Skip to main content

12 posts tagged with "weekly"

View All Tags

· 4 min read

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • add databend-meta config grpc_api_advertise_host (#9835)

AST

  • select from stage with files/pattern (#9877)
  • parse decimal type (#9894)

Expression

  • add Decimal128 and Decimal256 type (#9856)

Functions

  • support array_indexof (#9840)
  • support array function array_unique, array_distinct (#9875)
  • support array aggregate functions (#9903)

Query

  • add column id in TableSchema; use column id instead of index when read and write data (#9623)
  • support view in system.columns (#9853)

Storage

  • ParquetTable support topk optimization (#9824)

Sqllogictest

  • leverage sqllogictest to benchmark tpch (#9887)

Code Refactoring 🎉

Meta

  • remove obsolete meta service api read_msg() and write_msg() (#9891)
  • simplify UserAPI and RoleAPI by introducing a method update_xx_with(id, f: FnOnce) (#9921)

Cluster

  • split exchange source to reader and deserializer (#9805)
  • split and eliminate the status for exchange transform and sink (#9910)

Functions

  • rename some array functions add array_ prefix (#9886)

Query

  • TableArgs preserve info of positioned and named args (#9917)

Storage

  • ParquetTable list file in read_partition (#9871)

Build/Testing/CI Infra Changes 🔌

  • support for running benchmark on PRs (#9788)

Bug Fixes 🔧

Functions

  • fix nullable and or domain cal (#9928)

Planner

  • fix slow planner when ndv error backtrace (#9876)
  • fix order by contains aggregation function (#9879)
  • prevent panic when delete with subquery (#9902)

Query

  • fix insert default value datatype (#9816)

What's On In Databend

Stay connected with the latest news about Databend.

Why You Should Try Sccache

Sccache is a ccache-like project started by the Mozilla team, supporting C/CPP, Rust and other languages, and storing caches locally or in a cloud storage backend. The community first added native support for the Github Action Cache Service to Sccache in version 0.3.3, then improved the functionality in v0.4.0-pre.6 so that the production CI can now use it.

Now, opendal, open-sourced by Datafuse Labs, acts as a storage access layer for sccache to interface with various storage services (s3/gcs/azlob etc.).

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Try using build-info

To get information about git commits, build options and credits, we now use vergen and cargo-license.

build-info can collect build-information of your Rust crate. It might be possible to use it to refactor the relevant logic in common-building.

pub struct BuildInfo {
pub timestamp: DateTime<Utc>,
pub profile: String,
pub optimization_level: OptimizationLevel,
pub crate_info: CrateInfo,
pub compiler: CompilerInfo,
pub version_control: Option<VersionControl>,
}

Issue 9874: Refactor: Try using build-info

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdependabot[bot]drmingdrmer
andylokandyariesdevilb41shBohuTANGdependabot[bot]drmingdrmer
everpcpcflaneur2020johnhaxx7leiyskylichuangmergify[bot]
everpcpcflaneur2020johnhaxx7leiyskylichuangmergify[bot]
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonXuanwo
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonXuanwo
xudong963youngsofunzhang2014
xudong963youngsofunzhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 4 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

AST

  • add syntax about parsing presign options with content type (#9771)

Format

  • add TSV file format back (#9732)

Functions

  • support array functions prepend and append (#9844)
  • support array concat (#9804)

Query

  • add topn runtime filter in native storage format (#9738)
  • enable hashtable state pass from partial to final (#9809)

Storage

  • add pruning stats to EXPLAIN (#9724)
  • cache bloom index object (#9712)

Code Refactoring 🎉

  • 'select from stage' use ParquetTable (#9801)

Meta

  • expose a single "kvapi" as public interface (#9791)
  • do not remove the last node from a cluster (#9781)

AST/Expression/Planner

  • unify Span and Result (#9713)

Executor

  • merge simple pipe and resize pipe (#9782)

Bug Fixes 🔧

Base

  • fix not linux and macos jemalloc fallback to std (#9786)

Config

  • fix table_meta_cache can't be disabled (#9767)

Meta

  • when import data to meta-service dir, the specified "id" has to be one of the "initial_cluster" (#9755)

Query

  • fix and refactor aggregator (#9748)
  • fix memory leak for data port (#9762)
  • fix panic when cast jsonb to string (#9813)

Storage

  • fix up max_file_size may oom (#9740)

What's On In Databend

Stay connected with the latest news about Databend.

DML Command - UPDATE

Modifies rows in a table with new values.

Note: Databend guarantees data integrity. In Databend, Insert, Update, and Delete operations are guaranteed to be atomic, which means that all data in the operation must succeed or all must fail.

Syntax

UPDATE <table_name>
SET <col_name> = <value> [ , <col_name> = <value> , ... ]
[ FROM <table_name> ]
[ WHERE <condition> ]

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Support Arrow Flight SQL Protocol

Currently Databend supports the MySQL protocol, and it would be great if Databend could support the Arrow Flight SQL protocol as well.

Typically a lakehouse stores data in parquet files using the MySQL protocol while Databend has to do deserialization from parquet to arrow and then back to MySQL data types. Again on the caller end users use data frames or MySQL result iterators, which also requires serialization of types. With Arrow Flight SQL all of these back and forth serialization costs can be avoided.

Issue 9832: Feature: Support Arrow Flight SQL protocol

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydependabot[bot]
andylokandyariesdevilb41shBohuTANGdantengskydependabot[bot]
drmingdrmereverpcpcflaneur2020johnhaxx7leiyskymergify[bot]
drmingdrmereverpcpcflaneur2020johnhaxx7leiyskymergify[bot]
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonXuanwo
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonXuanwo
youngsofunyufan022zhang2014
youngsofunyufan022zhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 4 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

SQL

  • eliminate extra group by scalars (#9708)

Query

  • add privilege check for insert/delete/optimize (#9664)
  • enable empty projection (#9675)
  • add aggregate limit in final aggregate stage (#9716)
  • add optional column names to create/alter view statement (#9715)

Storage

  • add prewhere support in native storage format (#9600)

Code Refactoring 🎉

IO

  • move io constants to common/io (#9700)
  • refine fuse/io/read (#9711)

Planner

  • rename Scalar to ScalarExpr (#9665)

Storage

  • refactor cache layer (#9672)
  • pruner.rs -> fuse_bloom_pruner.rs (#9710)
  • make pruner hierarchy to chain (#9714)

Build/Testing/CI Infra Changes 🔌

  • support setup minio storage & external s3 storage in docker image (#9676)

Bug Fixes 🔧

Expression

  • fix missing simple_cast (#9671)

Query

  • fix efficiently_memory_final_aggregator result is not stable (#9685)
  • fix max_result_rows only limit output results nums (#9661)
  • fix query hang in two level aggregator (#9694)

Storage

  • may get wrong datablocks if not sorted by output schema (#9470)
  • bloom filter is using wrong cache key (#9706)

What's On In Databend

Stay connected with the latest news about Databend.

Databend All-in-One Docker Image

Databend Docker Image now supports setting up MinIO storage and external AWS S3 storage.

Now you can easily use a Docker image for your first experiment with Databend.

Run with MinIO as backend

docker run \
-p 8000:8000 \
-p 9000:9000 \
-e MINIO_ENABLED=true \
datafuselabs/databend

Run with self managed query config

docker run \
-p 8000:8000 \
-e DATABEND_QUERY_CONFIG_FILE=/etc/databend/mine.toml \
-v query_config_file:/etc/databend/mine.toml \
datafuselabs/databend

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Vector search captures the meaning and context of unstructured data, and is commonly used for text or image processing, enabling the use of semantics to find similar results and obtain more valid results than traditional keyword retrieval.

Databend plans to provide users with a richer and more efficient means of querying by supporting vector search, and the introduction of Faiss Index may be an initial solution.

Issue 9699: feat: vector search (Faiss index)

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydependabot[bot]
andylokandyariesdevilb41shBohuTANGdantengskydependabot[bot]
everpcpcflaneur2020johnhaxx7leiyskymergify[bot]PsiACE
everpcpcflaneur2020johnhaxx7leiyskymergify[bot]PsiACE
RinChanNOWWWsandfleesundy-lixudong963zhang2014zhyass
RinChanNOWWWsandfleesundy-lixudong963zhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 4 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • use expression::TableSchema to replace obsolete datavalues::DataSchema (#9506)
  • iter() iterate every tree and every records in theses trees (#9621)

Expression

  • add other base geo functions (#9588)

Optimizer

  • improve cardinality estimation for join based on histogram (#9594)

Planner

  • improve join reorder algorithm (#9571)

Query

  • support insert with placeholder (#9575)
  • set setting support expr (#9574)
  • add information_schema for sharding-jdbc (#9583)
  • support named params for table functions (#9630)

Storage

  • read_parquet page index (#9563)
  • update interpreter and storage support (#9261)

Code Refactoring 🎉

  • refine on_error mode (#9473)

Meta

  • remove unused meta types and conversion util (#9584)

Parser

  • more strict parser for format_options (#9635)

Expression

  • rearrange common_expression and common_function (#9585)

Build/Testing/CI Infra Changes 🔌

  • run sqllogictests with binary (#9603)

Bug Fixes 🔧

Expression

  • constant folder should run repeatedly until stable (#9572)
  • check_date() and to_string(boolean) may panic (#9561)

Planner

  • fix stack overflow when applying RuleFilterPushDownJoin (#9645)

Storage

  • fix range filter read stat with index (#9619)

Sqllogictest

  • sqllogic test hangs (cluster mod + clickhouse handler) (#9615)

What's On In Databend

Stay connected with the latest news about Databend.

Upgrade Databend Query from 0.8 to 0.9

Databend-query-0.9 introduces incompatible changes in metadata, these metadata has to be manually migrated. Databend provides a program for this job: databend-meta-upgrade-09, which you can find in a release package or can be built from source.

Upgrade

databend-meta-upgrade-09 --cmd upgrade --raft-dir "<./your/raft-dir/>"

Learn More

Release Proposal: Nightly v1.0

The call for proposals for the release of v1.0 is now open.

The preliminary plan is to release in March, mainly focusing on alter table, update, and group by spill.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add Type Checker for Sqllogictest

We can check if each row's each element's type is correct.

databend/tests/sqllogictests/src/client/mysql_client.rs

 // Todo: add types to compare 
Ok(DBOutput::Rows {
types,
rows: parsed_rows,

Issue 9647: Feature: Add type checker for sqllogictest

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
sundy-liTCeasonXuanwoxudong963youngsofunyufan022
sundy-liTCeasonXuanwoxudong963youngsofunyufan022
zhang2014zhyass
zhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 4 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • add reader-min-msg-ver and msg-min-reader-ver in proto-conv (#9535)

Planner

  • support tuple.1 and get(1)(tuple) (#9493)
  • support display estimated rows in EXPLAIN (#9528)

Query

  • efficiently memory two level group by in standalone mode (#9504)

Storage

  • support nested type in read_parquet (#9486)
  • add build options table (#9502)

Code Refactoring 🎉

  • merge new expression (#9411)
  • remove and rename crates (#9481)
  • bump rust version (#9540)

Expression

  • move negative functions to binder (#9484)
  • use error_to_null() to eval try_cast (#9545)

Functions

  • replace h3ron to h3o (#9553)

Format

  • extract AligningStateTextBased (#9472)
  • richer error context (#9534)

Query

  • use ctx to store the function evaluation error (#9501)
  • refactor map access to support view read tuple inner (#9516)

Storage

  • bump opendal for streaming read support (#9503)
  • refactor bloom index to use vectorized siphash function (#9542)

Bug Fixes 🔧

HashTable

  • fix memory leak for unsized hash table (#9551)

Storage

  • fix row group stats collection (#9537)

What's On In Databend

Stay connected with the latest news about Databend.

New Year, New Expression!

We're so thrilled to tell you that Databend now fully works with New Expression after more than a half year of dedicated work. New Expression introduces a formal type system to Databend and supports type-safe downward casting , making the definition of functions easier.

New Expression is still being tuned, and a new version (v0.9) of Databend will be released once the tuning work is complete.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

UNNEST Function

The UNNEST function takes an array as a parameter, and returns a table containing each element of the array in a row.

Syntax

UNNEST(ARRAY) [WITH OFFSET]

If you're interested in becoming a contributor, helping us develop the UNNEST function would be a good start.

Issue 9549: Feature: Support unnest

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGClSlaiddantengsky
andylokandyariesdevilb41shBohuTANGClSlaiddantengsky
dependabot[bot]drmingdrmereverpcpcflaneur2020leiyskymergify[bot]
dependabot[bot]drmingdrmereverpcpcflaneur2020leiyskymergify[bot]
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonwubx
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonwubx
Xuanwoxudong963youngsofunzhang2014
Xuanwoxudong963youngsofunzhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 4 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Format

  • basic output format JSON (#9447)

Query

  • check connection params (#9437)
  • add max_query_row_nums (#9406)

Storage

  • support prewhere in hive (#9427)
  • add generic cache trait for different object reader (#9436)
  • add metrics for new cache (#9445)

New Expression

  • migrate hash func to func-v2 (#9402)

Sqllogictest

  • run all tests in parallel (#9400)

Code Refactoring 🎉

Storage

  • add to_bytes and from_bytes for CachedObject (#9439)
  • better table-meta and parquet reader function (#9434)
  • convert fuse_snapshot unit tests to sqlloigc test (#9428)

Bug Fixes 🔧

Format

  • catch unwind when read split (#9420)

User

Planner

  • create Stage URL's path should ends with / (#9450)

What's On In Databend

Stay connected with the latest news about Databend.

Databend 2022 Recap

Let's look back and see how Databend did in 2022.

  • Open source: got 2,000+ stars, merged 2,400+ PRs, and solved 1,900 issues.
  • From data warehouse to lakehouse: Brand-new design with enhanced capabilities.
  • Rigorous testing: SQL Logic Tests, SQLancer, and https://perf.databend.rs.
  • Building the ecosystem: More customers chose, trusted, and grew with Databend, including Kuaishou and SAP.
  • Databend Cloud: Built on top of Databend, the next big data analytics platform.

We wish everyone a Happy New Year and look forward to engaging with you.

Learn More

Databend 2023 Roadmap

As the new year approaches, Databend is also actively planning its roadmap for 2023.

We will continue to polish the Planner and work on data and query caching. Enhancing storage and query issues for PB-level data volumes is also on our list.

Try Databend and join the roadmap discussion.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Profile-Guided Optimization (PGO)

The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform optimizations such as inlining, machine-code layout, register allocation, etc.

rustc supports doing profile-guided optimization (PGO). We expect to be able to use it to enhance the build.

Issue 9387: Feature: Add PGO Support

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilBohuTANGdantengskydependabot[bot]everpcpcflaneur2020
ariesdevilBohuTANGdantengskydependabot[bot]everpcpcflaneur2020
hantmacleiyskymergify[bot]PsiACEsandfleesoyeric128
hantmacleiyskymergify[bot]PsiACEsandfleesoyeric128
sundy-liTCeasonXuanwoxudong963youngsofunzhang2014
sundy-liTCeasonXuanwoxudong963youngsofunzhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 6 min read

The year is coming to an end, and Databend is about to enter its third year. Before we count down the new year, it's a good idea to look back and see how Databend did in 2022.

Open Source: Receiving Increased Attention

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

databend

The open-source philosophy has guided Databend from the very beginning. The entire team works seamlessly on GitHub where the Rust community and many data pros are fully involved. In 2022, the Databend repository:

  • Got 2,000+ stars, totaling 5,000 .
  • Merged 2,400+ PRs, totaling 5,600 .
  • Solved 1,900 issues, totaling 3,000 .
  • Received 16,000 commits, totaling 23,000 .
  • Attracted more contributors, totaling 138 .

Development: Inspired by Real Scenarios

Databend brought many new features and improvements in 2022 to help customers with their real work scenarios.

databend arch

Brand-New Data Warehouse

As a data warehouse inspired by and benchmarking itself against Snowflake and Clickhouse, Databend fully took advantage of "Cloud Native" to bring you a new design and implementation without breaking the balance between performance and maintainability:

  • Added support for Stage and Data Sharing, helping users manage their data life cycle with more options.
  • Introduced a new planner with user-friendly error prompts and efficient optimization techniques for the execution plan.
  • Redesigned the type system to support type checking and type-safe downward casting.
  • Enhanced the new processor framework: It can now work in both Pull and Push modes.
  • Added experimental support for Native Format to improve performance when running on a local disk.

Databend as Lakehouse

Storing and managing massive data is key to our vision "Databend as Lakehouse" . A lot of efforts have been made in 2022 for a larger data payload and a wider range of accepted data sources:

  • Adopted OpenDAL in the data access layer as a unified interface.
  • Expanded support for structured and semi-structured data.
  • Added the ability to keep multiple catalogs: This makes integrations with custom catalogs such as Hive much easier.
  • Added the ability to query data directly from a local, staged, or remote file.

Optimal Efficiency Ratio

After a year of continuous tuning, we brought Databend to a new stage featuring elastic scheduling and separating storage from compute. We're thrilled to see a significant improvement in the efficiency ratio:

  • In some scenarios, Databend works as efficiently as Clickhouse.
  • Lowered costs by 90% compared to Elasticsearch, and by 30% compared to Clickhouse.

Testing: Put Us at Ease

Comprehensive tests help make a database management system robust. While optimizing performance, we also care about the accuracy and reproducibility of SQL results returned from Databend.

databend perf

Correctness Testing

In 2022, we replaced stateless tests with SQL Logic Tests for Databend in the first place. We also migrated a large number of mature test cases to cover as many scenarios as possible. Afterward, we started to use a Rust native test program called sqllogictest-rs instead of the previous Python one, which saved us a lot of time on CI without losing the maintainability of the tests.

Furthermore, we also planned and implemented three types of automated testing (TLP, QPS, and NoREC) supported by SQLancer. All of them have been successfully merged into the main branch with dozens of bug fixes.

Performance Testing

Performance testing is also essential for us. In 2022, we launched a website (https://perf.databend.rs/) to track daily performance changes and spot potential issues. Meanwhile, we actively evaluated Databend against Clickbench and some other benchmarks.

Ecosystem: Give and Take

The Databend ecosystem and users benefit from each other. More and more users were attracted to the ecosystem and joined the community in 2022. As they brought their own creative ideas to Databend and made them come true, the Databend ecosystem made tremendous progress and started to flourish in the field.

Positive Expansion

We build and value the Databend ecosystem. Databend is now compatible with the MySQL protocol and Clickhouse HTTP Handler, and can seamlessly integrate with the following data services or utilities:

  • Airbyte
  • DBT
  • Addax (Datax)
  • Vector
  • Jupyter Notebook
  • DBeaver

To help users develop and customize services based on Databend, we developed drivers in multiple languages, including Python and Go.

Growing with Users

Users are the basis of Databend. They help develop Databend and stir up the whole community.

In 2022, Databend added support for the Hive Catalog with the help of Kuaishou Technology. This connected Databend to the Hive ecosystem and encouraged us to consider the possibility of multiple catalogs. DMALL implemented and verified data archiving with Databend. We also appreciate SHAREit, Voyance, DigiFinex, and Weimob for their trust and support.

The Databend ecosystem includes a few projects that are loved and trusted by other products:

  • OpenDAL now manages the data access layer for sccache , which provides further support for Firefox CI . Other database and data analysis projects, such as GreptimeDB and deepeth/mars , also used OpenDAL for data access.
  • OpenRaft was used to implement a Feature Registry (database to hold feature metadata) in Azure/Feathr. SAP, Huobi, and Meituan also used it in a few internal projects.
  • The MySQL protocol implementation in OpenSrv has been applied to multiple database projects such as GreptimeDB and CeresDB .

Knowledge Sharing

In 2022, the Databend community launched the "Data Infra Club" for knowledge sharing. Our friends from PingCAP, Kuaishou Technology, DMALL, and SHAREit were invited to share their insights on big data platforms, Data Mesh, and Modern Data Stack. You can find all the video replays on Bilibili if you're interested.

Going Cloud: Sky's the Limit

Going cloud is part of Databend's business strategy where most Databend users come from the cloud.

Built on top of Databend, Databend Cloud is a big-data analytics platform of the next generation, featuring easy-to-use , low-cost , and high-performance . Two versions of Databend Cloud are now available and open for trial:

· 4 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • remove stream when a watch client is dropped (#9334)

Planner

  • support selectivity estimation for range predicates (#9398)

Query

  • support copy on error (#9312)
  • support databend-local (#9282)
  • external storage support location part prefix (#9381)

Storage

  • rangefilter support in (#9330)
  • try to improve object storage io read (#9335)
  • support table compression (#9370)

Metrics

  • add more metrics for fuse compact and block write (#9399)

Sqllogictest

  • add no-fail-fast support (#9391)

Code Refactoring 🎉

*

  • adopt rustls entirely, removing all deps to native-tls (#9358)

Format

  • remove format_xxx settings (#9360)
  • adjust interface of FileFormatOptionsExt (#9395)

Planner

  • remove SyncTypeChecker (#9352)

Query

  • split fuse source to read data and deserialize (#9353)
  • avoid io copy in read parquet data (#9365)
  • add uncompressed buffer for parquet reader (#9379)

Storage

  • add read/write settings (#9359)

Bug Fixes 🔧

Format

  • fix align_flush with header only (#9327)

Settings

  • use logical CPU number as default value of num_cpus (#9396)

Processors

  • the data type on both sides of the union does not match (#9361)

HTTP Handler

  • false alarm (warning log) about query not exists (#9380)

Sqllogictest

  • refactor sqllogictest http client and fix expression string like (#9363)

What's On In Databend

Stay connected with the latest news about Databend.

Introducing databend-local​

Inspired by clickhouse-local, databend-local allows you to perform fast processing on local files, without the need of launching a Databend cluster.

> export CONFIG_FILE=tests/local/config/databend-local.toml
> cargo run --bin=databend-local -- --sql="SELECT * FROM tbl1" --table=tbl1=/path/to/databend/docs/public/data/books.parquet

exec local query: SELECT * FROM tbl1
+------------------------------+---------------------+------+
| title | author | date |
+------------------------------+---------------------+------+
| Transaction Processing | Jim Gray | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
| Transaction Processing | Jim Gray | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+---------------------+------+
4 rows in set. Query took 0.015 seconds.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Compressing Short Strings​

When processing the same queries with short strings involved, Databend usually reads more data than other databases, such as Snowflake.

SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;

Such queries might be more efficient if short strings (URLs, etc) are compressed.

Issue 9001: performance: compressing for short strings

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilBohuTANGdantengskydrmingdrmereastfisher
andylokandyariesdevilBohuTANGdantengskydrmingdrmereastfisher
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
sundy-liXuanwoxudong963youngsofunzhang2014zhyass
sundy-liXuanwoxudong963youngsofunzhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 4 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Multiple Catalogs

  • implement show tables (from|in catalog.database) (#9153)

Planner

  • introduce histogram in column statistics (#9310)

Query

  • support attaching stage for insert values (#9249)
  • add native format in fuse table (#9279)
  • add internal_enable_sandbox_tenant config and sandbox_tenant (#9277)

Sqllogictest

  • introduce rust native sqllogictest framework (#9150)

Code Refactoring 🎉

*

  • unify apply_file_format_options for copy & insert (#9323)

IO

  • remove unused code (#9266)

meta

  • test watcher count (#9324)

Planner

  • replace TableContext in planner with PlannerContext (#9290)

Bug Fixes 🔧

Base

  • try fix SIGABRT when catch unwind (#9269)
  • replace #[thread_local] to thread_local macro (#9280)

Query

  • fix unknown database in query without relation to this database (#9250)
  • fix wrong current_role when drop the role (#9276)

What's On In Databend

Stay connected with the latest news about Databend.

Introduced a Rust Native Sqllogictest Framework

Sqllogictest verifies the results returned from a SQL database engine by comparing them with the results of other engines for the same queries.

In the past, Databend ran such tests using a program written in Python and migrated a large number of test cases from other popular databases. We implemented the program again with sqllogictest-rs in recent days.

Learn More

Experimental: Native Format

PA is a native storage format based on Apache Arrow. Similar to Arrow IPC, PA aims at optimizing the storage layer.

Databend is introducing PA as a native storage format in the hope of getting a performance boost, though it's still at an early stage of development.

create table tmp (a int) ENGINE=FUSE STORAGE_FORMAT='native';

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Checking File Existence Before Returning Presigned URL​

When presigning a file, Databend currently returns a potentially valid URL based on the filename without checking if the file really exists. Thus, the 404 error might occur if the file doesn't exist at all.

Issue 8702: Before return presign url add file exist judgement

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGClSlaiddrmingdrmereverpcpc
ariesdevilb41shBohuTANGClSlaiddrmingdrmereverpcpc
leiyskymergify[bot]PsiACEsandfleesoyeric128sundy-li
leiyskymergify[bot]PsiACEsandfleesoyeric128sundy-li
Xuanwoxudong963youngsofunzhang2014ZhiHanZzhyass
Xuanwoxudong963youngsofunzhang2014ZhiHanZzhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 5 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Multiple Catalogs

  • extends show databases SQL (#9152)

Stage

  • support select from URI (#9247)

Streaming Load

  • support file_format syntax in streaming load insert sql (#9063)

Planner

  • push down limit to union (#9210)

Query

  • use analyze table instead of optimize table statistic (#9143)
  • fast parse insert values (#9214)

Storage

  • use distinct count calculated by the xor hash function (#9159)
  • read_parquet read meta before read data (#9154)
  • push down filter to parquet reader (#9199)
  • prune row groups before reading (#9228)

Open Sharing

  • add prototype open sharing and add sharing stateful tests (#9177)

Code Refactoring 🎉

*

  • simplify the global data registry logic (#9187)

Storage

  • refactor deletion (#8824)

Build/Testing/CI Infra Changes 🔌

  • release databend deb package and databend with hive (#9138, #9241, etc.)

Bug Fixes 🔧

Format

  • support ASCII control code hex as format field delimiter (#9160)

Planner

  • prewhere_column empty and predicate is not const will return empty (#9116)
  • don't push down topk to Merge when it's child is Aggregate (#9183)
  • fix nullable column validity not equal (#9220)

Query

  • address unit test hang on test_insert (#9242)

Storage

  • too many io requests for read blocks during compact (#9128)
  • collect orphan snapshots (#9108)

What's On In Databend

Stay connected with the latest news about Databend.

Breaking Change: Unified File Format Options

To simplify, we're rolling out a set of unified file format options as follows for the COPY INTO command, the Streaming Load API, and all the other cases where users need to describe their file formats:

[ FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | XML} [ formatTypeOptions ] ) ]
  • Please note that the current format options starting with format_* will be deprecated.
  • ... FORMAT CSV ... will still be accepted by the ClickHouse handler.
  • Support for customized formats created by CREATE FILE FORMAT ... will be added in a future release: ... FILE_FORMAT = (format_name = 'MyCustomCSV') .... .

Learn More

Open Sharing

Open Sharing is a simple and secure data-sharing protocol designed for databend-query nodes running in a multi-cloud environment.

  • Simple & Free: Open Sharing is open-source and basically a RESTful API implementation.
  • Secure: Open Sharing verifies incoming requesters' identities and access permissions, and provides an audit log.
  • Multi-Cloud: Open Sharing supports a variety of public cloud platforms, including AWS, Azure, GCP, etc.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

We're about to run stage-related tests again using the Streaming Load API to move files to a stage instead of an AWS command like this:

aws --endpoint-url ${STORAGE_S3_ENDPOINT_URL} s3 cp s3://testbucket/admin/data/ontime_200.csv s3://testbucket/admin/stage/internal/s1/ontime_200.csv >/dev/null 2>&1

This is because Databend users do not need to take care of, or do not even know the stage paths that the AWS command requires.

Issue 8528: refactor stage related tests

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGChasen-ZhangClSlaiddantengsky
ariesdevilb41shBohuTANGChasen-ZhangClSlaiddantengsky
drmingdrmerhantmaclichuangmergify[bot]PsiACERinChanNOWWW
drmingdrmerhantmaclichuangmergify[bot]PsiACERinChanNOWWW
soyeric128sundy-liwubxXuanwoxudong963youngsofun
soyeric128sundy-liwubxXuanwoxudong963youngsofun
ZhiHanZzhyasszzzdong
ZhiHanZzhyasszzzdong

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 4 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Planner

  • optimize topk in cluster mode (#9092)

Query

  • support select * exclude [column_name | (col_name, col_name,...)] (#9009)
  • alter table flashback (#8967)
  • new table function read_parquet to read parquet files as a table (#9080)
  • support select * from @stage (#9123)

Storage

  • cache policy (#9062)
  • support hive nullable partition (#9064)

Code Refactoring 🎉

Memory Tracker

  • keep tracker state consistent (#8973)

REST API

  • drop ctx after query finished (#9091)

Bug Fixes 🔧

Configs

  • add more tests for hive config loading (#9074)

Planner

  • try to fix table name case sensibility (#9055)

Functions

  • vector_const like bug fix (#9082)

Storage

  • update last_snapshot_hint file when purge (#9060)

Cluster

  • try fix broken pipe or connect reset (#9104)

What's On In Databend

Stay connected with the latest news about Databend.

RESTORE TABLE

By the snapshot ID or timestamp you specify in the command, Databend restores the table to a prior state where the snapshot was created. To retrieve snapshot IDs and timestamps of a table, use FUSE_SNAPSHOT.

-- Restore with a snapshot ID
ALTER TABLE <table> FLASHBACK TO (SNAPSHOT => '<snapshot-id>');
-- Restore with a snapshot timestamp
ALTER TABLE <table> FLASHBACK TO (TIMESTAMP => '<timestamp>'::TIMESTAMP);

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Adding Build Information to Error Report

An error report currently only contains an error code and some information about why the error occurred. When build information is available, troubleshooting will become easier.

"Code: xx. Error: error msg... (version ...)"

Issue 9117: Add Build Information to the Error Report

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyb41shBohuTANGdantengskydrmingdrmereverpcpc
andylokandyb41shBohuTANGdantengskydrmingdrmereverpcpc
lichuangmergify[bot]PsiACERinChanNOWWWsandfleesoyeric128
lichuangmergify[bot]PsiACERinChanNOWWWsandfleesoyeric128
sundy-liTCeasonXuanwoxudong963youngsofunzhang2014
sundy-liTCeasonXuanwoxudong963youngsofunzhang2014
ZhiHanZ
ZhiHanZ

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

· 5 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Format

  • better checking of format options (#8981)
  • add basic schema infer for parquet (#9043)

Query

  • QualifiedName support 'db.table.' and 'table.' (#8965)
  • support bulk insert without exprssion (#8966)

Storage

  • add cache layer for fuse engine (#8830)
  • add system table system.memory_statistics (#8945)
  • add optimize statistic ddl support (#8891)

Code Refactoring 🎉

Base

  • remove common macros (#8936)

Format

  • TypeDeserializer get rid of FormatSetting (#8950)

Planner

  • refactor extract or predicate (#8951)

Processors

  • optimize join by merging build data block (#8961)

New Expression

  • allow sparse column id in chunk, redo #8789 with a new approach. (#9008)

Documentation 📔

Bug Fixes 🔧

Base

  • try fix lost tracker (#8932)

Meta

  • fix share db bug, create DatabaseIdToName if need (#9006)

Mysql handler

  • fix mysql conns leak (#8894)

Processors

  • try fix update list memory leak (#9023)

Storage

  • read and write block in parallel when compact (#8921)

What's On In Databend

Stay connected with the latest news about Databend.

Infer Schema at a Glance

You usually need to create a table before loading data from a file stored on a stage or somewhere. Unfortunately, sometimes you might not know the file schema to create the table or are unable to input the schema due to its complexity.

Introducing the capability to infer schema from an existing file will make the work much easier. You will even be able to query data directly from a stage using a SELECT statement like select * from @my_stage.

INFER 's3://mybucket/data.csv' FILE_FORMAT = ( TYPE = CSV );
+-------------+---------+----------+
| COLUMN_NAME | TYPE | NULLABLE |
|-------------+---------+----------|
| CONTINENT | TEXT | True |
| COUNTRY | VARIANT | True |
+-------------+---------+----------+

We've added support for inferring the basic schema from parquet files in #9043, and we're now working on #7211 to implement select from @stage.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add Tls Support for Mysql Handler

opensrv-mysql v0.3.0 that was released recently includes support for TLS. It sounds like a good idea to introduce it to Databend.

let (is_ssl, init_params) = opensrv_mysql::AsyncMysqlIntermediary::init_before_ssl(
&mut shim,
&mut r,
&mut w,
&Some(tls_config.clone()),
)
.await
.unwrap();

opensrv_mysql::secure_run_with_options(shim, w, ops, tls_config, init_params).await

Issue 8983: Feature: tls support for mysql handler

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
everpcpcflaneur2020leiyskylichuangmergify[bot]PsiACE
everpcpcflaneur2020leiyskylichuangmergify[bot]PsiACE
sandfleesoyeric128sundy-liTCeasonTracyZYJXuanwo
sandfleesoyeric128sundy-liTCeasonTracyZYJXuanwo
xudong963youngsofunyufan022zhang2014zhyass
xudong963youngsofunyufan022zhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.