Releases: duckdb/duckdb
0.9.0 Preview Release "Undulata"
This preview release of DuckDB is named "Undulata" after the aptly named Yellow-billed duck native to Africa.
Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE command with the old version followed by IMPORT DATABASE with the new version to migrate your data. See the documentation for details.
What's Changed
- [Dev] Merge master into feature by @Tishj in #7535
- Issue #7563: make_timestamptz by @hawkfish in #7597
- Add support for nested laterals by @arhamchopra in #7528
- Issue #7563: epoch_us(temporal) by @hawkfish in #7629
- Fix lingering clang-tidy issues by @Mytherin in #7670
- Add list_intersect, list_has_any, and list_has_all by @maiadegraaf in #7518
- Issue #7563: epoch_xs(temporal) by @hawkfish in #7648
- Pivot - dynamically switch between using filtered aggregates or the new pivot operator by @Mytherin in #7688
- Add wildcard to JSON Path by @lnkuiper in #7624
- [Dev] Add optional build flag to disable assertions in debug mode by @Tishj in #7618
- [DEV]: ICU C Casts by @hawkfish in #7715
- List_resize by @maiadegraaf in #7401
- Issue #7187: AsOf Join Performance by @hawkfish in #7607
- Some minor CI changes by @samansmink in #7763
- Binder coverage by @hawkfish in #7791
- Vacuum Completely Deleted Row Groups by @Mytherin in #7794
- Issue #7187: AsOf Coverage by @hawkfish in #7774
- Implement FIELD_IDS for parquet writes by @lnkuiper in #7696
- Optimize Regexp_matches to LIKE statements when possible by @Tmonster in #7264
- Jemalloc configuration, more buffer allocator, and remove redundant string copying in parquet dictionary by @lnkuiper in #7697
- Truncate Database File on Checkpoint by @Mytherin in #7824
- LEFT JOIN ON TRUE support by @taniabogatsch in #7821
- Issue #7809: Segment Tree Performance by @hawkfish in #7831
- C Data Interface:
duckdb_arrow_scanandduckdb_arrow_array_scanby @angadn in #7570 - Update Julia to 0.8.1 by @Mytherin in #7932
- Add conn.interrupt() to DuckDB python API by @henrinikku in #7895
- renaming part of extension build refactor PR by @samansmink in #7926
- fix swapped x/y regression parameters by @MartinNowak in #7855
- [Docs] Aggregate function README.md by @hawkfish in #7881
- PhysicalPiecewiseMergeJoin improvement by @xuke-hat in #7832
- Initial set of commits to add support for zOS (an IBM mainframe operating system) by @v1gnesh in #7805
- test(nodejs): add test_all_types.test.ts by @Mause in #7740
- Issue #7879: Missing JDBC TIMESTAMP_TZ by @hawkfish in #7922
- Attempt to fix CI on Windows 32 and Python on Windows by @carlopi in #7961
- Fix 7947 by @lnkuiper in #7963
- test: patch test_7652 to skip on pyarrow<11 by @gforsyth in #7966
- NodeJS: Add
columns()method to get type info from prepared statement by @Maxxen in #7948 - Fix: Don't free arrow children explicitly by @Maxxen in #7917
- CSV Rejects table by @Maxxen in #7681
- Issue #7809: Segment Tree Performance by @hawkfish in #7891
- Add tpch benchmark run exclusively on parquet files by @Tmonster in #7519
- Bidirectional check storage + minor CI fixes by @carlopi in #7955
- [Swift] fix #7985 by @tcldr in #7993
- Move @samansmink's extension_header_rename.patch by @carlopi in #8001
- [Python] Properly use NumPy array
stridewhen scanningobjectarrays. by @Tishj in #7964 - CI - No longer run on PR synchronize, instead run on ready_for_review by @Mytherin in #8007
- Parallel pipeline execution should call NextBatch on first batch by @bleskes in #7978
- Micro-optimization for generating collation keys by @Krechals in #7983
- Multiple assignment for
UPDATE SETby @nickgerrets in #7968 - CI job to move synchronized PRs to draft by @carlopi in #8010
- [ADBC] ConnectionGetTableSchema and StatementSetSubstraitPlan Functions by @pdet in #7914
- Issue #7852: Window Vectorisation by @hawkfish in #7996
- Moving JDBC Linux x64 builds to CentOS 7 by @hannes in #7991
- CI Draft - token is called GH_TOKEN by @Mytherin in #8016
- Add support for materialized CTEs by @kryonix in #7126
- Reduce memory usage of Parquet writer by @lnkuiper in #7995
- CI auto draft: pass token via environment + avoid wrapping action by @carlopi in #8024
- CI autodraft: use implicit variable [test] by @carlopi in #8027
- remove duplicate pivots declare by @douenergy in #7992
- Fix typo in fts indexing exception by @alexanderchiu in #8034
- Fix issue 7988 by @samansmink in #8023
- Delete DraftMe.yml by @Mytherin in #8048
- Fix 3eb9ab3: Remove unneeded move by @carlopi in #8038
- [CI] Skip many more CI jobs for pull requests, and add make coverage-check to run coverage locally by @Mytherin in #8046
- Extension build configuration refactor by @samansmink in #7735
- Compressed Materialization by @lnkuiper in #7644
- [Relation] Add support for creating an empty
ValueRelationby @Tishj in #7967 - Join Order Optimizer has duplicate enumerations and lost some neighbors by @lokax in #7358
- Fix CI wasm by @carlopi in #8057
- [CI] More CI reduction and clean-up by @Mytherin in #8052
- Restore auto-draft functionality by @carlopi in #8058
- Move WebAssembly.yml to NightlyTests.yml by @carlopi in #8060
- Unskip, attach HTTPFS test, and create HTTPState when the opener is not available by @pdet in #8012
- CI fixes: Don't persist ccache for nightlies by @carlopi in #8075
- Fix regression & fix draft mechanism by @carlopi in #8071
- CI compliance feature branch by @carlopi in #8070
- Fix python flaky test (potentially GET requests gets back 403) by @carlopi in #8074
- [Arrow] Fix segfault in appending list data by @Tishj in #8042
- Issue #7852: Window Vectorisation by @hawkfish in #8050
- CONTRIBUTING.md by @carlopi in #8077
- Add ORDER BY clause to query in test_bool.test by @Flogex in #8082
- ART test and benchmark refactor by @taniabogatsch in #8055
- Update plan cost runner script to remove 20% threshold for cardinality estimates by @Tmonster in #7989
- Fix #8067 by @lnkuiper in #8090
- ART prefix refactor by @taniabogatsch in #7930
- Bump Substrait by @pdet in #8110
- Merge feature into master by @Mytherin in #8136
- Increase memory limit in test to prevent non-deterministic CI failures by @lnkuiper in #8138
- UNNEST binder fix by @taniabogatsch in #8111
- Out-of-Core Hash Aggregate by @lnkuiper in #7931
- Add Unittests for ODBC by @maiadegraaf in #7875
- Hive types by @lverdoes in #7674
...
0.8.1 Bugfix Release
This is a bug fix release for various issues discovered after we released 0.8.0. There are no new features, just bug fixes. Database files created by DuckDB v0.8.0 can be read by DuckDB v0.8.1 (i.e. v0.8.1 is backwards compatible with v0.8.0). Note that database files created by v0.8.1 cannot be read by DuckDB v0.8.0 (i.e. v0.8.0 is not forwards compatible with v0.8.1).
Changes
- [Julia] Update DuckDB_jll to v0.8.0 by @Mytherin in #7568
- CSV reader - allow parallel option to be set in COPY statement as well by @Mytherin in #7579
- shell: Remove .dbinfo command. by @omo in #7569
- Catalog::LookupEntry(): Remove unused code. by @omo in #7557
- Add the default scheme to the CREATE TYPE's type search path. by @omo in #7555
- Use std::all_of instead of raw loop in Disjoint. by @ttsugriy in #7549
- feat: introduce a common grammar/types file for libpgquery parser and update Python scripts to take source/target directory paths as argument by @stephaniewang526 in #7574
- Fix #7582 - correctly set "last_offset" in InitializeScanWithOffset and turn assertion into run-time check by @Mytherin in #7586
- Partially fix #7551 - throw internal exception in case of type mismatch in ExpressionExecutor by @Mytherin in #7587
- Fix #7602 - allow reserved keywords in named parameters by @Mytherin in #7604
- Fix #7599 - output a clear error message when a subquery is used in a table function that does not support it by @Mytherin in #7603
- Rework Code Coverage CI - Remove CodeCov and instead track uncovered lines explicitly + turn lack of coverage into a CI failure by @Mytherin in #7611
- Use unordered_set insert range overload. by @ttsugriy in #7615
- Reserve expression_costs storage. by @ttsugriy in #7608
- [ADBC] Testing Unhappy Paths, Fixing Memory Leaks from Error Setting, Removing Macros by @pdet in #7589
- Windows - path is only absolute if path starts with a single back-slash by @Mytherin in #7623
- Fix #7564 - if the auto-complete extension is not enabled, inline it into the shell by @Mytherin in #7621
- Remove 2 extra bytes from magic string pattern. by @ttsugriy in #7626
- Avoid unnecessary table lookup. by @ttsugriy in #7630
- Reserve enough storage for unbound_expressions. by @ttsugriy in #7627
- Increment code coverage by @Mytherin in #7636
- Remove all C-style casts and add clang-tidy rule to forbid them by @Mytherin in #7656
- Fix sql auto complete extension CI issue by @Mytherin in #7650
- Add missing entries to ParquetDecodeUtils::BITPACK_MASKS by @Tishj in #7658
- Fix: allow distinct and order by in list aggregates by @taniabogatsch in #7638
- Rework the AggregateExecutor interface to no longer have unnecessary pointers and arrays by @Mytherin in #7671
- Fix #7660 - avoid exporting the same catalog multiple times in EXPORT by @Mytherin in #7676
- Move BindUpdateConstraints into a virtual function that is implemented by the DuckTableEntry by @Mytherin in #7679
- Fix #7567 - when setting the schema to a different schema within another catalog, keep the correct catalog by @Mytherin in #7678
- Fix exception fmt by @carlopi in #7683
- Fix amalgamation build by avoiding overloading multiplication by @carlopi in #7661
- Fix #7659 - use correct catalog when replaying a CREATE TABLE in the WAL by @Mytherin in #7675
- Implement #7662 - add the "lock_configuration" setting which allows configurations to be locked down by @Mytherin in #7682
- Fix #7663 - add in_search_path function, correctly show temporary views in SHOW TABLES, and show views in SHOW ALL TABLES by @Mytherin in #7680
- expose the
StripUnicodeSpacesparser utility method by @stephaniewang526 in #7705 - Add FuzzyDuck fuzzer - and move fuzzer CI to separate repo by @Mytherin in #7712
- Add missing std::move for old GCCs by @Mytherin in #7714
- [Dev] Fix failing assertion in python debug by @Tishj in #7722
- Fix crash in
ArrowTableFunction::GetArrowLogicalTypeon Linux by @Tishj in #7718 - Allow core duckdb to handle unrecognized JDBC configuration by @elefeint in #7713
- [ADBC] Transactions and explicitly not-supporting Partition Reading/Execution by @pdet in #7639
- Verify that Parallel CSV Reader skips lines mid-threads by @pdet in #7637
- Fix issue with setup.py builds without dependencies by @samansmink in #7695
- [Python] Fix tests for Pandas 2.0.2 by @Tishj in #7726
- Code Coverage CI check - allow one uncovered line by @Mytherin in #7724
- Generate
default_typesfrom json files by @Tishj in #7646 - Fix fuzzer issues found by new fuzzer CI runs by @Mytherin in #7736
- [Python] Fix conversion of deeply nested dictionaries by @Tishj in #7739
- Fix TupleDataCollection List serialization by @lnkuiper in #7741
- Fuzzer #156: Copy Before Swizzle by @hawkfish in #7747
- Minor fixes to failing CI runs by @carlopi in #7768
- Fix more fuzzer issues found by new fuzzer CI by @Mytherin in #7759
- Add option to disable serialization by @stephaniewang526 in #7745
- fix(httpfs): correct listobjectv2_url for strict s3/http servers by @Mause in #7761
- Fuzzer #209: Multiple Scalar Blocks by @hawkfish in #7764
- Fuzzer #206: Fix Cast Overflow by @hawkfish in #7770
- More minor CI fixes by @Mytherin in #7779
- Add Exception on dependency verification for Enum Types and Temp Tables by @pdet in #7641
- Add fuzz_all_functions fuzzer, and add support for varargs to test_vector_types by @Mytherin in #7754
- JSON fixes by @lnkuiper in #7762
- [Julia] Fix issue related to table function callbacks and IO by @Tishj in #7783
- [Dev] Use
sqlin thepython_regression_test.py. by @Tishj in #7787 - Allow core duckdb to handle unrecognized C API configuration by @elefeint in #7804
- Fuzzer #214: ROWS BETWEEN Overflow by @hawkfish in #7767
- Add tests to cover issue 5132 and enable force reload by @taniabogatsch in #7800
- Fuzzer #215: Timestamp Arithmetic Overflow by @hawkfish in #7769
- Remove grammar support for CREATE/DROP DATABASE by @stephaniewang526 in #7806
- Serialize: fix some uncovered cases, part 1 by @carlopi in #7810
- CodeCov tweaks by @carlopi in #7815
- fix(jdbc): arrow error handling by @Mause in #7814
- Fix duck fuzzer #218 and #220 by @carlopi in #7818
- Add msan and ubsan to cifuzz (+ fix zstd + msan) by @carlopi in #7813
- Art bug fixes by @taniabogatsch in #7801
- Check GlobalSortState for external scan in PhysicalWindow by @lnkuiper in #7827
- remove un-used PGNodeTag by @stephaniewang526 in #7833
- refactor(fsspec): remove seekable flag by @Mause in #6585
- Unnest_rewriter fixes by @taniabogatsch in #7836
- [Julia] Fix comments on #7783 by @Tishj in #7843
- Disable attaching on-disk DuckDB databases if external access is disabled by @Mytherin in #7850
- Fix #7711 - disallow detaching the currently USEd database by @Mytherin in #7851
- [Python] only execute in
DuckDBPyRelation::Closeif it was never executed before by @Tishj in #7844 - Add rel_from_table_function to R relational API by @hannes in https://github.com/duckdb/d...
0.8.0 Preview Release "Fulvigula"
This preview release of DuckDB is named "Fulvigula" after the Mottled duck (Anas fulvigula) which lives in the Gulf of Mexico, where it is apparently highly prized amongst (heartless) hunters.
There are two SQL-level breaking changes in this release:
- #7174 The default sort order switched from
NULLS FIRSTtoNULLS LASTbecause this is more intuitive, especially in conjunction withLIMIT. - #7082 The division operator
/will now always lead to a floating point result even with integer parameters. The new operator//retains the old semantics. This change is consistent with Python.
Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE command with the old version followed by IMPORT DATABASE with the new version to migrate your data. See the documentation for details.
What's Changed
- Issue 5984 #4 LogicalColumnIndex out of range Error by @Tmonster in #6303
- Implementing Integration with PyTorch by @pdet in #6295
- Implement #4941: Python client: for streaming fetches construct a streaming result (fetch_one, record_batch_reader, etc) by @Mytherin in #6346
- Implement sharable Buffer Pool across DatabaseInstances by @jkub in #6299
- Add table functions range and generate_series for TIMESTAMPTZ by @papparapa in #6285
- Add Initial DuckDB Swift API by @tcldr in #6351
- Integration with TensorFlow Tensors by @pdet in #6348
- Windows - remove delayload code and enable statically linking extensions by default by @Mytherin in #6399
- Add support for Pivot/Unpivot statements by @Mytherin in #6387
- [C-API] Add support for StreamQueryResult by @Tishj in #6318
- [Swift] add remaining non-composite types by @tcldr in #6422
- [Swift] Add Prepared Statements by @tcldr in #6459
- [Python] Exclude jemalloc files while pip install on Android OS by @papparapa in #6450
- CI: Swap cron for repository_dispatch by @carlopi in #6498
- CI improvements + add version badge to README by @carlopi in #6493
- Storage: store lists as uint64 offsets instead of as list_entry_t by @Mytherin in #6499
- two changes facilitating sending table/column stats over the wire (M… by @peterboncz in #6440
- Rework Value class internals to have a similar structure to LogicalType and others by @Mytherin in #6503
- Remove unswizzle flag from SortedData::Unswizzle by @lnkuiper in #6501
- [Swift] Add Appender by @tcldr in #6482
- JDBC: Remove DuckDBDatabase by @MariusVolkhart in #6426
- Add nan and inf arithmetic by @Tmonster in #6415
- Update
tools/rpkgREADME.md by @Tishj in #6530 - Merge feature into master by @Mytherin in #6534
- Restrict threads for reliability. by @hawkfish in #6540
- Replace replace with format strings by @domoritz in #6542
- Add missing escape for " by @domoritz in #6543
- Blob <-> Bitstring casting by @LindsayWray in #6488
- Mapfunctions: map_entries, map_values, map_keys by @LindsayWray in #6522
- Issue #5920: Ordered Aggregate Buffering by @hawkfish in #6539
- Handle SQL-tagged strings correctly with dplyr::tbl, fixes #6506 by @rsund in #6536
- CI: Update Swift.yml by @carlopi in #6553
- Update SwiftRelease.yml by @carlopi in #6554
- Java: Implement JDBC 4.1 by @MariusVolkhart in #6376
- Bitstring aggregations by @LindsayWray in #6417
- Make our default
threadssetting Cgroup-aware on Linux by @Tishj in #6550 - [Swift] Add composite type support by @tcldr in #6557
- Statistics Rework: Switch to single BaseStatistics class, use separate static classes for methods on the stats instead by @Mytherin in #6560
- Introduce Syntax for SEMI and ANTI joins by @Tmonster in #6480
- Update storage_info with version 0.7.1 by @carlopi in #6572
- [Python] Add the ability to supply a DuckDBPyRelation instance to
registerby @Tishj in #6483 - [Python]
mapnow defaults to original type when analyzed type at bind is NULL by @Tishj in #6571 - [Dev] Fix broken
test_filesystem.pytest by @Tishj in #6582 - CI: Node.js, add common NPM-setup step by @carlopi in #6590
- build: add builds for nodejs linux arm64 by @Mause in #6586
- CI: move to setup-node@v3 by @carlopi in #6596
- Issue #6604: TIMESTAMP <=> TIMESTAMPTZ by @hawkfish in #6605
- [Python] Add support for EXPLAIN ANALYZE to
explainmethod by @Tishj in #6561 - Add ICU list functions generate_series and range by @papparapa in #6445
- feat(nodejs): add errorType attribute to DuckDbError by @Mause in #6434
- Fix TPC-DS date insertion by @ywelsch in #6591
- Fix #4016: Test amalgamation with --split param by @carlopi in #6587
- feat(python): throw HTTPExceptions instead of IOException for http errors by @Mause in #6533
- Add httpfs config to support packaging it as an extension by @ankrgyl in #6608
- Issue #6595: N-Ary Positional Joins by @hawkfish in #6598
- [Swift] inline documentation plus API tweaks by @tcldr in #6614
- Fix #6602: add inet extension to build/distribute script by @Mytherin in #6610
- CI remove amalgama x8 + swift release by @carlopi in #6615
- Fix too many open file handles during JSON schema detection by @lnkuiper in #6613
- Issue #6580: Parquet Int96 Timestamps by @hawkfish in #6601
- Exception_static_build defalt: Partial revert of dabbead by @carlopi in #6620
- Make DISTINCT ON respect the ORDER BY clause similar to Postgres + several ordered aggregate improvements by @Mytherin in #6616
- fix url encode issue for R2 by @samansmink in #6609
- [Swift] Database.Configuration type + documentation enhancements by @tcldr in #6617
- R: Avoid passing SEXP by reference by @krlmlr in #6475
- Test and fix preservation of class attribute in external pointers by @krlmlr in #6526
- Add support for lambda functions to
COLUMNS, and allow COLUMNS to be used in the ORDER BY/WHERE clauses by @Mytherin in #6621 - [R] Remove duplicate occurrence of dependency by @Tishj in #6625
- Automatically Fully Download Files through HTTPFS if no length header is provided by @pdet in #6448
- Remove some function calls that can throw potential false positives in CI by @Tmonster in #6623
- [Python] Add
__getattr__and__getitem__implementations for DuckDBPyRelation by @Tishj in #6624 - [Optimizer] Regex Optimization Rule fix by @Tishj in #6634
- [Bug Fix] Enum Serialization by @pdet in #6040
- Update interval for arrow by @handstuyennn in #6515
- SQLLogicTest - instead of moving prepared statements over avoid restarting database when there are prepared statements by @Mytherin in #6638
- Bind replace table function by @samansmink in #6639
- Fix #6630: correctly set bind_data->types in the Parquet scan when using union_by_name by @Mytherin in #6642
- [Python]
read_csvcan now read from a file-like object. by @Tishj in #6568 - Fix #6640: correctly throw an error on altering schemas by @Mytherin in #6643
- Support multiple aggregates in top-level pivot by @m...
0.7.1 Bugfix Release
This is a bug fix release for various issues discovered after we released 0.7.0. There are no new features, just bug fixes. Notably, there is no incompatibility with database files created with v0.7.0
Changes
- When building extensions we need to add _storage_init to the whitelist on MacOS by @Mytherin in #6243
- Some more read_json_auto bugfixes by @lnkuiper in #6244
- Fix for Thrift.h: std::iterator is deprecated by @hannes in #6250
- Add missing shell mode descriptions by @papparapa in #6256
- Fix #6255: Shell should be installed in INSTALL_BIN_DIR by @Mytherin in #6266
- Bump Julia to v0.7.0 by @Mytherin in #6280
- Skip headers in read_csv functions as well by @pdet in #6267
- Correctly compute Windows terminal width, and add a
.maxwidthoption to the shell for duckbox mode by @Mytherin in #6274 - Fix lateral join bug by @taniabogatsch in #6268
- fix: add storage_version_info entry for v0.7.0 by @Mause in #6279
- Fix to #5461 by @annnei in #6265
- CI fixes by @Mytherin in #6289
- [Fuzzer] Fixes fuzzer issue 11 by @Tishj in #6191
- Partially Fix #6253: Improve handling of timezones in the regular VARCHAR -> TIMESTAMP cast by @Mytherin in #6283
- Error message on no content-length header by @samansmink in #6293
- fixes #6238 by @rpbouman in #6239
- fixes #6236 by @rpbouman in #6252
- Missing extension exceptions by @lverdoes in #6294
- feat: allow extensions to implement CREATE/DROP DATABASE by @rjatwal in #6115
- fix(python): python object types in stubs by @Mause in #5732
- Fix UPSERT binding issue related to the source table_index by @Tishj in #6275
- fix: DESCRIBE does not show primary key by @gkaretka in #6068
- Fix #6276: avoid transforming the root arg of a case expression multiple times by @Mytherin in #6300
- More read_json(_auto) bugfixes by @lnkuiper in #6281
- JDBC: Expand Blob, add UUID support by @MariusVolkhart in #6302
- CMake: Move from GREATER_EQUAL to GREATER, fixing #5528 by @carlopi in #6310
- Implement #6003 - add names option to CSV reader by @Mytherin in #6308
- CI: Test for cron based workflows by @carlopi in #6311
- CI Fix + match tests on less specific error messages by @Mytherin in #6320
- Fix #6314: select correct block index in IEJoin - and fix issues with left/right IE join resuming in case of multiple matches by @Mytherin in #6323
- CI: all workflows moved to nightly by @carlopi in #6334
- Fixes #6315: keep names/types around so description can be used after result is closed by @Mytherin in #6326
- Fix #5800: add missing Copy() calls, and add ALTERNATE_VERIFY method to verify Copy of INSERT/UPDATE/DELETE/COPY statements by @Mytherin in #6327
- Apply lower casing to extension aliases by @Mytherin in #6331
- Fix #6304: correctly handle NULL partitions and constant vectors, plus handle default parameters in COPY by @Mytherin in #6336
- [Python] DuckDBPyRelation: Change
explainmethod and addsqlmethod by @Tishj in #6287 - Fix Polars CI and properly implement check_ methods in the dataframes by @pdet in #6347
- Fixing a clang16 problem that slipped through by @hannes in #6345
- Fix #6341: LEFT/RIGHT/OUTER join on condition that is always true is only equal to a cross product if the other side is not empty by @Mytherin in #6342
- CI: Skip any CI on branches named 'feature' or 'master' by @carlopi in #6350
- Add correct bail-out to CSV auto-detection on oddly/inconsistently formatted CSV files by @Mytherin in #6330
- CI: Invert path-ignore for tools folders by @carlopi in #6353
- NULLs sort last in relational by @krlmlr in #5994
- Properly deal with Star (*) expressions in
COPY ... (FORMAT JSON)by @lnkuiper in #6319 - fixes #6227 by @rpbouman in #6230
- fix typos in dictionary_store_worst_case.benchmark by @hnjylwb in #6371
- Julia: Support change timezone config by @xcaptain in #6358
- Paths-ignore on push by @carlopi in #6363
- JDBC - Add separate treatment for timestamptz values by @Jens-H in #6364
- bugfix: switch to fsspec's strip protocol impl by @Mause in #6361
- Disable tidy on ODBC for now by @Mytherin in #6379
- Implements function "sqlite3_column_table_name" for the sqlite3 wrapper by @TinyTinni in #6385
- [Python] No jemalloc for successful build on android by @papparapa in #6383
- throw BinderException on empty list in percentile by @samansmink in #6378
- Add optimizer flag to R and Python Substrait api by @LindsayWray in #6097
- fixes #6269 by @rpbouman in #6291
- Java: Use automatic resource management for AutoCloseable types by @MariusVolkhart in #6377
- Fix progress bar in (parallel) CSV reader by @Mytherin in #6397
- Fix #6393: for DESCRIBE order by column_index instead of column_name by @Mytherin in #6398
- ART (bug) fixes by @taniabogatsch in #6396
- [NodeJS] Support multi-statement prepare by @Tishj in #6278
- Java: Use StringBuilder where appropriate by @MariusVolkhart in #6373
- Bitpacking bug by @samansmink in #6402
- bugfix(fsspec): missing fs methods by @Mause in #6395
- Auto-load HTTPFS extension when http(s)/s3 files are queried and it is not loaded + upgrade SQLite scanner version/other extension fixes by @Mytherin in #6401
- Add helpful error message if a setting from an extension is attempted to be set when the extension is not loaded by @Mytherin in #6406
- namespace typos in blocking concurrent queue by @csruiliu in #6408
- Java: Implement DatabaseMetaData#isReadOnly() by @MariusVolkhart in #6375
- CI fixes by @carlopi in #6414
- Parquet: for DELTA_BYTE_ARRAY encoding verify that lengths of subsequent arrays do not exceed length of BYTE_ARRAY by @Mytherin in #6412
- Fix #6235: correctly return catalog for views in information_schema by @Mytherin in #6413
- Enable CMAKE_EXPORT_COMPILE_COMMANDS ON default by @JackDrogon in #6394
- Fix #5878: only delete the temp directory if we created it, otherwise delete only our temp files by @Mytherin in #6425
- Fix under-specified test by @Mytherin in #6419
- fix: logic fix to allow storage extension to implement DROP DATABASE by @stephaniewang526 in #6430
- Map bug combo of const & non-const lists by @LindsayWray in #6354
- Issue #6272: Window Scaled Repartitioning by @hawkfish in #6366
- respect column order for partitioned write by @samansmink in #6436
- Properly initialize string vector when reading large JSON arrays of strings by @lnkuiper in #6437
- Fix #6420 - correctly delete temporary files that are not explicitly read back but just dropped by @Mytherin in #6424
- Julia: support Pkg.test() by @chris-b1 in #6431
- fixes sqlite3_column_bytes nullptr access on some call ordering by @TinyTinni in #6409
- Write struct fields as optionally quoted in EXPORT DATABASE by @Tishj in #6416
- Enables sqlite3 wrapper tests for win32 builds by @TinyTinni in #6427
- Adding separate extension_directory configuration setting by @hannes in https://github.com/duckdb/duckdb/...
0.7.0 Preview Release "Labradorius"
This preview release of DuckDB is named "Labradorius" after the Labrador duck (Camptorhynchus labradorius) which was native to North America and went extinct in 1878 despite its reportedly bad taste.
Again, @Mytherin has written a blog post explaining the exciting list of new features in this release.
Binary builds are listed at the bottom of this post. Please note that it can take a couple of hours until binary builds for all platforms and environments are available.
Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE command with the old version followed by IMPORT DATABASE with the new version to migrate your data. See the documentation for details.
What's Changed
- Use structs to avoid confusing C pointer wrappers by @krlmlr in #4961
- Enum type added to the types metadata table by @LindsayWray in #5290
- R: code format by @krlmlr in #5185
- Add starts_with function and operator by @papparapa in #5334
- Feature: Allow binary-formatted strings to be cast to integers by @Maxxen in #5337
- For range joins use NL join when the LHS or RHS side is tiny by @Mytherin in #5399
- Add support for LATERAL joins by @Mytherin in #5393
- [Julia] Add support for consuming a UNION vector into a DataFrame by @Tishj in #5360
- Issue #5314: At Time Zone by @hawkfish in #5341
- Decimal values now round when the value given has more decimals than the
scaleof the target by @Tishj in #5362 - Shell: add individual SQL queries to the history, instead of individual lines by @Mytherin in #5414
- Shell: add support for history search by @Mytherin in #5415
- Parallelise scanning result of ORDER_BY by @lnkuiper in #5403
- Add translate function by @zhouliqi in #5212
- Enable cmake to recognize AppleClang by @changhiskhan in #5432
- Support enum_code() function by @lokax in #5408
- Fix binder error and produce more informative error message. by @Tmonster in #5302
- Parquet Reader: Re-use (de)compression and dictionary buffers and allocate powers of two by @Mytherin in #5445
- Support RLE, DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY Parquet encodings by @Mytherin in #5457
- print profiling output for deserialized logical query plans by @ila in #5448
- Issue #5277: Sorted Aggregate Sorting by @hawkfish in #5456
- Add internal flag to duckdb_functions, and correctly set internal flag for internal functions by @Mytherin in #5462
- Add experimental R String passthrough support by @hannes in #5479
- Issue #5258: Quantile Negative Fractions by @hawkfish in #5463
- Arrow stream ingestion for JDBC client by @hannes in #5449
- PER_THREAD_OUTPUT flag for COPY by @hannes in #5412
- Feature: skip broken tests for now by @Mytherin in #5532
- Add Union All support to R extention by @Tmonster in #5484
- [Python] Add from_parquet features by @papparapa in #5492
- Add ExtractStatements to C API by @LindsayWray in #5524
- Improve http retry by @samansmink in #5549
- Issue #5277: Sorted Aggregate Window by @hawkfish in #5571
- Issue #5422: QUANTILE_DESC Decimals by @hawkfish in #5572
- Issue #5559: 2022g Time Zones by @hawkfish in #5570
- [Dev] Clean up of the python pkg folder structure by @Tishj in #5436
- httpfs: check environment vars for AWS Credentials by @satotake in #5419
- Misc union-type improvements by @Maxxen in #5617
- Fix so Left inner join doesn't re-optimize nodes by @Tmonster in #5620
- [Substrait] C API + from_substrait_json + bump on substrait version. by @pdet in #5613
- Allow strings in ColumnDataCollection to be written to disk by @lnkuiper in #5543
- [PythonDEV] Let
clean.shbe run from anywhere, not justtools/pythonpkgby @Tishj in #5625 - Reorganize Join order optimizer code by @Tmonster in #5621
- [Catalog] Grab missing write_locks in a couple places by @Tishj in #5601
- Parquet info to Substrait by @pdet in #5627
- HTTP parquet optimizations by @samansmink in #5405
- Adding delta compression to Bitpacking compression by @samansmink in #5491
- [Python] Changed use of DuckDBPyConnection to shared_ptr by @Tishj in #5635
- Merge feature branch into master by @Mytherin in #5645
- [Python] Display progress bar by default in an interactive environment by @Tishj in #5596
- Add support for
RESETstatement on configuration options by @Tishj in #5603 - httpfs: Encode url path on request by @satotake in #5587
- Fix broken CI because of RESET statement by @Tishj in #5671
- Don't automatically set the bug label on issues by @Mytherin in #5680
- Add support for CREATE VIEW IF NOT EXISTS by @Mytherin in #5682
- Issue #5622: Validate Timezone Characters by @hawkfish in #5658
- Issue 5630 fix. by @Tmonster in #5644
- Adding COLUMN_TYPES option for read_csv_auto by @pdet in #5552
- [Python] Get rid of DuckDBPyResult (merged functionality into DuckDBPyRelation) by @Tishj in #5597
- feat: port nodejs tests to typescript by @Mause in #5632
- Improve nodejs README by @Tishj in #5688
- [Python] Add (partial) support for
numpy.datetime64objects by @Tishj in #5659 - retry on all httplib errors by @samansmink in #5684
- Return false if file doesn't exist by @Y-- in #5701
- Adding context option to not run replacement scans and exporting namespace of json substrait function - R by @pdet in #5689
- Issue #5609: Scope CTE Windows by @hawkfish in #5690
- Attempt to fix random NodeJS CI failure by @Tishj in #5710
- [Python]
duckdb.execute()==duckdb.default_connection.execute()by @Tishj in #5650 - NodeJS: switch to using package_build, and add support to BUILD_NODE to Makefile by @Mytherin in #5691
- JDBC SNAPSHOT Jars by @hannes in #5687
- Fix NodeJS 19 CI for Windows by @Tishj in #5719
- Fix issue 5664 by @lokax in #5667
- Issue #5712: CURRENT_TIMESTAMP and CURRENT_TIME by @hawkfish in #5713
- [CSVReader] Catch a user error in supplying 'columns' option by @Tishj in #5721
- Improve suggestions when LOAD of an extension fails by @Mytherin in #5722
- doc(nodejs): amend arrow stream type docs by @Mause in #5731
- Fix for TSV throwing during sniffing by @pdet in #5555
- Statically link extensions on Linux with Clang by @jkub in #5653
- [Python] Add support for named parameters by @Tishj in #5611
- fix: nodejs source releases should be standalone by @Mause in #5734
- build: don't install python from chocolatey by @Mause in #5740
- fix: use non-string-splitting variable interpolation in binding.gyp.in by @Mause in #5745
- Equalizing DBConfig constructors by @nicku33 in #5747
- We should not treat replacement open paths as disk paths by @nicku33 in #5748
- Allow table in-out functions to be used in correlated subqueries and as LATERAL queries by @Mytherin in https://github.com/duckdb/duckdb...
0.6.1 Bugfix Release
This is a bug fix release for various issues discovered after we released 0.6.0. There are no new features, just bug fixes.
What's Changed
- Correctly accept BUILD_JEMALLOC_EXTENSION on Linux by @Mytherin in #5343
- [julia] fix docstring of
load!and relax type restriction by @jfb-h in #5354 - Bump DuckDB_jll compat to v0.6 by @jeremiahpslewis in #5356
- Issue #5342: DATE_PART Struct Indexing by @hawkfish in #5382
- Add reference to cleanup function for duckdb_result_get_chunk by @ak-coram in #5389
- Fix #5390: in filter pull-up optimizer avoid adding columns to one side of a set operation by @Mytherin in #5400
- Fix #5371: correctly use instance cache in JDBC and ODBC connector by @Mytherin in #5398
- Add support for reading JSON type columns from Parquet files by @Mytherin in #5401
- [Dev] Fix compilation issues related to MSVC and Windows.h by @Tishj in #5386
- fix: upgrade npm's internal node-gyp by @Mause in #5402
- [Appender] Appender can now properly append to DECIMAL columns by @Tishj in #5364
- Fix bug causing loss of order preservation in insert by @lnkuiper in #5427
- Allocator: throw std::bad_alloc if a malloc allocation fails by @Mytherin in #5439
- Fix the use of COLUMNS(...) in ORDER BY clause by @lokax in #5444
- Adding lazy relation -> data.frame conversion for R client by @hannes in #5181
- Fix #5450, don't crash on integer dates in R by @hannes in #5451
- Issue #5366: QUANTILE_DISC Intervals by @hawkfish in #5442
- Remove the f off by @hatvik in #5475
- Fix many fuzzer issues by @Mytherin in #5482
- Allow column references in constant table functions by @Mytherin in #5483
- Node register arrow ipc buffer fix by @samansmink in #5433
- Add initializer for queue_insertions by @hannes in #5504
- Disabling per-value materialization of r altrep strings in results by @hannes in #5454
- Correctly set delim_offset in flatten dependent join and disable linux arrow test by @Mytherin in #5509
- update arrow extension by @samansmink in #5506
- [Python] Correct stub for DuckDBPyConnection::df by @Tishj in #5385
- Add deserialization to custom operators by @rjatwal in #5496
- [Python] No longer truncate ByteArray values by nullbytes by @Tishj in #5517
- Add in the pg_database, pg_proc, and pg_settings views to pg_catalog by @jwills in #5526
- Fix various BufferManager issues by @lnkuiper in #5476
- Add feature request link by @Mause in #5324
- [Python] Fix
relation.query()not accepting non-select statements by @Tishj in #5531 - fix issue #5488 by @samansmink in #5519
- [Python] Adding back Query interrupt support (through Ctrl+C) by @Tishj in #5487
- Adding dummy user/username/password settings by @hannes in #5530
- Add memory leak tests, and fix memory leaks related to repeated table creation/destruction by @Mytherin in #5537
- DuckBox renderer fixes by @Mytherin in #5539
- Fix #5533: correctly use timestamp logical type unit in Parquet stats reader by @Mytherin in #5540
- Disable the extended code coverage tests for now by @Mytherin in #5542
- NLJoin is not always terrible by @pdet in #5538
- naming mismatch for linux arm extension upload by @samansmink in #5556
- Deprecate 'sprintf' usage using MacOSX SDK 13 by @darrenfu in #5545
- Fix #5546: allow foldable scalar expressions in standard table functions by @Mytherin in #5550
- Upgrade sqlite scanner hash by @Mytherin in #5551
- [Python] Fixed bug where creating a cursor from a closed connection caused a segfault by @Tishj in #5565
- Fsst pull bugfix from upstream by @samansmink in #5567
- Parquet: Not setting num_children for primitive types as per spec by @hannes in #5579
- [Python] Fix accidental dependency on
pandasby @Tishj in #5581 - Throw error when sorting or using indexes on big endian architecture by @Mytherin in #5588
- fix: separate artifacts for 32bit and 64bit builds by @Mause in #5592
- Bug fix for 5523 by @taniabogatsch in #5554
- Disabling truncating of temporary buffer manager files on Windows by @hannes in #5600
- Removed FSST unused global that triggered compiler warning by @hannes in #5602
- Copy JDBC Properties to not lose readonly setting by @hannes in #5594
Full Changelog: v0.6.0...v0.6.1
0.6.0 Preview Release "Oxyura"
This preview release of DuckDB is named "Oxyura" after the White-headed duck (Oxyura leucocephala) which is an endangered species native to Eurasia.
This time, @Mytherin has written a blog post explaining the quite long and exciting list of new features in this release.
Binary builds are listed at the bottom of this post. Please note that it can take a couple of hours until binary builds for all platforms and environments are available.
Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE command with the old version followed by IMPORT DATABASE with the new version to migrate your data. See the documentation for details.
Featured Changes
- Optimistically write data to disk when batch loading data into the system by @Mytherin in #4996
- Parallel non-order preserving CREATE TABLE AS and INSERT INTO by @Mytherin in #5033
- Parallel order preserving CREATE TABLE AS and INSERT INTO by @Mytherin in #5082
- FSST compression by @samansmink in #4366
- CHIMP128 Compression by @Tishj in #4878
- Patas Compression (float/double) (variation on Chimp) by @Tishj in #5044
- Parallel CSV Reader by @pdet in #5194
- Parallelize CREATE INDEX of ART by @taniabogatsch in #4655
- Improve memory management of ART indexes by @Mytherin in #5292
- DISTINCT aggregates with GROUP BY are now executed in parallel by @Tishj in #5146
- Nested "UNION"-type by @Maxxen in #4966
- Allow for queries to start with FROM, instead of with SELECT by @Mytherin in #5076
- Support for the COLUMNS expression, which allows expanding computations on multiple columns by @Mytherin in #5120
- Python-style list-comprehension syntax @Mytherin in #4926
- Improvements to Out-of-Core Hash Join by @lnkuiper in #4970
- jemalloc "extension" for Linux by @lnkuiper in #4971
- Improve rendering of result sets for the shell by @Mytherin in #5140
- Add auto-complete support to the shell by @Mytherin in #4921
- Nicer looking progress bar by @Mytherin in #5187
All Changes
- Fix #4747: Handle pandas num categories between 128 and 256 by @pankajp in #4757
- Julia 0.5.1 by @Mytherin in #4758
- Fix #3595: avoid using system hash for floating point values by @Mytherin in #4761
- Fix #4704. Correct the column name for pragma_storage_info with generated column by @zippond in #4750
- Allow to load extensions through compiler variable definitions by @pdet in #4767
- Fix some typo in code comments by @buaazhwb in #4769
- Enhance duckdb_constraints() by @krlmlr in #4346
- Issue #4764: Window Ignore Nulls by @hawkfish in #4773
- [Python (Relational)] Query now returns a DuckDBPyRelation by @Tishj in #4471
- R types expansion by @hannes in #4778
- Add json_contains by @lnkuiper in #4686
- Fix #4152: create base table reference in returning clause so generated columns are correctly resolved by @Mytherin in #4783
- Fix Exists and ANY correlated subquerys by @lokax in #4752
- Fix for ORDER BY on large dictionary vectors: correctly pass offset into get_index of selection vector by @Mytherin in #4787
- Missing json_contains in extension list by @Mytherin in #4788
- Extensible Casts & Cast Function Rework by @Mytherin in #4785
- Bump sqlite scanner by @hannes in #4789
- Improve sorting for strings and push projections into sort operator by @lnkuiper in #4697
- Parquet: Refactor decompression, including more complete datapage v2 support by @wisp3rwind in #4628
- Parallelize CREATE INDEX of ART by @taniabogatsch in #4655
- Unify LocalStorage and DataTable Storage by @Mytherin in #4798
- feat: support passing all db config to jdbc driver by @Mause in #4794
- Fix #4806: correctly use offset index in pragma_table_info on view by @Mytherin in #4807
- Map VARCHAR, JSON, ENUM to Julia String by @nickrobinson251 in #4810
- fix: support SHOW query types in jdbc client by @Mause in #4799
- Replacement Open Hooks by @hannes in #4721
- Build multiple out of tree extensions in one pass by @Mytherin in #4828
- fix(jdbc): release results before releasing statements by @Mause in #4831
- Fix for #4827 by @PedroTadim in #4829
- Multiblock2 by @jkub in #4555
- Disconnect after test by @krlmlr in #4835
- Check prefix length, not string_t::INLINE_LENGTH when comparing strings while sorting by @lnkuiper in #4816
- Adding a CI workflow to re-build individual out-of-tree extensions by @hannes in #4833
- fix: json getColumnType error by @Mause in #4847
- Attempt two at rebuilding old extensions by @hannes in #4848
- Updating postgres scanner by @hannes in #4832
- Extension Rebuild Attempt 3 by @hannes in #4849
- Adding overwrite flag to R duckdb_register by @hannes in #4850
- Move LocalStorage row groups directly to DataTable instead of re-appending by @Mytherin in #4851
- fix for macos CI by @samansmink in #4854
- Fully qualified s3url by @LindsayWray in #4786
- FSST compression by @samansmink in #4366
- Julia: add support for handling errors in replacement scans by @Mytherin in #4865
- Extension build: turn IGNORE_WARNINGS into generic OPTIONS field, and add --main-only field by @Mytherin in #4866
- Issue #4867: Approximate Quantile Hugeint by @hawkfish in #4868
- Install OpenSSH on ubuntu 16 by @Mytherin in #4877
- Join order regression test: add 20% threshold to cardinalities before we care about regressions by @Mytherin in #4880
- Move LocalStorage row groups directly to DataTable if there are enough rows being appended by @Mytherin in #4876
- Allow referencing of aliases in SELECT clause and TPC-DS extension clean-up by @Mytherin in #4879
- Add github to known hosts by @Mytherin in #4884
- Adding a serialized version of all TPCH queries and test we can read them by @bleskes in #4605
- Add support for custom bind functions to RegisterCastFunction, and propagate client context to the bind function by @Mytherin in #4885
- CSV reader: quoted NULL values should be kept as non-NULL by @Mytherin in #4888
- fix: add numpy to setup_requires to fix build from source by @Mause in #4893
- fix openFlags overwriting in shell fixing #4894 by @kouta-kun in #4895
- Remove filter columns from table scans if they are unused in the remainder of the plan by @lnkuiper in #4817
- feat: add duckdb_library_version method and fix extension load state by @Mause in #4881
- uuid.cpp: GenerateRandomUUID: fix indexing by @nodakai in #4892
- Update serialized plans by @Mytherin in #4900
- Add CPython 3.11 to build matrix by @edgarrmondragon in #4906
- Support UNION_BY_NAME option in read_csv_auto by @douenergy in #4837
- support for virtualizing storage layer by @jkub in #4858
- Reduce data set size of IE join test by @Mytherin in #4905
- Making sure parquet column readers return the expected amount of rows by @ha...
0.5.1 Bugfix Release
This is a bug fix release for various issues discovered after we released 0.5.0. There are no new features, just bug fixes. The following PRs were included in this release:
- [Fuzzer] Issue #4152 - Lag window function issue by @lokax in #4603
- Fix zonemap check for VARCHAR by @lokax in #4613
- Remove the DLLEXPORT from deleted API methods by @emmenlau in #4611
- Fix update statement on generated column by @lokax in #4616
- [Fuzzer] Issue #4152 - Limit 0% on ANY subquery by @lokax in #4544
- [Fuzzer] Issue #4610 - Vacuum table with generated column by @lokax in #4622
- [Fuzzer] Decimal scale+width overflows too quickly by @Tishj in #4627
- [Fuzzer] issue #4566 by @Tishj in #4592
- Issue #4635: DATE_DIFF Week Boundaries by @hawkfish in #4648
- Fix issue #4630 by @lnkuiper in #4642
- [Python] Fix unwanted conversion from NaN -> NULL in param list by @Tishj in #4624
- Fix home directory setter by @attilahorvath in #4617
- fix(jdbc): correct mapping for TIMESTAMP_WITH_TIME_ZONE by @Mause in #4654
- Fix bug changing input order on array_sort column by @taniabogatsch in #4643
- Fix issue #4625 by @lnkuiper in #4653
- [Extensions] Suggesting which extension to Load/Install by @pdet in #4634
- Fixes issue #4123 by @Tishj in #4523
- Updating jdbc deploy script by @hannes in #4663
- Consistent struct definitions by @hannes in #4667
- Fix #4666 by @taofengliu in #4670
- Fix for #3417 by @PedroTadim in #4664
- feat: improve python replacement scan error by @Mause in #4672
- [C-API] Data chunk invalid left-shift by @Tishj in #4660
- fix: correct mislabelling of amd64 libs in jars by @Mause in #4691
- Fix #4647 by @taofengliu in #4698
- Throw error if attempting to delete from table without physical columns by @Tishj in #4693
- Fix #4475: allow ignore_errors in read_csv and read_csv_auto by @Mytherin in #4713
- Fix #4442: correctly handle TIMESTAMP logicalType in Parquet files by @Mytherin in #4714
- Fix #4699: when no file is found globbing, fallback to using the literal string name as a path by @Mytherin in #4716
- Fuzzer fixes batch 1 by @Mytherin in #4707
- Fix #4677. Correctly set_not_null when table contains generated column by @zippond in #4706
- Fix #4703 by @taofengliu in #4715
- Fixing Extension naming CI Checker by @pdet in #4717
- [Python(pandas)] Scan multiple chunks worth of values from a 'object' dtype DataFrame by @Tishj in #4692
- Fix #4694: Keep shared pointer to pipelines around in additionally scheduled events by @Mytherin in #4724
- Fuzzer Batch Fixes 2 by @Mytherin in #4722
- Fix #4702. Correctly use index when generated column is involved by @zippond in #4727
- Fix for #4583 by @PedroTadim in #4728
- Fuzzer fix batch 3 by @Mytherin in #4726
- Fix #4562: generate table index for dummy scan generated from VALUES clause by @Mytherin in #4731
- [Arrow] Guarantee threads don't call get_next after stream is done. by @pdet in #4712
- Correctly catch and report exceptions thrown during a pipeline's scheduling by @Mytherin in #4733
- Fix for issue #4708 by @PedroTadim in #4711
- Fix #4568: correctly handle casts in deliminator by @Mytherin in #4734
- No longer disable vptr sanitizer on M1 macs by @Mytherin in #4735
- Use version tag as dir for extensions for releases by @samansmink in #4729
- Correctly call ::Skip function of child of structs by @Mytherin in #4736
- [Map] Map extract now properly uses the selection vectors of the
mapandkeyvectors by @Tishj in #4725 - Fix #4356 by @taofengliu in #4740
- Fuzzer Batch 4 by @Mytherin in #4737
- feat: bump Julia package version by @Mause in #4742
- Julia API: Add load! to add a DataFrame as a table by @jfb-h in #4743
- aarch64 extensions by @samansmink in #4745
- Faster hive part filters by @samansmink in #4746
- [Python] DECIMAL with value 0.00... issue fix by @Tishj in #4690
- enable out-of-tree extensions for aarch64 by @samansmink in #4751
Full Changelog: v0.5.0...v0.5.1
0.5.0 Preview Release "Pulchellus"
This preview release of DuckDB is named "Pulchellus" after the Green pygmy goose (Nettapus pulchellus) which is native to Australia where VLDB 2022 is starting today. Despite being called a "goose" it is actually a duck.
Binary builds are listed at the bottom of this post. Feedback is very welcome.
Note: Again, this release introduces a backwards-incompatible change to the on-disk storage format. We suggest you use the EXPORT DATABASE command with the old version followed by IMPORT DATABASE with the new version to migrate your data. See the documentation for details.
Below a list of changes in this release
Major Changes & Features
- #4189: Implement Out-of-Core Hash Join and Re-Work Query Verification
- #4022: Art Index Storage
- #4274: Join Order Optimizer improvements
- #4420: Logical Plan Serialization
- #4137, #4347, #4293, #4190, #4178, #4177, #3954 & #4159: Scalability and performance improvements for Window operator
- #4004: Add support for extensions to the parser, and add an example of this to the loadable extension demo
- #4089: Signed Extensions
- #4097 & #4211: Filename column + Hive partitioning support for Parquet Reader
- #4501, #4511: Aarch64 Linux builds of CLI, shared library, JDBC & ODBC
Minor Changes & Bug Fixes
- #4594: [Map] Fix map_extract from multiple rows
- #4585: Fix for r test instability, #4549
- #4560: Support all basic integer types in node API
- #4558: [CPP-API] Comment no longer causes crash
- #4552: [Fuzzer] Issue #4152 - Remove ToString roundtrip in query verification
- #4543: Fixing silent assertions
- #4542: Check if database is still alive when trying to connect for nodejs
- #4541: fix for issue 4533
- #4539: Paralelization non-dependent on Arrow rows
- #4524: Explicitly deleting default connection on js side
- #4522: Correct architecture name for Linux aarch64
- #4521: Adding correct substrait release tag to out-of-tree extension deployment
- #4520: Added test cases for several fixed JDBC issues
- #4516: Fix #4455, dont set default schema in transform
- #4513: Issue 4502
- #4510: [Casting] Varchar -> Decimal cast fix
- #4507: [CSV] Fixed bug related to invalidated iterators
- #4505: extension trigger event
- #4504: fix: short-circuit hash and version discovery
- #4496: [Fuzzer] Issue #4152 - Force no cross-product issue
- #4495: Build ODBC driver binary for OSX
- #4494: [Fuzzer] Issue #4152 - Analyze inexisting column
- #4493: Declare all variables for nodejs.
- #4491: Issue #4419: Range Join Swizzling
- #4488: Making the parquet extension loadable
- #4484: fix: ignore status message from output of mypy stubs check
- #4483: [Development bug] unittest result_helper.cpp triggers assertion
- #4480: Remove REST server
- #4479: Remove assertion
- #4477: Removing Substrait From DuckDB Repo
- #4474: WIP #4152
- #4472: [Python] Removed mutable default parameters
- #4470: Fix hidden merge conflict with fetchmany
- #4465: [Python]
fetchmanyimplemented - #4458: Issue #4454: VARCHAR/DATE Reversibility
- #4448: Issue #3954: Pinned Heap Blocks
- #4440: Added support for HUGEINT input type to BIT_COUNT scalar function
- #4434: Python: Add PyRelation.fetchnumpy()
- #4429: Allow indicating a format version that should be used to write/read from (De)serializer and use it for plans
- #4427: Python: Improve docstrings for DuckDBPyRelation and DuckDBPyResult
- #4418: Fix typo
- #4416: Fix several update issues
- #4413: Correctly schedule mix of union/child pipelines (again)
- #4409: Increase timeout for coverage checks
- #4405: Hybrid ART Leaf Part I
- #4404: Add support for TS_MS, TS_NS, and TS_S
- #4400: Issue #4388: DATE_TRUNC Low Precision
- #4398: fix: correct object return types for arrow functions
- #4395: Fix name of environment variable
- #4390: Support UNION BY NAME set operation
- #4383: Missing LISTs are NULL
- #4382: Include PID in test directory name
- #4380: R: Avoid
translate_duckdb()in tests - #4377: R: Full BLOB support
- #4372: Fix #4370: correctly handle non-flat vectors in list_sort
- #4371: [Python] Changed all RuntimeErrors thrown in the Python client
- #4368: Fixes issue #4365 - Not null constraint is no longer duplicated
- #4364: Allow extra parameters in list_aggr to be passed in, as long as they are constant and only used during the bind
- #4363: Fix for array_position with NaNs: use Equals::Operation instead of regular equality
- #4362: Allow table functions to set cardinality stats through the C API - and utilize this in Julia DataFrame scans
- #4359: Mark slow tests
- #4355: Fix typo in exception text
- #4354: R: Use preinstalled symbol
- #4353: Shell: Add missing newline in help output
- #4352: Tweak contributing guide [ci skip]
- #4345: [Substrait] Pushing-down projections and filters to read relation
- #4340: Correctly schedule pipeline dependencies when scheduling mix of UNION and FULL OUTER JOINs
- #4336: feat: add basic json support to jdbc client
- #4334: Bring ibis/substrait tests to a sane state
- #4332: Fix Julia parallelism interleaving with the garbage collector, and expose Pending Query Result in C interface
- #4328: Allow specifying a custom home directory using the SET home_directory option
- #4327: [Aggregate] DISTINCT aggregates without GROUP BY are now executed in parallel
- #4324: Fix #4309: fix for multiple foreign key constraints on the same table-table pair
- #4323: Optimizer profiling
- #4322: Print NOT operator correctly
- #4319: feat: add missing node versions to CI
- #4317: refactor: remove dead code in python client
- #4316: R: Add rlang as suggested dependency
- #4315: Column Data Collection, Arrow Result conversion rework, Cross Product performance fixes & more
- #4312: R: Install tidy CLI tool
- #4310: R: Add test for
test_all_types() - #4304: Improve numeric hash function to a better but slightly slower hash function
- #4301: Add unit of measurement in timer function
- #4300: Support root type on expressions #4278
- #4298: Feature/nodejs client docs
- #4297: fix: remove nodejs test focus
- #4296: Avoid infinite loop in range(NULL)
- #4294: #4276 Serializing data types on table schema in substrait
- #4289: [Python/Pandas] fix +/- inf wrongly converting to NaN (NULL)
- #4288: Fix fuzzer issue w.r.t. NULL values in generate_series
- #4286: [Python - Relation] CreateView on a filtered relation does not cause infinite loop anymore
- #4285: chore: remove cython constraint now that bug is fixed
- #4284: Pandas timezone
- #4283: Return errors from RecordBatchReader
- #4280: R: Remove nycflights13 dependency
- #4279: R: Don't export duckdb_explain()
- #4277: feat: update setup.py links
- #4272: Allow 0 as a seed parameter
- #4266: R: Only quote non-syntactic and reserved words
- #4265: Specialize LIST aggregate function implementation
- #4263: R: Avoid attaching package during tests
- #4259: Add ANY_VALUE agg function
- #4256: Schedule child pipeline correctly
- #4255: Disable ibis substrait tests for now
- #4250: C API: Report appender error in case conversion fails
- #4240: DELIM_JOIN now propagate statistics correctly
- #4237: fix: pin cython to work around bug
- #4236: Integer types now correctly increase
widthof DECIMAL type. - #4235: Parquet writer: Write dictionary_page_offset, and distinct_count for dictionary encoded strings/enum
- #4234: Implement json_merge_patch and jsonlines output mode
- #4233: feat: fix pandas types in docstrings/python types
- #4230: Handle nulls in structs and lists
- #4225: Add Jaro Winkler
- #4215: Use right template for smallint
- #4213: feat: update instructions for installing master builds in bug report template
- #4212: Improve error message
- #4210: PARQUET: Move StringColumnWriter dictionary to use string_t to avoid allocations
- #4209: Remove unused PhysicalTypes
- #4207: Disable GC during Julia execution to avoid internal GC deadlock in DataFrame scan
- #4206: Fix #4202: in the comparison simplification optimizer, we can only shift the cast to the constant if both casts are invertible
- #4199: feat: Use pip to install and uninstall python client
- #4198: [capi] impl clear bindings for prepared stmt
- #4197: feat: port bug_report.md to bug_report.yml
- #4196: Fix RTTI issue across extension boundaries on OSX
- #4192: Correctly call SetFilePointerEx on Windows so the truncate works as expected
- #4191: Fix Expanded CI test case by adding swap space to test
- #4188: ALTER SEQUENCE IF EXISTS fix
- #4187: [Storage] FOR compression
- #4185: ISSUE #3248 Support for ALTER TABLE altering columns NOT NULL
- #4183: Julia multi-threading fix: avoid using a time-out to cancel threads in case there are no tasks
- #4179: node: add async-iterator-based streaming
- #4175: [CI] Python Build with Sanitizer
- #4172: Update stubs test
- #4168: Issue #4161: Create WindowExecutor
- #4167: node: report memory usage to the node GC
- #4166: Fix #4165: correctly fill in false_sel when performing comparison with constant null value
- #4160: node: don't crash on syntax errors
- #4154: Making date_trunc statistics handling consistent with date_part
- #4153: Support for int64 round trips in R driver using the bit64 package
- #4151: Fix orrify merge conflict
- #4143: Correctly handle query parameters in JDBC
- #4140: CI Fixes
- #4139: Remove redundant code
- #4138: Support struct.* to retrieve all struct fields in SELECT list
- #4134: Fuzzer Fixes
- #4133: Remove DUCKDB_API for deletes. (For Windows/ZIG)
- #4132: [Python]
projectnow correctly inherits owning references to PyObjects - #4131: Missing error messages
- #4125: Fix Orrify rename merge confl...
0.4.0 Preview Release "Ferruginea"
This preview release of DuckDB is named "Ferruginea" after the Andean Duck.
Binary builds are listed below. Feedback is very welcome.
Note: This release should be backwards-compatible wrt the on-disk storage format, but the next release may very well be incompatible again. So please don't rely on this just yet. We suggest you use the EXPORT DATABASE command with the old version followed by IMPORT DATABASE with the new version to migrate your data. See the documentation for details.
Also note: DuckDB is switching to semantic versioning. Version numbers look like this: MAJOR.MINOR.PATCH with changes to
- MAJOR version when you make incompatible API changes,
- MINOR version when you add functionality in a backwards compatible manner, and
- PATCH version when you make backwards compatible bug fixes.
However, note that because MAJOR is currently 0, "Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable."
Below a list of changes in this release
Major Changes & Features
- #3767: Table function rework, parallel Julia DF scans & Python regression tests
- #3749 & #3747: Query cancellation with CTRL-C for R and Python clients
- #3700: Support Parallel Order-Preserving Result Set Materialization
- #3696: Support WINDOW FILTER
- #3620: HTTP read optimization
- #3668: Adding alias type
- #3435: Add support for reading newline-delimited JSON
- #3783: Extension loading by statically linking DuckDB
Minor Changes & Bug Fixes
- #3905: Fix SQLancer CI
- #3904: Fix #3896: correctly compute GroupRowsAvailable in struct reader in case a child-entry is not just a list, but a struct with only list entries
- #3902: Fuzzer: fix sanitization of address sanitizer error
- #3901: R: Extract DetectLogicalType() function
- #3899: R: Check query return type instead of query type in dbFetch()
- #3898: Issue #3880: Rebind DATE_TRUNC dates
- #3894: Purge concurrent queue when enqueueing entries to prevent entries from piling up
- #3892: Fix for issue #3878
- #3889: Fix TreeRenderer crash on invalid UTF8
- #3888: Julia Table Functions: add stack trace to errors reported
- #3887: Correctly reset interrupted flag so verification does not overwrite original error
- #3886: Remove the check_tread from python connection
- #3879: Avoid title is too long error in fuzzer issue submission
- #3877: Fix use-after-free in create view with prepared statement parameter
- #3872: Glob with search paths
- #3871: [Python] Making new connections to cursors and adding lock on queries over sampe connection
- #3869: Several OSSFuzz fixes
- #3865: Fix #3860: add support for creating foreign keys on temporary tables, and for now disable support for cross-schema foreign keys
- #3863: Out-of-tree Extensions for Windows
- #3862: Rework of Struct <> Dictionary Vectors, and add test_vector_types function
- #3852: Added support for generated columns to TableCatalogEntry->ToSQL()
- #3850: Enable EXTENSION_STATIC_BUILD for Mac too
- #3849: [Python] Unbundle Substrait
- #3848: Parquet: fix for fixed length byte arrays in dictionary column reader
- #3847: Expand oss-fuzz tests to run queries and check for internal errors
- #3846: Pass through read only flag for node connector
- #3845: Add queries over Arrow to Python regression tests, and time entirety of TPC-H
- #3843: [JDBC] Pass through scale and precision for decimal types from DuckDBColumnTypeMetaData
- #3842: Allow to use custom memory allocator through DuckDB API on Windows
- #3837: Fix overflow in generate_series and overflow in abs operator
- #3832: Issue #3816: Parquet Time Zones
- #3831: s3fs decode keys correctly
- #3828: Update testthat snapshots
- #3818: Add SQLancer to CI Fuzzing Framework
- #3815: Out-of-tree Extension Builds
- #3812: Fix several issues found by Valgrind
- #3810: DuckDB.jl Julia Package History
- #3809: Add
shell: basheverywhere - #3802: fix ci breaking from extension PR
- #3799: Optimisation rule for regexp_matches with literal pattern
- #3798: Substrait: Adding more compatibility with Substrait and Ibis
- #3792: Issue #3790: Temporal IsFinite/IsInf
- #3791: Issue #3721: Rightshift Negative Hugeint
- #3786: Fix binding of fully qualified view reference
- #3785: Python: Allowing cursor to set check threads flag
- #3784: Improve speed of ALTER TABLE ADD COLUMN
- #3778: More node types
- #3777: Python: Updating Stubs and Bringing Stubs tests back
- #3776: Simplify
clangdtarget - #3775: Expose dbgen speed_seed functions on header file and add missing ones
- #3771: Increment R package version
- #3765: Issue #3759: Node Time Zone
- #3764: Issue #3763: List Min/Max Problems
- #3761: Fix .import not creating missing table in CLI
- #3760: Requiring keys provided to
mapto be unique - #3757: Fix #3756: fix issue when running blockwise NL join on dictionary vectors of structs
- #3752: Fixed error handling for node exec()
- #3751: Decreasing the overallocation for list aggregates
- #3750: Fix a bug in HyperLogLog
- #3746: Check if replacement scans don't leak memory
- #3745: Arrow/Pandas Case Insensitive Columns
- #3744: Treating ENUM Case in pyresult describe
- #3739: DuckDBPyRelation: support
offsetargument forlimit() - #3738: Fix #3730: avoid modifying the payload in-place in aggregate hash table, because it might be used multiple times in case of grouping sets
- #3736: JDBC better error handling
- #3733: Progress bar clean-up: fix thread sanitizer issue, and move progress bar code to individual operators
- #3720: Issue #3515: Add statistical rounding
- #3707: Fix #3702: avoid assertion that we are not storing internal entries in the file
- #3706: Implement sqlite3_file_control and sqlite3_sleep
- #3705: Add support for ENUM converted types in the Parquet reader
- #3699: Zero-copy scans for non-list uncompressed segments
- #3695: Only rename pandas columns that have duplicates
- #3692: Compatibility with dev dbplyr
- #3691: Fix #3690: correctly assign catalog set to default objects to avoid crash when used as dependency
- #3681: R: Fail CI/CD on NOTEs, check examples on UBSAN, log valgrind output
- #3677: Fuzzer fix: avoid reporting non-internal errors
- #3676: More ccache removal from OSX Extension Release
- #3675: More extensive SQLLogicTest testing, and temporarily disable OR pushdown
- #3667: Handling dataframes with repeated names in columns outside the bind. Now when registering df for scan.
- #3665: Delete correct revision in pypi cleanup script
- #3664: try/except in pypi cleanup
- #3663: Return PY registered objects from temporary views
- #3662: Remove CCache from the OSX Extensions Release build
- #3661: Automatic PyPI cleanup in CI
- #3653: Fixing enum comparison at where clause to TRY_CAST
- #3652: to issue#3475 optimize CSG & CMP enumeration of join order optimizer
- #3650: Issue #3610 mem leak
- #3648: Julia DataFrame Scan Performance Improvements & TPC-H Tests
- #3646: ODBC: adjustments because of ADO
- #3643: Fix for #3639, dont use string copy and value api to fill factor vector
- #3635: Avoid running approx quantile with vsize=2
- #3634: Fix some issues with the fuzzer auto-closing issue behavior
- #3633: Add default type generator, move built-in types to default type class and improve error reporting for types
- #3632: Check for div by zero in distinct stats
- #3630: Fix issue 3611
- #3629: S3 Minio fix
- #3628: Issue #3625: Adding canonical guards around Arrow CData Interface
- #3624: Add interval to DBAPI description
- #3615: Fix #1785: correctly copy constraints in ADD COLUMN of alter table
- #3614: Correctly propagate what a statement returns from the binder
- #3613: SQLSmith fuzzer fixes
- #3612: SQLite UDF fixes for writefile and friends
- #3609: Fix operator precedence of ** in the parser
- #3608: Turn the expression depth limit into a configureable parameter
- #3607: Implements enter and exit functions on pyconnection to allow the use of context managers
- #3606: Use Python 3 for configuring R
- #3604: Equal or null optimization
- #3603: Fixing ascii bug in histogram strings
- #3602: Support for Arrow Timezone
- #3598: Add auto-commit off to JDBC Connection
- #3594: Issue #3588: Half constant BETWEEN
- #3592: Issue #3444: Approximate quantile lists
- #3589: Issue #1187: Virtual Generated Columns
- #3576: More compliant with substrait and upgrading version up to 0.1.2
- #3575: Issue #3534: Remove TIMESTAMPTZ casts
- #3574: Issue #3430: Temporal Infinity Values
- #3571: Fixing JNI, matching function signature exactly
- #3569: Implicit struct_pack
- #3564: Fix for #3562
- #3551: Issue #2309: Update benchmark info in README.
- #3550: ICU Extension Rework: clangd for extensions
- #3547: Issue #3273 support multistatments for JDBC driver
- #3546: Issue #2910: Support pandas boolean datatype
- #3533: Exit with the correct exit code in the regression test runner
- #3531: Correctly increment list offset on histogram aggregation
- #3528: Julia Client - re-enable parallelism by executing tasks on dedicated Julia threads
- #3524: Rework table-in-out function API, and move Unnest table function to table-in-out function
- #3523: Improve HyperLogLog
- #3519: Support in-place updates for unsigned integers
- #3516: Issue #3497: Round DECIMAL casts
- #3514: Issue #3453: Window Partition Collections
- #3512: Issue #3418: Match Multiple Spaces
- #3511: Fix #3505: Correctly handle Foreign Key syntax for when primary-key columns are not specified
- #3507: Fix merge conflicts
- #3504: ODBC: issue #3398
- #3503: ODBC: issue #3478
- #3502: Random-value ge...