-
Notifications
You must be signed in to change notification settings - Fork 105
Description
What happens?
We have a situation where some of our DuckLake tables in our application do not appear to get compacted when calling ducklake_merge_adjacent_files.
One of the tables that we have been testing with contains about 2838 rows; looking through the ducklake_data_files table and filtering by the table_id of that table, we see that there are 2555 data files linked to the table, most of them containing only 1 row, with the notable exception of the initial file, which contined 157 rows. There are no entries in ducklake_deleted_files associated with the table, and none of the existing files contain any lightweight snapshot entries in the partial_file_info column. We also do not have a custom target_file_size set. Yet, for some reason the table just refuses to get compacted down, and as a result our queries running against this table end up experiencing significant slowdown.
I have attempted to reproduce the issue but have not been able to so far. I have created tables with approximately the same number of rows as well as distribution amongst the files, but while I have sometimes seen a table compact only "partially" (in the sense that a handful of single-row files survive the compaction), I have not been able to reproduce the issue where no files at all are getting compacted.
I have attached the rows of the ducklake_data_files table that pertain to the table that we investigated.
To Reproduce
Unfortunately I do not have a fully reproducible example at this point, but I am still trying to get one and will update here if I succeed.
OS:
macOS 15.5
DuckDB Version:
1.4.1
DuckLake Version:
DuckDB Client:
Python
Hardware:
No response
Full Name:
Oliver Hsu
Affiliation:
Ascend.io
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have