merge_adjacent_files not compacting a table with lots of single row files

### What happens?

We have a situation where some of our DuckLake tables in our application do not appear to get compacted when calling `ducklake_merge_adjacent_files`. 

One of the tables that we have been testing with contains about 2838 rows; looking through the `ducklake_data_files` table and filtering by the `table_id` of that table, we see that there are 2555 data files linked to the table, most of them containing only 1 row, with the notable exception of the initial file, which contined 157 rows. There are no entries in `ducklake_deleted_files` associated with the table, and none of the existing files contain any lightweight snapshot entries in the `partial_file_info` column. We also do not have a custom `target_file_size` set. Yet, for some reason the table just refuses to get compacted down, and as a result our queries running against this table end up experiencing significant slowdown. 

I have attempted to reproduce the issue but have not been able to so far. I have created tables with approximately the same number of rows as well as distribution amongst the files, but while I have sometimes seen a table compact only "partially" (in the sense that a handful of single-row files survive the compaction), I have not been able to reproduce the issue where no files at all are getting compacted. 

I have attached the rows of the `ducklake_data_files` table that pertain to the table that we investigated.

[ducklake_data_file_tb_13.csv](https://github.com/user-attachments/files/23241713/ducklake_data_file_tb_13.csv)

### To Reproduce

Unfortunately I do not have a fully reproducible example at this point, but I am still trying to get one and will update here if I succeed. 

### OS:

macOS 15.5

### DuckDB Version:

1.4.1

### DuckLake Version:

f134ad8

### DuckDB Client:

Python

### Hardware:

_No response_

### Full Name:

Oliver Hsu

### Affiliation:

Ascend.io

### What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

### Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

### Did you include all code required to reproduce the issue?

- [ ] Yes, I have

### Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

- [x] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge_adjacent_files not compacting a table with lots of single row files #536

What happens?

To Reproduce

OS:

DuckDB Version:

DuckLake Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

merge_adjacent_files not compacting a table with lots of single row files #536

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckLake Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions