[Improvement]: Improve major optimizing: compact segment files to target size #2330

zhongqishang · 2023-11-20T06:35:13Z

Search before asking

I have searched in the issues and found no similar issues.

Backgroud

The continuously optimized Iceberg format table has a large number of segment files (16m~128m), and they are all close to the lower limit. A large number of small files will bring the following impacts:

Too many small files will occupy a lot of memory and directly affect the performance of NameNode.
Will cause query performance loss to the query engine (SparkSQL, Trino)

What would you like to be improved?

Most file sizes are close to target size.

How should we improve?

Fix current segment planner logic. Design Doc

Are you willing to submit PR?

Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

I agree to follow this project's Code of Conduct

zhongqishang added the type:improvement label Nov 20, 2023

zhongqishang mentioned this issue Nov 20, 2023

[AMORO-2330] Improve major plan #2332

Merged

3 tasks

zhoujinsong closed this as completed in #2332 Dec 19, 2023

zhongqishang mentioned this issue May 6, 2024

Release-0.7.0 roadmap #2176

Closed

66 tasks

zhoujinsong changed the title ~~[Improvement]: Improve major compaction~~ [Improvement]: Improve major optimizing: compact segment files to target size Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement]: Improve major optimizing: compact segment files to target size #2330

[Improvement]: Improve major optimizing: compact segment files to target size #2330

zhongqishang commented Nov 20, 2023 •

edited

Loading

[Improvement]: Improve major optimizing: compact segment files to target size #2330

[Improvement]: Improve major optimizing: compact segment files to target size #2330

Comments

zhongqishang commented Nov 20, 2023 • edited Loading

Search before asking

Backgroud

What would you like to be improved?

How should we improve?

Are you willing to submit PR?

Subtasks

Code of Conduct

zhongqishang commented Nov 20, 2023 •

edited

Loading