You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
Backgroud
The continuously optimized Iceberg format table has a large number of segment files (16m~128m), and they are all close to the lower limit. A large number of small files will bring the following impacts:
Too many small files will occupy a lot of memory and directly affect the performance of NameNode.
Will cause query performance loss to the query engine (SparkSQL, Trino)
zhoujinsong
changed the title
[Improvement]: Improve major compaction
[Improvement]: Improve major optimizing: compact segment files to target size
Jun 26, 2024
Search before asking
Backgroud
The continuously optimized Iceberg format table has a large number of segment files (16m~128m), and they are all close to the lower limit. A large number of small files will bring the following impacts:
What would you like to be improved?
Most file sizes are close to target size.
How should we improve?
Fix current segment planner logic. Design Doc
Are you willing to submit PR?
Subtasks
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: