Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement]: Improve major optimizing: compact segment files to target size #2330

Closed
3 tasks done
Tracked by #2176
zhongqishang opened this issue Nov 20, 2023 · 0 comments · Fixed by #2332
Closed
3 tasks done
Tracked by #2176

[Improvement]: Improve major optimizing: compact segment files to target size #2330

zhongqishang opened this issue Nov 20, 2023 · 0 comments · Fixed by #2332

Comments

@zhongqishang
Copy link
Contributor

zhongqishang commented Nov 20, 2023

Search before asking

  • I have searched in the issues and found no similar issues.

Backgroud

The continuously optimized Iceberg format table has a large number of segment files (16m~128m), and they are all close to the lower limit. A large number of small files will bring the following impacts:

  • Too many small files will occupy a lot of memory and directly affect the performance of NameNode.
  • Will cause query performance loss to the query engine (SparkSQL, Trino)

What would you like to be improved?

Most file sizes are close to target size.

How should we improve?

Fix current segment planner logic. Design Doc

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

@zhongqishang zhongqishang mentioned this issue May 6, 2024
66 tasks
@zhoujinsong zhoujinsong changed the title [Improvement]: Improve major compaction [Improvement]: Improve major optimizing: compact segment files to target size Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant