Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add segment end criteria check for SVForwardIndex and Dictionary #15120

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lnbest0707-uber
Copy link
Contributor

enhancement ingestion
Inspired by #14479
Apart from the MVForwardIndex check added in that PR, this PR adds similar check for SVForwardIndex and Dictionary. The PR only adds the interface and UTs.
The actual check policy is not within this PR and could be an open question to discuss. The end criteria needs to consider

  • ForwardIndex would end in a 4GB limit once converted into immutable segment.
  • Dictionary has its cardinality limit

While it is tricky to make it correct as:

  • 4GB limit is a compressed size, during ingestion into mutable segment, it cannot predict the correct compression ratio
  • With optimizeDictionary enabled, Dictionary encoding could also end in immutable forward index. It would be even harder to guess
    - If Dictionary would be converted to ForwardIndex
    - If converted, what would be the final compressed size

Some proposed policies:

  • Make the uncompressed size limit configurable, relying on that value to set the mutable segment's threshold.
  • Use the last segment's compression ratio as reference to predict current segment's.
  • For dictionary, use the same policy (but maybe a larger number by config) as the forward index.

@codecov-commenter
Copy link

codecov-commenter commented Feb 24, 2025

Codecov Report

Attention: Patch coverage is 86.66667% with 6 lines in your changes missing coverage. Please review.

Project coverage is 63.66%. Comparing base (59551e4) to head (4edee94).
Report is 1837 commits behind head on master.

Files with missing lines Patch % Lines
...local/indexsegment/mutable/MutableSegmentImpl.java 88.63% 1 Missing and 4 partials ⚠️
...t/segment/spi/index/mutable/MutableDictionary.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15120      +/-   ##
============================================
+ Coverage     61.75%   63.66%   +1.91%     
- Complexity      207     1461    +1254     
============================================
  Files          2436     2772     +336     
  Lines        133233   156256   +23023     
  Branches      20636    23980    +3344     
============================================
+ Hits          82274    99482   +17208     
- Misses        44911    49299    +4388     
- Partials       6048     7475    +1427     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.61% <86.66%> (+1.90%) ⬆️
java-21 63.54% <86.66%> (+1.92%) ⬆️
skip-bytebuffers-false 63.63% <86.66%> (+1.88%) ⬆️
skip-bytebuffers-true 63.52% <86.66%> (+35.79%) ⬆️
temurin 63.66% <86.66%> (+1.91%) ⬆️
unittests 63.66% <86.66%> (+1.91%) ⬆️
unittests1 56.22% <20.00%> (+9.33%) ⬆️
unittests2 34.18% <86.66%> (+6.45%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants