Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Memory Usage in Mutable Dictionary Index When Cardinality Increases Significantly #15147

Open
jtao15 opened this issue Feb 27, 2025 · 1 comment

Comments

@jtao15
Copy link
Contributor

jtao15 commented Feb 27, 2025

We've observed excessive heap memory allocation when new columns with a dictionary forward index are added first, and Kafka values are populated later. This issue arises due to static buffer size allocation based on column cardinality from StatsHistory (e.g., 1, due to the default value).

When Kafka ingestion begins and the actual cardinality increases significantly, small buffers are repeatedly allocated, leading to a long list of bufferReader. Since the implementation uses a CopyOnWriteArrayList, adding new bufferReader entries is expensive, leading to increased memory usage, excessive garbage collection, and potentially causing out-of-memory (OOM) issues.

Ideally, buffer size allocation should be dynamically adjusted at consumption time rather than relying only on StatsHistory.

@jtao15
Copy link
Contributor Author

jtao15 commented Feb 27, 2025

cc @vvivekiyer @sajjad-moradi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants