High Memory Usage in Mutable Dictionary Index When Cardinality Increases Significantly #15147

jtao15 · 2025-02-27T16:53:51Z

We've observed excessive heap memory allocation when new columns with a dictionary forward index are added first, and Kafka values are populated later. This issue arises due to static buffer size allocation based on column cardinality from StatsHistory (e.g., 1, due to the default value).

When Kafka ingestion begins and the actual cardinality increases significantly, small buffers are repeatedly allocated, leading to a long list of bufferReader. Since the implementation uses a CopyOnWriteArrayList, adding new bufferReader entries is expensive, leading to increased memory usage, excessive garbage collection, and potentially causing out-of-memory (OOM) issues.

Ideally, buffer size allocation should be dynamically adjusted at consumption time rather than relying only on StatsHistory.

The text was updated successfully, but these errors were encountered:

jtao15 · 2025-02-27T23:49:28Z

cc @vvivekiyer @sajjad-moradi

Jackie-Jiang added the ingestion label Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Memory Usage in Mutable Dictionary Index When Cardinality Increases Significantly #15147

High Memory Usage in Mutable Dictionary Index When Cardinality Increases Significantly #15147

jtao15 commented Feb 27, 2025

jtao15 commented Feb 27, 2025

High Memory Usage in Mutable Dictionary Index When Cardinality Increases Significantly #15147

High Memory Usage in Mutable Dictionary Index When Cardinality Increases Significantly #15147

Comments

jtao15 commented Feb 27, 2025

jtao15 commented Feb 27, 2025