[Doc] How long till data is ready to be consumed at speed? #15036

hpvd · 2025-02-12T10:24:02Z

Pinot can deliver query results with stunning speed /low latency which is described on many places
e.g. very nicely at startree s blog https://startree.ai/resources/what-makes-apache-pinot-fast-chapter-ii

In contrast its hard to find any numbers/examples on How long does it take to have the data ready to be consumed at this speed?

How long does it take from data ingest through the layers of Pinot including updating different index etc.

Would be handy to have some infos on this in the doc or blog or as a first step directly in this issue.

hpvd · 2025-02-12T10:31:33Z

btw: at operation of Pinot this kind of metric of "e2e freshness" would also have relevance in many use cases and would allow to

to find problems
help to optimize
decide if freshness of data from inside Pinot is good enough or data has to be taken from other places e.g. event sourcing
proof SLAs
...

there is a "well aged" issue for this #4007
incl. a proposal: https://cwiki.apache.org/confluence/display/PINOT/Pinot+Freshness+Metric

hpvd · 2025-02-13T08:16:52Z

@Jackie-Jiang do you have any first example numbers on this / a source to share
to get a very first impression about possible time spans and influencing factors e.g. like depending on index type
(for everyone interested in before full doc /metric implementation is done)?

Jackie-Jiang · 2025-02-13T19:33:52Z

I'm not sure if I completely get the question, but I can answer from the perspective of how Pinot handles streaming data. Unlike a lot of other databases that ingest streaming data as mini batches (where the delay happens), Pinot directly writes the data into index row-by-row and the data immediately becomes queryable. The delay of streaming data arriving Pinot to it becoming queryable is usually below millisecond (Pinot can easily ingest thousands of messages per second). If you count end-to-end time from data produced to streaming system (e.g. Kafka) to becoming queryable in Pinot, the delay is usually a few seconds, and majority of the delay is from streaming system processing then delivering the messages to Pinot.

hpvd · 2025-02-14T22:39:09Z

Pinot directly writes the data into index row-by-row and the data immediately becomes queryable. The delay of streaming data arriving Pinot to it becoming queryable is usually below millisecond

Many thanks for this inside!

Would be really interesting to have some real end2end benchmarks of durations
from the arrival of a (kafka) message at Pinot
to writing the results of a query which uses an index and includes data from the freshly arrived message to a new message
(e.g. maybe continuous query and send results till the new data is included)

hpvd changed the title ~~Doc: how long till data is ready to be consumed at speed?~~ [Doc] How long till data is ready to be consumed at speed? Feb 12, 2025

Jackie-Jiang added the documentation label Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] How long till data is ready to be consumed at speed? #15036

[Doc] How long till data is ready to be consumed at speed? #15036

hpvd commented Feb 12, 2025 •

edited

Loading

hpvd commented Feb 12, 2025 •

edited

Loading

hpvd commented Feb 13, 2025 •

edited

Loading

Jackie-Jiang commented Feb 13, 2025

hpvd commented Feb 14, 2025

[Doc] How long till data is ready to be consumed at speed? #15036

[Doc] How long till data is ready to be consumed at speed? #15036

Comments

hpvd commented Feb 12, 2025 • edited Loading

hpvd commented Feb 12, 2025 • edited Loading

hpvd commented Feb 13, 2025 • edited Loading

Jackie-Jiang commented Feb 13, 2025

hpvd commented Feb 14, 2025

hpvd commented Feb 12, 2025 •

edited

Loading

hpvd commented Feb 12, 2025 •

edited

Loading

hpvd commented Feb 13, 2025 •

edited

Loading