You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recent improvements (#14836) to ensure a valid data type is returned when queries result in empty responses (due to broker pruning or segment pruning) have introduced substantial query overhead.
• In some cases, we observed a 200x increase in query latency (e.g., from 6ms to 1200ms).
• On average, the overhead was at least 100x, leading to high CPU utilization and even host failures.
Root Cause
This issue appears to be linked to the following PRs:
1. PR #13831
2. PR #14918
Findings
A/B testing and profiling point to some operation done in compileQuery like optimize(
This can be easily reproduced from one of the integ-test with and without the improvements from the above PRs
Open Question
We’ve observed that this logic is commonly used in the multi-stage query engine for query compilation.
• Is this latency a known issue, or has something changed downstream that we may not be aware of?
• The current Calcite library has been in place for nearly eight months, so it’s unclear if a recent change is causing this behavior.
@praveenc7 can you share some of the queries where you saw disproportionately large latency overheads due to the MSQE compilation? Do they have really large IN clauses? Based on the image of the profile that you've shared, it looks like the root cause is this known issue - #13617. There are currently some attempts at solving it (#14615, #15027) but we're still discussing the cleanest option to fix that issue.
@yashmayya Yes we do see this in queries having large IN clauses. However this was observed in some simple queries as well
Query pattern
SELECT col_a, MAX(col_b)
FROM table_x
WHERE col_b >= 10000
AND col_c NOT IN ('value_x')
AND col_d = 123456789
// Large IN clause
AND col_a IN (
'x1',
x2',
'x3',
'x4'
.......
'x1000'
);
SELECT col_a, col_b, col_c, SUM(col_d)
FROM table_y
WHERE col_a IN ('XXXXX') -- High cardinality
AND col_c >= 10000
GROUP BY col_a, col_b, col_c
ORDER BY SUM(col_d) DESC
LIMIT 20000;
Recent improvements (#14836) to ensure a valid data type is returned when queries result in empty responses (due to broker pruning or segment pruning) have introduced substantial query overhead.
• In some cases, we observed a 200x increase in query latency (e.g., from 6ms to 1200ms).
• On average, the overhead was at least 100x, leading to high CPU utilization and even host failures.
Root Cause
This issue appears to be linked to the following PRs:
1. PR #13831
2. PR #14918
Findings
A/B testing and profiling point to some operation done in compileQuery like
optimize
(pinot/pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java
Line 347 in ad6662f
toRelation
as the significant overhead.This can be easily reproduced from one of the integ-test with and without the improvements from the above PRs
Open Question
We’ve observed that this logic is commonly used in the multi-stage query engine for query compilation.
• Is this latency a known issue, or has something changed downstream that we may not be aware of?
• The current Calcite library has been in place for nearly eight months, so it’s unclear if a recent change is causing this behavior.
cc : @vvivekiyer @Jackie-Jiang @albertobastos
The text was updated successfully, but these errors were encountered: