-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve bad queries (with excessive number of groups) observability #15254
base: master
Are you sure you want to change the base?
Improve bad queries (with excessive number of groups) observability #15254
Conversation
* But if a single query has 2 different aggregate operators and each one reaches the limit, this will be increased | ||
* by 2. | ||
*/ | ||
AGGREGATE_TIMES_NUM_GROUPS_LIMIT_WARNING("times", true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we would also like to have a similar metric but not global
@@ -123,6 +124,7 @@ protected void processSegments() { | |||
if (resultsBlock.isNumGroupsLimitReached()) { | |||
_numGroupsLimitReached = true; | |||
} | |||
_numGroups = resultsBlock.getNumGroups(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code can be called more than once. In fact what I think you should be doing is to increase _numGroups
by mergedKeys
, which is calculated below.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15254 +/- ##
============================================
+ Coverage 61.75% 63.62% +1.87%
- Complexity 207 1459 +1252
============================================
Files 2436 2772 +336
Lines 133233 156301 +23068
Branches 20636 23982 +3346
============================================
+ Hits 82274 99450 +17176
- Misses 44911 49362 +4451
- Partials 6048 7489 +1441
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This PR focuses on improving the observability for queries that may lead to a cluster performance degradation.
In particular, it is focused on group-by queries that lead to an excessive number of groups to be processed.
The following changes are done:
LOGGER
statement that traces all incoming query fromDEBUG
toINFO
. Without it, queries are not logged until they are finished which does not help troubleshooting the latest queries against a cluster presenting issues.num.groups.limit.default.warn.factor
Server configuration parameter with1.5
as default value. When a group by operator detects that more thandefault num.groups.limit * num.groups.limit.default.warn.factor
groups are created, it will emit a warning message such as:Add a
numGroups
attribute to the Broker response, next to the already existingnumGroupsLimitReached
attribute. When the latter istrue
, thenumGroups
will match the configured limit. Otherwise, it will show the total number of groups processed by the query. For non-aggregated queries the value will be0
.Add a new Server metric named
aggregateTimesNumGroupsLimitWarning
with the number of times the warning message from above has been logged.