-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substitute TransferableBlock with MseBlock #15245
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15245 +/- ##
=============================================
- Coverage 61.75% 34.17% -27.58%
- Complexity 207 687 +480
=============================================
Files 2436 2775 +339
Lines 133233 156261 +23028
Branches 20636 23944 +3308
=============================================
- Hits 82274 53399 -28875
- Misses 44911 98593 +53682
+ Partials 6048 4269 -1779
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
94d7edc
to
7c89b43
Compare
7c89b43
to
1792831
Compare
cc3b2cd
to
479a44a
Compare
479a44a
to
758c418
Compare
/// Pinot source code is migrated to Java 17 or newer, specially in Java 21 where pattern matching can also be used, | ||
/// removing the need for the [Visitor] pattern. | ||
/// | ||
/// Meanwhile, the API force callers to do some castings, but it is a trade-off to have a more robust and maintainable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: relay on
-> rely on
.
/// Meanwhile, the API force callers to do some castings, but it is a trade-off to have a more robust and maintainable | ||
/// codebase given that we can relay on Java typesystem to verify some important properties at compile time instead of | ||
/// adding runtime checks. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternative of
-> alternative to
/// A block that contains data in row heap format. | ||
/// | ||
/// This class is a subclass of [MseBlock.Data] and is used to store data in row heap format. | ||
/// This is probably the less efficient way to store data, but it is also the easiest to work with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everytime -> every time
. As the day this comment was written
-> At the time of writing (the comment)
@Nullable | ||
@SuppressWarnings("rawtypes") | ||
private final AggregationFunction[] _aggFunctions; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to document _aggFunctions .
// Keep reading the input blocks until we find a match row or all blocks are processed. | ||
// TODO: Consider batching the rows to improve performance. | ||
while (true) { | ||
TransferableBlock block = _input.nextBlock(); | ||
if (block.isErrorBlock()) { | ||
MseBlock block = _input.nextBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct ?
@@ -176,16 +174,25 @@ protected TransferableBlock getNextBlock() | |||
// Terminate when receiving exception block | |||
Map<Integer, String> exceptions = _exceptions; | |||
if (exceptions != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is used twice so it could be made into a regular or static constructor in ErrorMseBlock.
@@ -156,15 +156,15 @@ public Void visitJoin(JoinNode node, ServerPlanRequestContext context) { | |||
if (visit(left, context)) { | |||
PipelineBreakerResult pipelineBreakerResult = context.getPipelineBreakerResult(); | |||
int resultMapId = pipelineBreakerResult.getNodeIdMap().get(right); | |||
List<TransferableBlock> transferableBlocks = | |||
pipelineBreakerResult.getResultMap().getOrDefault(resultMapId, Collections.emptyList()); | |||
List<MseBlock> blocks = pipelineBreakerResult.getResultMap().getOrDefault(resultMapId, Collections.emptyList()); | |||
List<Object[]> resultDataContainer = new ArrayList<>(); | |||
DataSchema dataSchema = right.getDataSchema(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this loop receive column-based block ?
@@ -244,10 +245,10 @@ public void processQuery(WorkerMetadata workerMetadata, StagePlan stagePlan, Map | |||
|
|||
// Send error block to all the receivers if pipeline breaker fails | |||
if (pipelineBreakerResult != null && pipelineBreakerResult.getErrorBlock() != null) { | |||
TransferableBlock errorBlock = pipelineBreakerResult.getErrorBlock(); | |||
ErrorMseBlock errorBlock = pipelineBreakerResult.getErrorBlock(); | |||
int stageId = stageMetadata.getStageId(); | |||
LOGGER.error("Error executing pipeline breaker for request: {}, stage: {}, sending error block: {}", requestId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this just use default version of toString() method, producing useless message ?
Introduction
This is a large mechanical PR that replaces
TransferableBlock
with a newMseBlock
. I’ve wanted to apply this refactor for over a year but postponed it for other reasons. Now, with the need to add multi-stage stats on failures, implementing that change withinTransferableBlock
proved too complex. As a result, I also redesigned how stats are communicated between operators.From TransferableBlock to MseBlock
TransferableBlock
tried to model too many different things at once. It had three logical types:Additionally, blocks could be in either a serialized or deserialized form, significantly increasing complexity. A single implementation had to support all cases, leading to an API where many methods were illegal in certain contexts. This made it difficult to determine when a method was safe to call.
For instance,
TransferableBlock
was designed to store stats only in EOS blocks, but the API allowed stats in other block types, potentially breaking invariants.MseBlock: A More Structured Approach
The new
MseBlock
follows a different design principle. Instead of a single class handling all cases, we now have separate interfaces for each block type. The actual methods are defined in these interfaces, reducing ambiguity.For example,
MseBlock
itself does not have agetNumRows()
method. Instead, that method is part ofMseBlock.Data
. This forces callers to explicitly cast theirMseBlock
to the appropriate type, making the API safer.While this may initially seem more restrictive, it actually simplifies development. Previously, developers had to rely on context to determine whether a method could be called. Now, the Java compiler enforces correctness, reducing potential errors.
New MseBlock Structure
Currently, this structure requires explicit casting (or use the visitor pattern), but once Pinot upgrades to Java 17 or 21, we can use sealed classes and pattern matching, further simplifying the API.
A New Way to Communicate Stats
Another key design change is that
MseBlock
never stores stats.In
TransferableBlock
, stats were added as a workaround insideDataBlock
, making the API more complex. This also pollutedMultiStageOperator.nextBlock
with unnecessary logic for collecting stats.MseBlock
, instead, does not contain stats at all.MultiStageOperator.nextBlock()
now only deals with data blocks, EOS blocks, and error blocks. Instead of storing stats within blocks, we compute them separately using a new method:calculateStats()
WorkflowcalculateStats()
on each input operator to retrieve upstream stats. If no inputs exist, start with empty stats.The exception is
MailboxReceivaeOperator
, which cannot directly callcalculateStats()
on upstream operators in otherstages (as they may be running in separate JVMs). Instead, stats are sent through
SendingMailbox
, packaged with EOS blocks.How Stats Are Transmitted
MailboxSendOperator
receives an EOS or error block, it callscalculateStats()
and callsExchange.send(MseBlock.Eos, Stats)
SendingMailbox.send(MseBlock.Eos, Stats)
, whose implementation depends on the transport layer.DataBlock
format as before.ReceivingMailbox
queue.ReceivingMailbox
reads messages, it extracts the stats, which are then retrieved whenMailboxReceiveOperator.calculateStats()
is called.This system ensures transparent multi-stage stat propagation while keeping
MseBlock
clean. It also shows how the newMseBlock
API can be used to enforce a cleaner design. The newSendingMailbox
andExchange
methods define their restrictions using Java typesystem:Summary of Benefits