Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concurrent exception related to multi-match statement #4605

Merged
merged 4 commits into from
Sep 7, 2022

Conversation

czpmango
Copy link
Contributor

@czpmango czpmango commented Sep 1, 2022

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:

#4430

Description:

Plan of match (v:player) where v.player.name == "Tim Duncan" MATCH (v)-[e:serve]->(n:team) return v :
image
In the execution plan of the above multiple match query, _AppendVertices_3 dataset is depended on by operators Filter_14 and Arguement_6 at the same time, and operators Filter_14 and Arguement_6 are scheduled concurrently, where operator Filter_14 will modify _AppendVertices_3 dataset, which cause write-read conflict. If operator Arguement_6 is scheduled after operator Filter_14, it may be undefined behavior (crash or output incorrect results).

How do you solve it?

I have considered the following options:
Option 1. Customize a special Iterator view for filter/limit, because these two operators will reuse the dataset of the pre-operator
Option 2. Use the readby member of dataset to determine the scenarios that may cause concurrent exceptions and eliminate errors that may be caused by write-read conflicts by copying the dataset
Option 3. Set the validity flag in the dataset to handle write-read conflicts
Option 4. Make the scheduler aware of possible concurrent write-read conflicts and reasonably schedule the execution order of operators

Option 1 Due to the complicated generation of various statement execution plans, for example, the iterator type of the filter of the go statement may be getNeighborsIter. If it is consistent with the previous behavior, the modification may be complicated.
Option 3 will affect the way all operators access the dataset, and the impact is relatively large, making it difficult to assess the performance impact.
Option 4. May lead to complex scheduling logic.

Therefore, the current pr adopts option 2 to ensure that it does not affect the irrelevant implementation as much as possible, and only handles scenarios with concurrent scheduling exceptions, and the performance impact is relatively small.

Checklist:

Tests:

  • Unit test(positive and negative cases)
  • Function test
  • Performance test
  • N/A

Affects:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Special notes for reviewer

In the case of possible concurrency exceptions, it is necessary to copy data and bring additional overhead, but this overhead is necessary for correctness.
This is a patch, and a better fix may exist.
Since the problem caused by concurrency is difficult to reproduce, no test case is provided to constrain it.

@czpmango czpmango changed the title fix filter executor Fix concurrent exception related to multi-match statement Sep 1, 2022
@Sophie-Xie Sophie-Xie added the affects/v3.2 PR/issue: this bug affects v3.2.x version. label Sep 1, 2022
@czpmango czpmango added the ready-for-testing PR: ready for the CI test label Sep 1, 2022
@czpmango czpmango force-pushed the fix/concurrent-exception branch from e45ad87 to 40afef3 Compare September 2, 2022 01:52
@czpmango czpmango marked this pull request as ready for review September 2, 2022 01:52
@czpmango czpmango added the do not review PR: not ready for the code review yet label Sep 2, 2022
@Sophie-Xie Sophie-Xie added the cherry-pick-v3.2 PR: need cherry-pick to this version label Sep 5, 2022
@czpmango czpmango force-pushed the fix/concurrent-exception branch 4 times, most recently from d724974 to d4486a6 Compare September 6, 2022 06:17
@czpmango czpmango added ready for review and removed do not review PR: not ready for the code review yet labels Sep 6, 2022
@Shylock-Hg
Copy link
Contributor

Please add a comment.

@czpmango
Copy link
Contributor Author

czpmango commented Sep 6, 2022

Please add a comment.

I explained in the description of pr why I need to fix it this way, as for the logic of fixing the code is relatively simple.

@Shylock-Hg
Copy link
Contributor

I think you could simply remove modify related operations from Iterator of DataSet, and keep Iterator is immutable. For Filter you could copy valid rows to new DataSet, and same for Limit or other operators need modify input DataSet.

@czpmango
Copy link
Contributor Author

czpmango commented Sep 6, 2022

I think you could simply remove modify related operations from Iterator of DataSet, and keep Iterator is immutable. For Filter you could copy valid rows to new DataSet, and same for Limit or other operators need modify input DataSet.

Some of the previous revisions considered performance issues more comprehensively, and it is feasible to revert this refactoring if the previous considerations can be covered.
I found some previous pr for your reference:
vesoft-inc/nebula-graph#789
vesoft-inc/nebula-graph#757
vesoft-inc/nebula-graph#791

The current pr fix is feasible, and the code changes and performance impact are manageable.

jievince
jievince previously approved these changes Sep 6, 2022
@czpmango czpmango force-pushed the fix/concurrent-exception branch 3 times, most recently from c3b449c to 1d3ffcf Compare September 6, 2022 08:55
@Shylock-Hg
Copy link
Contributor

Intersection will call unstableErase too, I think you should check all operators call erase and eraseRange

@czpmango czpmango force-pushed the fix/concurrent-exception branch from 1d3ffcf to c3eb378 Compare September 6, 2022 08:58
@czpmango
Copy link
Contributor Author

czpmango commented Sep 6, 2022

Intersection will call unstableErase too, I think you should check all operators call erase and eraseRange

The IntersectExecutor will only consume data through erase function, and will not cause concurrent read-write conflicts.

@Shylock-Hg
Copy link
Contributor

Intersection will call unstableErase too, I think you should check all operators call erase and eraseRange

The IntersectExecutor will only consume data through erase function, and will not cause concurrent read-write conflicts.

If other operator reference input of Intersection, it's of course read-write conflicts.

@czpmango
Copy link
Contributor Author

czpmango commented Sep 6, 2022

Intersection will call unstableErase too, I think you should check all operators call erase and eraseRange

The IntersectExecutor will only consume data through erase function, and will not cause concurrent read-write conflicts.

This issue will only be reproduced by the arguement operator of multiple match statements.
It will only crash in the following code block:

auto iter = ectx_->getResult(argNode->inputVar()).iter();
DCHECK(iter != nullptr);
DataSet ds;
ds.colNames = argNode->colNames();
ds.rows.reserve(iter->size());
std::unordered_set<Value> unique;
for (; iter->valid(); iter->next()) {
auto val = iter->getColumn(alias);
if (!val.isVertex()) {
continue;
}
if (unique.emplace(val.getVertex().vid).second) {
Row row;
row.values.emplace_back(std::move(val));
ds.rows.emplace_back(std::move(row));
}
}
return finish(ResultBuilder().value(Value(std::move(ds))).build());

Shylock-Hg
Shylock-Hg previously approved these changes Sep 6, 2022
@@ -84,6 +84,10 @@ Status FilterExecutor::handleSingleJobFilter() {
bool canMoveData = movable(inputVar);
Result result = ectx_->getResult(inputVar);
auto *iter = result.iterRef();
// Always reuse getNeighbors's dataset to avoid some go statement execution plan related issues
if (iter->isGetNeighborsIter()) {
canMoveData = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse getNeighbors's dataset. Go statements do not cause write-read conflicts.

fix iterator

fix

small delete

small delete

skip iterator type handle for concurrency

small delete

fix scan edges

small delete

small delete

fix

small delete

small change

small change

fix ut

small fix
@czpmango czpmango force-pushed the fix/concurrent-exception branch from 9e2f26f to a051931 Compare September 7, 2022 03:35
@codecov-commenter
Copy link

Codecov Report

Merging #4605 (a051931) into master (3157fad) will increase coverage by 0.03%.
The diff coverage is 86.82%.

@@            Coverage Diff             @@
##           master    #4605      +/-   ##
==========================================
+ Coverage   84.72%   84.75%   +0.03%     
==========================================
  Files        1357     1358       +1     
  Lines      135323   135531     +208     
==========================================
+ Hits       114652   114870     +218     
+ Misses      20671    20661      -10     
Impacted Files Coverage Δ
src/common/expression/LabelAttributeExpression.h 77.41% <0.00%> (ø)
src/graph/planner/plan/PlanNodeVisitor.h 100.00% <ø> (ø)
src/graph/planner/plan/Query.h 96.41% <ø> (+0.53%) ⬆️
src/graph/visitor/PropertyTrackerVisitor.h 100.00% <ø> (ø)
src/graph/visitor/PrunePropertiesVisitor.h 50.00% <ø> (ø)
src/daemons/MetaDaemon.cpp 67.18% <50.00%> (ø)
src/daemons/StorageDaemon.cpp 65.59% <50.00%> (ø)
src/graph/visitor/PropertyTrackerVisitor.cpp 85.11% <69.76%> (-0.68%) ⬇️
src/graph/context/Result.h 98.30% <80.00%> (-1.70%) ⬇️
...aph/optimizer/rule/PushFilterDownInnerJoinRule.cpp 85.71% <85.71%> (ø)
... and 56 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Sophie-Xie Sophie-Xie requested a review from dutor September 7, 2022 06:54
@dutor dutor merged commit 085381b into vesoft-inc:master Sep 7, 2022
Sophie-Xie added a commit that referenced this pull request Sep 7, 2022
* fix filter executor

* Fix concurrency exception of multi-match statements

fix iterator

fix

small delete

small delete

skip iterator type handle for concurrency

small delete

fix scan edges

small delete

small delete

fix

small delete

small change

small change

fix ut

small fix

Co-authored-by: Sophie <[email protected]>
Sophie-Xie added a commit that referenced this pull request Sep 13, 2022
* fix lookup (#4552)

fix

Co-authored-by: jimingquan <[email protected]>
Co-authored-by: Sophie <[email protected]>

* fix split brain in raft (#4479)

Co-authored-by: Sophie <[email protected]>

* fix invalid filter in GetProp make storage crashed (#4568)

Co-authored-by: haowen <[email protected]>

* fix scan vertex/edge do not handle ttl (#4578)

* fix scan vertex/edge do not handle ttl

* use ErrorCode to unify community version and end version

* Fix #1212. Return FoldConstantExprVisitor, if status_ already failed due to found syantax errors. (#4607)

Co-authored-by: jie.wang <[email protected]>

* Avoid fatal when expression illegal. (#4618)

* Fix concurrent exception related to multi-match statement (#4605)

* fix filter executor

* Fix concurrency exception of multi-match statements

fix iterator

fix

small delete

small delete

skip iterator type handle for concurrency

small delete

fix scan edges

small delete

small delete

fix

small delete

small change

small change

fix ut

small fix

Co-authored-by: Sophie <[email protected]>

* Prune properties(#4523)

* fix conflict

* extract attribute from properties function (#4604)

* extract attribute from properties function

* fix error

* fix subscript error

* add test case

* process scanEdges

* fix test error

* add unwind & check vidType when executing not validate (#4456)

* Update AppendVerticesExecutor.cpp

fix conflict

* Update AppendVerticesExecutor.cpp

* Replace obsolete RocksDB API (#4395)

Co-authored-by: Sophie <[email protected]>

* Update PrunePropertiesRule.feature

* remove useless dc (#4533)

* Update PrunePropertiesRule.feature

* fix test error

Co-authored-by: kyle.cao <[email protected]>
Co-authored-by: jimingquan <[email protected]>
Co-authored-by: liwenhui-soul <[email protected]>
Co-authored-by: Doodle <[email protected]>
Co-authored-by: haowen <[email protected]>
Co-authored-by: Cheng Xuntao <[email protected]>
Co-authored-by: jie.wang <[email protected]>
Co-authored-by: shylock <[email protected]>
Co-authored-by: Qiaolin Yu <[email protected]>
@Sophie-Xie Sophie-Xie removed the affects/v3.2 PR/issue: this bug affects v3.2.x version. label Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-v3.2 PR: need cherry-pick to this version ready for review ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants