-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Tracker #5082
Memory Tracker #5082
Conversation
Codecov ReportBase: 77.36% // Head: 77.47% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #5082 +/- ##
==========================================
+ Coverage 77.36% 77.47% +0.11%
==========================================
Files 1105 1110 +5
Lines 82188 82917 +729
==========================================
+ Hits 63586 64242 +656
- Misses 18602 18675 +73
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@@ -65,6 +66,14 @@ folly::Future<Status> SubgraphExecutor::getNeighbors() { | |||
} | |||
vids_.clear(); | |||
return handleResponse(std::move(resp)); | |||
}) | |||
.thenError(folly::tag_t<std::bad_alloc>{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about other Executor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about other Executor?
I am trying add more check, including storaged's execution path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we catch all bad allocation exception in one place like Scheduler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we catch all bad allocation exception in one place like
Scheduler
?
I tried, but experimented show that exception occured in folly's Executor can be catched in thenError, but outside seems not able to catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no conflict. As I known, Scheduler
will call execute
function of each Executor and got the Future
of execution. So, we could process bad allocation exception from this Future
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no conflict. As I known,
Scheduler
will callexecute
function of each Executor and got theFuture
of execution. So, we could process bad allocation exception from thisFuture
. WDYT?
Yes, no conflict, I have also added catch in Scheduler, see Scheduler.cpp
6c4995a
to
8293e78
Compare
src/storage/StorageServer.cpp
Outdated
memoryMonitorThread_ = std::make_unique<thread::GenericWorker>(); | ||
if (!memoryMonitorThread_ || !memoryMonitorThread_->start("graph-memory-monitor")) { | ||
return Status::Error("Fail to start query engine background thread."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query engine? graph-memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query engine? graph-memory?
good catch
try { | ||
if (req.common_ref().has_value() && req.get_common()->profile_detail_ref().value_or(false)) { | ||
profileDetailFlag_ = true; | ||
profileDetail("GetDstBySrcProcessorTotal", 0); | ||
profileDetail("GetDstBySrcProcessorDedup", 0); | ||
} | ||
|
||
spaceId_ = req.get_space_id(); | ||
auto retCode = getSpaceVidLen(spaceId_); | ||
if (retCode != nebula::cpp2::ErrorCode::SUCCEEDED) { | ||
for (auto& p : req.get_parts()) { | ||
pushResultCode(retCode, p.first); | ||
spaceId_ = req.get_space_id(); | ||
auto retCode = getSpaceVidLen(spaceId_); | ||
if (retCode != nebula::cpp2::ErrorCode::SUCCEEDED) { | ||
for (auto& p : req.get_parts()) { | ||
pushResultCode(retCode, p.first); | ||
} | ||
onFinished(); | ||
return; | ||
} | ||
onFinished(); | ||
return; | ||
} | ||
this->planContext_ = std::make_unique<PlanContext>( | ||
this->env_, spaceId_, this->spaceVidLen_, this->isIntId_, req.common_ref()); | ||
|
||
// check edgetypes exists | ||
retCode = checkAndBuildContexts(req); | ||
if (retCode != nebula::cpp2::ErrorCode::SUCCEEDED) { | ||
for (auto& p : req.get_parts()) { | ||
pushResultCode(retCode, p.first); | ||
this->planContext_ = std::make_unique<PlanContext>( | ||
this->env_, spaceId_, this->spaceVidLen_, this->isIntId_, req.common_ref()); | ||
|
||
// check edgetypes exists | ||
retCode = checkAndBuildContexts(req); | ||
if (retCode != nebula::cpp2::ErrorCode::SUCCEEDED) { | ||
for (auto& p : req.get_parts()) { | ||
pushResultCode(retCode, p.first); | ||
} | ||
onFinished(); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try-catch it outside the function doProcess in GraphStorageServiceHandler.cpp?
try {
doProcess()
} catch xxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…valable, depends on jemalloc to get accurate free size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me mostly.
What type of PR is this?
What problem(s) does this PR solve?
Issue(s) number:
#5081
Description:
UPDATE: MemoryTracker is only turn on when jemalloc in enabled. because default malloc cannot get accurate size of free(ptr) (see untrackMemory in Memory.h, malloc_usable_size() is not accurate);
override the default c++ new/delete operator, the new added new/delete function do the memory quota counting before do the real alloc/free
Checklist:
Tests:
Test
match (v:Person)-[e*3]->(x) where id(v)==933 return x
graphd/storaged not killed,
Debug LOG indicate it do not leak memory
graphd memory stats log
storaged has a FLAGS_query_concurrently, will run different code path, both tested with FLAGS_query_concurrently set to false and true
Performance
E2E test
FlameGraph
the percentile of in MemoryTracker's allocImpl() and free() is trival, I think the cache miss introduce by atomic member (which cannot be revealed by FlameGraph) is the major reason of slowness.
GraphD
data:image/s3,"s3://crabby-images/e850d/e850ddd7cc9abd2aa3e00934b2efd38bcbefb575" alt="perf_graphd2"
StorageD
data:image/s3,"s3://crabby-images/c8bbe/c8bbedff0573a170db6ad75efcd89412cb30be6e" alt="perf_storaged"
can not find, maybe too trivial
Affects:
Release notes:
Please confirm whether to be reflected in release notes and how to describe: