-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for orphan segment cleanup #15142
base: master
Are you sure you want to change the base?
Add support for orphan segment cleanup #15142
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15142 +/- ##
============================================
+ Coverage 61.75% 63.68% +1.92%
- Complexity 207 1461 +1254
============================================
Files 2436 2772 +336
Lines 133233 156306 +23073
Branches 20636 23988 +3352
============================================
+ Hits 82274 99536 +17262
- Misses 44911 49283 +4372
- Partials 6048 7487 +1439
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a14c5bc
to
d2d1bcf
Compare
@@ -124,8 +138,19 @@ private void manageRetentionForTable(TableConfig tableConfig) { | |||
} | |||
|
|||
private void manageRetentionForOfflineTable(String offlineTableName, RetentionStrategy retentionStrategy) { | |||
List<SegmentZKMetadata> segmentZKMetadataList = _pinotHelixResourceManager.getSegmentsZKMetadata(offlineTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Segment metadata is added to ZK before it's added to IdealState
Segment metadata is removed from ZK after the segment has been removed from the ideal state.
The segment list from ZK will always be a super set of segments in Ideal State. Thus, I have not checked IS for filtering segments that are still in IS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not just rely on the ideal state as source of truth because that would avoid reading all of segment Zk ? We are going to check retention window as well prior to deletion (if the concern is deleting a new segment that just got added, which can happen even when the segment is not in Zk metadata but just in deep store).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deletion occurs based on the endTime
present in the segmentZKMetadata. That's why the isPurgeable relies on segmentZKMetadata to decide whether we need to remove a segment or not.
Are you suggesting some other approach where we move away from endTime of the segment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it happen that segment's ZKMetadata was still there in ZK but segment was not in IS any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially did not realize that we already loop through all of segmentZk metadata to check if segment is purge-able (retentionStrategy.isPurgeable).
Yes its possible for a segment to be Zk metadata but not in IS @klsince, but we want to handle that case as well right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the deletion fails in the middle then the segment will be in ZK but not in IS.
Moreover, as per Xiang we have seen cases where the SegmentRefreshTask etc push segment in Deepstore and ZK but fail to update the IS.
In that case as well we have such segments.
|
||
long lastModifiedTime = fileMetadata.getLastModifiedTime(); | ||
|
||
if (retentionStrategy.isPurgeable(segmentName, offlineTableName, lastModifiedTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures that a segment that has not entry in the ZK but has been uploaded to deepstore does not get deleted.
} | ||
|
||
String segmentName = extractSegmentName(fileMetadata.getFilePath()); | ||
if (segmentName == null || segmentsToExclude.contains(segmentName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rely on endTime in ZK Metadata for retention in case the segment ZK metadata is present.
This function relies on file modification time to decide whether a segment should be purged or not.
2fd0b0e
to
4060631
Compare
...troller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java
Show resolved
Hide resolved
@@ -124,8 +138,19 @@ private void manageRetentionForTable(TableConfig tableConfig) { | |||
} | |||
|
|||
private void manageRetentionForOfflineTable(String offlineTableName, RetentionStrategy retentionStrategy) { | |||
List<SegmentZKMetadata> segmentZKMetadataList = _pinotHelixResourceManager.getSegmentsZKMetadata(offlineTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not just rely on the ideal state as source of truth because that would avoid reading all of segment Zk ? We are going to check retention window as well prior to deletion (if the concern is deleting a new segment that just got added, which can happen even when the segment is not in Zk metadata but just in deep store).
...troller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java
Outdated
Show resolved
Hide resolved
* @param endTimeMs The end time of the segment in milliseconds | ||
* @return Whether the segment should be purged | ||
*/ | ||
boolean isPurgeable(String segmentName, String tableNameWithType, long endTimeMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest having this method local to SegmentDeletionManager since endTime is supplied from outside. It'll be confusing as to what needs to be passed in (file modification time/segment time etc..). The caller can get the retention time and do its own checks ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I would still keep it as part of the RetentionManager
. I feel that the SegmentDeletionManager
should just have the job of deleting (moving to Deleted_Segments dir) and not know what segments needs to be deleted.
For our case, the RetentionManager
can fetch the segments from deepstore and decide based on the file modification time whether they need to be deleted or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second pass I think it's fine to have this.
One can pass the endTimeMs that needs to be compared to System.currentTimeMillis() - retentionMS i.e.
System.currentTimeMillis() - retentionMS > endTimeMs.
It's a more generic implementation of the
public boolean isPurgeable(String tableNameWithType, SegmentZKMetadata segmentZKMetadata)
Where one only needs to rely on endTime present in segmentZKMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have changed as per @klsince suggestions.
...troller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java
Outdated
Show resolved
Hide resolved
@@ -221,6 +221,7 @@ private void deleteSegmentMetadataFromStore(PinotFS pinotFS, URI segmentFileUri, | |||
URI segmentMetadataUri = SegmentPushUtils.generateSegmentMetadataURI(segmentFileUri.toString(), segmentId); | |||
if (pinotFS.exists(segmentMetadataUri)) { | |||
LOGGER.info("Deleting segment metadata {} from {}", segmentId, segmentMetadataUri); | |||
// TODO: check if the deletion was successful and add a warning here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this TODO supposed to check the return of the delete() method and log warning if false? if so, maybe we can just add it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -77,6 +88,7 @@ protected void processTable(String tableNameWithType) { | |||
return; | |||
} | |||
|
|||
// Did not understand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for pointing it out.
@@ -124,8 +138,19 @@ private void manageRetentionForTable(TableConfig tableConfig) { | |||
} | |||
|
|||
private void manageRetentionForOfflineTable(String offlineTableName, RetentionStrategy retentionStrategy) { | |||
List<SegmentZKMetadata> segmentZKMetadataList = _pinotHelixResourceManager.getSegmentsZKMetadata(offlineTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it happen that segment's ZKMetadata was still there in ZK but segment was not in IS any more?
segmentsToDelete.add(segmentName); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an INFO log about how many segments to delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
PinotFS pinotFS = PinotFSFactory.create(tableDataUri.getScheme()); | ||
|
||
List<FileMetadata> deepstoreFiles = pinotFS.listFilesWithMetadata(tableDataUri, false); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe print a INFO log here about how many segments found, and the timestamp in the log, with another INFO log after the for-loop, can be used to calculate how long it'd take to figure out the files to delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -50,4 +50,17 @@ public boolean isPurgeable(String tableNameWithType, SegmentZKMetadata segmentZK | |||
|
|||
return System.currentTimeMillis() - endTimeMs > _retentionMs; | |||
} | |||
|
|||
@Override | |||
public boolean isPurgeable(String segmentName, String tableNameWithType, long endTimeMs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- better move tableNameWithType as the first param for consistency
- can reuse this method to implement the method above
- perhaps call endTimeMs segmentTimeMs to be generic? and leave a comment that segmentTimeMs can be endTime or mtime etc. to be decided by the caller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
...troller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java
Outdated
Show resolved
Hide resolved
...troller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java
Show resolved
Hide resolved
URI tableDataUri = URIUtils.getUri(_pinotHelixResourceManager.getDataDir(), rawTableName); | ||
PinotFS pinotFS = PinotFSFactory.create(tableDataUri.getScheme()); | ||
|
||
List<FileMetadata> deepstoreFiles = pinotFS.listFilesWithMetadata(tableDataUri, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a metric to get count of files that are in deep store but not in segmentZkMetadata ? This is to see how many dangling files are in the deep store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -124,8 +138,19 @@ private void manageRetentionForTable(TableConfig tableConfig) { | |||
} | |||
|
|||
private void manageRetentionForOfflineTable(String offlineTableName, RetentionStrategy retentionStrategy) { | |||
List<SegmentZKMetadata> segmentZKMetadataList = _pinotHelixResourceManager.getSegmentsZKMetadata(offlineTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially did not realize that we already loop through all of segmentZk metadata to check if segment is purge-able (retentionStrategy.isPurgeable).
Yes its possible for a segment to be Zk metadata but not in IS @klsince, but we want to handle that case as well right?
5cdcfef
to
abac218
Compare
|
||
List<FileMetadata> deepstoreFiles = pinotFS.listFilesWithMetadata(tableDataUri, false); | ||
long listEndTimeMs = System.currentTimeMillis(); | ||
LOGGER.info("Found: {} segments in deepstore for table: {}. Time taken to list segments: {} ms", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Found ... in {} ms
to be short
} | ||
} | ||
long endTimeMs = System.currentTimeMillis(); | ||
LOGGER.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps combine the two INFO together, as "Found {} segments ... have no corresponding ZK metadata, in {} msg"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} catch (IOException e) { | ||
LOGGER.warn("Unable to fetch segments from deep store that are beyond retention period for table: {}", | ||
realtimeTableName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a helper method from L241-L258 and use it for both manageRetentionForOfflineTable and manageRetentionForREaltimeTable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
a77116a
to
168a771
Compare
168a771
to
e3b7d7e
Compare
… to be deleted in an single run of RetentionManager
_controllerMetrics.setValueOfTableGauge(tableNameWithType, ControllerGauge.UNTRACKED_SEGMENTS_COUNT, | ||
segmentsToDelete.size()); | ||
|
||
if (segmentsToDelete.size() > untrackedSegmentsDeletionBatchSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the logic to pick as per the configured batch size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ideal way would be to do pagination on the FS and bring only untrackedSegmentsDeletionBatchSize number of files/ segments from deepstore.
We can add this improvement later.
List<SegmentZKMetadata> segmentZKMetadataList, int untrackedSegmentsDeletionBatchSize) { | ||
List<String> segmentsToDelete = new ArrayList<>(); | ||
|
||
if (untrackedSegmentsDeletionBatchSize <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging and returning early in case it's set to zero to avoid listing all the files in the deepstore.
Context
Followup of this PR: #15048
The first PR ensured that we don't miss any segments that are created from now on.
This PR aims to fix the orphan segments that are present in deepstore and have passed the retention time but are neither present in ZK or IdealState.
Scope of the PR.
The PR aims to handle two scenarios:
The RententionManager will not be able to delete a segment in the subsequent runs if the controller restarted etc. between the last two steps i.e. any failure between the last two steps of the process.
Testing
Update
After analyzing the test results:
https://docs.google.com/document/d/1ZvNefSsRL716NspQc1f5VrRYjd9CSqocac5TIfLJWPA/edit?usp=sharing
I concluded that limiting the number of segments deleted in a single run would be beneficial. To address this, I have introduced a configurable batch size for segment deletion. (
untrackedSegmentsDeletionBatchSize
part ofSegmentsValidationAndRetentionConfig
)Reasons for This Change
Deletion
respond
but the deletion will stayblocked