Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support snapshots configurations in stats collector, fix bugs where snapshot retention didn't support past DAY format #291

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Will-Lo
Copy link
Collaborator

@Will-Lo Will-Lo commented Feb 26, 2025

Summary

Issue Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.

We want to be able to emit events with user tables that have the history configuration for them so that monitoring systems can accurately detect when snapshot expiration fails for configured tables.

This PR also fixes a bug where not all date granularities were supported in retention, since TimeUnit is maxed granularity at DAYS when technically the snapshot policy can be kept at higher granularities of MONTH and YEAR, although we currently still limit it to 3 DAYS.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Unit tests, tested snapshot version and ran tablestatscollector job in a cluster on Spark, did not see any errors.

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

@Will-Lo Will-Lo changed the title [WIP] Support snapshots configurations in stats collector, fix bugs where s… Support snapshots configurations in stats collector, fix bugs where s… Feb 27, 2025
@Will-Lo Will-Lo changed the title Support snapshots configurations in stats collector, fix bugs where s… Support snapshots configurations in stats collector, fix bugs where snapshot retention didn't support past DAY format Feb 27, 2025
@@ -336,6 +343,9 @@ private static Map<String, Object> getTablePolicies(Table table) {
policyMap.put(
"sharingEnabled", Boolean.valueOf(policiesObject.get("sharingEnabled").getAsString()));
}
if (policiesObject.get("history") != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this edge case tested?

Copy link
Collaborator Author

@Will-Lo Will-Lo Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup it's tested as part of this test which sets up the history policy json on the backend. Also tested manually with a table with a configured history policy.

public void testCollectHistoryPolicyStatsWithSnapshots() throws Exception {

private static HistoryPolicyStatsSchema buildHistoryPolicy(
Map<String, Object> historyPolicy, Long currentSnapshotTimestamp) {
String granularity = (String) historyPolicy.getOrDefault("granularity", null);
Integer maxAge = Integer.valueOf((String) historyPolicy.getOrDefault("maxAge", "0"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these defaults accurately represent the table?

aren't there global defaults that apply in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good callout, I will change it to the current defaults of 3 days. I was also thinking of representing unconfigured history tables as null but in practice they'd have 3 days snapshot retention

@@ -30,6 +30,8 @@ public class IcebergTableStats extends BaseTableMetadata {

private Long oldestSnapshotTimestamp;

private Long numSnapshots;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. this sounds like a good time to also add secondOldestSnapshotTimestamp. wdyt?

this will address the LONG standing and very noisy bug of false positive snapshot expiration alerts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more about the false positives that we are seeing? Currently we keep track of the oldest snapshot and the newest snapshot, wondering what value the secondOldestSnapshot would bring. I can add it though if needed.

Assertions.assertEquals(stats.getSharingEnabled(), false);
Assertions.assertEquals(stats.getHistoryPolicy().getMaxAge(), 2);
Assertions.assertEquals(stats.getHistoryPolicy().getNumVersions(), 20);
Assertions.assertEquals(stats.getHistoryPolicy().getDateGranularity(), "DAY");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we added on test for HOURS as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants