Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-1940] Support list partitions via flink catalog #1994

Merged
merged 7 commits into from
Oct 11, 2023

Conversation

huyuanfeng2018
Copy link
Contributor

@huyuanfeng2018 huyuanfeng2018 commented Sep 18, 2023

Why are the changes needed?

Close #1940

Brief change log

  • Implement the listpartitions method in ArcticCatalog

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added the module:mixed-flink Flink moduel for Mixed Format label Sep 18, 2023
@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

All modified lines are covered by tests ✅

see 50 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@YesOrNo828
Copy link
Contributor

@huyuanfeng2018 Thanks for your contribution. The CI failed due to the checkstyle. Could you take a look?

@huyuanfeng2018
Copy link
Contributor Author

@huyuanfeng2018 Thanks for your contribution. The CI failed due to the checkstyle. Could you take a look?

done.

List<Object[]> data = new LinkedList<>();
data.add(new Object[]{1, "mark", "2023-10-01"});
data.add(new Object[]{2, "Gerry", "2023-10-02"});
data.add(new Object[]{RowKind.DELETE, 2, "Gerry", "2023-10-02"});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this, first insert a piece of data with ID 2 and then delete it. ListPartition cannot determine whether the partition 2023-10-02 really exists.
@YesOrNo828
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huyuanfeng2018 I don't think there's an excellent way to return the data's partitions. Additionally, the issue is present in the iceberg v2 table.
Perhaps we can add a comment to the method to clarify that the partitions returned in this scenario encompass all the partitions related to writing the data, including -D and -U.

cc @zhoujinsong WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to return this partition in this situation.

tables.add(arcticTable.asUnkeyedTable());
}
for (Table table : tables) {
try (CloseableIterable<FileScanTask> tasks = table.newScan().planFiles()) {
Copy link
Contributor

@shidayang shidayang Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed table Files is not equal change files + base files. you can use mixed plan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed table Files is not equal change files + base files. you can use mixed plan

I see that the logic in ams should be calculated using change files + base files.
https://sourcegraph.com/github.com/NetEase/amoro/-/blob/ams/server/src/main/java/com/netease/arctic/server/dashboard/ServerTableDescriptor.java?L273:37&popover=pinned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had talked with the auth of ServerTableDescriptor. The logic is error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use KeyedTable#newScan to plan files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use KeyedTable#newScan to plan files

It seems cannot get partition or file information through this.

image
There is no planfiles method, and plantask cannot get file and partition information.

Copy link
Contributor

@shidayang shidayang Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

planTasks() -> KeyedTableScanTask -> ArcticFileScanTask. ArcticFileScanTask is a FileScanTask.
or
planTasks() -> KeyedTableScanTask -> ArcticFileScanTask -> PrimaryKeyedFile. PrimaryKeyedFile is a ContentFile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

planTasks() -> KeyedTableScanTask -> ArcticFileScanTask. ArcticFileScanTask is a FileScanTask. or planTasks() -> KeyedTableScanTask -> ArcticFileScanTask -> PrimaryKeyedFile. PrimaryKeyedFile is a ContentFile

done

@zhoujinsong zhoujinsong merged commit b164ff3 into apache:master Oct 11, 2023
@huyuanfeng2018 huyuanfeng2018 deleted the list_partition branch October 11, 2023 02:08
@YesOrNo828
Copy link
Contributor

@huyuanfeng2018 Thanks for your contribution.

I find that ArcticCatalog.listPartitions(ObjectPath tablePath, CatalogPartitionSpec partitionSpec) and ArcticCatalog.listPartitionsByFilter(ObjectPath tablePath, List<Expression> filters) aren't completed.

Would you like to open new issues to complete?

@huyuanfeng2018
Copy link
Contributor Author

@huyuanfeng2018 Thanks for your contribution.

I find that ArcticCatalog.listPartitions(ObjectPath tablePath, CatalogPartitionSpec partitionSpec) and ArcticCatalog.listPartitionsByFilter(ObjectPath tablePath, List<Expression> filters) aren't completed.

Would you like to open new issues to complete?

Okay, I will open a new issue to complete it

czy006 added a commit to czy006/amoro that referenced this pull request Oct 11, 2023
ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
* ArcticCatalog support listPartitions

* add test case

* checkstyle

* resolve conflicts for flink1.12

* fix

* spotless apply

---------

Co-authored-by: huyuanfeng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixed-flink Flink moduel for Mixed Format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement]: Support list partitions via flink catalog
4 participants