Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement][Spark]: Speed up commit process for Mixed Hive format tables #1350

Closed
3 tasks done
baiyangtx opened this issue Apr 14, 2023 · 0 comments · Fixed by #1463
Closed
3 tasks done

[Improvement][Spark]: Speed up commit process for Mixed Hive format tables #1350

baiyangtx opened this issue Apr 14, 2023 · 0 comments · Fixed by #1463
Labels
module:mixed-spark Spark module for Mixed Format type:improvement
Milestone

Comments

@baiyangtx
Copy link
Contributor

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

image

After running insert overwrite on mixed-hive table, stages are completed and cost much times to commit. The thread dump shows that stack blocked on List.contain() method, it could be improved by using Set.contain()

How should we improve?

image

replace this contain() method to Set.contain()

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

@baiyangtx baiyangtx added type:improvement module:mixed-spark Spark module for Mixed Format labels Apr 14, 2023
@baiyangtx baiyangtx added this to the Release 0.5.0 milestone Apr 14, 2023
@baiyangtx baiyangtx mentioned this issue Jun 7, 2023
27 tasks
@zhoujinsong zhoujinsong changed the title [Improvement][Spark]: Spark commit hive partition is too slow [Improvement][Spark]: Speed up commit process for Mixed Hive format tables Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixed-spark Spark module for Mixed Format type:improvement
Projects
None yet
1 participant