Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When the flink jobs reads the arctic table for a while, the flink job fails #1063

Closed
1 task done
shendanfengg opened this issue Feb 7, 2023 · 0 comments · Fixed by #1130
Closed
1 task done
Assignees
Labels
module:mixed-flink Flink moduel for Mixed Format priority:blocker security, data-loss, correctness, etc. type:bug Something isn't working
Milestone

Comments

@shendanfengg
Copy link
Contributor

What happened?

When using flink to read the arctic table and setting ('arctic_read_mode'='file','streaming'='true'), after the flink job has been running for longer than a certain amount of time (probably longer than snapshot.change.keep.minutes), it will go to read a snapshot file that no longer exists and the flink job will fail

Affects Versions

0.3.x

What engines are you seeing the problem on?

Flink

How to reproduce

  1. Create table and set snapshot.change.keep.minutes = 360
  2. Read the arctic table with flink and set ('arctic_read_mode'='file', 'streaming'='true', 'scan.startup.mode'='latest'), then the task runs for longer than the time set in step 1

Relevant log output

2023-02-07 09:56:52,208 ERROR org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext [] - Exception while handling result from async call in SourceCoordinator-Source: com.netease.arctic.flink.read.ArcticSource. Triggering job failover.
org.apache.flink.util.FlinkRuntimeException: Failed to scan arctic table due to 
	at com.netease.arctic.flink.read.hybrid.enumerator.ArcticSourceEnumerator.handleResultOfSplits(ArcticSourceEnumerator.java:174) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$null$4(ExecutorNotifier.java:136) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.uti...
2023-02-07 09:56:52,212 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Trying to recover from a global failure.
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Source: com.netease.arctic.flink.read.ArcticSource' (operator feca28aff5a3958840bee985ee7de4d3).
	at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:564) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:223) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.failJob(SourceCoordinatorContext.java:285) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.handleUncaughtExceptionFromAsyncCall(SourceCoordinatorContext.java:298) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:42) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_77]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_77]
	at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_77]
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to scan arctic table due to 
	at com.netease.arctic.flink.read.hybrid.enumerator.ArcticSourceEnumerator.handleResultOfSplits(ArcticSourceEnumerator.java:174) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$null$4(ExecutorNotifier.java:136) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	... 3 more
Caused by: java.lang.IllegalArgumentException: from snapshot 4537158955858717973 does not exist
	at com.netease.arctic.shaded.org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:203) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at com.netease.arctic.shaded.org.apache.iceberg.IncrementalDataTableScan.validateSnapshotIds(IncrementalDataTableScan.java:147) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at com.netease.arctic.shaded.org.apache.iceberg.IncrementalDataTableScan.<init>(IncrementalDataTableScan.java:39) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at com.netease.arctic.shaded.org.apache.iceberg.DataTableScan.appendsBetween(DataTableScan.java:54) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at com.netease.arctic.flink.read.hybrid.enumerator.ContinuousSplitPlannerImpl.discoverIncrementalSplits(ContinuousSplitPlannerImpl.java:81) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at com.netease.arctic.flink.read.hybrid.enumerator.ContinuousSplitPlannerImpl.planSplits(ContinuousSplitPlannerImpl.java:67) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at com.netease.arctic.flink.read.hybrid.enumerator.ArcticSourceEnumerator.doPlanSplits(ArcticSourceEnumerator.java:168) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at com.netease.arctic.flink.read.hybrid.enumerator.ArcticSourceEnumerator.planSplits(ArcticSourceEnumerator.java:150) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at org.apache.flink.runtime.source.coordinator.ExecutorNotifier.lambda$notifyReadyAsync$5(ExecutorNotifier.java:133) ~[plugin_ne-flink-1.12.4-1.1.7.1_scala2.12_hive2.3.6-release-3.9.4-1.4.5.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_77]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:1.8.0_77]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_77]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:1.8.0_77]
	... 3 more

Anything else

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@shendanfengg shendanfengg added type:bug Something isn't working priority:blocker security, data-loss, correctness, etc. labels Feb 7, 2023
@baiyangtx baiyangtx added the module:mixed-flink Flink moduel for Mixed Format label Feb 8, 2023
@shendanfengg shendanfengg added this to the Release 0.4.1 milestone Feb 16, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 19, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 19, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 19, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 19, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 19, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 19, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 20, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 20, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 20, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 20, 2023
lklhdu added a commit to lklhdu/arctic that referenced this issue Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixed-flink Flink moduel for Mixed Format priority:blocker security, data-loss, correctness, etc. type:bug Something isn't working
Projects
None yet
4 participants