-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove tracer provider guard. #444
Conversation
65aee8f
to
77d3ad3
Compare
Codecov Report
@@ Coverage Diff @@
## main #444 +/- ##
==========================================
- Coverage 47.73% 46.93% -0.81%
==========================================
Files 95 95
Lines 8798 8934 +136
==========================================
- Hits 4200 4193 -7
- Misses 4598 4741 +143
Continue to review full report at Codecov.
|
So for single thread runtime. One solution that would do the trick is spawn a new thread and a new runtime on it to run the batch processor. That solve the deadlock issue. For example let spawn = |box_future: BoxFuture<'static, ()>| {
thread::spawn(move || {
let rt = tokio::runtime::Builder::new_current_thread().enable_all().build().unwrap();
rt.block_on(box_future);
});
}; (Need to handle the result and make some changes to batch processor also) |
That seems like a fine solution! |
21f1b56
to
fa19431
Compare
May want to consider if any of these potentially deadlock if used in tests which execute in parallel. |
I don't think it will cause deadlock as we guard the global tracer provider with a lock. But if two tests try to modify the global tracer provider at the same time, the result will likely be non-deterministic. Thus, I run tests in |
This add a suite of tests to see in different runtimes does global tracer provider has trouble to shutdown.
8dcfcd2
to
d65f1f9
Compare
…kio runtime use case. To improve our performance, it would be good if we can use tokio::spawn instead of tokio::spawn_blocking. The problem with using tokio::spawn to spawn background task that send spans is when shutting down, we need to block on the shutdown function and sending the spans at the same time. In multiple thread runtime, those tasks can be arranged in different thread. But in single runtime we will have a dead lock. The proposed solution is to spawn the background task in a separate thread from the tokio runtime.
8232412
to
0049e43
Compare
0049e43
to
c157726
Compare
Also removed two unnecessary dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
I used opentelemetry
recently for a small CLI and ran into 2 issues:
- Using the Tokio current thread runtime
- Waiting for remaining spans to export when also using
std::process::exit(1);
to specify an exit code
I'm beginning to think that I would prefer an async shutdown function that I could call explicitly. It seems to be the hardest thing to accidentally misuse. I had worked about the above mentioned issues by doing something like the following. When I tried to use this PR I added the shutdown function and it didn't work:
async fn main() -> i32 {
// install_pipeline ... create spans ...
}
fn main() {
let exit_code = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(async_main);
// opentelemetry::global::shutdown_tracer_provider(); <-- works without this line on master. spans don't export when I add this line using this PR
std::process::exit(exit_code);
}
The error I got was "send failed because receiver is gone" (using the Jaeger exporter). I think this makes sense because I tried to shutdown the pipeline after shutting down the Tokio runtime. If shutdown_tracer_provider
was async, I wouldn't have tried this code because I couldn't have awaited the future.
That said, though, this PR already seems like a huge improvement. Both of the following much simpler versions of my main function worked just fine based on this PR.
#[tokio::main]
async fn main() {
let exit_code = async_main().await;
pipeline::shutdown_pipeline();
std::process::exit(exit_code);
}
#[tokio::main(flavor = "current_thread")]
async fn main() {
let exit_code = async_main().await;
pipeline::shutdown_pipeline();
std::process::exit(exit_code);
}
So to me this is definitely a nice improvement. And having the tests to make sure we don't regress is nice as well. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this @TommyCpp, looks good to me. Users may forget to call shutdown and miss a few last unexported spans, but that is preferable to the current situation where shutdown can happen silently and no spans are exported at all.
The global function name seems fine to me. I think I would love to unify the naming on the different runtime features. Maybe something alone the lines of We could possibly even throw compile errors if more than one runtime feature is activated. I saw this pattern in another project. Not sure if this requires any special features (translated to our use case but untested): #[cfg(not(any(feature = "rt-tokio", feature = "rt-tokio-current-thread", feature = "rt-tokio-async-std")))]
#[cfg(all(feature = rt-tokio", feature = "rt-tokio-current-thread"))]
#[cfg(all(feature = rt-tokio", feature = "rt-async-std"))]
#[cfg(all(feature = rt-tokio-current-thread", feature = "rt-async-std"))]
compile_error!(
"exactly one of the features ['rt-tokio', 'rt-tokio-current-thread', 'rt-async-std'] must be enabled"
); But I'm happy to discuss both of those things in separate PRs. |
Sounds good to me. |
I think the current approach when multiple conflicting features are enabled is to choose one of them. For example, enable I'd vote to change this behavior but it seems we should address this in a different PR. |
Definitely agree, this should be a different PR. |
This is good to merge I guess? |
… same time. As per open-telemetry#444 (comment). We should try to throw compile error when users set two exclusive features instead of default to use one of them.
… same time. As per open-telemetry#444 (comment). We should try to throw compile error when users set two exclusive features instead of default to use one of them.
… same time. As per open-telemetry#444 (comment). We should try to throw compile error when users set two exclusive features instead of default to use one of them.
Resolve #427
Resolve #364
Problem statement
The problem we have is how to allow users to send remaining spans before exits. The current approach works for multiple thread tokio runtime, but not really working for the current thread tokio runtime.
Also currently we allow the users to use multiple tracer providers throughout the application, which I believe is not a common situation. Most people should have one global tracer provider.
Proposed solution
We currently have the
set_tracer_provider
function to set the tracer provider. It currently will return a guard, the drop of the guard will trigger the shutdown of the tracer provider.the function instead of returning the guard, we return the replaced tracer provider. If users need to use multiple tracer providers. They can manage those tracer providers on their own.
In order to shut down tracer provider properly. We could have a
shut_down_provider
function that will block in place until the current tracer provider shuts down. The problem then becomes that how do we spawn the background task so that it will not block forever when shuts down.Different situations
The objective is if users
shut_down_provider
before exit, the remaining spans inBatchSpanProcessor
would sending all remaining spans. If not, then all remaining spans in it will be dropped.Tried a few different set up to see if we can do it.
BatchProcessor
shut_down_provider
before exit?tokio::spawn
tokio::spawn
tokio::spawn
tokio::spawn
* spawn_blocking =
|fut| tokio::spawn_blocking(|| futures::executor::block_on(fut))
So my theory on why
opentelemetry
hang forever when using the single thread tokio:when
shutdown
BatchSpanProcessor
. We need to wait for the worker task to finish. But by blocking here, we prevent the worker task from being polled. If we don't block and wait here. When theshutdown
finishes, the runtime will go ahead and drop the worker task, so we can't guarantee the spans inBatchSpanProcessor
can be sent.opentelemetry-rust/opentelemetry/src/sdk/trace/span_processor.rs
Lines 232 to 240 in f06f38b
I have been proking around and tried random stuff on it but none of them seem to work. So would love some advice here as I believe some of you might have more experience on how tokio works.
Final solution
set_tracer_provider
function should return the replaced tracer provider.shut_down_tracer_provider
function will be used to shut down the tracer provider.tokio::spawn
will be used to spawn background tasks in BatchProcessor if the runtime has multiple threads.