Add pipeline uninstall mechanism #229

jtescher · 2020-09-24T19:26:14Z

In order to address the issues raised in #225, the following changes are proposed:

Fix the current bug where futures are being dropped instead of properly awaited by updating:

- let spawn = |future| tokio::task::spawn_blocking(|| future);
+ let spawn = |future| tokio::task::spawn_blocking(|| futures::executor::block_on(future));

Because spawn_blocking will now block the program from exiting until the future completes, add a drop guard to pipeline install methods which uninstalls the global tracer provider (and in the otlp case in the future also the meter provider) so the exporters properly shut down, causing the future to resolve.

Alternatives considered here could be:

a) let users unset the global(s) themselves (but they would likely forget, causing the same blocking behavior)
b) return a method that could be called instead of a drop guard (similar problems)
c) Current impl uses custom uninstall objects per exporter to make the struct opentelemetry_jaeger::Uninstall rather than a more generic global::TracerProviderGuard or global::UninstallPipeline, alternative name suggestions could be helpful if they improve clarity.
c) Return a pipeline object that contains a Tracer, and could be the drop guard itself, maybe cleaner API but less explicit, possibly a better choice.
Because Tracers currently retain strong references to their TracerProvider, and shutdown is called on the inner's drop, it is still never called. To fix this, update the Tracer to instead contain a Weak<TracerProviderInner> reference, which can be upgraded when needed. This does force other logic like span creation to consider the case where the TracerProvider has been dropped on shutdown, but making this case explicit seems beneficial in most cases.
Additional clarity/consistency tweaks:
- Rename global::set_provider to global::set_tracer_provider
- Rename global::trace_provider to global::tracer_provider
- Rename ProviderInner to TracerProviderInner

Fixes #225

In order to address the issues raised in #225, the following changes are proposed: 1) Fix the current bug where futures are being dropped instead of properly awaited by updating: ```diff - let spawn = |future| tokio::task::spawn_blocking(|| future); + let spawn = |future| tokio::task::spawn_blocking(|| futures::executor::block_on(future)); ``` 2) Because `spawn_blocking` will now block the program from exiting until the future completes, add a drop guard to pipeline install methods which uninstalls the global tracer provider (and in the otlp case in the future also the meter provider) so the exporters properly shut down, causing the future to resolve. Alternatives considered here could be: a) let users unset the global(s) themselves (but they would likely forget, causing the same blocking behavior) b) return a method that could be called instead of a drop guard (similar problems) c) Current impl uses custom uninstall objects per exporter to make the struct `opentelemetry_jaeger::Uninstall` rather than a more generic `global::TracerProviderGuard` or `global::UninstallPipeline`, alternative name suggestions could be helpful if they improve clarity. c) Return a pipeline object that contains a `Tracer`, and could be the drop guard itself, maybe cleaner API but less explicit, possibly a better choice. 3) Because `Tracer`s currently retain strong references to their `TracerProvider`, and shutdown is called on the inner's drop, it is still never called. To fix this, update the `Tracer` to instead contain a `Weak<TracerProviderInner>` reference, which can be upgraded when needed. This does force other logic like span creation to consider the case where the `TracerProvider` has been dropped on shutdown, but making this case explicit seems beneficial in most cases. 4) Additional clarity/consistency tweaks: * Rename `global::set_provider` to `global::set_tracer_provider` * Rename `global::trace_provider` to `global::tracer_provider` * Rename `ProviderInner` to `TracerProviderInner`

jtescher · 2020-09-24T19:26:36Z

cc @frigus02

frigus02

Nice. Thanks for working on this. I left 2 tiny comments, but it's probably better to focus on the overall direction.

Regarding the Uninstall struct returned from the exporters:

I like the name. "Uninstall" is quite clear to me. It doesn't immedately tell me it's going to happen on drop, which something like PipelineGuard might do. But I can't say if that's actually a better name. I think wouldn't mind returning global::TracerProviderGuard directly, either. But really, Uninstall seems good 🙂.
I'm not sure if returning the tuple of a pipeline object is better. The tuple is quite convenient to destructure. 🤔

The additional tweaks make sense to me. 👍

I'm not entirely convinced using weak references in Tracers is a good idea. I'm still thinking about it but these are my thoughts:

What if I don't install a global tracer provider? This could be an option when using tracing-opentelemetry or maybe if I'm writing a performance critical application? In any case, if I understand it correctly I would have to keep at least 1 strong reference to the provider around, which is not obvious.
In general I like the idea of calling shutdown on drop. I think this means whenever there is a span or tracer somewhere, there is definitelty still a reference to the provider, so the pipeline is still running.
Weak references are only needed because we keep references to named tracers. I might be missing something but I don't understand the value this brings at the moment. It doesn't save memory because we're cloning anyway. And there is no logic happening on tracer initialization, so we don't save CPU, either. Would we loose anything if we remove the named tracers HashMap? 🤔

examples/aws-xray/src/client.rs

README.md

jtescher · 2020-09-24T22:04:14Z

@frigus02 great point, probably better to just drop the map. The locking is likely a higher cost than initializing a new tracer (even if they were reduced to Arc strong increments), so this may be a good idea in either case.

I'm not totally convinced that uninstall is the best name, but on the other hand I'm not sure how familiar the rust community is generally with drop guards, this option seems a little more oriented to devs who want things to just work.

The pipeline object may have the benefit of being less of a breaking change to update things, e.g. not totally clear for OTLP if people would prefer one pipeline initialization vs separate metrics and trace pipelines 🤷‍♂️. But generally I like the simplicity of the tuple destructure. The current impl may optimize slightly more for the getting started example readability, which has benifits as that may be how most devs will interact with otel in general.

Co-authored-by: Jan Kühle <[email protected]>

TommyCpp · 2020-09-24T23:25:49Z

Checked with the current version of spec about named tracer.

It is unspecified whether or under which conditions the same or different Tracer instances are returned from this functions.

Didn't really say we need to return the same tracer instance given the same instrument library info. We just need to make sure they can pick up the configuration change, which could be done by keeping a reference of TracerProvider in Tracer. So it should be good to drop the map

frigus02 · 2020-09-25T07:33:58Z

I'm not totally convinced that uninstall is the best name

Yeah, it might be better to be consistent with other Rust APIs and use something like a PipelineGuard.

I also like the idea of a Pipeline object because of it's extensibility. But I'm not sure if the drop behaviour is obvious in that case. Althought this might be fine if we add a big note in the docs 🙂

jtescher · 2020-09-25T16:00:28Z

@frigus02 removed the provider's tracer cache 👍

Weak references are only needed because we keep references to named tracers.

The other reason to switch to weak refs is because it is reasonable for other systems to store tracers or objects with references to tracers in thread locals or other effectively 'static ways (e.g. tracing stores a tracer in the otel layer impl).

frigus02 · 2020-09-25T16:03:09Z

Yeah, that's true. Maybe we need to document that tracers will stop working if their provider is not stored in any strong reference.

jtescher · 2020-09-25T16:31:34Z

A larger alternative would instead be to switch exporters to be async, allowing spawn instead of spawn_blocking which may remove the need to uninstall? Not sure if there are other reasons even in that case where we would want an explicit uninstall.

frigus02 · 2020-09-26T08:13:54Z

Thinking about it again, I think I would prefer if tracing-opentelemetry would be responsible to provide a way to release the Tracer reference, similar to the Uninstall guard in this PR. It feels like a slightly more intuitive API without weak references. But that might well be subjective. 🙂

A larger alternative would instead be to switch exporters to be async, allowing spawn instead of spawn_blocking which may remove the need to uninstall? Not sure if there are other reasons even in that case where we would want an explicit uninstall.

Async exporters could be nice. I wonder if that eliminates the need for a controlled shutdown, though. We would still need to notify the batch processor to export the last spans and we want to wait for this to finish before exiting the program.

We actually still have this issue with async-std with the implementation in this PR. It seems to be a tokio thing to automatically wait for blocking tasks to finish.

I played a bit with this recently. This makes the BatchSpanProcessor wait for its worker to finish on shutdown: frigus02@5dbe15d. This works for async-std and tokio now and I think it should also work for async exporters later. If that looks alright, I can submit this as a follow up PR. Or feel free to copy-paste it here.

Co-authored-by: Jan Kühle <[email protected]>

jtescher · 2020-09-26T19:42:48Z

@frigus02 good point, I applied your patch with a slight change to generics to allow tokio::spawn to be used directly as we aren't actually interested in the results 👍

jtescher · 2020-09-26T20:00:37Z

Thinking about it again, I think I would prefer if tracing-opentelemetry would be responsible to provide a way to release the Tracer reference, similar to the Uninstall guard in this PR. It feels like a slightly more intuitive API without weak references. But that might well be subjective. 🙂

Seems like the two architectural choices here are drop guards vs explicit method calls, and weak refs vs other mechanism for storing shutdown state and making subsequent spans no-ops?

I think I'd lean toward drop guards as they are trickier to use incorrectly (although assigning to _ is one way people will likely run into issues), I don't have as strong feelings about weak refs vs some other no-op state mechanism as they seem like internal implementation details. Did you have other alternatives or tradeoffs in mind?

frigus02 · 2020-09-26T20:14:15Z

I think I'd lean toward drop guards as they are trickier to use incorrectly

I think I agree.

I don't have as strong feelings about weak refs vs some other no-op state mechanism as they seem like internal implementation details.

My only concern with Tracer having a weak reference to the TraceProvider is that code like this would not work anymore, because the provider would be dropped after init_tracer:

fn init_tracer() -> sdk::Tracer {
    let provider = sdk::TracerProvider::builder().with_exporter(...).build();
    provider.get_tracer("", None)
}

fn main() {
    let tracer = init_tracer();
    tracer.in_span(...);
}

That could also be solved with documentation. I probably prefer the way it currently works. But that might just be because I'm used to it 🙂.

frigus02

Again, thanks for working on this!

I mentioned the downside of having to keep a TracerProvider around, which we may want to document. And everything else looks great 🎉

There are 2 READMEs, which we should update as well:

opentelemetry-jaeger/README.md
opentelemetry-otlp/README.md

src/sdk/trace/span_processor.rs

examples/basic-otlp/src/main.rs

jtescher · 2020-09-26T20:21:00Z

@frigus02 yeah seems like that is a casualty of the drop guard choice, with an explicit call to shutdown that example could work, but with guards it is tricky (in that example it would need some method of shutting down if the batch exporter was used in the builder)

Co-authored-by: Jan Kühle <[email protected]>

frigus02 · 2020-09-26T20:23:39Z

Hm, if the Tracer owns a clone of the TracerProvider it should work, I think. When the last Tracer is dropped, the TracerProviderInner would drop as well and can shutdown the span processor. 🤔

Edit: However, it might actually be more intuitive with the way it works in this PR, because it's easy to see when the TracerProvider is dropped. Which means it's also easy to see when span processors will shutdown.

Co-authored-by: Jan Kühle <[email protected]>

TommyCpp

Not too familar with async, but the rest changes looks good. Nice job.

jtescher · 2020-09-27T20:09:32Z

Ok can merge for now to keep things going. If there are clearer name ideas or feedback about how to improve this we can add in subsequent PR's 👍

jtescher · 2020-09-27T20:14:02Z

@frigus02 if you want to be a member of the otel org (requirements) I'm happy to sponsor you, just have open an issue here to get added.

frigus02 · 2020-09-28T20:14:10Z

Yeah, why not. The responsibilities are reasonable. Sounds like I need one more sponsor. @TommyCpp would you be up for that? 🙂

TommyCpp · 2020-09-28T21:16:56Z

@frigus02 More than happy to 😄! Just added me as sponsor in the issue.

jtescher requested a review from a team September 24, 2020 19:26

frigus02 reviewed Sep 24, 2020

View reviewed changes

examples/aws-xray/src/client.rs Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

Fix guard nits

9d0163b

Co-authored-by: Jan Kühle <[email protected]>

Remove tracer provider's tracer map

699d981

jtescher and others added 2 commits September 26, 2020 12:40

Await worker future on shutdown

e6c8f9d

Co-authored-by: Jan Kühle <[email protected]>

Merge branch 'master' into uninstall-pipelines

6d6e957

frigus02 approved these changes Sep 26, 2020

View reviewed changes

src/sdk/trace/span_processor.rs Outdated Show resolved Hide resolved

examples/basic-otlp/src/main.rs Outdated Show resolved Hide resolved

Update examples/basic-otlp/src/main.rs

914f454

Co-authored-by: Jan Kühle <[email protected]>

Update span processor shutdown to take &mut self

3725755

Co-authored-by: Jan Kühle <[email protected]>

TommyCpp approved these changes Sep 26, 2020

View reviewed changes

Update jaeger and otlp readmes

7a45c74

frigus02 approved these changes Sep 27, 2020

View reviewed changes

jtescher merged commit 6e10eb2 into master Sep 27, 2020

jtescher deleted the uninstall-pipelines branch September 27, 2020 20:09

cijothomas mentioned this pull request Aug 17, 2023

Perf bottleneck when Loggers upgrade weak ref to LoggerProvider #1209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline uninstall mechanism #229

Add pipeline uninstall mechanism #229

jtescher commented Sep 24, 2020 •

edited

Loading

jtescher commented Sep 24, 2020

frigus02 left a comment •

edited

Loading

jtescher commented Sep 24, 2020

TommyCpp commented Sep 24, 2020

frigus02 commented Sep 25, 2020

jtescher commented Sep 25, 2020 •

edited

Loading

frigus02 commented Sep 25, 2020 •

edited

Loading

jtescher commented Sep 25, 2020

frigus02 commented Sep 26, 2020

jtescher commented Sep 26, 2020

jtescher commented Sep 26, 2020

frigus02 commented Sep 26, 2020

frigus02 left a comment

jtescher commented Sep 26, 2020

frigus02 commented Sep 26, 2020 •

edited

Loading

TommyCpp left a comment

jtescher commented Sep 27, 2020

jtescher commented Sep 27, 2020

frigus02 commented Sep 28, 2020

TommyCpp commented Sep 28, 2020

Add pipeline uninstall mechanism #229

Add pipeline uninstall mechanism #229

Conversation

jtescher commented Sep 24, 2020 • edited Loading

jtescher commented Sep 24, 2020

frigus02 left a comment • edited Loading

Choose a reason for hiding this comment

jtescher commented Sep 24, 2020

TommyCpp commented Sep 24, 2020

frigus02 commented Sep 25, 2020

jtescher commented Sep 25, 2020 • edited Loading

frigus02 commented Sep 25, 2020 • edited Loading

jtescher commented Sep 25, 2020

frigus02 commented Sep 26, 2020

jtescher commented Sep 26, 2020

jtescher commented Sep 26, 2020

frigus02 commented Sep 26, 2020

frigus02 left a comment

Choose a reason for hiding this comment

jtescher commented Sep 26, 2020

frigus02 commented Sep 26, 2020 • edited Loading

TommyCpp left a comment

Choose a reason for hiding this comment

jtescher commented Sep 27, 2020

jtescher commented Sep 27, 2020

frigus02 commented Sep 28, 2020

TommyCpp commented Sep 28, 2020

jtescher commented Sep 24, 2020 •

edited

Loading

frigus02 left a comment •

edited

Loading

jtescher commented Sep 25, 2020 •

edited

Loading

frigus02 commented Sep 25, 2020 •

edited

Loading

frigus02 commented Sep 26, 2020 •

edited

Loading