Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pipeline uninstall mechanism #229

Merged
merged 8 commits into from
Sep 27, 2020
Merged

Add pipeline uninstall mechanism #229

merged 8 commits into from
Sep 27, 2020

Conversation

jtescher
Copy link
Member

@jtescher jtescher commented Sep 24, 2020

In order to address the issues raised in #225, the following changes are proposed:

  1. Fix the current bug where futures are being dropped instead of properly awaited by updating:
- let spawn = |future| tokio::task::spawn_blocking(|| future);
+ let spawn = |future| tokio::task::spawn_blocking(|| futures::executor::block_on(future));
  1. Because spawn_blocking will now block the program from exiting until the future completes, add a drop guard to pipeline install methods which uninstalls the global tracer provider (and in the otlp case in the future also the meter provider) so the exporters properly shut down, causing the future to resolve.

    Alternatives considered here could be:

    a) let users unset the global(s) themselves (but they would likely forget, causing the same blocking behavior)
    b) return a method that could be called instead of a drop guard (similar problems)
    c) Current impl uses custom uninstall objects per exporter to make the struct opentelemetry_jaeger::Uninstall rather than a more generic global::TracerProviderGuard or global::UninstallPipeline, alternative name suggestions could be helpful if they improve clarity.
    c) Return a pipeline object that contains a Tracer, and could be the drop guard itself, maybe cleaner API but less explicit, possibly a better choice.

  2. Because Tracers currently retain strong references to their TracerProvider, and shutdown is called on the inner's drop, it is still never called. To fix this, update the Tracer to instead contain a Weak<TracerProviderInner> reference, which can be upgraded when needed. This does force other logic like span creation to consider the case where the TracerProvider has been dropped on shutdown, but making this case explicit seems beneficial in most cases.

  3. Additional clarity/consistency tweaks:

    • Rename global::set_provider to global::set_tracer_provider
    • Rename global::trace_provider to global::tracer_provider
    • Rename ProviderInner to TracerProviderInner

Fixes #225

In order to address the issues raised in #225, the following changes are
proposed:

1) Fix the current bug where futures are being dropped instead of
properly awaited by updating:

```diff
- let spawn = |future| tokio::task::spawn_blocking(|| future);
+ let spawn = |future| tokio::task::spawn_blocking(|| futures::executor::block_on(future));
```

2) Because `spawn_blocking` will now block the program from exiting
until the future completes, add a drop guard to pipeline install methods
which uninstalls the global tracer provider (and in the otlp case in the
future also the meter provider) so the exporters properly shut down,
causing the future to resolve.

    Alternatives considered here could be:

    a) let users unset the global(s) themselves (but they would likely
forget, causing the same blocking behavior)
    b) return a method that could be called instead of a drop guard
(similar problems)
    c) Current impl uses custom uninstall objects per exporter to make
the struct `opentelemetry_jaeger::Uninstall` rather than a more generic
`global::TracerProviderGuard` or `global::UninstallPipeline`,
alternative name suggestions could be helpful if they improve clarity.
    c) Return a pipeline object that contains a `Tracer`, and could be
the drop guard itself, maybe cleaner API but less explicit, possibly a
better choice.

3) Because `Tracer`s currently retain strong references to their
`TracerProvider`, and shutdown is called on the inner's drop, it is
still never called. To fix this, update the `Tracer` to instead contain
a `Weak<TracerProviderInner>` reference, which can be upgraded when
needed. This does force other logic like span creation to consider the
case where the `TracerProvider` has been dropped on shutdown, but making
this case explicit seems beneficial in most cases.

4) Additional clarity/consistency tweaks:

    * Rename `global::set_provider` to `global::set_tracer_provider`
    * Rename `global::trace_provider` to `global::tracer_provider`
    * Rename `ProviderInner` to `TracerProviderInner`
@jtescher jtescher requested a review from a team September 24, 2020 19:26
@jtescher
Copy link
Member Author

cc @frigus02

Copy link
Member

@frigus02 frigus02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thanks for working on this. I left 2 tiny comments, but it's probably better to focus on the overall direction.

Regarding the Uninstall struct returned from the exporters:

  • I like the name. "Uninstall" is quite clear to me. It doesn't immedately tell me it's going to happen on drop, which something like PipelineGuard might do. But I can't say if that's actually a better name. I think wouldn't mind returning global::TracerProviderGuard directly, either. But really, Uninstall seems good 🙂.
  • I'm not sure if returning the tuple of a pipeline object is better. The tuple is quite convenient to destructure. 🤔

The additional tweaks make sense to me. 👍

I'm not entirely convinced using weak references in Tracers is a good idea. I'm still thinking about it but these are my thoughts:

  • What if I don't install a global tracer provider? This could be an option when using tracing-opentelemetry or maybe if I'm writing a performance critical application? In any case, if I understand it correctly I would have to keep at least 1 strong reference to the provider around, which is not obvious.
  • In general I like the idea of calling shutdown on drop. I think this means whenever there is a span or tracer somewhere, there is definitelty still a reference to the provider, so the pipeline is still running.
  • Weak references are only needed because we keep references to named tracers. I might be missing something but I don't understand the value this brings at the moment. It doesn't save memory because we're cloning anyway. And there is no logic happening on tracer initialization, so we don't save CPU, either. Would we loose anything if we remove the named tracers HashMap? 🤔

@jtescher
Copy link
Member Author

@frigus02 great point, probably better to just drop the map. The locking is likely a higher cost than initializing a new tracer (even if they were reduced to Arc strong increments), so this may be a good idea in either case.

I'm not totally convinced that uninstall is the best name, but on the other hand I'm not sure how familiar the rust community is generally with drop guards, this option seems a little more oriented to devs who want things to just work.

The pipeline object may have the benefit of being less of a breaking change to update things, e.g. not totally clear for OTLP if people would prefer one pipeline initialization vs separate metrics and trace pipelines 🤷‍♂️. But generally I like the simplicity of the tuple destructure. The current impl may optimize slightly more for the getting started example readability, which has benifits as that may be how most devs will interact with otel in general.

Co-authored-by: Jan Kühle <[email protected]>
@TommyCpp
Copy link
Contributor

Checked with the current version of spec about named tracer.

It is unspecified whether or under which conditions the same or different Tracer instances are returned from this functions.

Didn't really say we need to return the same tracer instance given the same instrument library info. We just need to make sure they can pick up the configuration change, which could be done by keeping a reference of TracerProvider in Tracer. So it should be good to drop the map

@frigus02
Copy link
Member

I'm not totally convinced that uninstall is the best name

Yeah, it might be better to be consistent with other Rust APIs and use something like a PipelineGuard.

I also like the idea of a Pipeline object because of it's extensibility. But I'm not sure if the drop behaviour is obvious in that case. Althought this might be fine if we add a big note in the docs 🙂

@jtescher
Copy link
Member Author

jtescher commented Sep 25, 2020

@frigus02 removed the provider's tracer cache 👍

Weak references are only needed because we keep references to named tracers.

The other reason to switch to weak refs is because it is reasonable for other systems to store tracers or objects with references to tracers in thread locals or other effectively 'static ways (e.g. tracing stores a tracer in the otel layer impl).

@frigus02
Copy link
Member

frigus02 commented Sep 25, 2020

Yeah, that's true. Maybe we need to document that tracers will stop working if their provider is not stored in any strong reference.

@jtescher
Copy link
Member Author

A larger alternative would instead be to switch exporters to be async, allowing spawn instead of spawn_blocking which may remove the need to uninstall? Not sure if there are other reasons even in that case where we would want an explicit uninstall.

@frigus02
Copy link
Member

Thinking about it again, I think I would prefer if tracing-opentelemetry would be responsible to provide a way to release the Tracer reference, similar to the Uninstall guard in this PR. It feels like a slightly more intuitive API without weak references. But that might well be subjective. 🙂

A larger alternative would instead be to switch exporters to be async, allowing spawn instead of spawn_blocking which may remove the need to uninstall? Not sure if there are other reasons even in that case where we would want an explicit uninstall.

Async exporters could be nice. I wonder if that eliminates the need for a controlled shutdown, though. We would still need to notify the batch processor to export the last spans and we want to wait for this to finish before exiting the program.

We actually still have this issue with async-std with the implementation in this PR. It seems to be a tokio thing to automatically wait for blocking tasks to finish.

I played a bit with this recently. This makes the BatchSpanProcessor wait for its worker to finish on shutdown: frigus02@5dbe15d. This works for async-std and tokio now and I think it should also work for async exporters later. If that looks alright, I can submit this as a follow up PR. Or feel free to copy-paste it here.

@jtescher
Copy link
Member Author

@frigus02 good point, I applied your patch with a slight change to generics to allow tokio::spawn to be used directly as we aren't actually interested in the results 👍

@jtescher
Copy link
Member Author

Thinking about it again, I think I would prefer if tracing-opentelemetry would be responsible to provide a way to release the Tracer reference, similar to the Uninstall guard in this PR. It feels like a slightly more intuitive API without weak references. But that might well be subjective. 🙂

Seems like the two architectural choices here are drop guards vs explicit method calls, and weak refs vs other mechanism for storing shutdown state and making subsequent spans no-ops?

I think I'd lean toward drop guards as they are trickier to use incorrectly (although assigning to _ is one way people will likely run into issues), I don't have as strong feelings about weak refs vs some other no-op state mechanism as they seem like internal implementation details. Did you have other alternatives or tradeoffs in mind?

@frigus02
Copy link
Member

I think I'd lean toward drop guards as they are trickier to use incorrectly

I think I agree.

I don't have as strong feelings about weak refs vs some other no-op state mechanism as they seem like internal implementation details.

My only concern with Tracer having a weak reference to the TraceProvider is that code like this would not work anymore, because the provider would be dropped after init_tracer:

fn init_tracer() -> sdk::Tracer {
    let provider = sdk::TracerProvider::builder().with_exporter(...).build();
    provider.get_tracer("", None)
}

fn main() {
    let tracer = init_tracer();
    tracer.in_span(...);
}

That could also be solved with documentation. I probably prefer the way it currently works. But that might just be because I'm used to it 🙂.

Copy link
Member

@frigus02 frigus02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for working on this!

I mentioned the downside of having to keep a TracerProvider around, which we may want to document. And everything else looks great 🎉

There are 2 READMEs, which we should update as well:

  • opentelemetry-jaeger/README.md
  • opentelemetry-otlp/README.md

@jtescher
Copy link
Member Author

@frigus02 yeah seems like that is a casualty of the drop guard choice, with an explicit call to shutdown that example could work, but with guards it is tricky (in that example it would need some method of shutting down if the batch exporter was used in the builder)

@frigus02
Copy link
Member

frigus02 commented Sep 26, 2020

Hm, if the Tracer owns a clone of the TracerProvider it should work, I think. When the last Tracer is dropped, the TracerProviderInner would drop as well and can shutdown the span processor. 🤔

Edit: However, it might actually be more intuitive with the way it works in this PR, because it's easy to see when the TracerProvider is dropped. Which means it's also easy to see when span processors will shutdown.

Copy link
Contributor

@TommyCpp TommyCpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familar with async, but the rest changes looks good. Nice job.

@jtescher
Copy link
Member Author

Ok can merge for now to keep things going. If there are clearer name ideas or feedback about how to improve this we can add in subsequent PR's 👍

@jtescher jtescher merged commit 6e10eb2 into master Sep 27, 2020
@jtescher jtescher deleted the uninstall-pipelines branch September 27, 2020 20:09
@jtescher
Copy link
Member Author

@frigus02 if you want to be a member of the otel org (requirements) I'm happy to sponsor you, just have open an issue here to get added.

@frigus02
Copy link
Member

Yeah, why not. The responsibilities are reasonable. Sounds like I need one more sponsor. @TommyCpp would you be up for that? 🙂

@TommyCpp
Copy link
Contributor

@frigus02 More than happy to 😄! Just added me as sponsor in the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TraceProvider::with_exporter immediately drops BatchSpanProcessor
3 participants