-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pipeline uninstall mechanism #229
Conversation
In order to address the issues raised in #225, the following changes are proposed: 1) Fix the current bug where futures are being dropped instead of properly awaited by updating: ```diff - let spawn = |future| tokio::task::spawn_blocking(|| future); + let spawn = |future| tokio::task::spawn_blocking(|| futures::executor::block_on(future)); ``` 2) Because `spawn_blocking` will now block the program from exiting until the future completes, add a drop guard to pipeline install methods which uninstalls the global tracer provider (and in the otlp case in the future also the meter provider) so the exporters properly shut down, causing the future to resolve. Alternatives considered here could be: a) let users unset the global(s) themselves (but they would likely forget, causing the same blocking behavior) b) return a method that could be called instead of a drop guard (similar problems) c) Current impl uses custom uninstall objects per exporter to make the struct `opentelemetry_jaeger::Uninstall` rather than a more generic `global::TracerProviderGuard` or `global::UninstallPipeline`, alternative name suggestions could be helpful if they improve clarity. c) Return a pipeline object that contains a `Tracer`, and could be the drop guard itself, maybe cleaner API but less explicit, possibly a better choice. 3) Because `Tracer`s currently retain strong references to their `TracerProvider`, and shutdown is called on the inner's drop, it is still never called. To fix this, update the `Tracer` to instead contain a `Weak<TracerProviderInner>` reference, which can be upgraded when needed. This does force other logic like span creation to consider the case where the `TracerProvider` has been dropped on shutdown, but making this case explicit seems beneficial in most cases. 4) Additional clarity/consistency tweaks: * Rename `global::set_provider` to `global::set_tracer_provider` * Rename `global::trace_provider` to `global::tracer_provider` * Rename `ProviderInner` to `TracerProviderInner`
cc @frigus02 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks for working on this. I left 2 tiny comments, but it's probably better to focus on the overall direction.
Regarding the Uninstall
struct returned from the exporters:
- I like the name. "Uninstall" is quite clear to me. It doesn't immedately tell me it's going to happen on drop, which something like
PipelineGuard
might do. But I can't say if that's actually a better name. I think wouldn't mind returningglobal::TracerProviderGuard
directly, either. But really,Uninstall
seems good 🙂. - I'm not sure if returning the tuple of a pipeline object is better. The tuple is quite convenient to destructure. 🤔
The additional tweaks make sense to me. 👍
I'm not entirely convinced using weak references in Tracer
s is a good idea. I'm still thinking about it but these are my thoughts:
- What if I don't install a global tracer provider? This could be an option when using
tracing-opentelemetry
or maybe if I'm writing a performance critical application? In any case, if I understand it correctly I would have to keep at least 1 strong reference to the provider around, which is not obvious. - In general I like the idea of calling shutdown on drop. I think this means whenever there is a span or tracer somewhere, there is definitelty still a reference to the provider, so the pipeline is still running.
- Weak references are only needed because we keep references to named tracers. I might be missing something but I don't understand the value this brings at the moment. It doesn't save memory because we're cloning anyway. And there is no logic happening on tracer initialization, so we don't save CPU, either. Would we loose anything if we remove the named tracers HashMap? 🤔
@frigus02 great point, probably better to just drop the map. The locking is likely a higher cost than initializing a new tracer (even if they were reduced to Arc strong increments), so this may be a good idea in either case. I'm not totally convinced that uninstall is the best name, but on the other hand I'm not sure how familiar the rust community is generally with drop guards, this option seems a little more oriented to devs who want things to just work. The pipeline object may have the benefit of being less of a breaking change to update things, e.g. not totally clear for OTLP if people would prefer one pipeline initialization vs separate metrics and trace pipelines 🤷♂️. But generally I like the simplicity of the tuple destructure. The current impl may optimize slightly more for the getting started example readability, which has benifits as that may be how most devs will interact with otel in general. |
Co-authored-by: Jan Kühle <[email protected]>
Checked with the current version of spec about named tracer.
Didn't really say we need to return the same tracer instance given the same instrument library info. We just need to make sure they can pick up the configuration change, which could be done by keeping a reference of |
Yeah, it might be better to be consistent with other Rust APIs and use something like a I also like the idea of a |
@frigus02 removed the provider's tracer cache 👍
The other reason to switch to weak refs is because it is reasonable for other systems to store tracers or objects with references to tracers in thread locals or other effectively |
Yeah, that's true. Maybe we need to document that tracers will stop working if their provider is not stored in any strong reference. |
A larger alternative would instead be to switch exporters to be async, allowing |
Thinking about it again, I think I would prefer if
Async exporters could be nice. I wonder if that eliminates the need for a controlled shutdown, though. We would still need to notify the batch processor to export the last spans and we want to wait for this to finish before exiting the program. We actually still have this issue with I played a bit with this recently. This makes the BatchSpanProcessor wait for its worker to finish on shutdown: frigus02@5dbe15d. This works for async-std and tokio now and I think it should also work for async exporters later. If that looks alright, I can submit this as a follow up PR. Or feel free to copy-paste it here. |
Co-authored-by: Jan Kühle <[email protected]>
@frigus02 good point, I applied your patch with a slight change to generics to allow |
Seems like the two architectural choices here are drop guards vs explicit method calls, and weak refs vs other mechanism for storing shutdown state and making subsequent spans no-ops? I think I'd lean toward drop guards as they are trickier to use incorrectly (although assigning to |
I think I agree.
My only concern with Tracer having a weak reference to the TraceProvider is that code like this would not work anymore, because the provider would be dropped after fn init_tracer() -> sdk::Tracer {
let provider = sdk::TracerProvider::builder().with_exporter(...).build();
provider.get_tracer("", None)
}
fn main() {
let tracer = init_tracer();
tracer.in_span(...);
} That could also be solved with documentation. I probably prefer the way it currently works. But that might just be because I'm used to it 🙂. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks for working on this!
I mentioned the downside of having to keep a TracerProvider around, which we may want to document. And everything else looks great 🎉
There are 2 READMEs, which we should update as well:
- opentelemetry-jaeger/README.md
- opentelemetry-otlp/README.md
@frigus02 yeah seems like that is a casualty of the drop guard choice, with an explicit call to shutdown that example could work, but with guards it is tricky (in that example it would need some method of shutting down if the batch exporter was used in the builder) |
Co-authored-by: Jan Kühle <[email protected]>
Hm, if the Tracer owns a clone of the TracerProvider it should work, I think. When the last Tracer is dropped, the TracerProviderInner would drop as well and can shutdown the span processor. 🤔 Edit: However, it might actually be more intuitive with the way it works in this PR, because it's easy to see when the TracerProvider is dropped. Which means it's also easy to see when span processors will shutdown. |
Co-authored-by: Jan Kühle <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too familar with async, but the rest changes looks good. Nice job.
Ok can merge for now to keep things going. If there are clearer name ideas or feedback about how to improve this we can add in subsequent PR's 👍 |
@frigus02 if you want to be a member of the otel org (requirements) I'm happy to sponsor you, just have open an issue here to get added. |
Yeah, why not. The responsibilities are reasonable. Sounds like I need one more sponsor. @TommyCpp would you be up for that? 🙂 |
@frigus02 More than happy to 😄! Just added me as sponsor in the issue. |
In order to address the issues raised in #225, the following changes are proposed:
Because
spawn_blocking
will now block the program from exiting until the future completes, add a drop guard to pipeline install methods which uninstalls the global tracer provider (and in the otlp case in the future also the meter provider) so the exporters properly shut down, causing the future to resolve.Alternatives considered here could be:
a) let users unset the global(s) themselves (but they would likely forget, causing the same blocking behavior)
b) return a method that could be called instead of a drop guard (similar problems)
c) Current impl uses custom uninstall objects per exporter to make the struct
opentelemetry_jaeger::Uninstall
rather than a more genericglobal::TracerProviderGuard
orglobal::UninstallPipeline
, alternative name suggestions could be helpful if they improve clarity.c) Return a pipeline object that contains a
Tracer
, and could be the drop guard itself, maybe cleaner API but less explicit, possibly a better choice.Because
Tracer
s currently retain strong references to theirTracerProvider
, and shutdown is called on the inner's drop, it is still never called. To fix this, update theTracer
to instead contain aWeak<TracerProviderInner>
reference, which can be upgraded when needed. This does force other logic like span creation to consider the case where theTracerProvider
has been dropped on shutdown, but making this case explicit seems beneficial in most cases.Additional clarity/consistency tweaks:
global::set_provider
toglobal::set_tracer_provider
global::trace_provider
toglobal::tracer_provider
ProviderInner
toTracerProviderInner
Fixes #225