-
Notifications
You must be signed in to change notification settings - Fork 798
opentelemetry-sdk: Implement tracer configurator #4861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return environ.get(OTEL_PYTHON_ID_GENERATOR, _DEFAULT_ID_GENERATOR) | ||
|
|
||
|
|
||
| def _get_tracer_configurator() -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the configuration via env var to match what we are doing with the others TraceProvider parameters, since this is in development I can drop it. For my use case I can just use _OTelSDKConfigurator._configure.
| set_status_on_exception: bool = True, | ||
| ) -> trace_api.Span: | ||
| if not self._is_enabled: | ||
| return INVALID_SPAN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key behavior of tracer configurator is:
If a Tracer is disabled, it MUST behave equivalently to a No-op Tracer.".
Where the behavior of a noop tracer is:
However, there is one important exception to this general rule, and that is related to propagation of a SpanContext: The API MUST return a non-recording Span with the SpanContext in the parent Context (whether explicitly given or implicit current). If the Span in the parent Context is already non-recording, it SHOULD be returned directly without instantiating a new Span. If the parent Context contains no Span, an empty non-recording Span MUST be returned instead (i.e., having a SpanContext with all-zero Span and Trace IDs, empty Tracestate, and unsampled TraceFlags). This means that a SpanContext that has been provided by a configured Propagator will be propagated through to any child span and ultimately also Inject, but that no new SpanContexts will be created.
Does INVALID_SPAN here follow this behavior? In the java implementation, we use the noop tracer when a tracer is disabled, which returns a non-recording span to propagator the span context here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NoOpTracer in Python has the very same behavior of returning INVALID_SPAN, that is a NonRecordingSpan:
INVALID_SPAN = NonRecordingSpan(INVALID_SPAN_CONTEXT)
That may not match what the spec says though. Thanks for the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the point here that the INVALID_SPAN from a disabled tracer will break trace propagation/parenting for any spans created under the disabled tracer's span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Well, that's what I assume without looking deeply at the python code.
Suppose you have a trace:
-> (root) span A, from scope A
|-> span B, from scope B
|-> span C, from scope C
If I disable scope B, then I should see:
-> (root) span A, from scope A
|-> span C, from scope C
But unless I'm misunderstanding, in python the behavior is something like:
-> (root) span A, from scope A
-> span C, from scope C
Where span C's parent is an invalid span, and so results in a broken trace.
Please correct me if I'm misunderstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the NoOpTracer and the not enabled sdk tracer to not return an invalid span but a NonRecordingSpan with the proper context. Could use more tests I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed the part of reusing the very same NonRecordingSpan, updated and added more tests.
Implement part of the tracing SDK spec to configure the tracers https://opentelemetry.io/docs/specs/otel/trace/sdk/#configuration At the moment this adds helper in order to enable or disable a tracer after it has been created. The spec in is development so attributes, helpers and classes are prefixed with underscore. TODO: hook into sdk configuration
…eclarative config schema
b0e8ccb to
9726fec
Compare
| ] | ||
|
|
||
|
|
||
| def _tracer_name_matches_glob( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we'll want to add configurations for meters and loggers maybe we can make this more general and move it somewhere else to just be:
def _scope_name_matches_glob(
glob_pattern: str,
) -> _InstrumentationScopePredicateT:
def inner(scope: InstrumentationScope) -> bool:
return fnmatch.fnmatch(scope.name, glob_pattern)
return inner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make this generic once I have a second user
| ) | ||
| self._cached_tracers: WeakSet[Tracer] = WeakSet() | ||
|
|
||
| def _set_tracer_configurator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required that the tracer config can be updated after creating tracer providers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not required (It's a MAY update in specs https://opentelemetry.io/docs/specs/otel/trace/sdk/#configuration) but that's the part I'm interested about :) The idea is to update the function that computes the TracerConfig on the TracerProvider and then update these new TracerConfigs in the tracers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be cleaner to invert things so each tracer gets the active configurator through a reference to the tracer provider? So you don't have to broadcast the config change to each tracer
I think you could avoid the WeakSet altogether then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change will require to get a new tracerconfig and then check if the tracer is enabled in start_span though. I don't expect that to be slow but the configuration function is provided by the user so we cannot guarantee it'll be fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done the change in 1844f8e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change will require to get a new tracerconfig and then check if the tracer is enabled in
start_spanthough
Interesting point.. does the spec say anything about this being a pure function or when the tracer configurator should be evaluated? I am worried about slowness, but I'll leave it up to you anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specs says it should return quickly and that the configurator should be called on update:
This function is called when a Tracer is first created, and for each outstanding Tracer when a TracerProvider’s TracerConfigurator is updated (if updating is supported). Therefore, it is important that it returns quickly.
So I think now we are drifting a bit from the spec. Let me run some benchmarks to understand the overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed the benchmarks in 5b16f02:
-------------------------------------------------------------------------------------------------------- benchmark: 5 tests --------------------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_simple_start_span_with_tracer_configurator_rules[None] 11.4000 (1.0) 78.0850 (1.96) 12.5320 (1.0) 1.2733 (1.12) 12.3960 (1.0) 0.5780 (1.0) 184;338 79.7955 (1.0) 9169 1
test_simple_start_span_with_tracer_configurator_rules[0] 12.2860 (1.08) 91.2420 (2.29) 13.5666 (1.08) 2.4278 (2.14) 13.3040 (1.07) 0.5940 (1.03) 311;684 73.7102 (0.92) 21168 1
test_simple_start_span_with_tracer_configurator_rules[1] 13.7130 (1.20) 97.1370 (2.44) 15.5183 (1.24) 2.9091 (2.57) 15.0875 (1.22) 0.7425 (1.28) 243;376 64.4400 (0.81) 6844 1
test_simple_start_span_with_tracer_configurator_rules[10] 22.0500 (1.93) 39.8470 (1.0) 23.7657 (1.90) 1.1332 (1.0) 23.5890 (1.90) 0.7828 (1.35) 243;115 42.0774 (0.53) 2855 1
test_simple_start_span_with_tracer_configurator_rules[50] 56.9020 (4.99) 96.8840 (2.43) 59.5743 (4.75) 3.0849 (2.72) 58.9710 (4.76) 1.3700 (2.37) 51;65 16.7858 (0.21) 847 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The None case is when the tracer_provider is None and so we don't go through the TracerConfigurator function at all, the other are the number of rules that are evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since is the function that changes if I decorate it with lru_cache the overhead is gone
-------------------------------------------------------------------------------------------------------- benchmark: 5 tests --------------------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_simple_start_span_with_tracer_configurator_rules[None] 11.5800 (1.0) 62.7350 (1.11) 12.6467 (1.0) 1.3824 (1.08) 12.4890 (1.0) 0.6270 (1.21) 155;263 79.0723 (1.0) 8802 1
test_simple_start_span_with_tracer_configurator_rules[0] 11.7430 (1.01) 60.6950 (1.07) 12.8948 (1.02) 1.3589 (1.06) 12.7530 (1.02) 0.5190 (1.0) 317;556 77.5503 (0.98) 17918 1
test_simple_start_span_with_tracer_configurator_rules[50] 12.0780 (1.04) 56.6670 (1.0) 13.2263 (1.05) 1.9171 (1.50) 13.0615 (1.05) 0.5605 (1.08) 8;15 75.6070 (0.96) 708 1
test_simple_start_span_with_tracer_configurator_rules[10] 12.1520 (1.05) 62.8130 (1.11) 13.5478 (1.07) 1.9299 (1.51) 13.2370 (1.06) 0.6488 (1.25) 94;155 73.8126 (0.93) 2807 1
test_simple_start_span_with_tracer_configurator_rules[1] 12.1630 (1.05) 64.3120 (1.13) 13.4501 (1.06) 1.2799 (1.0) 13.3360 (1.07) 0.6240 (1.20) 162;197 74.3488 (0.94) 7501 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Added that and documented the performance issue and the cache invalidation in case of reuse in 4d13f29
So instead of caching it when the TracerConfigurator changes calculate it when we need to check if the tracer is enabled. This simplify the code quite a bit and hopefully people will use fast tracer configurator.
Description
Implement part of the tracing SDK spec to configure the tracers https://opentelemetry.io/docs/specs/otel/trace/sdk/#configuration
At the moment this adds helper in order to enable or disable a tracer after it has been created.
The spec in is development so attributes, helpers and classes are prefixed with underscore.
TODO:
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Does This PR Require a Contrib Repo Change?
Checklist: