-
Notifications
You must be signed in to change notification settings - Fork 398
feat: add process tags to traces #5033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… This is still missing memoization and additional tests.
|
Thank you for updating Change log entry section 👏 Visited at: 2025-11-14 09:35:11 UTC |
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage 🔗 Commit SHA: 23d9769 | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
| def tag_process_tags! | ||
| return unless trace.experimental_propagate_process_tags_enabled | ||
| process_tags = Core::Environment::Process.formatted_process_tags_k1_v1 | ||
| return if process_tags.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is impossible right? If so, we can remove it, as it would give us a false sense of uncertainty here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I fixed it in 8dae705 by just removing the check in process tags, but let me know if you spot issues with it!
…he payload has the process tag only when the feature is enabled.
…versions so this fixes that.
Co-authored-by: Marco Costa <[email protected]>
BenchmarksBenchmark execution time: 2025-11-14 23:15:00 Comparing candidate commit 23d9769 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics. |
| format! | ||
| expect(first_span.meta).to include('_dd.tags.process') | ||
| expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized) | ||
| # TODO figure out if we need an assertion for the value, ie |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcotc - do you think there's value in asserting for the values of the tag? Or is the test in process_spec enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are doing with expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized) seems good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't test realistic values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing to test here is that it's respecting the configuring option, which you did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing to test here is that it's respecting the configuring option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! In that case it doesn't seem like I need to make any changes to the assertions then?
Typing analysisNote: Ignored files are excluded from the next sections. Untyped methodsThis PR introduces 1 partially typed method. It increases the percentage of typed methods from 54.67% to 54.8% (+0.13%). Partially typed methods (+1-0)❌ Introduced:If you believe a method or an attribute is rightfully untyped or partially typed, you can add |
…uby conflict with sqlite and it is not needed for this test
lib/datadog/core/normalizer.rb
Outdated
| # Invalid characters are replaced with an underscore | ||
| normalized_value.gsub!(INVALID_TAG_CHARACTERS, '_') | ||
| # Merge consecutive underscores with a single underscore | ||
| normalized_value.squeeze!('_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge squeeze with the gsub above by changing the regex in the gsub to match one or more characters, instead of just one (regex +).
This saves us a string operation and string copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this is still needed if there's a conditional check now to see if there are still double underscores: __?
normalized_value.sub!(LEADING_INVALID_CHARS, "")
normalized_value.sub!(TRAILING_UNDERSCORES, "")
normalized_value.squeeze!('_') if normalized_value.include?('__')
```
lib/datadog/core/normalizer.rb
Outdated
| def self.normalize(original_value) | ||
| return "" if original_value.nil? || original_value.to_s.strip.empty? | ||
|
|
||
| # Removes whitespaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this comment, and the one for downcase, since they only capture what the code already does (and the code is not ambiguous).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up moving all the code comments up in a66e635, to clarify the order of operations based on the Trace Agent logic.
lib/datadog/core/normalizer.rb
Outdated
| # Merge consecutive underscores with a single underscore | ||
| normalized_value.squeeze!('_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Merge consecutive underscores with a single underscore | |
| normalized_value.squeeze!('_') |
Actually I don't believe we're supposed to do the squeeze (reduce multiple _ characters to a single character). Or at least the spec doesn't say to do so. Here the Python tracer leaves repeated underscores:
We should remove this to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the Trace Agent, which has a test case seen here: https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize_test.go#L33
{in: "contiguous_____underscores", out: "contiguous_underscores"},
It looks like the Trace Agent merges them. I'll start a separate thread since I think we want this all to be consistent.
lib/datadog/core/normalizer.rb
Outdated
| normalized_value.sub!(LEADING_INVALID_CHARS, "") | ||
| normalized_value.sub!(TRAILING_UNDERSCORES, "") | ||
| normalized_value.squeeze!('_') | ||
| normalized_value = normalized_value[MAX_CHARACTER_LENGTH] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a value of having range when it could be normalized_value[0, 200]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, so I removed the range approach.
I played around with a few things here: 4747259 and now it looks like this:
normalized_value.slice!(MAX_CHARACTER_LENGTH..-1) if normalized_value.length > MAX_CHARACTER_LENGTH
(the conditional portion should help skip this operation if the text is small enough)
Let me know if this is more in line with what you're thinking of!
| @serialized: untyped | ||
|
|
||
| def self?.entrypoint_workdir: () -> untyped | ||
|
|
||
| def self?.entrypoint_type: () -> untyped | ||
|
|
||
| def self?.entrypoint_name: () -> untyped | ||
|
|
||
| def self?.entrypoint_basedir: () -> untyped | ||
| def self?.serialized_kv_helper: (untyped key, untyped value) -> ::String | ||
| def self?.serialized: () -> untyped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Could you please type it. You can use Codex, it's good at it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Thanks for pointing this out! Addressed in adfa416!
sig/datadog/core/normalizer.rbs
Outdated
| module Core | ||
| module Normalizer | ||
| INVALID_TAG_CHARACTERS: ::Regexp | ||
| def self.normalize: (untyped original_value) -> ("" | untyped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
untyped will cover everything, but still, it's not untyped, it's a ::String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in adfa416!
Co-authored-by: Sergey Fedorov <[email protected]>
Co-authored-by: Sergey Fedorov <[email protected]>
| tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_WORKDIR, entrypoint_workdir) if entrypoint_workdir | ||
| tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_NAME, entrypoint_name) if entrypoint_name | ||
| tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_BASEDIR, entrypoint_basedir) if entrypoint_basedir | ||
| tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_TYPE, entrypoint_type) if entrypoint_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to specify namespacing in Ruby up until the common point between: the current class or module we are in; and the object we want to reference.
In this case, we are in Datadog::Core::Environment::Process and want to reference Datadog::Core::Environment::Ext::TAG_ENTRYPOINT_WORKDIR.
We can remove the prefix namespace that is identical. For example Ext::TAG_ENTRYPOINT_WORKDIR will work here.
BUT, Ruby namespace resolution is very lenient, and we will try to match Ext (from Ext::TAG_ENTRYPOINT_WORKDIR), in order, to: Datadog::Core::Environment::Process::Ext, Datadog::Core::Environment::Ext, Datadog::Core::Ext, Datadog::Ext, and ::Ext.
This is important because the namespace matching doesn't try to match the complete Ext::TAG_ENTRYPOINT_WORKDIR path; it only tries to match the first token you provided: the Ext in Ext::TAG_ENTRYPOINT_WORKDIR.
And because more than one of these locations in the possible search logic are realistic matches, we should be a bit more specific than Ext::TAG_ENTRYPOINT_WORKDIR.
A good practice is to stop at the closet common namespace location. In this case, it would be the Environment. So I suggest using Environment::Ext::TAG_ENTRYPOINT_WORKDIR (and the equivalent for the other constants) here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! I'll keep this in mind going forward!
Addressed in: 31d9796
| # Returns the entrypoint type of the process | ||
| # @return [String] the type of the process, which is fixed in Ruby | ||
| def entrypoint_type | ||
| Core::Environment::Ext::PROCESS_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove Core:: from this constant access (see comment in def serialized).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this note! see 31d9796!
| # - Trailing underscores are removed | ||
| # - Consecutive underscores are merged into a single underscore | ||
| # - Maximum length is 200 characters | ||
| def self.normalize(original_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how many operations happen inside this method, I recommend adding a "fast-case", where we do some checks and return immediately if the provided original_value is already valid.
This suggestion is equivalent to the early return by the agent here.
I suggest trying to use a regular expression, instead of implementing the agent's isNormalizedASCIITag in Ruby, since Ruby code is slower than Go code, but Ruby regex is pretty fast.
Something like:
return original_value if original_value.size <= MAX_CHARACTER_LENGTH && original_value.matches?(VALID_ASCII_TAG)The hypothetical VALID_ASCII_TAG doesn't have to catch all valid cases: it's a trade-off between matching most valid tags vs making the regex complicated and slow. As long as it never matches invalid tags, it's all good.
| TRAILING_UNDERSCORES = %r{_++\z} | ||
| MAX_CHARACTER_LENGTH = 200 | ||
|
|
||
| # Based on https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize.go#L131 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general here: do we need NormalizeTag or NormalizeTagValue for process tags? https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize.go#L120-L129
What does this PR do?
The goal of AIDM-253 is to add process tags to the trace payloads.
After this gets merged, the next step is to add it for the other products.
To run the tests in docker
Main tests:
Motivation:
We're trying to add process tags to various payloads so they can be used in different use cases.
Note I still want to try adding server type but I'll have to tackle that in a separate PR.
Change log entry
Yes. Add process tags to the trace payloads.
Additional Notes:
How to test the change?