Skip to content

Conversation

@wantsui
Copy link
Collaborator

@wantsui wantsui commented Nov 7, 2025

What does this PR do?

The goal of AIDM-253 is to add process tags to the trace payloads.

After this gets merged, the next step is to add it for the other products.

To run the tests in docker

docker compose run --rm tracer-3.3 /bin/bash
bundle exec rake compile
bundle exec rake test:core_with_rails

Main tests:

BUNDLE_GEMFILE=/app/gemfiles/ruby_3.3_rails8.gemfile bundle exec rspec spec/datadog/core/environment/process_spec.rb
bundle exec rspec spec/datadog/tracing/transport/trace_formatter_spec.rb
bundle exec rspec spec/datadog/core/normalizer_spec.rb
bundle exec rspec spec/datadog/core/configuration/settings_spec.rb

Motivation:

We're trying to add process tags to various payloads so they can be used in different use cases.

Note I still want to try adding server type but I'll have to tackle that in a separate PR.

Change log entry

Yes. Add process tags to the trace payloads.

Additional Notes:

How to test the change?

… This is still missing memoization and additional tests.
@github-actions github-actions bot added core Involves Datadog core libraries tracing labels Nov 7, 2025
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

Thank you for updating Change log entry section 👏

Visited at: 2025-11-14 09:35:11 UTC

@datadog-official
Copy link

datadog-official bot commented Nov 7, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 100.61%
Total Coverage: 98.50% (+0.05%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 23d9769 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@wantsui wantsui added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Nov 7, 2025
def tag_process_tags!
return unless trace.experimental_propagate_process_tags_enabled
process_tags = Core::Environment::Process.formatted_process_tags_k1_v1
return if process_tags.empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is impossible right? If so, we can remove it, as it would give us a false sense of uncertainty here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed it in 8dae705 by just removing the check in process tags, but let me know if you spot issues with it!

@pr-commenter
Copy link

pr-commenter bot commented Nov 10, 2025

Benchmarks

Benchmark execution time: 2025-11-14 23:15:00

Comparing candidate commit 23d9769 in PR branch add-process-tags-to-tracing with baseline commit 49cee89 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

format!
expect(first_span.meta).to include('_dd.tags.process')
expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized)
# TODO figure out if we need an assertion for the value, ie
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcotc - do you think there's value in asserting for the values of the tag? Or is the test in process_spec enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are doing with expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized) seems good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't test realistic values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing to test here is that it's respecting the configuring option, which you did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing to test here is that it's respecting the configuring option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In that case it doesn't seem like I need to make any changes to the assertions then?

@github-actions
Copy link

github-actions bot commented Nov 11, 2025

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 1 partially typed method. It increases the percentage of typed methods from 54.67% to 54.8% (+0.13%).

Partially typed methods (+1-0)Introduced:
sig/datadog/core/normalizer.rbs:8
└── def self.normalize: (untyped original_value) -> ::String

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

@wantsui wantsui marked this pull request as ready for review November 12, 2025 15:31
@wantsui wantsui requested review from a team as code owners November 12, 2025 15:31
@wantsui wantsui requested a review from mabdinur November 12, 2025 15:31
# Invalid characters are replaced with an underscore
normalized_value.gsub!(INVALID_TAG_CHARACTERS, '_')
# Merge consecutive underscores with a single underscore
normalized_value.squeeze!('_')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge squeeze with the gsub above by changing the regex in the gsub to match one or more characters, instead of just one (regex +).
This saves us a string operation and string copy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is still needed if there's a conditional check now to see if there are still double underscores: __?

        normalized_value.sub!(LEADING_INVALID_CHARS, "")
        normalized_value.sub!(TRAILING_UNDERSCORES, "")
        normalized_value.squeeze!('_') if normalized_value.include?('__')
        ```

def self.normalize(original_value)
return "" if original_value.nil? || original_value.to_s.strip.empty?

# Removes whitespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this comment, and the one for downcase, since they only capture what the code already does (and the code is not ambiguous).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up moving all the code comments up in a66e635, to clarify the order of operations based on the Trace Agent logic.

Comment on lines 23 to 24
# Merge consecutive underscores with a single underscore
normalized_value.squeeze!('_')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Merge consecutive underscores with a single underscore
normalized_value.squeeze!('_')

Actually I don't believe we're supposed to do the squeeze (reduce multiple _ characters to a single character). Or at least the spec doesn't say to do so. Here the Python tracer leaves repeated underscores:

https://github.com/DataDog/dd-trace-py/pull/15146/files#diff-734f80f7c77b609471c1ca40c131a2604c2f50647ad9cf264863e2016f07b209R23

We should remove this to be consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the Trace Agent, which has a test case seen here: https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize_test.go#L33

{in: "contiguous_____underscores", out: "contiguous_underscores"},

It looks like the Trace Agent merges them. I'll start a separate thread since I think we want this all to be consistent.

normalized_value.sub!(LEADING_INVALID_CHARS, "")
normalized_value.sub!(TRAILING_UNDERSCORES, "")
normalized_value.squeeze!('_')
normalized_value = normalized_value[MAX_CHARACTER_LENGTH]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a value of having range when it could be normalized_value[0, 200]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, so I removed the range approach.

I played around with a few things here: 4747259 and now it looks like this:

normalized_value.slice!(MAX_CHARACTER_LENGTH..-1) if normalized_value.length > MAX_CHARACTER_LENGTH

(the conditional portion should help skip this operation if the text is small enough)

Let me know if this is more in line with what you're thinking of!

Comment on lines 5 to 15
@serialized: untyped

def self?.entrypoint_workdir: () -> untyped

def self?.entrypoint_type: () -> untyped

def self?.entrypoint_name: () -> untyped

def self?.entrypoint_basedir: () -> untyped
def self?.serialized_kv_helper: (untyped key, untyped value) -> ::String
def self?.serialized: () -> untyped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Could you please type it. You can use Codex, it's good at it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thanks for pointing this out! Addressed in adfa416!

module Core
module Normalizer
INVALID_TAG_CHARACTERS: ::Regexp
def self.normalize: (untyped original_value) -> ("" | untyped)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untyped will cover everything, but still, it's not untyped, it's a ::String?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in adfa416!

Comment on lines 54 to 57
tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_WORKDIR, entrypoint_workdir) if entrypoint_workdir
tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_NAME, entrypoint_name) if entrypoint_name
tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_BASEDIR, entrypoint_basedir) if entrypoint_basedir
tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_TYPE, entrypoint_type) if entrypoint_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to specify namespacing in Ruby up until the common point between: the current class or module we are in; and the object we want to reference.
In this case, we are in Datadog::Core::Environment::Process and want to reference Datadog::Core::Environment::Ext::TAG_ENTRYPOINT_WORKDIR.

We can remove the prefix namespace that is identical. For example Ext::TAG_ENTRYPOINT_WORKDIR will work here.

BUT, Ruby namespace resolution is very lenient, and we will try to match Ext (from Ext::TAG_ENTRYPOINT_WORKDIR), in order, to: Datadog::Core::Environment::Process::Ext, Datadog::Core::Environment::Ext, Datadog::Core::Ext, Datadog::Ext, and ::Ext.
This is important because the namespace matching doesn't try to match the complete Ext::TAG_ENTRYPOINT_WORKDIR path; it only tries to match the first token you provided: the Ext in Ext::TAG_ENTRYPOINT_WORKDIR.
And because more than one of these locations in the possible search logic are realistic matches, we should be a bit more specific than Ext::TAG_ENTRYPOINT_WORKDIR.

A good practice is to stop at the closet common namespace location. In this case, it would be the Environment. So I suggest using Environment::Ext::TAG_ENTRYPOINT_WORKDIR (and the equivalent for the other constants) here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I'll keep this in mind going forward!
Addressed in: 31d9796

# Returns the entrypoint type of the process
# @return [String] the type of the process, which is fixed in Ruby
def entrypoint_type
Core::Environment::Ext::PROCESS_TYPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove Core:: from this constant access (see comment in def serialized).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this note! see 31d9796!

# - Trailing underscores are removed
# - Consecutive underscores are merged into a single underscore
# - Maximum length is 200 characters
def self.normalize(original_value)
Copy link
Member

@marcotc marcotc Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how many operations happen inside this method, I recommend adding a "fast-case", where we do some checks and return immediately if the provided original_value is already valid.
This suggestion is equivalent to the early return by the agent here.

I suggest trying to use a regular expression, instead of implementing the agent's isNormalizedASCIITag in Ruby, since Ruby code is slower than Go code, but Ruby regex is pretty fast.

Something like:

return original_value if original_value.size <= MAX_CHARACTER_LENGTH && original_value.matches?(VALID_ASCII_TAG)

The hypothetical VALID_ASCII_TAG doesn't have to catch all valid cases: it's a trade-off between matching most valid tags vs making the regex complicated and slow. As long as it never matches invalid tags, it's all good.

TRAILING_UNDERSCORES = %r{_++\z}
MAX_CHARACTER_LENGTH = 200

# Based on https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize.go#L131
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos core Involves Datadog core libraries tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants