feat: add process tags to traces #5033

wantsui · 2025-11-07T22:46:53Z

What does this PR do?

The goal of AIDM-253 is to add process tags to the trace payloads.

After this gets merged, the next step is to add it for the other products.

To run the tests in docker

docker compose run --rm tracer-3.3 /bin/bash
bundle exec rake compile
bundle exec rake test:core_with_rails

Main tests:

BUNDLE_GEMFILE=/app/gemfiles/ruby_3.3_rails8.gemfile bundle exec rspec spec/datadog/core/environment/process_spec.rb
bundle exec rspec spec/datadog/tracing/transport/trace_formatter_spec.rb
bundle exec rspec spec/datadog/core/normalizer_spec.rb
bundle exec rspec spec/datadog/core/configuration/settings_spec.rb

Motivation:

We're trying to add process tags to various payloads so they can be used in different use cases.

Note I still want to try adding server type but I'll have to tackle that in a separate PR.

Change log entry

Yes. Add process tags to the trace payloads.

Additional Notes:

How to test the change?

… This is still missing memoization and additional tests.

github-actions · 2025-11-07T22:47:06Z

Thank you for updating Change log entry section 👏

^{Visited at: 2025-11-14 09:35:11 UTC}

datadog-official · 2025-11-07T22:51:11Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
• Patch Coverage: 100.61%
• Total Coverage: 98.50% (+0.05%)

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 23d9769 | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

lib/datadog/core/environment/process.rb

lib/datadog/tracing/configuration/settings.rb

marcotc · 2025-11-10T21:12:26Z

lib/datadog/tracing/transport/trace_formatter.rb

+        def tag_process_tags!
+          return unless trace.experimental_propagate_process_tags_enabled
+          process_tags = Core::Environment::Process.formatted_process_tags_k1_v1
+          return if process_tags.empty?


This is impossible right? If so, we can remove it, as it would give us a false sense of uncertainty here.

I think I fixed it in 8dae705 by just removing the check in process tags, but let me know if you spot issues with it!

…he payload has the process tag only when the feature is enabled.

…versions so this fixes that.

Co-authored-by: Marco Costa <[email protected]>

pr-commenter · 2025-11-10T22:17:05Z

Benchmarks

Benchmark execution time: 2025-11-14 23:15:00

Comparing candidate commit 23d9769 in PR branch add-process-tags-to-tracing with baseline commit 49cee89 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

wantsui · 2025-11-11T16:29:31Z

spec/datadog/tracing/transport/trace_formatter_spec.rb

+          format!
+          expect(first_span.meta).to include('_dd.tags.process')
+          expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized)
+          # TODO figure out if we need an assertion for the value, ie


@marcotc - do you think there's value in asserting for the values of the tag? Or is the test in process_spec enough?

What you are doing with expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized) seems good to me.

I wouldn't test realistic values.

The main thing to test here is that it's respecting the configuring option, which you did.

The main thing to test here is that it's respecting the configuring option.

Thanks! In that case it doesn't seem like I need to make any changes to the assertions then?

github-actions · 2025-11-11T18:43:19Z

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 1 partially typed method. It increases the percentage of typed methods from 54.67% to 54.8% (+0.13%).

Partially typed methods (+1-0)

❌ Introduced:

sig/datadog/core/normalizer.rbs:8
└── def self.normalize: (untyped original_value) -> ::String

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

…uby conflict with sqlite and it is not needed for this test

lib/datadog/core/environment/ext.rb

lib/datadog/core/normalizer.rb

marcotc · 2025-11-12T19:31:16Z

lib/datadog/core/normalizer.rb

+        # Invalid characters are replaced with an underscore
+        normalized_value.gsub!(INVALID_TAG_CHARACTERS, '_')
+        # Merge consecutive underscores with a single underscore
+        normalized_value.squeeze!('_')


Let's merge squeeze with the gsub above by changing the regex in the gsub to match one or more characters, instead of just one (regex +).
This saves us a string operation and string copy.

Do you think this is still needed if there's a conditional check now to see if there are still double underscores: __?

normalized_value.sub!(LEADING_INVALID_CHARS, "") normalized_value.sub!(TRAILING_UNDERSCORES, "") normalized_value.squeeze!('_') if normalized_value.include?('__') ```

lib/datadog/core/normalizer.rb

spec/datadog/core/environment/process_spec.rb

lib/datadog/core/configuration/settings.rb

lib/datadog/core/environment/process.rb

marcotc · 2025-11-12T20:34:26Z

lib/datadog/core/normalizer.rb

+      def self.normalize(original_value)
+        return "" if original_value.nil? || original_value.to_s.strip.empty?
+
+        # Removes whitespaces


You can remove this comment, and the one for downcase, since they only capture what the code already does (and the code is not ambiguous).

I ended up moving all the code comments up in a66e635, to clarify the order of operations based on the Trace Agent logic.

lib/datadog/core/configuration/settings.rb

lib/datadog/core/normalizer.rb

tlhunter · 2025-11-12T22:51:43Z

lib/datadog/core/normalizer.rb

+        # Merge consecutive underscores with a single underscore
+        normalized_value.squeeze!('_')


Suggested change

# Merge consecutive underscores with a single underscore

normalized_value.squeeze!('_')

Actually I don't believe we're supposed to do the squeeze (reduce multiple _ characters to a single character). Or at least the spec doesn't say to do so. Here the Python tracer leaves repeated underscores:

https://github.com/DataDog/dd-trace-py/pull/15146/files#diff-734f80f7c77b609471c1ca40c131a2604c2f50647ad9cf264863e2016f07b209R23

We should remove this to be consistent.

I was following the Trace Agent, which has a test case seen here: https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize_test.go#L33

{in: "contiguous_____underscores", out: "contiguous_underscores"},

It looks like the Trace Agent merges them. I'll start a separate thread since I think we want this all to be consistent.

… environment variable.

lib/datadog/core/normalizer.rb

Strech · 2025-11-14T09:41:18Z

lib/datadog/core/normalizer.rb

+        normalized_value.sub!(LEADING_INVALID_CHARS, "")
+        normalized_value.sub!(TRAILING_UNDERSCORES, "")
+        normalized_value.squeeze!('_')
+        normalized_value = normalized_value[MAX_CHARACTER_LENGTH]


Is there a value of having range when it could be normalized_value[0, 200]?

Not really, so I removed the range approach.

I played around with a few things here: 4747259 and now it looks like this:

normalized_value.slice!(MAX_CHARACTER_LENGTH..-1) if normalized_value.length > MAX_CHARACTER_LENGTH

(the conditional portion should help skip this operation if the text is small enough)

Let me know if this is more in line with what you're thinking of!

lib/datadog/tracing/transport/trace_formatter.rb

sig/datadog/core/environment/ext.rbs

Strech · 2025-11-14T09:43:12Z

sig/datadog/core/environment/process.rbs

+        @serialized: untyped
+
+        def self?.entrypoint_workdir: () -> untyped
+
+        def self?.entrypoint_type: () -> untyped
+
+        def self?.entrypoint_name: () -> untyped
+
+        def self?.entrypoint_basedir: () -> untyped
+        def self?.serialized_kv_helper: (untyped key, untyped value) -> ::String
+        def self?.serialized: () -> untyped


Minor: Could you please type it. You can use Codex, it's good at it

Yes! Thanks for pointing this out! Addressed in adfa416!

Strech · 2025-11-14T09:43:42Z

sig/datadog/core/normalizer.rbs

+  module Core
+    module Normalizer
+      INVALID_TAG_CHARACTERS: ::Regexp
+      def self.normalize: (untyped original_value) -> ("" | untyped)


untyped will cover everything, but still, it's not untyped, it's a ::String?

Addressed in adfa416!

Co-authored-by: Sergey Fedorov <[email protected]>

lib/datadog/core/normalizer.rb

…d data error.

…nd to 200 characters

marcotc · 2025-11-14T21:50:38Z

lib/datadog/core/environment/process.rb

+          tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_WORKDIR, entrypoint_workdir) if entrypoint_workdir
+          tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_NAME, entrypoint_name) if entrypoint_name
+          tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_BASEDIR, entrypoint_basedir) if entrypoint_basedir
+          tags << serialized_kv_helper(Core::Environment::Ext::TAG_ENTRYPOINT_TYPE, entrypoint_type) if entrypoint_type


We only need to specify namespacing in Ruby up until the common point between: the current class or module we are in; and the object we want to reference.
In this case, we are in Datadog::Core::Environment::Process and want to reference Datadog::Core::Environment::Ext::TAG_ENTRYPOINT_WORKDIR.

We can remove the prefix namespace that is identical. For example Ext::TAG_ENTRYPOINT_WORKDIR will work here.

BUT, Ruby namespace resolution is very lenient, and we will try to match Ext (from Ext::TAG_ENTRYPOINT_WORKDIR), in order, to: Datadog::Core::Environment::Process::Ext, Datadog::Core::Environment::Ext, Datadog::Core::Ext, Datadog::Ext, and ::Ext.
This is important because the namespace matching doesn't try to match the complete Ext::TAG_ENTRYPOINT_WORKDIR path; it only tries to match the first token you provided: the Ext in Ext::TAG_ENTRYPOINT_WORKDIR.
And because more than one of these locations in the possible search logic are realistic matches, we should be a bit more specific than Ext::TAG_ENTRYPOINT_WORKDIR.

A good practice is to stop at the closet common namespace location. In this case, it would be the Environment. So I suggest using Environment::Ext::TAG_ENTRYPOINT_WORKDIR (and the equivalent for the other constants) here.

Thanks for the explanation! I'll keep this in mind going forward!
Addressed in: 31d9796

marcotc · 2025-11-14T21:51:13Z

lib/datadog/core/environment/process.rb

+        # Returns the entrypoint type of the process
+        # @return [String] the type of the process, which is fixed in Ruby
+        def entrypoint_type
+          Core::Environment::Ext::PROCESS_TYPE


We can remove Core:: from this constant access (see comment in def serialized).

Thanks for this note! see 31d9796!

marcotc · 2025-11-14T22:51:45Z

lib/datadog/core/normalizer.rb

+      # - Trailing underscores are removed
+      # - Consecutive underscores are merged into a single underscore
+      # - Maximum length is 200 characters
+      def self.normalize(original_value)


Given how many operations happen inside this method, I recommend adding a "fast-case", where we do some checks and return immediately if the provided original_value is already valid.
This suggestion is equivalent to the early return by the agent here.

I suggest trying to use a regular expression, instead of implementing the agent's isNormalizedASCIITag in Ruby, since Ruby code is slower than Go code, but Ruby regex is pretty fast.

Something like:

return original_value if original_value.size <= MAX_CHARACTER_LENGTH && original_value.matches?(VALID_ASCII_TAG)

The hypothetical VALID_ASCII_TAG doesn't have to catch all valid cases: it's a trade-off between matching most valid tags vs making the regex complicated and slow. As long as it never matches invalid tags, it's all good.

marcotc · 2025-11-14T22:53:34Z

lib/datadog/core/normalizer.rb

+      TRAILING_UNDERSCORES = %r{_++\z}
+      MAX_CHARACTER_LENGTH = 200
+
+      # Based on https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize.go#L131


In general here: do we need NormalizeTag or NormalizeTagValue for process tags? https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize.go#L120-L129

Add initial attempt at adding process related tags on trace payloads.…

1d8bab2

… This is still missing memoization and additional tests.

github-actions bot added core Involves Datadog core libraries tracing labels Nov 7, 2025

wantsui added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Nov 7, 2025

Add test for multiple calls to the formatter tags

58592a3

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/tracing/configuration/settings.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

wantsui and others added 6 commits November 10, 2025 16:29

Add tests for trace formatter spec to assert that the first span of t…

7dc9184

…he payload has the process tag only when the feature is enabled.

it turns out you cannot just pin things to rails 7 due to newer ruby …

cad26a6

…versions so this fixes that.

Update lib/datadog/core/environment/process.rb

f31440a

Co-authored-by: Marco Costa <[email protected]>

fix string and rename formatted_process_tags_k1_v1 to serialized

cfec602

remove unneeded line

8dae705

remove server type for now until more research is done

055586f

Add new tag normalizer logic following the trace agent.

cacb500

wantsui commented Nov 11, 2025

View reviewed changes

wantsui added 2 commits November 11, 2025 13:38

lint fix

7661a3f

add missing files from prototype command

7825940

wantsui added 3 commits November 11, 2025 13:47

Add missing constants to ext rbs file

5de6efd

jruby fix for the process spec

f5ca84a

remove the active record during rails creation because it caused a jr…

9ad5be5

…uby conflict with sqlite and it is not needed for this test

wantsui mentioned this pull request Nov 11, 2025

swap out the existing headers normalization logic with the tag normalizer #5041

Draft

wantsui requested a review from vandonr November 12, 2025 15:27

wantsui marked this pull request as ready for review November 12, 2025 15:31

wantsui requested review from a team as code owners November 12, 2025 15:31

wantsui requested a review from mabdinur November 12, 2025 15:31

marcotc reviewed Nov 12, 2025

View reviewed changes

tlhunter reviewed Nov 12, 2025

View reviewed changes

lib/datadog/core/normalizer.rb Outdated Show resolved Hide resolved

tlhunter reviewed Nov 12, 2025

View reviewed changes

wantsui and others added 3 commits November 13, 2025 16:14

Bring tag normalization to 1:1 parity with the Trace Agent

a66e635

Add changes from code review around comments and add test for the new…

ec1e930

… environment variable.

Merge branch 'master' into add-process-tags-to-tracing

4073ab5

Strech reviewed Nov 14, 2025

View reviewed changes

wantsui and others added 5 commits November 14, 2025 13:47

Remove the rails gem install from process_spec

22a3680

Remove 1 sec delay.

5784833

Update sig/datadog/core/environment/ext.rbs

2b705e3

Co-authored-by: Sergey Fedorov <[email protected]>

Update lib/datadog/tracing/transport/trace_formatter.rb

e3deb4c

Co-authored-by: Sergey Fedorov <[email protected]>

Add improvements for long strings.

4747259

github-advanced-security bot found potential problems Nov 14, 2025

View reviewed changes

lib/datadog/core/normalizer.rb Fixed Show fixed Hide fixed

wantsui added 4 commits November 14, 2025 16:09

small improvement to the whitespace removal.

41bc6c0

Add upper bound to regex to avoid the polynomial regex on uncontrolle…

c3605c0

…d data error.

Change untyped to string.

adfa416

Use possessive quantifiers in regex instead of limiting the upper bou…

0dff545

…nd to 200 characters

marcotc reviewed Nov 14, 2025

View reviewed changes

wantsui added 4 commits November 14, 2025 16:54

Fix types for steep check command

7d8da40

Remove unneeded Core prefix

31d9796

lint fixes

3672a8a

restructure folder lookup so it works on the macos ci tests

23d9769

marcotc reviewed Nov 14, 2025

View reviewed changes

		# Merge consecutive underscores with a single underscore
		normalized_value.squeeze!('_')

feat: add process tags to traces #5033

Are you sure you want to change the base?

feat: add process tags to traces #5033

Uh oh!

Conversation

wantsui commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pr-commenter bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Typing analysis

Untyped methods

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

wantsui commented Nov 7, 2025 •

edited

Loading

github-actions bot commented Nov 7, 2025 •

edited

Loading

datadog-official bot commented Nov 7, 2025 •

edited

Loading

pr-commenter bot commented Nov 10, 2025 •

edited

Loading

github-actions bot commented Nov 11, 2025 •

edited

Loading

marcotc Nov 14, 2025 •

edited

Loading