Enable evals with traceloop translate #29

adityamehra · 2025-11-04T23:12:23Z

Set input/put message when calling invocation from traceloop translator
Normalize traceloop.entity.input/output to the format deepeval expects
boilerplate code for docker, k8s deployments etc.

Trace ID: 0cc9f96b87744d348218fdb43f24e0ce
Span ID: dd39354d20b5a622
Flags: 1
LogRecord #1
ObservedTimestamp: 2025-11-05 07:27:44.847511 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
EventName: gen_ai.evaluation.results
Body: Map({"gen_ai.evaluations":[{"gen_ai.evaluation.explanation":"The score is 0.00 because the actual output is completely unbiased, providing a fair and balanced perspective without any discernible bias.","gen_ai.evaluation.name":"bias","gen_ai.evaluation.passed":true,"gen_ai.evaluation.score.label":"Not Biased","gen_ai.evaluation.score.value":0},{"gen_ai.evaluation.explanation":"The score is 0.00 because the output contains no toxic elements, demonstrating a completely neutral and respectful tone.","gen_ai.evaluation.name":"toxicity","gen_ai.evaluation.passed":true,"gen_ai.evaluation.score.label":"Not Toxic","gen_ai.evaluation.score.value":0},{"gen_ai.evaluation.explanation":"The score is 0.00 because the irrelevant statement 'coordinator' does not contribute any useful information regarding the trip planning, hotel, flights, or experiences requested.","gen_ai.evaluation.name":"relevance","gen_ai.evaluation.passed":false,"gen_ai.evaluation.score.label":"Fail","gen_ai.evaluation.score.value":0},{"gen_ai.evaluation.explanation":"The output does not address any of the key facts from the input, such as the trip details, hotel preferences, or flight information. Instead, it provides an unrelated term \"coordinator,\" which does not align with the user's request for travel planning. This indicates a high risk of hallucination and a complete lack of relevant content.","gen_ai.evaluation.name":"hallucination","gen_ai.evaluation.passed":false,"gen_ai.evaluation.score.label":"Hallucinated","gen_ai.evaluation.score.value":0.001406362599724625},{"gen_ai.evaluation.explanation":"The response does not contain any sentiment indicators or a tone that can be evaluated. It simply outputs a term without context or emotional weight, failing to meet the evaluation steps.","gen_ai.evaluation.name":"sentiment","gen_ai.evaluation.passed":true,"gen_ai.evaluation.score.value":0.018319230505582044}]})
Attributes:
     -> event.name: Str(gen_ai.evaluation.results)
     -> gen_ai.operation.name: Str(data_evaluation_results)
     -> traceloop.workflow.name: Str(travel_planner_multi_agent)
     -> gen_ai.request.model: Str(should_continue)
     -> _traceloop_processed: Bool(true)
     -> gen_ai.evaluation.sampled: Bool(true)
     -> gen_ai.step.name: Str(should_continue)
     -> gen_ai.step.type: Str(chain)
     -> gen_ai.workflow.name: Str(travel_planner_multi_agent)
     -> test.synthetic_span_marker: Str(created_in_start_llm)
     -> gen_ai.evaluation.aggregated: Bool(true)
     -> trace_id: Str(aaa9eff9a0c944ebc5e9d9ca18550c2a)
     -> span_id: Str(3f5f1d0f934a6167)

- Deplpyment scripts - env files - requirements.txt These are for traceloop translator and apps to run

91pavan · 2025-11-05T13:12:06Z

...enai-traceloop-translator/src/opentelemetry/util/genai/processor/traceloop_span_processor.py

+
+        # Add test marker to verify lifecycle integrity
+        invocation.attributes["test.synthetic_span_marker"] = "created_in_start_llm"


Probably not needed

Left over from testing. Fixed

91pavan · 2025-11-05T13:14:46Z

...l-genai-traceloop-translator/src/opentelemetry/util/genai/processor/message_reconstructor.py

+        try:
+            from langchain_core.messages import (
+                BaseMessage,
+                HumanMessage,
+                AIMessage,
+                SystemMessage,
+                ToolMessage,
+                FunctionMessage,
+            )


Why are we explicitly trying to import langchain library classes?

It is used to create the actual langchain specific object when reconstructing the input and output messages like HumanMessage and others

91pavan · 2025-11-05T13:15:00Z

...l-genai-traceloop-translator/src/opentelemetry/util/genai/processor/message_reconstructor.py

+                "LangChain not available; message reconstruction skipped. "
+                "Install langchain-core to enable evaluations with Traceloop."


IMO, we need to update this msg

91pavan · 2025-11-05T13:17:40Z

...l-genai-traceloop-translator/src/opentelemetry/util/genai/processor/message_reconstructor.py

+def _convert_normalized_to_langchain(
+    normalized_messages: List[Dict[str, Any]],
+    direction: str
+) -> List[Any]:


we could probably verify if the installed library is langchain and then execute this method, right? else we need to make it more general

Good point but for now let's keep it langchain specific. Otherwise we'll need to verify with the other library. We can add as we go.

91pavan · 2025-11-05T13:18:27Z

...enai-traceloop-translator/src/opentelemetry/util/genai/processor/traceloop_span_processor.py

+            input_messages=input_messages or [],
+            output_messages=output_messages or [],


Will this parameters be okay if there are no evals in the pipeline?

It's okay to have it as it'll be empty, [], and evals library will take care of it

adityamehra · 2025-11-05T17:51:20Z

...enai-traceloop-translator/src/opentelemetry/util/genai/processor/traceloop_span_processor.py

+            # NOW mark the original span as processed (AFTER translation is done)
+            # This prevents infinite recursion while allowing synthetic span creation
+            if hasattr(span, "_attributes"):
+                span._attributes["_traceloop_processed"] = True  # type: ignore[attr-defined]


@91pavan I have moved the marking to this line outside _mutate_span_if_needed and _process_span_translation so can leave the checks inside those methods but the start_llm() and stop_llm() will also get triggered

adityamehra added 2 commits November 4, 2025 14:53

Set input/output messages on invocation to trigger evals

f09b939

- Deplpyment scripts - env files - requirements.txt These are for traceloop translator and apps to run

patch schedule

a18952e

adityamehra requested review from a team as code owners November 4, 2025 23:12

adityamehra added 6 commits November 4, 2025 16:18

revert genai util emitter change

45a1905

fix synthetic span issue

a0f6bf1

fix cron schedule

f08ac73

move to start_llm/stop_llm lifecycle methods

d5aa5e6

normalize input output messages

5002980

fix issue with input/output message reconstruction

8a6d7c3

91pavan reviewed Nov 5, 2025

View reviewed changes

adityamehra commented Nov 5, 2025

View reviewed changes

adityamehra added 3 commits November 5, 2025 11:51

get the latest changes from main.py

2e5901a

adress review comments

b2f1443

remove .deepeval and update .gitignore

076501a

91pavan approved these changes Nov 6, 2025

View reviewed changes

adityamehra mentioned this pull request Nov 6, 2025

feat: Traceloop evals #36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable evals with traceloop translate #29

Enable evals with traceloop translate #29

Uh oh!

adityamehra commented Nov 4, 2025 •

edited

Loading

Uh oh!

91pavan Nov 5, 2025

Uh oh!

adityamehra Nov 5, 2025

Uh oh!

91pavan Nov 5, 2025

Uh oh!

adityamehra Nov 5, 2025 •

edited

Loading

Uh oh!

91pavan Nov 5, 2025

Uh oh!

91pavan Nov 5, 2025

Uh oh!

adityamehra Nov 5, 2025

Uh oh!

91pavan Nov 5, 2025

Uh oh!

adityamehra Nov 5, 2025

Uh oh!

adityamehra Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		# Add test marker to verify lifecycle integrity
		invocation.attributes["test.synthetic_span_marker"] = "created_in_start_llm"

		"LangChain not available; message reconstruction skipped. "
		"Install langchain-core to enable evaluations with Traceloop."

		input_messages=input_messages or [],
		output_messages=output_messages or [],

Enable evals with traceloop translate #29

Are you sure you want to change the base?

Enable evals with traceloop translate #29

Uh oh!

Conversation

adityamehra commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adityamehra Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adityamehra commented Nov 4, 2025 •

edited

Loading

adityamehra Nov 5, 2025 •

edited

Loading