Skip to content

Conversation

@adityamehra
Copy link
Contributor

@adityamehra adityamehra commented Nov 4, 2025

  • Set input/put message when calling invocation from traceloop translator
  • Normalize traceloop.entity.input/output to the format deepeval expects
  • boilerplate code for docker, k8s deployments etc.
Trace ID: 0cc9f96b87744d348218fdb43f24e0ce
Span ID: dd39354d20b5a622
Flags: 1
LogRecord #1
ObservedTimestamp: 2025-11-05 07:27:44.847511 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
EventName: gen_ai.evaluation.results
Body: Map({"gen_ai.evaluations":[{"gen_ai.evaluation.explanation":"The score is 0.00 because the actual output is completely unbiased, providing a fair and balanced perspective without any discernible bias.","gen_ai.evaluation.name":"bias","gen_ai.evaluation.passed":true,"gen_ai.evaluation.score.label":"Not Biased","gen_ai.evaluation.score.value":0},{"gen_ai.evaluation.explanation":"The score is 0.00 because the output contains no toxic elements, demonstrating a completely neutral and respectful tone.","gen_ai.evaluation.name":"toxicity","gen_ai.evaluation.passed":true,"gen_ai.evaluation.score.label":"Not Toxic","gen_ai.evaluation.score.value":0},{"gen_ai.evaluation.explanation":"The score is 0.00 because the irrelevant statement 'coordinator' does not contribute any useful information regarding the trip planning, hotel, flights, or experiences requested.","gen_ai.evaluation.name":"relevance","gen_ai.evaluation.passed":false,"gen_ai.evaluation.score.label":"Fail","gen_ai.evaluation.score.value":0},{"gen_ai.evaluation.explanation":"The output does not address any of the key facts from the input, such as the trip details, hotel preferences, or flight information. Instead, it provides an unrelated term \"coordinator,\" which does not align with the user's request for travel planning. This indicates a high risk of hallucination and a complete lack of relevant content.","gen_ai.evaluation.name":"hallucination","gen_ai.evaluation.passed":false,"gen_ai.evaluation.score.label":"Hallucinated","gen_ai.evaluation.score.value":0.001406362599724625},{"gen_ai.evaluation.explanation":"The response does not contain any sentiment indicators or a tone that can be evaluated. It simply outputs a term without context or emotional weight, failing to meet the evaluation steps.","gen_ai.evaluation.name":"sentiment","gen_ai.evaluation.passed":true,"gen_ai.evaluation.score.value":0.018319230505582044}]})
Attributes:
     -> event.name: Str(gen_ai.evaluation.results)
     -> gen_ai.operation.name: Str(data_evaluation_results)
     -> traceloop.workflow.name: Str(travel_planner_multi_agent)
     -> gen_ai.request.model: Str(should_continue)
     -> _traceloop_processed: Bool(true)
     -> gen_ai.evaluation.sampled: Bool(true)
     -> gen_ai.step.name: Str(should_continue)
     -> gen_ai.step.type: Str(chain)
     -> gen_ai.workflow.name: Str(travel_planner_multi_agent)
     -> test.synthetic_span_marker: Str(created_in_start_llm)
     -> gen_ai.evaluation.aggregated: Bool(true)
     -> trace_id: Str(aaa9eff9a0c944ebc5e9d9ca18550c2a)
     -> span_id: Str(3f5f1d0f934a6167)

- Deplpyment scripts
- env files
- requirements.txt

These are for traceloop translator and apps to run
@adityamehra adityamehra requested review from a team as code owners November 4, 2025 23:12
Comment on lines 304 to 306

# Add test marker to verify lifecycle integrity
invocation.attributes["test.synthetic_span_marker"] = "created_in_start_llm"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left over from testing. Fixed

Comment on lines +44 to +52
try:
from langchain_core.messages import (
BaseMessage,
HumanMessage,
AIMessage,
SystemMessage,
ToolMessage,
FunctionMessage,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we explicitly trying to import langchain library classes?

Copy link
Contributor Author

@adityamehra adityamehra Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used to create the actual langchain specific object when reconstructing the input and output messages like HumanMessage and others

Comment on lines 55 to 56
"LangChain not available; message reconstruction skipped. "
"Install langchain-core to enable evaluations with Traceloop."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we need to update this msg

Comment on lines +98 to +101
def _convert_normalized_to_langchain(
normalized_messages: List[Dict[str, Any]],
direction: str
) -> List[Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably verify if the installed library is langchain and then execute this method, right? else we need to make it more general

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point but for now let's keep it langchain specific. Otherwise we'll need to verify with the other library. We can add as we go.

Comment on lines +662 to +663
input_messages=input_messages or [],
output_messages=output_messages or [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this parameters be okay if there are no evals in the pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to have it as it'll be empty, [], and evals library will take care of it

# NOW mark the original span as processed (AFTER translation is done)
# This prevents infinite recursion while allowing synthetic span creation
if hasattr(span, "_attributes"):
span._attributes["_traceloop_processed"] = True # type: ignore[attr-defined]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@91pavan I have moved the marking to this line outside _mutate_span_if_needed and _process_span_translation so can leave the checks inside those methods but the start_llm() and stop_llm() will also get triggered

@adityamehra adityamehra mentioned this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants