Skip to content

Conversation

@rpanackal
Copy link
Member

Context

AI/ai-sdk-java-backlog#295

New ADR for logging guidelines

Definition of Done

  • Functionality scope stated & covered

@rpanackal rpanackal changed the title Docs/adr logging docs: [DevOps] ADR logging strategy Nov 10, 2025
```[reqId=e3eaa45c] OpenAI request completed successfully with duration=1628ms, size=1,2KB.```

* **Correlate logs.**
Include a request identifier (e.g., `reqId`) in per-request logs to assist with correlation and debugging.
Copy link
Contributor

@newtork newtork Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Major)

What does request identifier mean? Specifically, what is a request exactly?

Please consider there may be users who explicitly (directly) or implicitly (via plugin/extension) put a request identifier for incoming request* contexts. I wouldn't be surprised if Spring framework already today offers this mechanism. We need to make sure, that we do not break/confuse our own semantics in that case.

Edit:
*I remember, the word I was looking for was: correlation id :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correlating logs (to disambiguate multiple parallel threads) is usually something the application configures centrally, so personally I wouldn't see the need to add this from our side.

However, when AI Core responds with an HTTP error, and the response headers contain a correlation ID, we should include that in our exceptions as well.

Copy link
Member Author

@rpanackal rpanackal Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed reqId to callId and added an explanation to what a "request" correspond to in the document.

Hopefully, this adds enough clarity.

Logs must be clear enough for a developer to understand what happened without checking the code.
Use the `metric=value` pattern to include structured details with extensibility in mind.

```[reqId=e3eaa45c] OpenAI request completed successfully with duration=1628ms, size=1,2KB.```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor/Nitpick)

In your example what does size mean? is this the response content size?
I'm okay with metric=value, but it'll get challenging, once you serve multiple "sizes" - or you want to be specific.


---

### 3. MDC (Mapped Diagnostic Context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Question/Minor)

Can you give an example with current code base?

Copy link
Member

@MatKuhr MatKuhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for writing this up! left some comments

```[reqId=e3eaa45c] OpenAI request completed successfully with duration=1628ms, size=1,2KB.```

* **Correlate logs.**
Include a request identifier (e.g., `reqId`) in per-request logs to assist with correlation and debugging.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correlating logs (to disambiguate multiple parallel threads) is usually something the application configures centrally, so personally I wouldn't see the need to add this from our side.

However, when AI Core responds with an HTTP error, and the response headers contain a correlation ID, we should include that in our exceptions as well.

MDC is used to carry contextual information (e.g., `reqId`, `endpoint`, `service`) across execution blocks within the same thread.

* **Setting and clearing context.**
Set MDC values deliberately and close to their scope of relevance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDC uses ThreadLocal variables internally, which will get lost when creating new Threads, either explicitly or implicitly via Cloud SDK resilience.

In addition, please keep in mind that the application developer may also use MDC. So the SDK should not erase anything that is set by app developers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. About using MDC across threads, I have outlined the issue with an example
  2. Also there is a following section that mentions with example on cautiously clearing the MDC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure about the correctness of the example about incorrect usage of MDC. Just let me know if I missed something.

rpanackal and others added 2 commits November 11, 2025 11:03
- adding . at the end
- reqId explanation
- 'size' renaming in log for clarity
- MDc examplee
- "object creation" with example instead of vague computaiton
- remove mention of "placeholder" warnings
content improvment

Co-authored-by: Jonas-Isr <[email protected]>
- add formatting rules
- caution MDC use on thread switch
- logging framework information
- clarify what "request" means
@rpanackal rpanackal added the please-review Request to review a pull-request label Nov 11, 2025
Copy link
Member Author

@rpanackal rpanackal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatKuhr

However, when AI Core responds with an HTTP error, and the response headers contain a correlation ID, we should include that in our exceptions as well.

Afaik we do not infer the correlation id from the error response headers. Preferably, lets do that in a separate item with any other such missing details that needs to be included in the exceptions.

@rpanackal rpanackal self-assigned this Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

please-review Request to review a pull-request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants