-
Notifications
You must be signed in to change notification settings - Fork 16
docs: [DevOps] ADR logging strategy #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
docs/adrs/006-logging-strategy.md
Outdated
| ```[reqId=e3eaa45c] OpenAI request completed successfully with duration=1628ms, size=1,2KB.``` | ||
|
|
||
| * **Correlate logs.** | ||
| Include a request identifier (e.g., `reqId`) in per-request logs to assist with correlation and debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Major)
What does request identifier mean? Specifically, what is a request exactly?
Please consider there may be users who explicitly (directly) or implicitly (via plugin/extension) put a request identifier for incoming request* contexts. I wouldn't be surprised if Spring framework already today offers this mechanism. We need to make sure, that we do not break/confuse our own semantics in that case.
Edit:
*I remember, the word I was looking for was: correlation id :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correlating logs (to disambiguate multiple parallel threads) is usually something the application configures centrally, so personally I wouldn't see the need to add this from our side.
However, when AI Core responds with an HTTP error, and the response headers contain a correlation ID, we should include that in our exceptions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed reqId to callId and added an explanation to what a "request" correspond to in the document.
Hopefully, this adds enough clarity.
docs/adrs/006-logging-strategy.md
Outdated
| Logs must be clear enough for a developer to understand what happened without checking the code. | ||
| Use the `metric=value` pattern to include structured details with extensibility in mind. | ||
|
|
||
| ```[reqId=e3eaa45c] OpenAI request completed successfully with duration=1628ms, size=1,2KB.``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Minor/Nitpick)
In your example what does size mean? is this the response content size?
I'm okay with metric=value, but it'll get challenging, once you serve multiple "sizes" - or you want to be specific.
|
|
||
| --- | ||
|
|
||
| ### 3. MDC (Mapped Diagnostic Context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Question/Minor)
Can you give an example with current code base?
MatKuhr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for writing this up! left some comments
docs/adrs/006-logging-strategy.md
Outdated
| ```[reqId=e3eaa45c] OpenAI request completed successfully with duration=1628ms, size=1,2KB.``` | ||
|
|
||
| * **Correlate logs.** | ||
| Include a request identifier (e.g., `reqId`) in per-request logs to assist with correlation and debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correlating logs (to disambiguate multiple parallel threads) is usually something the application configures centrally, so personally I wouldn't see the need to add this from our side.
However, when AI Core responds with an HTTP error, and the response headers contain a correlation ID, we should include that in our exceptions as well.
| MDC is used to carry contextual information (e.g., `reqId`, `endpoint`, `service`) across execution blocks within the same thread. | ||
|
|
||
| * **Setting and clearing context.** | ||
| Set MDC values deliberately and close to their scope of relevance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDC uses ThreadLocal variables internally, which will get lost when creating new Threads, either explicitly or implicitly via Cloud SDK resilience.
In addition, please keep in mind that the application developer may also use MDC. So the SDK should not erase anything that is set by app developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- About using MDC across threads, I have outlined the issue with an example
- Also there is a following section that mentions with example on cautiously clearing the MDC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure about the correctness of the example about incorrect usage of MDC. Just let me know if I missed something.
- adding . at the end - reqId explanation - 'size' renaming in log for clarity - MDc examplee - "object creation" with example instead of vague computaiton - remove mention of "placeholder" warnings
content improvment Co-authored-by: Jonas-Isr <[email protected]>
…into docs/adr-logging
- add formatting rules - caution MDC use on thread switch - logging framework information - clarify what "request" means
rpanackal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, when AI Core responds with an HTTP error, and the response headers contain a correlation ID, we should include that in our exceptions as well.
Afaik we do not infer the correlation id from the error response headers. Preferably, lets do that in a separate item with any other such missing details that needs to be included in the exceptions.
Context
AI/ai-sdk-java-backlog#295
New ADR for logging guidelines
Definition of Done