Skip to content

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Aug 18, 2025

What

We want debug logs in the form of AirbyteMessages to be logged when the flag LOG_LEVEL is debug.

How

Identify if the debug log has been enable by the environment variable (i.e. the platform) or by the arg --debug. If both, we prioritize the platform logs. Debug logs locally looks like this:

{
    "type": "DEBUG",
    "message": "Making outbound API request",
    "data": {
        "request_body": "None",
        "headers": "{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}",
        "url": "https://pokeapi.co/api/v2/pokemon?startIndex=0"
    }
}
{
    "type": "DEBUG",
    "message": "Receiving response",
    "data": {
        "headers": "{'Date': 'Fri, 15 Aug 2025 18:11:04 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '306', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Cache-Control': 'public, max-age=86400, s-maxage=86400', 'Content-Encoding': 'gzip', 'Etag': 'W/\"588-/X6dSpO6F3LPFuUhaJHfN5AqUco\"', 'Function-Execution-Id': 'bjn17ug6hu0q', 'Server': 'cloudflare', 'Strict-Transport-Security': 'max-age=31556926', 'X-Cloud-Trace-Context': 'f9dbb9c0a7d4a32e40676e3b047b2dc1', 'X-Country-Code': 'US', 'X-Powered-By': 'Express', 'Accept-Ranges': 'bytes', 'X-Served-By': 'cache-lga21983-LGA', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1755281464.437889,VS0,VE423', 'Vary': 'Accept-Encoding,cookie,need-authorization, x-fh-requested-host, accept-encoding', 'Alt-Svc': 'h3=\":443\"; ma=86400', 'Cf-Cache-Status': 'MISS', 'Nel': '{\"report_to\":\"cf-nel\",\"success_fraction\":0.0,\"max_age\":604800}', 'Report-To': '{\"group\":\"cf-nel\",\"max_age\":604800,\"endpoints\":[{\"url\":\"https://a.nel.cloudflare.com/report/v4?s=lWItFraaMRhmdeAXfB52vPaffGwp%2FIbOjUQBSZmrtdJ0Zhc%2FGJtVVNNmuFaGoh8YEGU9iRNZLOfMb4H4AlI5WHNoqGtFJhTa8LM%3D\"}]}', 'CF-RAY': '96faa7009e4db29e-EWR'}",
        "body": "{\"count\":1302,\"next\":\"https://pokeapi.co/api/v2/pokemon?offset=20&limit=20\",\"previous\":null,\"results\":[{\"name\":\"bulbasaur\",\"url\":\"https://pokeapi.co/api/v2/pokemon/1/\"},{\"name\":\"ivysaur\",\"url\":\"https://pokeapi.co/api/v2/pokemon/2/\"},{\"name\":\"venusaur\",\"url\":\"https://pokeapi.co/api/v2/pokemon/3/\"},{\"name\":\"charmander\",\"url\":\"https://pokeapi.co/api/v2/pokemon/4/\"},{\"name\":\"charmeleon\",\"url\":\"https://pokeapi.co/api/v2/pokemon/5/\"},{\"name\":\"charizard\",\"url\":\"https://pokeapi.co/api/v2/pokemon/6/\"},{\"name\":\"squirtle\",\"url\":\"https://pokeapi.co/api/v2/pokemon/7/\"},{\"name\":\"wartortle\",\"url\":\"https://pokeapi.co/api/v2/pokemon/8/\"},{\"name\":\"blastoise\",\"url\":\"https://pokeapi.co/api/v2/pokemon/9/\"},{\"name\":\"caterpie\",\"url\":\"https://pokeapi.co/api/v2/pokemon/10/\"},{\"name\":\"metapod\",\"url\":\"https://pokeapi.co/api/v2/pokemon/11/\"},{\"name\":\"butterfree\",\"url\":\"https://pokeapi.co/api/v2/pokemon/12/\"},{\"name\":\"weedle\",\"url\":\"https://pokeapi.co/api/v2/pokemon/13/\"},{\"name\":\"kakuna\",\"url\":\"https://pokeapi.co/api/v2/pokemon/14/\"},{\"name\":\"beedrill\",\"url\":\"https://pokeapi.co/api/v2/pokemon/15/\"},{\"name\":\"pidgey\",\"url\":\"https://pokeapi.co/api/v2/pokemon/16/\"},{\"name\":\"pidgeotto\",\"url\":\"https://pokeapi.co/api/v2/pokemon/17/\"},{\"name\":\"pidgeot\",\"url\":\"https://pokeapi.co/api/v2/pokemon/18/\"},{\"name\":\"rattata\",\"url\":\"https://pokeapi.co/api/v2/pokemon/19/\"},{\"name\":\"raticate\",\"url\":\"https://pokeapi.co/api/v2/pokemon/20/\"}]}",
        "status": "200"
    }
}

In the platform, they will look like this:

{
    "type": "LOG",
    "log": {
        "level": "DEBUG",
        "message": "Making outbound API request ///\nExtra logs: {\"request_body\": \"None\", \"url\": \"https://pokeapi.co/api/v2/pokemon?startIndex=0\", \"headers\": \"{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}\"}"
    }
}
{
    "type": "LOG",
    "log": {
        "level": "DEBUG",
        "message": "Receiving response ///\nExtra logs: {\"body\": \"{\\\"count\\\":1302,\\\"next\\\":\\\"https://pokeapi.co/api/v2/pokemon?offset=20&limit=20\\\",\\\"previous\\\":null,\\\"results\\\":[{\\\"name\\\":\\\"bulbasaur\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/1/\\\"},{\\\"name\\\":\\\"ivysaur\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/2/\\\"},{\\\"name\\\":\\\"venusaur\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/3/\\\"},{\\\"name\\\":\\\"charmander\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/4/\\\"},{\\\"name\\\":\\\"charmeleon\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/5/\\\"},{\\\"name\\\":\\\"charizard\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/6/\\\"},{\\\"name\\\":\\\"squirtle\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/7/\\\"},{\\\"name\\\":\\\"wartortle\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/8/\\\"},{\\\"name\\\":\\\"blastoise\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/9/\\\"},{\\\"name\\\":\\\"caterpie\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/10/\\\"},{\\\"name\\\":\\\"metapod\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/11/\\\"},{\\\"name\\\":\\\"butterfree\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/12/\\\"},{\\\"name\\\":\\\"weedle\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/13/\\\"},{\\\"name\\\":\\\"kakuna\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/14/\\\"},{\\\"name\\\":\\\"beedrill\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/15/\\\"},{\\\"name\\\":\\\"pidgey\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/16/\\\"},{\\\"name\\\":\\\"pidgeotto\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/17/\\\"},{\\\"name\\\":\\\"pidgeot\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/18/\\\"},{\\\"name\\\":\\\"rattata\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/19/\\\"},{\\\"name\\\":\\\"raticate\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/20/\\\"}]}\", \"status\": \"200\", \"headers\": \"{'Date': 'Mon, 18 Aug 2025 18:13:02 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '306', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Cache-Control': 'public, max-age=86400, s-maxage=86400', 'Content-Encoding': 'gzip', 'Etag': 'W/\\\"588-/X6dSpO6F3LPFuUhaJHfN5AqUco\\\"', 'Function-Execution-Id': '5xn0fh0ds3bn', 'Server': 'cloudflare', 'Strict-Transport-Security': 'max-age=31556926', 'X-Cloud-Trace-Context': '2b047a717c5c0cd8533f8617856e2aa1', 'X-Country-Code': 'GB', 'X-Powered-By': 'Express', 'Accept-Ranges': 'bytes', 'X-Served-By': 'cache-man4180-MAN', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1755540743.501342,VS0,VE730', 'Vary': 'Accept-Encoding,cookie,need-authorization, x-fh-requested-host, accept-encoding', 'Alt-Svc': 'h3=\\\":443\\\"; ma=86400', 'Age': '39', 'Cf-Cache-Status': 'HIT', 'Nel': '{\\\"report_to\\\":\\\"cf-nel\\\",\\\"success_fraction\\\":0.0,\\\"max_age\\\":604800}', 'Report-To': '{\\\"group\\\":\\\"cf-nel\\\",\\\"max_age\\\":604800,\\\"endpoints\\\":[{\\\"url\\\":\\\"https://a.nel.cloudflare.com/report/v4?s=Ko97QSz%2BvTny%2BDuj6FJSq%2BQEMn3VUMogN88Vl2wmojZTtU02znLgt117XpsdBGkh7AzpTaZZPUeDJQzmBO0bjaC8Hgl%2B1%2FkW\\\"}]}', 'CF-RAY': '9713620498bf8ad8-MAN'}\"}"
    }
}

More context as to why we implemented this solution

We want those logs to be human readable as much as possible. Right now, the logs are printed as this:

{
    "type": "DEBUG",
    "message": "Receiving response",
    "data": {
        "headers": <a dict as a string>,
        "body": "{\"count\":1302, <...>}",
        "status": "200"
    }
}

The flow of the developer is usually to format the DEBUG message, copy/paste the body, decode it and read it. The Airbyte protocol equivalent would be this:

{
    "type": "LOG",
    "log": {
        "level": "DEBUG",
        "message": "Receiving response",
        "extras": {
            "body": "{\"count\":1302, <...>}",
            "status": "200",
            "headers": <a dict as a string>
        }
    }
}

However even though it is allowed, I'm not sure what the platform does with the extras even though additional properties are allowed. Therefore, it'll lead to a solution that the developer will have to find where in the log message the body is and to decode it twice which adds a lot of work to be done on any request that needs to be read. The alternatives I see are:

  • If it's not already the case, we have the platform consider the additional properties
  • We remove the structure for the extra part and put everything in a single string (something like Receiving response. Status: %s, headers: %s, body: %s). It'll degrade the DX compared to what we do locally because between formatting the DEBUG message and copy/pasting the body, the dev will have to locate the body within the string. We can add a sequence of character before and after that we expect will not be present in must HTTP bodies to ease that but it is still not ideal.
  • We print the response information both in extra and in the message which allow to cover both cases but can add a lot in terms of size. This might be dangerous in terms of the JSON limit there was in jackson when the platform deserialize (I don't know if it is still a problem)
  • We develop tooling for the devs to do that (we have already though about this even before considering this change and have not taken action on this because the amount of work was non negligible but we can probably aggressively scope this problem)
  • The CDK consider how the debut logs were enabled. If it is through the environment variable, we print it in the single string, if it is through the --debug param, we print this through the extras dictionary

Summary by CodeRabbit

  • New Features
    • Debug logging can now be enabled via a platform-controlled environment override in addition to the existing CLI flag.
    • When neither the flag nor the platform override is active, logs default to INFO level.
    • All logs are emitted as structured JSON messages for consistent consumption across tools.
    • Debug logs may include richer, filtered contextual data when platform debug is enabled.

… have HTTP request/responses being logged as airbyte messages
@github-actions github-actions bot added the enhancement New feature or request label Aug 18, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/debug-log-project#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/debug-log-project

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 18, 2025

📝 Walkthrough

Walkthrough

Adds a platform-driven debug flag API (is_platform_debug_log_enabled) and uses it in entrypoint to choose DEBUG vs INFO. Reworks AirbyteLogFormatter to emit structured AirbyteMessage JSON for all levels, with DEBUG formatting differing when platform debug is enabled.

Changes

Cohort / File(s) Summary
Platform debug control API
airbyte_cdk/logger.py
Added is_platform_debug_log_enabled() reading LOG_LEVEL env var (case-insensitive) to detect "debug"; added os import.
Entrypoint log level selection
airbyte_cdk/entrypoint.py
run now sets logging to DEBUG if --debug flag OR is_platform_debug_log_enabled() is true; otherwise explicitly sets INFO.
Structured logging output
airbyte_cdk/logger.py
AirbyteLogFormatter.format now always emits serialized AirbyteMessage JSON. DEBUG branch: if platform debug enabled, include JSON-serialized, secret-filtered extras; otherwise use prior simpler debug JSON. Non-DEBUG branch produces AirbyteMessage JSON with filtered message.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Entrypoint
  participant LoggerAPI as logger.is_platform_debug_log_enabled
  participant PythonLogging as logging

  User->>Entrypoint: run(..., --debug?)
  Entrypoint->>LoggerAPI: is_platform_debug_log_enabled()
  LoggerAPI-->>Entrypoint: bool
  alt --debug or platform debug
    Entrypoint->>PythonLogging: setLevel(DEBUG)
  else
    Entrypoint->>PythonLogging: setLevel(INFO)
  end
Loading
sequenceDiagram
  participant Code as Calling Code
  participant Logger as AirbyteLogFormatter
  participant Env as LOG_LEVEL
  participant Serializer as AirbyteMessageSerializer

  Code->>Logger: format(record)
  alt record.level == DEBUG
    Logger->>Env: is_platform_debug_log_enabled()
    alt platform debug = true
      Logger->>Serializer: build AirbyteMessage(LOG, level=DEBUG, message+extras)
      Serializer-->>Logger: json
    else
      Logger->>Serializer: build AirbyteMessage(LOG, level=DEBUG, simple fields)
      Serializer-->>Logger: json
    end
  else
    Logger->>Serializer: build AirbyteMessage(LOG, level=record.level, filtered message)
    Serializer-->>Logger: json
  end
  Logger-->>Code: structured JSON string
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Would you like me to suggest specific unit tests or examples to validate the new debug-vs-platform behavior, wdyt?

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/debug-log-project

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
airbyte_cdk/logger.py (1)

75-79: Default to Level.INFO enum, not string

airbyte_level falls back to the literal "INFO" string which is inconsistent with the Level enum used elsewhere and could cause subtle type issues. Can we default to Level.INFO (and optionally annotate the variable)?

Apply this diff?

-        airbyte_level = self.level_mapping.get(record.levelno, "INFO")
+        airbyte_level: Level = self.level_mapping.get(record.levelno, Level.INFO)

wdyt?

airbyte_cdk/entrypoint.py (1)

161-165: Annotate is_platform_debug_log_enabled with -> bool to satisfy mypy (and maybe capture its result in a local variable for clearer logs? wdyt?)

We’re currently calling an untyped helper, which is tripping mypy. Let’s update its signature and optionally surface the flag in entrypoint.py for a more descriptive debug message.

• In airbyte_cdk/logger.py (around line 43):

--- a/airbyte_cdk/logger.py
+++ b/airbyte_cdk/logger.py
@@ -43,7 +43,7 @@
 
-def is_platform_debug_log_enabled():
+def is_platform_debug_log_enabled() -> bool:
     return os.environ.get("LOG_LEVEL", "info").lower() == "debug"

• In airbyte_cdk/entrypoint.py (lines 161–165), optionally capture the platform flag and update the log text:

--- a/airbyte_cdk/entrypoint.py
+++ b/airbyte_cdk/entrypoint.py
@@ -161,7 +161,11 @@
-        if (hasattr(parsed_args, "debug") and parsed_args.debug) or is_platform_debug_log_enabled():
+        platform_debug = is_platform_debug_log_enabled()
+        if (hasattr(parsed_args, "debug") and parsed_args.debug) or platform_debug:
             self.logger.setLevel(logging.DEBUG)
             logger.setLevel(logging.DEBUG)
-            self.logger.debug("Debug logs enabled")
+            self.logger.debug(
+                "Debug logs enabled%s" % (" (platform override)" if platform_debug else "")
+            )

These changes should clear the mypy error and give a clearer “platform override” indicator when appropriate—wdyt?

🧹 Nitpick comments (3)
airbyte_cdk/logger.py (3)

92-93: Preserve structure of extras (avoid str-casting) and make JSON robust

extract_extra_args_from_record coerces all extras to str, which loses structure for request/response payloads that are dicts/lists. This contradicts the stated goal of having structured data locally. Could we keep native types and only fall back to str during serialization?

Apply these diffs?

-                debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
-                return filter_secrets(json.dumps(debug_dict))
+                debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
+                # Ensure non-serializable extras don't break logging
+                return filter_secrets(json.dumps(debug_dict, default=str))
-        default_attrs = logging.LogRecord("", 0, "", 0, None, None, None).__dict__.keys()
-        extra_keys = set(record.__dict__.keys()) - default_attrs
-        return {k: str(getattr(record, k)) for k in extra_keys if hasattr(record, k)}
+        default_attrs = logging.LogRecord("", 0, "", 0, None, None, None).__dict__.keys()
+        extra_keys = set(record.__dict__.keys()) - default_attrs
+        # Preserve native types; let JSON serialization handle them (with fallback to str at call site)
+        return {k: getattr(record, k) for k in extra_keys if hasattr(record, k)}

wdyt?

Also applies to: 103-111


80-91: Consistent and safe serialization for platform debug path

Minor: For consistency and performance, would you consider using orjson for the extras JSON in the platform path and adding a default=str fallback? Also, this avoids double filtering passes by filtering once on the final composed string.

Apply this diff?

-                message = f"{filter_secrets(record.getMessage())} ///\nExtra logs: {filter_secrets(json.dumps(extras))}"
+                extras_json = orjson.dumps(extras, default=str).decode()
+                message = filter_secrets(f"{record.getMessage()} ///\nExtra logs: {extras_json}")
                 log_message = AirbyteMessage(type=Type.LOG, log=AirbyteLogMessage(level=airbyte_level, message=message))
                 return orjson.dumps(AirbyteMessageSerializer.dump(log_message)).decode()

wdyt?


80-93: Optional hardening: mask common sensitive HTTP headers in extras

While filter_secrets helps for values sourced from config, responses/headers often contain credentials not present in the config (e.g., Set-Cookie, Authorization). Would you like to mask a small set of common sensitive headers before logging to reduce leakage risk?

As a follow-up, we could add this helper:

SENSITIVE_HEADER_NAMES = {"authorization", "proxy-authorization", "cookie", "set-cookie", "x-api-key", "x-auth-token"}

def _sanitize_http_extras(extras: Mapping[str, Any]) -> Mapping[str, Any]:
    if not isinstance(extras, dict):
        return extras
    sanitized = dict(extras)
    headers = sanitized.get("headers")
    if isinstance(headers, Mapping):
        sanitized_headers = {}
        for k, v in headers.items():
            if isinstance(k, str) and k.lower() in SENSITIVE_HEADER_NAMES:
                sanitized_headers[k] = "****"
            else:
                sanitized_headers[k] = v
        sanitized["headers"] = sanitized_headers
    return sanitized

…then call _sanitize_http_extras(extras) right before serializing in both DEBUG branches. Happy to wire this in if you want, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1c9049a and ab53ce6.

📒 Files selected for processing (2)
  • airbyte_cdk/entrypoint.py (2 hunks)
  • airbyte_cdk/logger.py (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
airbyte_cdk/entrypoint.py (4)
airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py (1)
  • logger (421-422)
airbyte_cdk/sources/streams/core.py (1)
  • logger (128-129)
unit_tests/test_logger.py (1)
  • logger (15-17)
airbyte_cdk/logger.py (2)
  • init_logger (47-52)
  • is_platform_debug_log_enabled (43-44)
airbyte_cdk/logger.py (2)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • filter_secrets (73-80)
airbyte_cdk/models/airbyte_protocol.py (1)
  • AirbyteMessage (79-88)
🪛 GitHub Actions: Linters
airbyte_cdk/entrypoint.py

[error] 161-161: mypy: Call to untyped function is_platform_debug_log_enabled in typed context [no-untyped-call]. Step: poetry run mypy --config-file mypy.ini airbyte_cdk

airbyte_cdk/logger.py

[error] 43-43: mypy: Function is missing a return type annotation [no-untyped-def]. Step: poetry run mypy --config-file mypy.ini airbyte_cdk


[error] 80-80: mypy: Call to untyped function is_platform_debug_log_enabled in typed context [no-untyped-call]. Step: poetry run mypy --config-file mypy.ini airbyte_cdk

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)

@github-actions
Copy link

github-actions bot commented Aug 18, 2025

PyTest Results (Fast)

3 696 tests  ±0   3 685 ✅ ±0   6m 44s ⏱️ +15s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit ec49c93. ± Comparison against base commit 1c9049a.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Aug 18, 2025

PyTest Results (Full)

3 699 tests  ±0   3 688 ✅ ±0   11m 52s ⏱️ +18s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit ec49c93. ± Comparison against base commit 1c9049a.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
airbyte_cdk/logger.py (4)

43-45: Add a brief docstring to clarify contract and scope?

Since this function is a new public API, could we add a short docstring clarifying that it only checks the platform-controlled LOG_LEVEL env var (case-insensitive), and that CLI vs env precedence is handled by the caller (entrypoint)? It helps future readers avoid assuming it reads CLI flags. wdyt?

 def is_platform_debug_log_enabled() -> bool:
-    return os.environ.get("LOG_LEVEL", "info").lower() == "debug"
+    """True if platform-level debug logging is enabled via LOG_LEVEL=debug (case-insensitive).
+
+    Note: CLI vs env precedence is determined by the caller (e.g., entrypoint), not here.
+    """
+    return os.environ.get("LOG_LEVEL", "info").lower() == "debug"

75-79: Use Level.INFO for the default instead of the string "INFO"

The fallback currently returns a bare string, while the mapping returns Level enum values. For type consistency (and to avoid subtle comparisons), can we default to Level.INFO? wdyt?

-        airbyte_level = self.level_mapping.get(record.levelno, "INFO")
+        airbyte_level = self.level_mapping.get(record.levelno, Level.INFO)

80-96: Consider preserving structured extras and standardizing serialization

Two small improvements here to better align with the PR objective of structured request/response data and to keep serialization consistent:

  • Today, extract_extra_args_from_record() stringifies extras, so the "data" object under local DEBUG is actually strings (e.g., dicts become single-quoted strings), which defeats structured data. Would you be open to making extras JSON-safe but still structured (fallback to str only when necessary)?
  • Also, we mix json and orjson; can we standardize on orjson for speed and consistency?

Proposed tweaks in this block (switch to orjson), wdyt?

-                message = f"{filter_secrets(record.getMessage())} ///\nExtra logs: {filter_secrets(json.dumps(extras))}"
+                extras_json = orjson.dumps(extras).decode()
+                message = f"{filter_secrets(record.getMessage())} ///\nExtra logs: {filter_secrets(extras_json)}"
...
-                debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
-                return filter_secrets(json.dumps(debug_dict))
+                debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
+                return filter_secrets(orjson.dumps(debug_dict).decode())

And to actually keep extras structured, consider updating extract_extra_args_from_record() (outside this hunk) to only stringify values that aren’t JSON-serializable:

@staticmethod
def extract_extra_args_from_record(record: logging.LogRecord) -> Mapping[str, Any]:
    default_attrs = logging.LogRecord("", 0, "", 0, None, None, None).__dict__.keys()
    extra_keys = set(record.__dict__.keys()) - default_attrs

    def _json_safe(v: Any) -> Any:
        try:
            orjson.dumps(v)
            return v
        except TypeError:
            return str(v)

    return {k: _json_safe(getattr(record, k)) for k in extra_keys if hasattr(record, k)}

This should keep request/response bodies and headers as genuine objects in local DEBUG while still being robust. wdyt?


80-96: Guardrails for very large extras payloads?

HTTP bodies/headers can be big. Do you want to consider truncating extras (e.g., cap body length) in platform mode to keep single LOG messages manageable, or at least put a size cap and note truncation in the message? Happy to propose a small helper if helpful. wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ab53ce6 and ec49c93.

📒 Files selected for processing (1)
  • airbyte_cdk/logger.py (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/logger.py (2)
airbyte_cdk/utils/airbyte_secrets_utils.py (1)
  • filter_secrets (73-80)
airbyte_cdk/models/airbyte_protocol.py (1)
  • AirbyteMessage (79-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/logger.py (3)

7-7: Import looks good and scoped

os is used by is_platform_debug_log_enabled; no concerns here.


43-45: Nice: this resolves the mypy error noted earlier

The explicit -> bool return type should satisfy the linter and downstream call sites.


75-103: Verify entrypoint debug vs. LOG_LEVEL precedence

  • In airbyte_cdk/entrypoint.py (lines 161–164), debug is enabled when
    (parsed_args.debug) OR is_platform_debug_log_enabled().
  • is_platform_debug_log_enabled() returns true only if LOG_LEVEL env var == "debug".
  • As written, CLI --debug still enables debug even if LOG_LEVEL!="debug", and vice versa.

Could you confirm this OR-based logic matches the contract that LOG_LEVEL should “take priority” when both are present? For example, if the intent is for the env var to override CLI in both directions, we may need to check LOG_LEVEL first and ignore parsed_args.debug when it’s set. wdyt?

@maxi297 maxi297 merged commit 3244eec into main Aug 18, 2025
26 checks passed
@maxi297 maxi297 deleted the maxi297/debug-log-project branch August 18, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants