feat(debug log): enable debug log through environment variable and have HTTP request/responses being logged as airbyte messages #711

maxi297 · 2025-08-18T18:29:19Z

What

We want debug logs in the form of AirbyteMessages to be logged when the flag LOG_LEVEL is debug.

How

Identify if the debug log has been enable by the environment variable (i.e. the platform) or by the arg --debug. If both, we prioritize the platform logs. Debug logs locally looks like this:

{
    "type": "DEBUG",
    "message": "Making outbound API request",
    "data": {
        "request_body": "None",
        "headers": "{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}",
        "url": "https://pokeapi.co/api/v2/pokemon?startIndex=0"
    }
}
{
    "type": "DEBUG",
    "message": "Receiving response",
    "data": {
        "headers": "{'Date': 'Fri, 15 Aug 2025 18:11:04 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '306', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Cache-Control': 'public, max-age=86400, s-maxage=86400', 'Content-Encoding': 'gzip', 'Etag': 'W/\"588-/X6dSpO6F3LPFuUhaJHfN5AqUco\"', 'Function-Execution-Id': 'bjn17ug6hu0q', 'Server': 'cloudflare', 'Strict-Transport-Security': 'max-age=31556926', 'X-Cloud-Trace-Context': 'f9dbb9c0a7d4a32e40676e3b047b2dc1', 'X-Country-Code': 'US', 'X-Powered-By': 'Express', 'Accept-Ranges': 'bytes', 'X-Served-By': 'cache-lga21983-LGA', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1755281464.437889,VS0,VE423', 'Vary': 'Accept-Encoding,cookie,need-authorization, x-fh-requested-host, accept-encoding', 'Alt-Svc': 'h3=\":443\"; ma=86400', 'Cf-Cache-Status': 'MISS', 'Nel': '{\"report_to\":\"cf-nel\",\"success_fraction\":0.0,\"max_age\":604800}', 'Report-To': '{\"group\":\"cf-nel\",\"max_age\":604800,\"endpoints\":[{\"url\":\"https://a.nel.cloudflare.com/report/v4?s=lWItFraaMRhmdeAXfB52vPaffGwp%2FIbOjUQBSZmrtdJ0Zhc%2FGJtVVNNmuFaGoh8YEGU9iRNZLOfMb4H4AlI5WHNoqGtFJhTa8LM%3D\"}]}', 'CF-RAY': '96faa7009e4db29e-EWR'}",
        "body": "{\"count\":1302,\"next\":\"https://pokeapi.co/api/v2/pokemon?offset=20&limit=20\",\"previous\":null,\"results\":[{\"name\":\"bulbasaur\",\"url\":\"https://pokeapi.co/api/v2/pokemon/1/\"},{\"name\":\"ivysaur\",\"url\":\"https://pokeapi.co/api/v2/pokemon/2/\"},{\"name\":\"venusaur\",\"url\":\"https://pokeapi.co/api/v2/pokemon/3/\"},{\"name\":\"charmander\",\"url\":\"https://pokeapi.co/api/v2/pokemon/4/\"},{\"name\":\"charmeleon\",\"url\":\"https://pokeapi.co/api/v2/pokemon/5/\"},{\"name\":\"charizard\",\"url\":\"https://pokeapi.co/api/v2/pokemon/6/\"},{\"name\":\"squirtle\",\"url\":\"https://pokeapi.co/api/v2/pokemon/7/\"},{\"name\":\"wartortle\",\"url\":\"https://pokeapi.co/api/v2/pokemon/8/\"},{\"name\":\"blastoise\",\"url\":\"https://pokeapi.co/api/v2/pokemon/9/\"},{\"name\":\"caterpie\",\"url\":\"https://pokeapi.co/api/v2/pokemon/10/\"},{\"name\":\"metapod\",\"url\":\"https://pokeapi.co/api/v2/pokemon/11/\"},{\"name\":\"butterfree\",\"url\":\"https://pokeapi.co/api/v2/pokemon/12/\"},{\"name\":\"weedle\",\"url\":\"https://pokeapi.co/api/v2/pokemon/13/\"},{\"name\":\"kakuna\",\"url\":\"https://pokeapi.co/api/v2/pokemon/14/\"},{\"name\":\"beedrill\",\"url\":\"https://pokeapi.co/api/v2/pokemon/15/\"},{\"name\":\"pidgey\",\"url\":\"https://pokeapi.co/api/v2/pokemon/16/\"},{\"name\":\"pidgeotto\",\"url\":\"https://pokeapi.co/api/v2/pokemon/17/\"},{\"name\":\"pidgeot\",\"url\":\"https://pokeapi.co/api/v2/pokemon/18/\"},{\"name\":\"rattata\",\"url\":\"https://pokeapi.co/api/v2/pokemon/19/\"},{\"name\":\"raticate\",\"url\":\"https://pokeapi.co/api/v2/pokemon/20/\"}]}",
        "status": "200"
    }
}

In the platform, they will look like this:

{
    "type": "LOG",
    "log": {
        "level": "DEBUG",
        "message": "Making outbound API request ///\nExtra logs: {\"request_body\": \"None\", \"url\": \"https://pokeapi.co/api/v2/pokemon?startIndex=0\", \"headers\": \"{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}\"}"
    }
}
{
    "type": "LOG",
    "log": {
        "level": "DEBUG",
        "message": "Receiving response ///\nExtra logs: {\"body\": \"{\\\"count\\\":1302,\\\"next\\\":\\\"https://pokeapi.co/api/v2/pokemon?offset=20&limit=20\\\",\\\"previous\\\":null,\\\"results\\\":[{\\\"name\\\":\\\"bulbasaur\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/1/\\\"},{\\\"name\\\":\\\"ivysaur\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/2/\\\"},{\\\"name\\\":\\\"venusaur\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/3/\\\"},{\\\"name\\\":\\\"charmander\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/4/\\\"},{\\\"name\\\":\\\"charmeleon\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/5/\\\"},{\\\"name\\\":\\\"charizard\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/6/\\\"},{\\\"name\\\":\\\"squirtle\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/7/\\\"},{\\\"name\\\":\\\"wartortle\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/8/\\\"},{\\\"name\\\":\\\"blastoise\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/9/\\\"},{\\\"name\\\":\\\"caterpie\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/10/\\\"},{\\\"name\\\":\\\"metapod\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/11/\\\"},{\\\"name\\\":\\\"butterfree\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/12/\\\"},{\\\"name\\\":\\\"weedle\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/13/\\\"},{\\\"name\\\":\\\"kakuna\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/14/\\\"},{\\\"name\\\":\\\"beedrill\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/15/\\\"},{\\\"name\\\":\\\"pidgey\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/16/\\\"},{\\\"name\\\":\\\"pidgeotto\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/17/\\\"},{\\\"name\\\":\\\"pidgeot\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/18/\\\"},{\\\"name\\\":\\\"rattata\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/19/\\\"},{\\\"name\\\":\\\"raticate\\\",\\\"url\\\":\\\"https://pokeapi.co/api/v2/pokemon/20/\\\"}]}\", \"status\": \"200\", \"headers\": \"{'Date': 'Mon, 18 Aug 2025 18:13:02 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '306', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Cache-Control': 'public, max-age=86400, s-maxage=86400', 'Content-Encoding': 'gzip', 'Etag': 'W/\\\"588-/X6dSpO6F3LPFuUhaJHfN5AqUco\\\"', 'Function-Execution-Id': '5xn0fh0ds3bn', 'Server': 'cloudflare', 'Strict-Transport-Security': 'max-age=31556926', 'X-Cloud-Trace-Context': '2b047a717c5c0cd8533f8617856e2aa1', 'X-Country-Code': 'GB', 'X-Powered-By': 'Express', 'Accept-Ranges': 'bytes', 'X-Served-By': 'cache-man4180-MAN', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1755540743.501342,VS0,VE730', 'Vary': 'Accept-Encoding,cookie,need-authorization, x-fh-requested-host, accept-encoding', 'Alt-Svc': 'h3=\\\":443\\\"; ma=86400', 'Age': '39', 'Cf-Cache-Status': 'HIT', 'Nel': '{\\\"report_to\\\":\\\"cf-nel\\\",\\\"success_fraction\\\":0.0,\\\"max_age\\\":604800}', 'Report-To': '{\\\"group\\\":\\\"cf-nel\\\",\\\"max_age\\\":604800,\\\"endpoints\\\":[{\\\"url\\\":\\\"https://a.nel.cloudflare.com/report/v4?s=Ko97QSz%2BvTny%2BDuj6FJSq%2BQEMn3VUMogN88Vl2wmojZTtU02znLgt117XpsdBGkh7AzpTaZZPUeDJQzmBO0bjaC8Hgl%2B1%2FkW\\\"}]}', 'CF-RAY': '9713620498bf8ad8-MAN'}\"}"
    }
}

More context as to why we implemented this solution

We want those logs to be human readable as much as possible. Right now, the logs are printed as this:

{
    "type": "DEBUG",
    "message": "Receiving response",
    "data": {
        "headers": <a dict as a string>,
        "body": "{\"count\":1302, <...>}",
        "status": "200"
    }
}

The flow of the developer is usually to format the DEBUG message, copy/paste the body, decode it and read it. The Airbyte protocol equivalent would be this:

{
    "type": "LOG",
    "log": {
        "level": "DEBUG",
        "message": "Receiving response",
        "extras": {
            "body": "{\"count\":1302, <...>}",
            "status": "200",
            "headers": <a dict as a string>
        }
    }
}

However even though it is allowed, I'm not sure what the platform does with the extras even though additional properties are allowed. Therefore, it'll lead to a solution that the developer will have to find where in the log message the body is and to decode it twice which adds a lot of work to be done on any request that needs to be read. The alternatives I see are:

If it's not already the case, we have the platform consider the additional properties
We remove the structure for the extra part and put everything in a single string (something like Receiving response. Status: %s, headers: %s, body: %s). It'll degrade the DX compared to what we do locally because between formatting the DEBUG message and copy/pasting the body, the dev will have to locate the body within the string. We can add a sequence of character before and after that we expect will not be present in must HTTP bodies to ease that but it is still not ideal.
We print the response information both in extra and in the message which allow to cover both cases but can add a lot in terms of size. This might be dangerous in terms of the JSON limit there was in jackson when the platform deserialize (I don't know if it is still a problem)
We develop tooling for the devs to do that (we have already though about this even before considering this change and have not taken action on this because the amount of work was non negligible but we can probably aggressively scope this problem)
The CDK consider how the debut logs were enabled. If it is through the environment variable, we print it in the single string, if it is through the --debug param, we print this through the extras dictionary

Summary by CodeRabbit

New Features
- Debug logging can now be enabled via a platform-controlled environment override in addition to the existing CLI flag.
- When neither the flag nor the platform override is active, logs default to INFO level.
- All logs are emitted as structured JSON messages for consistent consumption across tools.
- Debug logs may include richer, filtered contextual data when platform debug is enabled.

… have HTTP request/responses being logged as airbyte messages

github-actions · 2025-08-18T18:29:33Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/debug-log-project#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/debug-log-project

Helpful Resources

CDK API Reference

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
/poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

coderabbitai · 2025-08-18T18:35:11Z

📝 Walkthrough

Walkthrough

Adds a platform-driven debug flag API (is_platform_debug_log_enabled) and uses it in entrypoint to choose DEBUG vs INFO. Reworks AirbyteLogFormatter to emit structured AirbyteMessage JSON for all levels, with DEBUG formatting differing when platform debug is enabled.

Changes

Cohort / File(s)	Summary
Platform debug control API `airbyte_cdk/logger.py`	Added `is_platform_debug_log_enabled()` reading `LOG_LEVEL` env var (case-insensitive) to detect `"debug"`; added `os` import.
Entrypoint log level selection `airbyte_cdk/entrypoint.py`	`run` now sets logging to DEBUG if `--debug` flag OR `is_platform_debug_log_enabled()` is true; otherwise explicitly sets INFO.
Structured logging output `airbyte_cdk/logger.py`	`AirbyteLogFormatter.format` now always emits serialized `AirbyteMessage` JSON. DEBUG branch: if platform debug enabled, include JSON-serialized, secret-filtered extras; otherwise use prior simpler debug JSON. Non-DEBUG branch produces AirbyteMessage JSON with filtered message.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Entrypoint
  participant LoggerAPI as logger.is_platform_debug_log_enabled
  participant PythonLogging as logging

  User->>Entrypoint: run(..., --debug?)
  Entrypoint->>LoggerAPI: is_platform_debug_log_enabled()
  LoggerAPI-->>Entrypoint: bool
  alt --debug or platform debug
    Entrypoint->>PythonLogging: setLevel(DEBUG)
  else
    Entrypoint->>PythonLogging: setLevel(INFO)
  end

sequenceDiagram
  participant Code as Calling Code
  participant Logger as AirbyteLogFormatter
  participant Env as LOG_LEVEL
  participant Serializer as AirbyteMessageSerializer

  Code->>Logger: format(record)
  alt record.level == DEBUG
    Logger->>Env: is_platform_debug_log_enabled()
    alt platform debug = true
      Logger->>Serializer: build AirbyteMessage(LOG, level=DEBUG, message+extras)
      Serializer-->>Logger: json
    else
      Logger->>Serializer: build AirbyteMessage(LOG, level=DEBUG, simple fields)
      Serializer-->>Logger: json
    end
  else
    Logger->>Serializer: build AirbyteMessage(LOG, level=record.level, filtered message)
    Serializer-->>Logger: json
  end
  Logger-->>Code: structured JSON string

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Would you like me to suggest specific unit tests or examples to validate the new debug-vs-platform behavior, wdyt?

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch maxi297/debug-log-project

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (2)

airbyte_cdk/logger.py (1)
75-79: Default to Level.INFO enum, not string

airbyte_level falls back to the literal "INFO" string which is inconsistent with the Level enum used elsewhere and could cause subtle type issues. Can we default to Level.INFO (and optionally annotate the variable)?

Apply this diff?
-        airbyte_level = self.level_mapping.get(record.levelno, "INFO")
+        airbyte_level: Level = self.level_mapping.get(record.levelno, Level.INFO)
wdyt?
airbyte_cdk/entrypoint.py (1)
161-165: Annotate is_platform_debug_log_enabled with -> bool to satisfy mypy (and maybe capture its result in a local variable for clearer logs? wdyt?)

We’re currently calling an untyped helper, which is tripping mypy. Let’s update its signature and optionally surface the flag in entrypoint.py for a more descriptive debug message.

• In airbyte_cdk/logger.py (around line 43):
--- a/airbyte_cdk/logger.py
+++ b/airbyte_cdk/logger.py
@@ -43,7 +43,7 @@
 
-def is_platform_debug_log_enabled():
+def is_platform_debug_log_enabled() -> bool:
     return os.environ.get("LOG_LEVEL", "info").lower() == "debug"
• In airbyte_cdk/entrypoint.py (lines 161–165), optionally capture the platform flag and update the log text:
--- a/airbyte_cdk/entrypoint.py
+++ b/airbyte_cdk/entrypoint.py
@@ -161,7 +161,11 @@
-        if (hasattr(parsed_args, "debug") and parsed_args.debug) or is_platform_debug_log_enabled():
+        platform_debug = is_platform_debug_log_enabled()
+        if (hasattr(parsed_args, "debug") and parsed_args.debug) or platform_debug:
             self.logger.setLevel(logging.DEBUG)
             logger.setLevel(logging.DEBUG)
-            self.logger.debug("Debug logs enabled")
+            self.logger.debug(
+                "Debug logs enabled%s" % (" (platform override)" if platform_debug else "")
+            )
These changes should clear the mypy error and give a clearer “platform override” indicator when appropriate—wdyt?

🧹 Nitpick comments (3)

airbyte_cdk/logger.py (3)

92-93: Preserve structure of extras (avoid str-casting) and make JSON robust

extract_extra_args_from_record coerces all extras to str, which loses structure for request/response payloads that are dicts/lists. This contradicts the stated goal of having structured data locally. Could we keep native types and only fall back to str during serialization?

Apply these diffs?

-                debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
-                return filter_secrets(json.dumps(debug_dict))
+                debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
+                # Ensure non-serializable extras don't break logging
+                return filter_secrets(json.dumps(debug_dict, default=str))

-        default_attrs = logging.LogRecord("", 0, "", 0, None, None, None).__dict__.keys()
-        extra_keys = set(record.__dict__.keys()) - default_attrs
-        return {k: str(getattr(record, k)) for k in extra_keys if hasattr(record, k)}
+        default_attrs = logging.LogRecord("", 0, "", 0, None, None, None).__dict__.keys()
+        extra_keys = set(record.__dict__.keys()) - default_attrs
+        # Preserve native types; let JSON serialization handle them (with fallback to str at call site)
+        return {k: getattr(record, k) for k in extra_keys if hasattr(record, k)}

wdyt?

Also applies to: 103-111

80-91: Consistent and safe serialization for platform debug path

Minor: For consistency and performance, would you consider using orjson for the extras JSON in the platform path and adding a default=str fallback? Also, this avoids double filtering passes by filtering once on the final composed string.

Apply this diff?

-                message = f"{filter_secrets(record.getMessage())} ///\nExtra logs: {filter_secrets(json.dumps(extras))}"
+                extras_json = orjson.dumps(extras, default=str).decode()
+                message = filter_secrets(f"{record.getMessage()} ///\nExtra logs: {extras_json}")
                 log_message = AirbyteMessage(type=Type.LOG, log=AirbyteLogMessage(level=airbyte_level, message=message))
                 return orjson.dumps(AirbyteMessageSerializer.dump(log_message)).decode()

wdyt?

80-93: Optional hardening: mask common sensitive HTTP headers in extras

While filter_secrets helps for values sourced from config, responses/headers often contain credentials not present in the config (e.g., Set-Cookie, Authorization). Would you like to mask a small set of common sensitive headers before logging to reduce leakage risk?

As a follow-up, we could add this helper:

SENSITIVE_HEADER_NAMES = {"authorization", "proxy-authorization", "cookie", "set-cookie", "x-api-key", "x-auth-token"}

def _sanitize_http_extras(extras: Mapping[str, Any]) -> Mapping[str, Any]:
    if not isinstance(extras, dict):
        return extras
    sanitized = dict(extras)
    headers = sanitized.get("headers")
    if isinstance(headers, Mapping):
        sanitized_headers = {}
        for k, v in headers.items():
            if isinstance(k, str) and k.lower() in SENSITIVE_HEADER_NAMES:
                sanitized_headers[k] = "****"
            else:
                sanitized_headers[k] = v
        sanitized["headers"] = sanitized_headers
    return sanitized

…then call _sanitize_http_extras(extras) right before serializing in both DEBUG branches. Happy to wire this in if you want, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1c9049a and ab53ce6.

📒 Files selected for processing (2)

airbyte_cdk/entrypoint.py (2 hunks)
airbyte_cdk/logger.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

airbyte_cdk/entrypoint.py (4)

airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py (1)

logger (421-422)

airbyte_cdk/sources/streams/core.py (1)

logger (128-129)

unit_tests/test_logger.py (1)

logger (15-17)

airbyte_cdk/logger.py (2)

init_logger (47-52)

is_platform_debug_log_enabled (43-44)

airbyte_cdk/logger.py (2)

airbyte_cdk/utils/airbyte_secrets_utils.py (1)

filter_secrets (73-80)

airbyte_cdk/models/airbyte_protocol.py (1)

AirbyteMessage (79-88)

🪛 GitHub Actions: Linters

airbyte_cdk/entrypoint.py

[error] 161-161: mypy: Call to untyped function is_platform_debug_log_enabled in typed context [no-untyped-call]. Step: poetry run mypy --config-file mypy.ini airbyte_cdk

airbyte_cdk/logger.py

[error] 43-43: mypy: Function is missing a return type annotation [no-untyped-def]. Step: poetry run mypy --config-file mypy.ini airbyte_cdk

[error] 80-80: mypy: Call to untyped function is_platform_debug_log_enabled in typed context [no-untyped-call]. Step: poetry run mypy --config-file mypy.ini airbyte_cdk

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Check: source-pokeapi
GitHub Check: Check: source-shopify
GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-intercom
GitHub Check: Check: source-hardcoded-records
GitHub Check: Pytest (Fast)
GitHub Check: SDM Docker Image Build
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Analyze (python)

airbyte_cdk/logger.py

github-actions · 2025-08-18T18:36:48Z

PyTest Results (Fast)

3 696 tests ±0 3 685 ✅ ±0 6m 44s ⏱️ +15s
1 suites ±0 11 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit ec49c93. ± Comparison against base commit 1c9049a.

♻️ This comment has been updated with latest results.

github-actions · 2025-08-18T18:42:08Z

PyTest Results (Full)

3 699 tests ±0 3 688 ✅ ±0 11m 52s ⏱️ +18s
1 suites ±0 11 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit ec49c93. ± Comparison against base commit 1c9049a.

♻️ This comment has been updated with latest results.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

airbyte_cdk/logger.py (4)
43-45: Add a brief docstring to clarify contract and scope?

Since this function is a new public API, could we add a short docstring clarifying that it only checks the platform-controlled LOG_LEVEL env var (case-insensitive), and that CLI vs env precedence is handled by the caller (entrypoint)? It helps future readers avoid assuming it reads CLI flags. wdyt?
 def is_platform_debug_log_enabled() -> bool:
-    return os.environ.get("LOG_LEVEL", "info").lower() == "debug"
+    """True if platform-level debug logging is enabled via LOG_LEVEL=debug (case-insensitive).
+
+    Note: CLI vs env precedence is determined by the caller (e.g., entrypoint), not here.
+    """
+    return os.environ.get("LOG_LEVEL", "info").lower() == "debug"
75-79: Use Level.INFO for the default instead of the string "INFO"

The fallback currently returns a bare string, while the mapping returns Level enum values. For type consistency (and to avoid subtle comparisons), can we default to Level.INFO? wdyt?
-        airbyte_level = self.level_mapping.get(record.levelno, "INFO")
+        airbyte_level = self.level_mapping.get(record.levelno, Level.INFO)
80-96: Consider preserving structured extras and standardizing serialization

Two small improvements here to better align with the PR objective of structured request/response data and to keep serialization consistent:

Today, extract_extra_args_from_record() stringifies extras, so the "data" object under local DEBUG is actually strings (e.g., dicts become single-quoted strings), which defeats structured data. Would you be open to making extras JSON-safe but still structured (fallback to str only when necessary)?

Also, we mix json and orjson; can we standardize on orjson for speed and consistency?

Proposed tweaks in this block (switch to orjson), wdyt?
-                message = f"{filter_secrets(record.getMessage())} ///\nExtra logs: {filter_secrets(json.dumps(extras))}"
+                extras_json = orjson.dumps(extras).decode()
+                message = f"{filter_secrets(record.getMessage())} ///\nExtra logs: {filter_secrets(extras_json)}"
...
-                debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
-                return filter_secrets(json.dumps(debug_dict))
+                debug_dict = {"type": "DEBUG", "message": record.getMessage(), "data": extras}
+                return filter_secrets(orjson.dumps(debug_dict).decode())
And to actually keep extras structured, consider updating extract_extra_args_from_record() (outside this hunk) to only stringify values that aren’t JSON-serializable:
@staticmethod
def extract_extra_args_from_record(record: logging.LogRecord) -> Mapping[str, Any]:
    default_attrs = logging.LogRecord("", 0, "", 0, None, None, None).__dict__.keys()
    extra_keys = set(record.__dict__.keys()) - default_attrs

    def _json_safe(v: Any) -> Any:
        try:
            orjson.dumps(v)
            return v
        except TypeError:
            return str(v)

    return {k: _json_safe(getattr(record, k)) for k in extra_keys if hasattr(record, k)}
This should keep request/response bodies and headers as genuine objects in local DEBUG while still being robust. wdyt?

80-96: Guardrails for very large extras payloads?

HTTP bodies/headers can be big. Do you want to consider truncating extras (e.g., cap body length) in platform mode to keep single LOG messages manageable, or at least put a size cap and note truncation in the message? Happy to propose a small helper if helpful. wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ab53ce6 and ec49c93.

📒 Files selected for processing (1)

airbyte_cdk/logger.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

airbyte_cdk/logger.py (2)

airbyte_cdk/utils/airbyte_secrets_utils.py (1)

filter_secrets (73-80)

airbyte_cdk/models/airbyte_protocol.py (1)

AirbyteMessage (79-88)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Check: source-hardcoded-records
GitHub Check: Check: source-pokeapi
GitHub Check: Check: source-intercom
GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-shopify
GitHub Check: SDM Docker Image Build
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

🔇 Additional comments (3)

airbyte_cdk/logger.py (3)

7-7: Import looks good and scoped

os is used by is_platform_debug_log_enabled; no concerns here.

43-45: Nice: this resolves the mypy error noted earlier

The explicit -> bool return type should satisfy the linter and downstream call sites.

75-103: Verify entrypoint debug vs. LOG_LEVEL precedence

In airbyte_cdk/entrypoint.py (lines 161–164), debug is enabled when
(parsed_args.debug) OR is_platform_debug_log_enabled().

is_platform_debug_log_enabled() returns true only if LOG_LEVEL env var == "debug".

As written, CLI --debug still enables debug even if LOG_LEVEL!="debug", and vice versa.

Could you confirm this OR-based logic matches the contract that LOG_LEVEL should “take priority” when both are present? For example, if the intent is for the env var to override CLI in both directions, we may need to check LOG_LEVEL first and ignore parsed_args.debug when it’s set. wdyt?

debug log projects: enable debug log through environment variable and…

ab53ce6

… have HTTP request/responses being logged as airbyte messages

github-actions bot added the enhancement New feature or request label Aug 18, 2025

maxi297 requested review from brianjlai, frifriSF59 and tolik0 August 18, 2025 18:29

format and mypy

ec49c93

coderabbitai bot requested changes Aug 18, 2025

View reviewed changes

airbyte_cdk/logger.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

coderabbitai bot approved these changes Aug 18, 2025

View reviewed changes

brianjlai approved these changes Aug 18, 2025

View reviewed changes

maxi297 merged commit 3244eec into main Aug 18, 2025
26 checks passed

maxi297 deleted the maxi297/debug-log-project branch August 18, 2025 20:49

coderabbitai bot mentioned this pull request Aug 21, 2025

chore: remove declarative stream #707

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(debug log): enable debug log through environment variable and have HTTP request/responses being logged as airbyte messages #711

feat(debug log): enable debug log through environment variable and have HTTP request/responses being logged as airbyte messages #711

maxi297 commented Aug 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

coderabbitai bot commented Aug 18, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(debug log): enable debug log through environment variable and have HTTP request/responses being logged as airbyte messages #711

feat(debug log): enable debug log through environment variable and have HTTP request/responses being logged as airbyte messages #711

Conversation

maxi297 commented Aug 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

More context as to why we implemented this solution

Summary by CodeRabbit

Uh oh!

github-actions bot commented Aug 18, 2025

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

Helpful Resources

PR Slash Commands

Uh oh!

coderabbitai bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

github-actions bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Full)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxi297 commented Aug 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

github-actions bot commented Aug 18, 2025 •

edited

Loading

github-actions bot commented Aug 18, 2025 •

edited

Loading