fix: properly handle multi-line secrets in secret masking #774

aaronsteers · 2025-09-26T04:54:33Z

Summary by CodeRabbit

Bug Fixes
- Improved CI secret masking to recursively find secrets in nested configurations (dicts/lists) and mask multi-line values line-by-line, preventing accidental exposure while preserving non-secret output and CI gating behavior.
Tests
- Added comprehensive unit tests validating masking across flat, nested, list-based, multi-line and edge-case configurations.

github-actions · 2025-09-26T04:54:46Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@aj/ci/fix/handle-multi-line-secrets-in-secrets-masking#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch aj/ci/fix/handle-multi-line-secrets-in-secrets-masking

Helpful Resources

CDK API Reference

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
/poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copilot

Pull Request Overview

This PR fixes secret masking functionality to properly handle multi-line secrets like private keys in GitHub CI environments. The change refactors the masking logic to process secrets line-by-line instead of as single strings.

Extracted secret value masking logic into a dedicated helper function
Added line-by-line processing for multi-line secrets using splitlines()
Fixed control flow logic by changing nested if to elif for better structure

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

airbyte_cdk/cli/airbyte_cdk/_secrets.py

coderabbitai · 2025-09-26T05:00:02Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds two helpers to recursively locate and mask secret values in configs: _print_ci_secret_mask_for_string(secret: str) masks multi-line strings line-by-line; _print_ci_secret_mask_for_value(value: Any) traverses dicts/lists and delegates to the string masker. _print_ci_secrets_masks_for_config now uses the recursive helper for secrets.

Changes

Cohort / File(s)	Summary
Secret masking helpers & config masking update `airbyte_cdk/cli/airbyte_cdk/_secrets.py`	Added `_print_ci_secret_mask_for_string(secret: str)` and `_print_ci_secret_mask_for_value(value: Any)`; updated `_print_ci_secrets_masks_for_config` to recurse into lists/dicts and delegate masking to the new helpers, emitting `::add-mask::` per line for scalar strings and preserving existing CI gating and error handling.
Unit tests for secret masking `unit_tests/cli/airbyte_cdk/test_secret_masks.py`	Added parameterized tests covering flat/nested dicts, lists, multi-line secrets, deep nesting, and edge cases; tests mock the internal mask-printing function and assert expected masked-string invocations (order-insensitive).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor CI as CI Runner
  participant CFG as _print_ci_secrets_masks_for_config
  participant VAL as _print_ci_secret_mask_for_value
  participant STR as _print_ci_secret_mask_for_string
  participant GHA as GitHub Actions Mask

  CI->>CFG: provide config, spec, connector_root
  CFG->>CFG: iterate properties/keys
  alt property marked as secret
    CFG->>VAL: mask(value)
    alt value is dict
      VAL->>VAL: recurse over key/value pairs
    else value is list
      VAL->>VAL: recurse over items
    else scalar/string
      VAL->>STR: mask(string)
      STR->>STR: split into lines
      loop per line
        STR->>GHA: ::add-mask::<line>
      end
    end
  else non-secret or no value
    CFG->>CFG: skip masking
  end
  CFG-->>CI: done

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Would you like a short note in docs or tests highlighting the multi-line masking behavior, wdyt?

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and concisely describes the core change of this pull request, which is fixing the handling of multi-line secrets in the secret masking logic, matching the implemented code updates and new helper functions.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch aj/ci/fix/handle-multi-line-secrets-in-secrets-masking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
467-479: Mask whole serialized containers too?

I noticed we only traverse nested members and emit masks for the leaf values, so composite secrets (dict/list) never have their serialized form masked even though the docstring says we handle the JSON stringified version. If an integrator logs the whole container in one go (e.g. json.dumps(secret_value)), punctuation and even delimiter-only lines would remain unmasked unless every leaf value happens to be split exactly the same way. Mind adding an extra mask pass for the serialized container (after the child recursion) so we cover that path as well, wdyt?
     if isinstance(value, dict):
         # For nested dicts, we also need to mask the json-stringified version
         for v in value.values():
             _print_ci_secret_mask_for_value(v)
 
+        json_mask = json.dumps(value, separators=(",", ":"))
+        for line in json_mask.splitlines() or [json_mask]:
+            print(f"::add-mask::{line}")
         return
 
     if isinstance(value, list):
         # For lists, we also need to mask the json-stringified version
         for list_item in value:
             _print_ci_secret_mask_for_value(list_item)
 
+        json_mask = json.dumps(value, separators=(",", ":"))
+        for line in json_mask.splitlines() or [json_mask]:
+            print(f"::add-mask::{line}")
         return

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b10ee9 and 7e93206.

📒 Files selected for processing (1)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1 hunks)

🧰 Additional context used

🪛 GitHub Check: CodeQL

airbyte_cdk/cli/airbyte_cdk/_secrets.py

[failure] 486-486: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Check: source-intercom
GitHub Check: Check: source-pokeapi
GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-shopify
GitHub Check: Check: source-hardcoded-records
GitHub Check: SDM Docker Image Build
GitHub Check: Manifest Server Docker Image Build
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.12, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.13, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)

github-actions · 2025-09-26T05:01:44Z

PyTest Results (Fast)

3 771 tests +9 3 759 ✅ +9 6m 21s ⏱️ +5s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 5a7ce56. ± Comparison against base commit 5b10ee9.

♻️ This comment has been updated with latest results.

github-actions · 2025-09-26T05:06:39Z

PyTest Results (Full)

3 774 tests +9 3 762 ✅ +9 10m 57s ⏱️ -3s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 5a7ce56. ± Comparison against base commit 5b10ee9.

♻️ This comment has been updated with latest results.

airbyte_cdk/cli/airbyte_cdk/_secrets.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e93206 and f36b41a.

📒 Files selected for processing (2)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1 hunks)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)

_print_ci_secrets_masks_for_config (497-513)

🪛 GitHub Actions: Linters

unit_tests/cli/airbyte_cdk/test_secret_masks.py

[error] 1-6: Ruff format failed due to formatting changes. 1 file would be reformatted; an extra blank line was added after the module docstring. Run 'ruff format' to fix the style issues.

🪛 GitHub Check: CodeQL

airbyte_cdk/cli/airbyte_cdk/_secrets.py

[failure] 469-469: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Check: source-intercom
GitHub Check: Check: source-pokeapi
GitHub Check: Check: source-hardcoded-records
GitHub Check: Check: destination-motherduck
GitHub Check: Check: source-shopify
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.13, Ubuntu)
GitHub Check: Pytest (All, Python 3.12, Ubuntu)
GitHub Check: Manifest Server Docker Image Build
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: SDM Docker Image Build

🔇 Additional comments (1)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
477-488: Restore JSON-level masking on containers

By skipping the json.dumps branch we no longer register a mask for the serialized dict/list itself. That regresses prior behavior and leaks booleans or nulls when the config is emitted via json.dumps because we now only register "True"/"False" while the log prints true/false. Could we restore the container-level mask before recursing to keep the old coverage, wdyt?

Apply this diff to fix the regression:
     if isinstance(value, dict):
-        # For nested dicts, we call recursively on each value
-        for v in value.values():
+        _print_ci_secret_mask_for_string(json.dumps(value))
+        for v in value.values():
             _print_ci_secret_mask_for_value(v)
 
         return
 
     if isinstance(value, list):
-        # For lists, we call recursively on each list item
-        for list_item in value:
+        _print_ci_secret_mask_for_string(json.dumps(value))
+        for list_item in value:
             _print_ci_secret_mask_for_value(list_item)
 
         return

unit_tests/cli/airbyte_cdk/test_secret_masks.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
462-471: Multi-line masking is correct; add a defensive CI guard to avoid accidental leaks and quiet CodeQL?

This helper works as intended. Given the CodeQL “clear-text logging” alert on Line 470 and that this function could be called out of band in the future, would you add a CI check here as defense-in-depth (we already gate upstream) and to document intent, wdyt?
 def _print_ci_secret_mask_for_string(secret: str) -> None:
     """Print GitHub CI mask for a single secret string.

     We expect single-line secrets, but we also handle the case where the secret contains newlines.
     For multi-line secrets, we must print a secret mask for each line separately.
     """
+    # Defense-in-depth: only emit mask commands in CI environments
+    if not os.environ.get("CI"):
+        logger.debug("Skipping ::add-mask:: outside CI context")
+        return
     for line in secret.splitlines():
         if line.strip():  # Skip empty lines
             print(f"::add-mask::{line!s}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f36b41a and acee4a8.

📒 Files selected for processing (1)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1 hunks)

🧰 Additional context used

🪛 GitHub Check: CodeQL

airbyte_cdk/cli/airbyte_cdk/_secrets.py

[failure] 470-470: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Check: source-shopify
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (All, Python 3.12, Ubuntu)
GitHub Check: Pytest (All, Python 3.13, Ubuntu)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)

🔇 Additional comments (2)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (2)
473-497: Remove double splitting and (optionally) also mask container JSON for dict/list values?

Minor: You’re splitting lines here and again inside _print_ci_secret_mask_for_string. You could delegate once to reduce duplication.

Optional/compat: Previous behavior masked json.dumps(value) for dicts/lists. If logs might include the entire container serialized, adding that back preserves coverage. It won’t hurt and is cheap. Wdyt?
 def _print_ci_secret_mask_for_value(value: Any) -> None:
     """Print GitHub CI mask for a single secret value.

     Call this function for any values identified as secrets, regardless of type.
     """
     if isinstance(value, dict):
-        # For nested dicts, we call recursively on each value
+        # Also mask the JSON-stringified container in case logs print the whole object
+        print(f"::add-mask::{json.dumps(value)}")
+        # Then recurse into values
         for v in value.values():
             _print_ci_secret_mask_for_value(v)

         return

     if isinstance(value, list):
-        # For lists, we call recursively on each list item
+        # Also mask the JSON-stringified container
+        print(f"::add-mask::{json.dumps(value)}")
+        # Then recurse into items
         for list_item in value:
             _print_ci_secret_mask_for_value(list_item)

         return

-    # For any other types besides dict and list, we convert to string and mask each line
-    # separately to handle multi-line secrets (e.g. private keys).
-    for line in str(value).splitlines():
-        if line.strip():  # Skip empty lines
-            _print_ci_secret_mask_for_string(line)
+    # For scalars, delegate to the string masker (handles multi-line)
+    _print_ci_secret_mask_for_string(str(value))
508-516: LGTM on recursive traversal and delegation.

The switch to delegating to _print_ci_secret_mask_for_value for matched keys and recursing into nested structures looks correct and aligns with the PR goal to cover multi-line and nested cases.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)

34-35: Broaden the config type hint?

Since a few parametrized cases feed lists into config, the current dict annotation is lying to type checkers. Could we widen it to something like Mapping[str, Any] | Sequence[Any] (or even Any) so the signature matches the data we pass, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between acee4a8 and 5a7ce56.

📒 Files selected for processing (1)

unit_tests/cli/airbyte_cdk/test_secret_masks.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)

airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)

_print_ci_secrets_masks_for_config (499-515)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Check: source-shopify

🔇 Additional comments (1)

unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)

2-6: Drop the blank line after the module docstring

Ruff is still failing because it wants the first import immediately after the docstring (same thing the bot flagged earlier). Could we pull the import up and rerun ruff format so CI goes green, wdyt?

airbyte_cdk/cli/airbyte_cdk/_secrets.py

dbgold17

discussed online, approving!

fix: properly handle multi-line secrets in secret masking

7e93206

Copilot AI review requested due to automatic review settings September 26, 2025 04:54

github-actions bot added bug Something isn't working security labels Sep 26, 2025

Copilot AI reviewed Sep 26, 2025

View reviewed changes

airbyte_cdk/cli/airbyte_cdk/_secrets.py Show resolved Hide resolved

airbyte_cdk/cli/airbyte_cdk/_secrets.py Show resolved Hide resolved

airbyte_cdk/cli/airbyte_cdk/_secrets.py Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Sep 26, 2025

View reviewed changes

airbyte_cdk/cli/airbyte_cdk/_secrets.py Fixed Show fixed Hide fixed

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

coderabbitai bot approved these changes Sep 26, 2025

View reviewed changes

fix: add handling for multi-line ci secrets mask (with tests)

f36b41a

github-advanced-security bot found potential problems Sep 26, 2025

View reviewed changes

airbyte_cdk/cli/airbyte_cdk/_secrets.py Fixed Show fixed Hide fixed

skip empty lines

acee4a8

github-advanced-security bot found potential problems Sep 26, 2025

View reviewed changes

airbyte_cdk/cli/airbyte_cdk/_secrets.py Dismissed Show dismissed Hide dismissed

coderabbitai bot requested changes Sep 26, 2025

View reviewed changes

unit_tests/cli/airbyte_cdk/test_secret_masks.py Show resolved Hide resolved

aaronsteers requested a review from jpefaur September 26, 2025 05:31

lint fix

5a7ce56

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

jpefaur reviewed Sep 26, 2025

View reviewed changes

airbyte_cdk/cli/airbyte_cdk/_secrets.py Show resolved Hide resolved

airbyte_cdk/cli/airbyte_cdk/_secrets.py Show resolved Hide resolved

dbgold17 reviewed Sep 26, 2025

View reviewed changes

airbyte_cdk/cli/airbyte_cdk/_secrets.py Show resolved Hide resolved

dbgold17 reviewed Sep 26, 2025

View reviewed changes

airbyte_cdk/cli/airbyte_cdk/_secrets.py Show resolved Hide resolved

dbgold17 reviewed Sep 26, 2025

View reviewed changes

airbyte_cdk/cli/airbyte_cdk/_secrets.py Show resolved Hide resolved

dbgold17 approved these changes Sep 26, 2025

View reviewed changes

coderabbitai bot approved these changes Sep 26, 2025

View reviewed changes

aaronsteers merged commit 90fdec5 into main Sep 26, 2025
28 of 29 checks passed

aaronsteers deleted the aj/ci/fix/handle-multi-line-secrets-in-secrets-masking branch September 26, 2025 19:43

fix: properly handle multi-line secrets in secret masking #774

fix: properly handle multi-line secrets in secret masking #774

Conversation

aaronsteers commented Sep 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Sep 26, 2025

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

Helpful Resources

PR Slash Commands

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

github-actions bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Full)

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dbgold17 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aaronsteers commented Sep 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 26, 2025 •

edited

Loading

github-actions bot commented Sep 26, 2025 •

edited

Loading

github-actions bot commented Sep 26, 2025 •

edited

Loading