-
Notifications
You must be signed in to change notification settings - Fork 30
fix: properly handle multi-line secrets in secret masking #774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: properly handle multi-line secrets in secret masking #774
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@aj/ci/fix/handle-multi-line-secrets-in-secrets-masking#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch aj/ci/fix/handle-multi-line-secrets-in-secrets-masking Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes secret masking functionality to properly handle multi-line secrets like private keys in GitHub CI environments. The change refactors the masking logic to process secrets line-by-line instead of as single strings.
- Extracted secret value masking logic into a dedicated helper function
- Added line-by-line processing for multi-line secrets using
splitlines()
- Fixed control flow logic by changing nested
if
toelif
for better structure
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds two helpers to recursively locate and mask secret values in configs: Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor CI as CI Runner
participant CFG as _print_ci_secrets_masks_for_config
participant VAL as _print_ci_secret_mask_for_value
participant STR as _print_ci_secret_mask_for_string
participant GHA as GitHub Actions Mask
CI->>CFG: provide config, spec, connector_root
CFG->>CFG: iterate properties/keys
alt property marked as secret
CFG->>VAL: mask(value)
alt value is dict
VAL->>VAL: recurse over key/value pairs
else value is list
VAL->>VAL: recurse over items
else scalar/string
VAL->>STR: mask(string)
STR->>STR: split into lines
loop per line
STR->>GHA: ::add-mask::<line>
end
end
else non-secret or no value
CFG->>CFG: skip masking
end
CFG-->>CI: done
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Would you like a short note in docs or tests highlighting the multi-line masking behavior, wdyt? Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
467-479
: Mask whole serialized containers too?I noticed we only traverse nested members and emit masks for the leaf values, so composite secrets (dict/list) never have their serialized form masked even though the docstring says we handle the JSON stringified version. If an integrator logs the whole container in one go (e.g.
json.dumps(secret_value)
), punctuation and even delimiter-only lines would remain unmasked unless every leaf value happens to be split exactly the same way. Mind adding an extra mask pass for the serialized container (after the child recursion) so we cover that path as well, wdyt?if isinstance(value, dict): # For nested dicts, we also need to mask the json-stringified version for v in value.values(): _print_ci_secret_mask_for_value(v) + json_mask = json.dumps(value, separators=(",", ":")) + for line in json_mask.splitlines() or [json_mask]: + print(f"::add-mask::{line}") return if isinstance(value, list): # For lists, we also need to mask the json-stringified version for list_item in value: _print_ci_secret_mask_for_value(list_item) + json_mask = json.dumps(value, separators=(",", ":")) + for line in json_mask.splitlines() or [json_mask]: + print(f"::add-mask::{line}") return
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeQL
airbyte_cdk/cli/airbyte_cdk/_secrets.py
[failure] 486-486: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/cli/airbyte_cdk/_secrets.py
(1 hunks)unit_tests/cli/airbyte_cdk/test_secret_masks.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
_print_ci_secrets_masks_for_config
(497-513)
🪛 GitHub Actions: Linters
unit_tests/cli/airbyte_cdk/test_secret_masks.py
[error] 1-6: Ruff format failed due to formatting changes. 1 file would be reformatted; an extra blank line was added after the module docstring. Run 'ruff format' to fix the style issues.
🪛 GitHub Check: CodeQL
airbyte_cdk/cli/airbyte_cdk/_secrets.py
[failure] 469-469: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
477-488
: Restore JSON-level masking on containersBy skipping the
json.dumps
branch we no longer register a mask for the serialized dict/list itself. That regresses prior behavior and leaks booleans or nulls when the config is emitted viajson.dumps
because we now only register"True"
/"False"
while the log printstrue
/false
. Could we restore the container-level mask before recursing to keep the old coverage, wdyt?Apply this diff to fix the regression:
if isinstance(value, dict): - # For nested dicts, we call recursively on each value - for v in value.values(): + _print_ci_secret_mask_for_string(json.dumps(value)) + for v in value.values(): _print_ci_secret_mask_for_value(v) return if isinstance(value, list): - # For lists, we call recursively on each list item - for list_item in value: + _print_ci_secret_mask_for_string(json.dumps(value)) + for list_item in value: _print_ci_secret_mask_for_value(list_item) return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
462-471
: Multi-line masking is correct; add a defensive CI guard to avoid accidental leaks and quiet CodeQL?This helper works as intended. Given the CodeQL “clear-text logging” alert on Line 470 and that this function could be called out of band in the future, would you add a CI check here as defense-in-depth (we already gate upstream) and to document intent, wdyt?
def _print_ci_secret_mask_for_string(secret: str) -> None: """Print GitHub CI mask for a single secret string. We expect single-line secrets, but we also handle the case where the secret contains newlines. For multi-line secrets, we must print a secret mask for each line separately. """ + # Defense-in-depth: only emit mask commands in CI environments + if not os.environ.get("CI"): + logger.debug("Skipping ::add-mask:: outside CI context") + return for line in secret.splitlines(): if line.strip(): # Skip empty lines print(f"::add-mask::{line!s}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeQL
airbyte_cdk/cli/airbyte_cdk/_secrets.py
[failure] 470-470: Clear-text logging of sensitive information
This expression logs sensitive data (secret) as clear text.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (2)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (2)
473-497
: Remove double splitting and (optionally) also mask container JSON for dict/list values?
- Minor: You’re splitting lines here and again inside
_print_ci_secret_mask_for_string
. You could delegate once to reduce duplication.- Optional/compat: Previous behavior masked
json.dumps(value)
for dicts/lists. If logs might include the entire container serialized, adding that back preserves coverage. It won’t hurt and is cheap. Wdyt?def _print_ci_secret_mask_for_value(value: Any) -> None: """Print GitHub CI mask for a single secret value. Call this function for any values identified as secrets, regardless of type. """ if isinstance(value, dict): - # For nested dicts, we call recursively on each value + # Also mask the JSON-stringified container in case logs print the whole object + print(f"::add-mask::{json.dumps(value)}") + # Then recurse into values for v in value.values(): _print_ci_secret_mask_for_value(v) return if isinstance(value, list): - # For lists, we call recursively on each list item + # Also mask the JSON-stringified container + print(f"::add-mask::{json.dumps(value)}") + # Then recurse into items for list_item in value: _print_ci_secret_mask_for_value(list_item) return - # For any other types besides dict and list, we convert to string and mask each line - # separately to handle multi-line secrets (e.g. private keys). - for line in str(value).splitlines(): - if line.strip(): # Skip empty lines - _print_ci_secret_mask_for_string(line) + # For scalars, delegate to the string masker (handles multi-line) + _print_ci_secret_mask_for_string(str(value))
508-516
: LGTM on recursive traversal and delegation.The switch to delegating to
_print_ci_secret_mask_for_value
for matched keys and recursing into nested structures looks correct and aligns with the PR goal to cover multi-line and nested cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)
34-35
: Broaden theconfig
type hint?Since a few parametrized cases feed lists into
config
, the currentdict
annotation is lying to type checkers. Could we widen it to something likeMapping[str, Any] | Sequence[Any]
(or evenAny
) so the signature matches the data we pass, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
_print_ci_secrets_masks_for_config
(499-515)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Check: source-shopify
🔇 Additional comments (1)
unit_tests/cli/airbyte_cdk/test_secret_masks.py (1)
2-6
: Drop the blank line after the module docstringRuff is still failing because it wants the first import immediately after the docstring (same thing the bot flagged earlier). Could we pull the import up and rerun
ruff format
so CI goes green, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed online, approving!
Summary by CodeRabbit
Bug Fixes
Tests