Skip to content

fix: Handle account-scoped ADLS properties in Iceberg IO config conversion#6359

Open
firecast wants to merge 2 commits intoEventual-Inc:mainfrom
firecast:fix/adls-scoped-iceberg-properties
Open

fix: Handle account-scoped ADLS properties in Iceberg IO config conversion#6359
firecast wants to merge 2 commits intoEventual-Inc:mainfrom
firecast:fix/adls-scoped-iceberg-properties

Conversation

@firecast
Copy link
Copy Markdown

@firecast firecast commented Mar 6, 2026

Summary

  • When using Polaris catalog with ADLS vended credentials, PyIceberg returns account-scoped property keys like adls.sas-token.<account>.dfs.core.windows.net instead of plain adls.sas-token. These were silently ignored, causing auth failures.
  • Added get_adls_property_value() helper that tries exact key match first, then falls back to scanning for scoped keys and extracts the account name from the suffix.
  • Uses the scoped account name as fallback for storage_account when adls.account-name isn't explicitly set.

Closes #6357

Test plan

  • test_exact_adls_properties — existing exact-match behavior preserved
  • test_scoped_sas_token_extracts_account — scoped SAS token + account extraction
  • test_scoped_account_key_extracts_account — scoped account key + account extraction
  • test_exact_key_takes_precedence_over_scoped — exact key wins over scoped
  • test_explicit_account_name_not_overridden_by_scoped — explicit account name preserved
  • test_empty_props_returns_none — no props returns None
  • test_s3_properties_still_work — S3 config unaffected

🤖 Generated with Claude Code

…rsion

When using Polaris catalog with ADLS vended credentials, PyIceberg returns
account-scoped property keys like `adls.sas-token.<account>.dfs.core.windows.net`
instead of plain `adls.sas-token`. These were silently ignored, causing auth failures.

Add a helper that falls back to scanning for scoped keys and extracts the account
name from the key suffix.

Closes Eventual-Inc#6357

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the fix label Mar 6, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR fixes a bug where account-scoped ADLS property keys vended by Polaris catalog (e.g., adls.sas-token.<account>.dfs.core.windows.net) were silently ignored during Iceberg IO config conversion, causing authentication failures. The fix introduces a nested helper get_adls_property_value inside _convert_iceberg_file_io_properties_to_io_config that first tries an exact key match and then falls back to a prefix scan for scoped keys, also extracting the account name from the key suffix so storage_account can be inferred when adls.account-name is absent.

The implementation correctly handles the Polaris use case with adls.* namespaced keys. However, the scoped-key fallback currently only scans the base_key prefix (e.g., adls.sas-token.*), not alternate_keys prefixes (e.g., adlfs.sas-token.*). This means if a future deployment uses scoped adlfs.*-style keys, they could be silently missed by the fallback. While not an issue for the current Polaris integration, extending the fallback to cover alternate-key prefixes (or adding a clarifying comment about the intentional restriction) would prevent similar gaps.

Confidence Score: 4/5

  • Safe to merge—fixes the stated Polaris ADLS credential bug with comprehensive tests and correct handling of the target use case. One logic gap identified: scoped fallback doesn't cover alternate-key namespaces, but this doesn't affect current deployments.
  • The PR successfully resolves the core issue: account-scoped adls.* keys are now properly extracted and inferred for storage_account. Tests are comprehensive and all passing. The one identified improvement—extending the scoped-key fallback to cover adlfs.* alternate-key prefixes (or adding a clarifying comment)—is a potential edge case for future deployments using that namespace, not a blocker for the current Polaris integration. The code is functionally sound and well-tested for its intended scope.
  • daft/io/iceberg/_iceberg.py — consider extending scoped-key fallback to cover alternate-key prefixes, or document why restriction to base_key is intentional.

Last reviewed commit: 00c477c

Comment thread daft/io/iceberg/_iceberg.py Outdated
…s.*)

Address review feedback: the scoped-key fallback now scans all candidate
prefixes (both adls.* and adlfs.*), not just the base key.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@madvart madvart requested a review from rchowell March 31, 2026 20:27
Comment on lines +30 to +53
def get_adls_property_value(base_key: str, *alternate_keys: str) -> tuple[Any | None, str | None]:
"""Try exact key match first, then fall back to account-scoped keys (e.g. adls.sas-token.<account>.dfs.core.windows.net)."""
for key in (base_key, *alternate_keys):
if value := props.get(key):
nonlocal any_props_set
any_props_set = True
return value, None
for candidate_key in (base_key, *alternate_keys):
prefix = f"{candidate_key}."
for key, value in props.items():
if key.startswith(prefix) and value:
any_props_set = True
account_name = key[len(prefix) :].split(".")[0]
return value, account_name
return None, None

sas_token, scoped_account = get_adls_property_value("adls.sas-token", "adlfs.sas-token")
access_key, scoped_account_from_key = get_adls_property_value("adls.account-key", "adlfs.account-key")
scoped_account = scoped_account or scoped_account_from_key

storage_account = get_first_property_value("adls.account-name", "adlfs.account-name")
if storage_account is None and scoped_account is not None:
storage_account = scoped_account
any_props_set = True
Copy link
Copy Markdown
Contributor

@rchowell rchowell Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. On one hand, I was thinking about things like the prefix being too greedy or not having any pattern/regex matching on the account id, but then I realized this is just config parsing.

I think this is a pragmatic approach

  1. Check for the account-name first.
  2. If there's a storage account, then use that like:
# we can internalize the "adls" and "adlfs" into the get_azure_property function
azure_account_name = get_azure_property("account-name")

# pass an optional scope suffix
azure_sas_token = get_azure_property("sas-token", scope=azure_account_name)
  1. If there's no explicit account, you could go super lenient with a prefix match like
azure_sas_token = get_azure_property("sas-token", prefix_match=True)

But I understand this doesn't extract the account_name. The prefix stuff is tough because we aren't asserting that any of the account_name values even match.

What are your thoughts here?

Copy link
Copy Markdown
Author

@firecast firecast Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! The API shape you suggest is cleaner. A couple thoughts:

Chicken-and-egg: In the Polaris case that motivated this PR, adls.account-name is never set explicitly — the account is only available as part of the scoped key suffix (e.g., adls.sas-token.<account>.dfs.core.windows.net). So we can't look up account-name first and then use it as a scope.

Account consistency: You raise a good point that we don't validate the account names match across different scoped keys. I could add a consistency check — if sas-token and account-key both have scoped keys, assert their extracted accounts match (or warn/error).

Refactored API: I could restructure to something like:

def get_azure_property(prop_name, scope=None, prefix_match=False):
    ...

Where scope does exact scoped lookup (when account is known) and prefix_match=True does the greedy fallback + account extraction. This gives us both paths — explicit scope when we have it, and discovery when we don't.

Want me to refactor to that API shape? The behavior would be the same, but the intent at each call site would be clearer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Account-scoped ADLS properties from Iceberg/Polaris are silently ignored

2 participants