fix: Handle account-scoped ADLS properties in Iceberg IO config conversion#6359
fix: Handle account-scoped ADLS properties in Iceberg IO config conversion#6359firecast wants to merge 2 commits intoEventual-Inc:mainfrom
Conversation
…rsion When using Polaris catalog with ADLS vended credentials, PyIceberg returns account-scoped property keys like `adls.sas-token.<account>.dfs.core.windows.net` instead of plain `adls.sas-token`. These were silently ignored, causing auth failures. Add a helper that falls back to scanning for scoped keys and extracts the account name from the key suffix. Closes Eventual-Inc#6357 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR fixes a bug where account-scoped ADLS property keys vended by Polaris catalog (e.g., The implementation correctly handles the Polaris use case with Confidence Score: 4/5
Last reviewed commit: 00c477c |
…s.*) Address review feedback: the scoped-key fallback now scans all candidate prefixes (both adls.* and adlfs.*), not just the base key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| def get_adls_property_value(base_key: str, *alternate_keys: str) -> tuple[Any | None, str | None]: | ||
| """Try exact key match first, then fall back to account-scoped keys (e.g. adls.sas-token.<account>.dfs.core.windows.net).""" | ||
| for key in (base_key, *alternate_keys): | ||
| if value := props.get(key): | ||
| nonlocal any_props_set | ||
| any_props_set = True | ||
| return value, None | ||
| for candidate_key in (base_key, *alternate_keys): | ||
| prefix = f"{candidate_key}." | ||
| for key, value in props.items(): | ||
| if key.startswith(prefix) and value: | ||
| any_props_set = True | ||
| account_name = key[len(prefix) :].split(".")[0] | ||
| return value, account_name | ||
| return None, None | ||
|
|
||
| sas_token, scoped_account = get_adls_property_value("adls.sas-token", "adlfs.sas-token") | ||
| access_key, scoped_account_from_key = get_adls_property_value("adls.account-key", "adlfs.account-key") | ||
| scoped_account = scoped_account or scoped_account_from_key | ||
|
|
||
| storage_account = get_first_property_value("adls.account-name", "adlfs.account-name") | ||
| if storage_account is None and scoped_account is not None: | ||
| storage_account = scoped_account | ||
| any_props_set = True |
There was a problem hiding this comment.
Thanks for the PR. On one hand, I was thinking about things like the prefix being too greedy or not having any pattern/regex matching on the account id, but then I realized this is just config parsing.
I think this is a pragmatic approach
- Check for the account-name first.
- If there's a storage account, then use that like:
# we can internalize the "adls" and "adlfs" into the get_azure_property function
azure_account_name = get_azure_property("account-name")
# pass an optional scope suffix
azure_sas_token = get_azure_property("sas-token", scope=azure_account_name)- If there's no explicit account, you could go super lenient with a prefix match like
azure_sas_token = get_azure_property("sas-token", prefix_match=True)But I understand this doesn't extract the account_name. The prefix stuff is tough because we aren't asserting that any of the account_name values even match.
What are your thoughts here?
There was a problem hiding this comment.
Good points! The API shape you suggest is cleaner. A couple thoughts:
Chicken-and-egg: In the Polaris case that motivated this PR, adls.account-name is never set explicitly — the account is only available as part of the scoped key suffix (e.g., adls.sas-token.<account>.dfs.core.windows.net). So we can't look up account-name first and then use it as a scope.
Account consistency: You raise a good point that we don't validate the account names match across different scoped keys. I could add a consistency check — if sas-token and account-key both have scoped keys, assert their extracted accounts match (or warn/error).
Refactored API: I could restructure to something like:
def get_azure_property(prop_name, scope=None, prefix_match=False):
...
Where scope does exact scoped lookup (when account is known) and prefix_match=True does the greedy fallback + account extraction. This gives us both paths — explicit scope when we have it, and discovery when we don't.
Want me to refactor to that API shape? The behavior would be the same, but the intent at each call site would be clearer.
Summary
adls.sas-token.<account>.dfs.core.windows.netinstead of plainadls.sas-token. These were silently ignored, causing auth failures.get_adls_property_value()helper that tries exact key match first, then falls back to scanning for scoped keys and extracts the account name from the suffix.storage_accountwhenadls.account-nameisn't explicitly set.Closes #6357
Test plan
test_exact_adls_properties— existing exact-match behavior preservedtest_scoped_sas_token_extracts_account— scoped SAS token + account extractiontest_scoped_account_key_extracts_account— scoped account key + account extractiontest_exact_key_takes_precedence_over_scoped— exact key wins over scopedtest_explicit_account_name_not_overridden_by_scoped— explicit account name preservedtest_empty_props_returns_none— no props returns Nonetest_s3_properties_still_work— S3 config unaffected🤖 Generated with Claude Code