Skip to content

Azure Blob Storage ingestion error #15312

@petros94

Description

@petros94

Describe the bug
When configuring the ABS source as described in documentation, the following error is shown during ingestion run:

Failed to configure the source (abs): 'dict' object has no attribute 'is_abs'
Stack trace:

~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': 'bf6564a0-e1c3-4edd-a453-aa8d444afa47',
 'infos': ['2025-11-17 12:00:30.528159 INFO: Starting execution for task with name=RUN_INGEST',
           "2025-11-17 12:01:06.185933 INFO: Failed to execute 'datahub ingest', exit code 1",
           '2025-11-17 12:01:06.190365 INFO: Caught exception EXECUTING task_id=bf6564a0-e1c3-4edd-a453-aa8d444afa47, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/home/datahub/.venv/lib/python3.11/site-packages/acryl/executor/execution/default_executor.py", line 153, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete\n'
           '    return future.result()\n'
           '           ^^^^^^^^^^^^^^^\n'
           '  File "/home/datahub/.venv/lib/python3.11/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 324, in execute\n'
           '    await self._execute_with_debug(validated_args, ctx, exec_id)\n'
           '  File "/home/datahub/.venv/lib/python3.11/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 386, in '
           '_execute_with_debug\n'
           '    self._handle_subprocess_completion(\n'
           '  File "/home/datahub/.venv/lib/python3.11/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 633, in '
           '_handle_subprocess_completion\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~
Setting up venv for plugin 'abs' with version '1.3.1.2'
Creating dynamic venv - this may take a few minutes...
Creating new venv: /tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867
+/usr/bin/uv venv --python /home/datahub/.venv/bin/python3 /tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867
Using CPython 3.11.13 interpreter at: /home/datahub/.venv/bin/python3
Creating virtual environment at: datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867
Installing requirements from: /tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/requirements.txt
+cat /tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/requirements.txt
# Generated at 2025-11-17T12:00:30.679054+00:00
acryl-datahub[abs]==1.3.1.2
+/usr/bin/uv pip install -r /tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/requirements.txt
Using Python 3.11.13 environment at: datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867
Resolved 101 packages in 12.90s
Downloading botocore (13.5MiB)
Downloading pandas (11.6MiB)
Downloading acryl-datahub (2.4MiB)
Downloading aiohttp (1.7MiB)
Downloading numpy (13.9MiB)
Downloading pyarrow (42.9MiB)
Downloading sqlalchemy (3.2MiB)
Downloading cryptography (4.1MiB)
Downloading pydantic-core (1.8MiB)
   Building pyspark==3.5.7
   Building unicodecsv==0.14.1
   Building linear-tsv==1.1.0
      Built linear-tsv==1.1.0
 Downloading aiohttp
 Downloading pydantic-core
      Built unicodecsv==0.14.1
 Downloading acryl-datahub
 Downloading sqlalchemy
 Downloading cryptography
 Downloading numpy
 Downloading botocore
 Downloading pandas
 Downloading pyarrow
      Built pyspark==3.5.7
Prepared 101 packages in 14.44s
Installed 101 packages in 245ms
 + acryl-datahub==1.3.1.2
 + aiohappyeyeballs==2.6.1
 + aiohttp==3.13.2
 + aiosignal==1.4.0
 + annotated-types==0.7.0
 + anyio==4.11.0
 + asgiref==3.10.0
 + attrs==25.4.0
 + avro==1.12.1
 + avro-gen3==0.7.16
 + azure-common==1.1.28
 + azure-core==1.36.0
 + azure-identity==1.25.1
 + azure-storage-blob==12.27.1
 + azure-storage-file-datalake==12.22.0
 + boto3==1.40.74
 + botocore==1.40.74
 + bracex==2.6
 + cached-property==2.0.1
 + cachetools==6.2.2
 + certifi==2025.11.12
 + cffi==2.0.0
 + chardet==5.2.0
 + charset-normalizer==3.4.4
 + click==8.3.1
 + click-default-group==1.2.4
 + click-spinner==0.1.10
 + cryptography==46.0.3
 + dataflows-tabulator==1.54.3
 + deprecated==1.3.1
 + docker==7.1.0
 + et-xmlfile==2.0.0
 + expandvars==1.1.2
 + frozenlist==1.8.0
 + greenlet==3.2.4
 + h11==0.16.0
 + httpcore==1.0.9
 + httpx==0.28.1
 + humanfriendly==10.0
 + idna==3.11
 + ijson==3.4.0.post0
 + isodate==0.7.2
 + jmespath==1.0.1
 + jsonlines==4.0.0
 + jsonref==1.1.0
 + jsonschema==4.25.1
 + jsonschema-specifications==2025.9.1
 + linear-tsv==1.1.0
 + mixpanel==5.0.0
 + more-itertools==10.8.0
 + msal==1.34.0
 + msal-extensions==1.3.1
 + multidict==6.7.0
 + mypy-extensions==1.1.0
 + numpy==2.3.5
 + openpyxl==3.1.5
 + packaging==25.0
 + pandas==2.3.3
 + parse==1.20.2
 + progressbar2==4.5.0
 + propcache==0.4.1
 + psutil==7.1.3
 + py4j==0.10.9.7
 + pyarrow==22.0.0
 + pycparser==2.23
 + pydantic==2.12.4
 + pydantic-core==2.41.5
 + pydeequ==1.5.0
 + pyjwt==2.10.1
 + pyspark==3.5.7
 + python-dateutil==2.9.0.post0
 + python-utils==3.9.1
 + pytz==2025.2
 + pyyaml==6.0.3
 + referencing==0.37.0
 + requests==2.32.5
 + requests-file==3.0.1
 + rfc3986==2.0.0
 + rpds-py==0.29.0
 + ruamel-yaml==0.18.16
 + ruamel-yaml-clib==0.2.15
 + s3transfer==0.14.0
 + sentry-sdk==2.44.0
 + six==1.17.0
 + smart-open==7.5.0
 + sniffio==1.3.1
 + sqlalchemy==2.0.44
 + tableschema==1.21.0
 + tabulate==0.9.0
 + toml==0.10.2
 + typing-extensions==4.15.0
 + typing-inspect==0.9.0
 + typing-inspection==0.4.2
 + tzdata==2025.2
 + ujson==5.11.0
 + unicodecsv==0.14.1
 + urllib3==2.5.0
 + wcmatch==10.1
 + wrapt==2.0.1
 + xlrd==2.0.2
 + yarl==1.22.0
✅ Venv ready at: /tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867
This version of datahub supports report-to functionality
+ exec datahub ingest run -c /tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/recipe.yml --report-to /tmp/datahub/logs/bf6564a0-e1c3-4edd-a453-aa8d444afa47/artifacts/ingestion_report.json
2025-11-17 12:01:03,073 [datahub.masking.bootstrap] INFO: Initializing secret masking infrastructure
2025-11-17 12:01:03,073 [datahub.masking.masking_filter] INFO: Installed SecretMaskingFilter on root logger
2025-11-17 12:01:03,073 [datahub.masking.masking_filter] DEBUG: Wrapped sys.stdout with StreamMaskingWrapper
2025-11-17 12:01:03,073 [datahub.masking.masking_filter] DEBUG: Wrapped sys.stderr with StreamMaskingWrapper
2025-11-17 12:01:03,073 [datahub.masking.masking_filter] DEBUG: Updated 4 logging handlers to use wrapped streams
2025-11-17 12:01:03,073 [datahub.masking.bootstrap] DEBUG: Installed custom exception hook for secret masking
2025-11-17 12:01:03,074 [datahub.masking.bootstrap] INFO: Secret masking infrastructure initialized successfully. Secrets will be registered automatically as they are loaded.
[2025-11-17 12:01:03,074] INFO     {datahub.cli.ingest_cli:155} - DataHub CLI version: 1.3.1.2
[2025-11-17 12:01:03,075] INFO     {datahub.ingestion.run.pipeline:202} - No sink configured, attempting to use the default datahub-rest sink.
[2025-11-17 12:01:03,105] INFO     {datahub.ingestion.run.pipeline:225} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://datahub-gms:8080
[2025-11-17 12:01:04,827] ERROR    {datahub.entrypoints:249} - Command failed: Failed to configure the source (abs): 'dict' object has no attribute 'is_abs'
Traceback (most recent call last):
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/ingestion/run/pipeline.py", line 78, in _add_init_error_context
    yield
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/ingestion/run/pipeline.py", line 247, in __init__
    source_class.create(
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/ingestion/source/abs/source.py", line 167, in create
    config = DataLakeSourceConfig.model_validate(config_dict)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/pydantic/main.py", line 716, in model_validate
    return cls.__pydantic_validator__.validate_python(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/configuration/common.py", line 156, in _track_nesting_context
    instance = handler(data)
               ^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/ingestion/source/abs/config.py", line 117, in check_path_specs_and_infer_platform
    guessed_platforms = set(
                        ^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/ingestion/source/abs/config.py", line 118, in <genexpr>
    "abs" if path_spec.is_abs else "file" for path_spec in path_specs
             ^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'is_abs'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/entrypoints.py", line 236, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/click/core.py", line 1485, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/click/core.py", line 1406, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/click/core.py", line 1873, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/click/core.py", line 1873, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/click/core.py", line 1269, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/click/core.py", line 824, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/telemetry/telemetry.py", line 490, in wrapper
    raise e
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/telemetry/telemetry.py", line 438, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/telemetry/telemetry.py", line 490, in wrapper
    raise e
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/telemetry/telemetry.py", line 438, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/upgrade/upgrade.py", line 491, in async_wrapper
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/cli/ingest_cli.py", line 176, in run
    pipeline = Pipeline.create(
               ^^^^^^^^^^^^^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/ingestion/run/pipeline.py", line 430, in create
    return cls(
           ^^^^
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/ingestion/run/pipeline.py", line 243, in __init__
    with _add_init_error_context(
  File "/usr/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/tmp/datahub/ingest/bf6564a0-e1c3-4edd-a453-aa8d444afa47/venv-abs-50e815d22ff70867/lib/python3.11/site-packages/datahub/ingestion/run/pipeline.py", line 82, in _add_init_error_context
    raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (abs): 'dict' object has no attribute 'is_abs'

To Reproduce
Steps to reproduce the behavior:

  1. Configure ABS
  2. Run ingestion
  3. Check failed logs

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: MacOS (Docker)
  • Browser Safari
  • Version 18.5

Additional context
Add any other context about the problem here.

Metadata

Metadata

Labels

bugBug report

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions