Skip to content

DFS with n_jobs > 1 fails #2767

@simon-forb

Description

@simon-forb

DFS with more than a single job does not work and leads to the following error (see below). My Python linter also confirms that self._state of the class Future in distributed.client.py may be None which causes the error.

2025-09-02 12:09:03 INFO     Remove client Client-d9874ed3-87e4-11f0-9634-7c10c942c95f
2025-09-02 12:09:03 INFO     Received 'close-stream' from tcp://127.0.0.1:41922; closing.
2025-09-02 12:09:03 INFO     Remove client Client-d9874ed3-87e4-11f0-9634-7c10c942c95f
2025-09-02 12:09:03 INFO     Close client connection: Client-d9874ed3-87e4-11f0-9634-7c10c942c95f
2025-09-02 12:09:03 INFO     Retire worker addresses (stimulus_id='retire-workers-1756807743.3685405') (0, 1)
2025-09-02 12:09:03 INFO     Closing Nanny at 'tcp://127.0.0.1:38085'. Reason: nanny-close
2025-09-02 12:09:03 INFO     Nanny asking worker to close. Reason: nanny-close
2025-09-02 12:09:03 INFO     Closing Nanny at 'tcp://127.0.0.1:37619'. Reason: nanny-close
2025-09-02 12:09:03 INFO     Nanny asking worker to close. Reason: nanny-close
2025-09-02 12:09:03 INFO     Received 'close-stream' from tcp://127.0.0.1:41918; closing.
2025-09-02 12:09:03 INFO     Remove worker addr: tcp://127.0.0.1:41559 name: 1 (stimulus_id='handle-worker-cleanup-1756807743.3698847')
2025-09-02 12:09:03 INFO     Received 'close-stream' from tcp://127.0.0.1:41908; closing.
2025-09-02 12:09:03 INFO     Remove worker addr: tcp://127.0.0.1:33377 name: 0 (stimulus_id='handle-worker-cleanup-1756807743.370498')
2025-09-02 12:09:03 ERROR    Removing worker 'tcp://127.0.0.1:33377' caused the cluster to lose scattered data, which can't be recovered: {'EntitySet-f6e3babe773ddc25334b143e0a0ee2ca'} (stimulus_id='handle-worker-cleanup-1756807743.370498')
2025-09-02 12:09:03 INFO     Lost all workers
2025-09-02 12:09:03 INFO     Nanny at 'tcp://127.0.0.1:38085' closed.
2025-09-02 12:09:03 INFO     Nanny at 'tcp://127.0.0.1:37619' closed.
2025-09-02 12:09:03 INFO     Closing scheduler. Reason: unknown
2025-09-02 12:09:03 INFO     Scheduler closing all comms
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/simon/code/tabmodel-for-relbench/src/tabmodel_for_relbench/feature_eng/__main__.py", line 65, in <module>
    main()
  File "/home/simon/code/tabmodel-for-relbench/src/tabmodel_for_relbench/feature_eng/__main__.py", line 40, in main
    train_feat_mat = construct_feature_matrix(train_es, train_time_col)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/code/tabmodel-for-relbench/src/tabmodel_for_relbench/feat_matrix_constructor.py", line 15, in construct_feature_matrix
    train_feat_matrix, _ = dfs(
                           ^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/featuretools/utils/entry_point.py", line 39, in function_wrapper
    raise e
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/featuretools/utils/entry_point.py", line 32, in function_wrapper
    return_value = func(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/featuretools/synthesis/dfs.py", line 283, in dfs
    feature_matrix = calculate_feature_matrix(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/featuretools/computational_backends/calculate_feature_matrix.py", line 298, in calculate_feature_matrix
    feature_matrix = parallel_calculate_chunks(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/featuretools/computational_backends/calculate_feature_matrix.py", line 747, in parallel_calculate_chunks
    client.who_has([Future(es_token)]).get(es_token, []),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/distributed/client.py", line 4193, in who_has
    futures = self.futures_of(futures)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/distributed/client.py", line 4905, in futures_of
    return futures_of(futures, client=self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/distributed/client.py", line 6121, in futures_of
    if not f.cancelled():
           ^^^^^^^^^^^^^
  File "/home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/distributed/client.py", line 514, in cancelled
    return self._state.status == "cancelled"
           ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'status'

Code Sample, a copy-pastable example to reproduce your bug.

    train_feat_matrix, _ = dfs(
        entityset=es,
        target_dataframe_name="entity_table",
        max_depth=get_runtime_config().max_depth,
        cutoff_time=cutoff_times_df,
        cutoff_time_in_index=True,
        n_jobs=2,
    )

Output of featuretools.show_info()

Featuretools version: 1.31.0
Featuretools installation directory: /home/simon/.cache/pypoetry/virtualenvs/tabmodel-for-relbench-7r7VglWt-py3.12/lib/python3.12/site-packages/featuretools

SYSTEM INFO

python: 3.12.3.final.0
python-bits: 64
OS: Linux
OS-release: 6.8.0-79-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

INSTALLED VERSIONS

numpy: 2.3.2
pandas: 2.3.2
tqdm: 4.67.1
cloudpickle: 3.1.1
dask: 2025.7.0
distributed: 2025.7.0
psutil: 7.0.0
pip: 25.1.1
setuptools: 80.9.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions