-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Hi 👋
Analogous to concurrent.futures.ProcessPoolExecutor's max_tasks_per_child (added in cp3.11) and multiprocessing.pool.Pool's maxtasksperchild (added in cp3.2) keyword arguments, it would be great to be able to control after how many completed tasks a loky subprocess is flushed and replaced with a new subprocess.
Our dask workers are currently consistently facing loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.:
Most likely caused by upstream memory leaks in lxml, hitting our 60GiB mem limit over time due to running the same loky pool subprocesses over 5+ hours. Periodically flushing the workers (spawn start method) will most likely fix these errors.
Many thanks!