Skip to content

'generator raised StopIteration' error when running 'randomstats' with multiple processes #377

@tparket

Description

@tparket

Hi,

First of all - thank you for your amazing work. pybedtools has been super useful for my research so far and I am very grateful.

I'm trying to run 'randomstats' with the following args:

results_dict = a.randomstats(b, iterations=1000, new=True, genome_fn=chromsizes_fn, processes=4, shuffle_kwargs={"chrom": True}, intersect_kwargs={"f": 1})

`---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in parallel_apply(self, iterations, func, func_args, func_kwargs, processes, _orig_pool)
2932 for it in range(iterations):
-> 2933 yield func(*func_args, **func_kwargs)
2934 raise StopIteration

~/.local/lib/python3.7/site-packages/pybedtools/stats.py in random_intersection(x, y, genome_fn, shuffle_kwargs, intersect_kwargs)
16 result = len(zz)
---> 17 helpers.close_or_delete(z, zz)
18 return result

~/.local/lib/python3.7/site-packages/pybedtools/helpers.py in close_or_delete(*args)
547 if hasattr(x.fn, "throw"):
--> 548 x.fn.throw(StopIteration)
549

StopIteration:

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
in

~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in randomstats(self, other, iterations, new, genome_fn, include_distribution, **kwargs)
2846 )
2847 distribution = self._randomintersection(
-> 2848 other, iterations=iterations, genome_fn=genome_fn, **kwargs
2849 )
2850

~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in _randomintersection(self, other, iterations, genome_fn, intersect_kwargs, _orig_pool, shuffle_kwargs, processes)
3038 ),
3039 processes=processes,
-> 3040 _orig_pool=_orig_pool,
3041 )
3042 )

RuntimeError: generator raised StopIteration`

The thing is that when I remove the 'processes' argument the 'randomstats' works just fine, but everytime I try to run it with 'processes' (even with a value of 1), I get the aformentioned error.

Other relevant data:

  • 'a' and 'b' are both bedtool objects generated from a df. A regular a.intersect(b, f=1) works perfectly.
  • 'chromsizes_fn' is the name of a genome file generated from a dict with:
    chromsizes_fn = pybedtools.chromsizes_to_file(chromsizes_dic, fn=temp_genome.name)
    I tried using both fn=False and fn=temp_genome.name
  • I tried to run it with both new=True and without it. It crashed on both tries.

I would really appreciate your help. I'm planning to run 'randomstats' on a large number of files, with at least 1000 iterations for each time, and being able to use multiprocessing will make it feasible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions