-
Notifications
You must be signed in to change notification settings - Fork 912
Description
Is your feature request related to a problem? Please describe
Howdy folks
I'd like to discuss a common convention I see with nf-core modules, where a process has two separate inputs for e.g. a sample and an index. Here are a few examples I found:
So you have two inputs:
input:
tuple val(meta), path(reads)
tuple val(meta2), path(index)This convention works fine as long as you have a single index, in which case you can provide the index as a value channel and it will be "broadcast" to every sample, basically an implicit cross product.
But what if you have multiple indices? The process inputs are not really set up to handle this, so you have to do a bit of hacking:
ch_samples = Channel.of( /* ... */ )
ch_indices = Channel.of( /* ... */ )
ch_inputs = ch_samples.combine(ch_indices)
ch_multi = ch_inputs.multiMap { it ->
samples: it[0..2],
indices: it[2..4]
}
PROC(ch_multi.samples, ch_multi.indices)But now you're wondering if multiMap preserves the order of its inputs, and that question leads down a deep rabbit hole. I have now led multiple people through that rabbit hole, and every time it leads me back to the original problem of multiple inputs. It's the reason why I added this note to the docs.
It's not always nf-core modules that are the cause, just "someone else's process that I'm trying to re-use". In any case, I'm hoping that I can broach the subject and spread this best practice to the community. Are people aware of this issue? Have you debated over this convention in the past? If so I would prefer to build on whatever previous discussions were had.
By the way, here's how I think you SHOULD do it:
process PROC {
input:
tuple val(meta), path(reads), val(meta2), path(index)
// ...
}
workflow {
ch_samples = Channel.of( /* ... */ )
ch_indices = Channel.of( /* ... */ )
PROC( ch_samples.combine(ch_indices) )
}Easy! It works in all cases (one-to-one, many-to-one, many-to-many), and it doesn't require you play fast and loose with your dataflow
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status