Skip to content

Conversation

@mohe1linux
Copy link

The gprofiler2 module outputs were missing the actual gene names/IDs in the expected columns, making it impossible to identify which specific genes contribute to pathway enrichment.

Expected behavior:

*.gprofiler2.all_enriched_pathways.tsv should contain an intersection column with gene names/IDs
*.gprofiler2.[source].sub_enriched_pathways.tsv should contain actual gene names in the DE_genes_names column

Actual behavior:

all_enriched_pathways.tsv file lacks the intersection column entirely
sub_enriched_pathways.tsv files have DE_genes_names column containing numeric values (same as DE_genes) instead of gene names

Now with the fix

Enable g:Profiler evidence codes so the intersection column is emitted.
Populate sub-tables with both Ensembl IDs and symbols:
DE_genes_ids = originalintersection IDs DE_genes_names = gene symbols (from DE table where available, else gprofiler2::gconvert), fallback to IDs if unmapped

nextflow run . -profile test,docker
--gprofiler2_run true
--gprofiler2_organism mmusculus
--gprofiler2_evcodes true
--outdir test_gprofile_symbols

  • *all_enriched_pathways.tsv now contains intersection.
  • *sub_enriched_pathways.tsv now has DE_genes_ids and DE_genes_names (symbols present; IDs used as fallback).

Notes

  • No changes to output file names besides adding DE_genes_ids in sub tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

good first issue Good for newcomers

Projects

Development

Successfully merging this pull request may close these issues.

1 participant