Skip to content

Tips for running on a haplotype-resolved reference genome #942

@marija-kra

Description

@marija-kra

Hi,

We just assembled a genome de novo and managed to resolve haplotypes (most chromosomes have two copies, one has four). The organism's genome is a bit peculiar - it has some hemizygous regions, and some of the chromosome copies are rather substantially different, to the point where it wasn't trivial to determine its karyotype.

I would like to also run HiCExplorer using this genome; however, when I tried doing it 'as is', I get most of the data being discarded (see below). I presume this is due to non-unique mapping due to both haplotypes being in the reference. Also, I suspect we get loads of interchromosomal contacts for this reason too.

How would you suggest dealing with this? Is there anything that can be done beyond:

  • playing around with minimum mapping quality
  • running HiCExplorer with hap1 and hap2 separately (ideally we would prefer not doing this)

Thanks!


I used bwa mem for mapping with the settings recommended in your docs and ran hicBuildMatrix with default settings.

Sequenced reads 274841122
Min rest. site distance 300
Max library insert size 1000

count (percentage w.r.t. total sequenced reads)

Pairs mappable, unique and high quality 16024497 (5.83)
Hi-C contacts 6920723 (2.52)
One mate unmapped 14912468 (5.43)
One mate not unique 173998712 (63.31)
Low mapping quality 69905445 (25.43)

count (percentage w.r.t. mappable, unique and high quality pairs)

dangling end GATC (restriction sequence GATC) 1616 (0.01)
dangling end AATT (restriction sequence AATT) 5365 (0.03)
self ligation (removed) 3053633 (19.06)
One mate not close to rest site 0 (0.00)
same fragment 1794660 (11.20)
self circle 190154 (1.19)
duplicated pairs 4248500 (26.51)

count (percentage w.r.t. total valid pairs used)

inter chromosomal 3341476 (48.28)
Intra short range (< 20kb) 1398180 (20.20)
Intra long range (>= 20kb) 2181067 (31.52)
Read pair type: inward pairs 802255 (11.59)
Read pair type: outward pairs 1022407 (14.77)
Read pair type: left pairs 886001 (12.80)
Read pair type: right pairs 868584 (12.55)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions