Skip to content

Range of balance correction #950

@oganesson76

Description

@oganesson76

Dear Author, thank you for providing such a useful tool.I have two sample cool files: sdata_2000.cool and rdata_2000.cool. Following your tutorial's instructions, I first normalized them: hicNormalize -m sdata_2000.cool rdata_2000.cool --normalize smallest -o sdata_2000_norm.cool rdata_2000_norm.cool --setToZeroThreshold 1. Then I corrected them using KR correction: hicCorrectMatrix correct --matrix sdata_2000_norm.cool --correctionMethod KR --outFileName sdata_2000_KR.cool, hicCorrectMatrix correct --matrix rdata_2000_norm.cool --correctionMethod KR --outFileName rdata_2000_KR.cool. Since I prefer using cooltools, I examined the weight columns in these cool files.
$ cooler dump --table bins sdata_2000_KR.cool
chr1 0 2000 1.11513
chr1 2000 4000 1.21425
chr1 4000 6000 0.900315
chr1 6000 8000 0.827184
chr1 8000 10000 0.763178
I am curious as to why the values in these weight columns are all clustered around 1.As I previously normalised using Cooler Balance (default being ICE), the values generated in the weight column were quite low, typically ranging from 0.01 to 0.1. For example, sdata_2000_balance.cool is a file for performing cooler balane.
$ cooler dump --table bins sdata_2000_balance.cool
chr1 0 2000 0.00240566
chr1 2000 4000 0.0025555
chr1 4000 6000 0.00190824
chr1 6000 8000 0.00178005
Subsequently, I employed the ICE method on hicCorrectMatrix for further validation.
hicCorrectMatrix diagnostic_plot --matrix sdata_500_norm.cool -o sdata_500_norm_diag.png
hicCorrectMatrix correct -m sdata_500_norm.cool --correctionMethod ICE --filterThreshold -2.8 4 -o sdata_500_ice.cool
$ cooler dump --table bins sdata_500_ice.cool
chr1 0 500 0.952762
chr1 500 1000 0.938808
chr1 1000 1500 0.722369
chr1 1500 2000 0.813357
chr1 2000 2500 1.16379
I've noticed that the values in this weight column remain close to 1, even when the cooler balance also uses ice. Why then do the two generated weight values differ so significantly?Below is the content of the CoolTools tutorial(https://cooltools.readthedocs.io/en/latest/notebooks/viz.html), where you can see that its weight values are all quite small.

Image

Subsequently, I used Python to examine it further.
The sdata_2000_KR.cool file generated by using the KR method:

Image

It can be seen that the sum of the values in each row is approximately 235,175.But I recall that after KR correction, shouldn't the sum of the rows or columns be close to 1? Is my situation normal? Does it align with the output from HiCExplorer?

The sdata_500_ice.cool file generated by using the ice method:

Image

Perform cooler balance on the sdata_2000_norm.cool file generated by hicNormalize, producing sdata_2000_balance.cool.

Image This is a situation I'm quite familiar with; you can see that the sum of all rows is around 1. I should like to ask the following questions: 1. Are the results generated using hicCorrectMatrix and hicNormalize (whether KR or ice) as described above reasonable? They simply haven't been scaled to the 0-1 range? If this is correct, may I read this matrix using `mat = clr.matrix(balance=True)[:]`, particularly with the `balance=True` parameter? 2. May I use hicNormalize to Normalise the cool files, then proceed directly to correction using cooler balance? I am unsure whether you are familiar with cooler balance, so this question may be somewhat presumptuous.Or do you consider this calibration method of adjusting values to fall between 0 and 1 unreasonable?

Apologies for the rather lengthy list of questions. Thank you for taking the time to answer them amidst your busy schedule.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions