Description
These tracks show the regions unique to the T2T-CHM13 v2.0 assembly compared to the GRCh38/hg38 and GRCh37/hg19 reference assemblies.
Methods
Converting a chain file to the PAF format
We used the `to_paf.py` script from chaintools (https://doi.org/10.5281/zenodo.6342391, v0.1) to convert the v1_nfLO chains to the PAF format.
Obtaining unique regions
We used the follwing commands to obtain the regions unique to GRCh38/hg38 and GRCh37/hg19 in the BED format.
cut -f 1,3,4 grch38-chm13v2.paf \
| bedtools sort -i - -g chm13v2.0.fasta.fai \
| bedtools merge \
| bedtools complement -g chm13v2.0.fasta.fai -i - \
| bedtools merge \
> T2T-CHM13v2.0_unique_regions_hg38.bed
cut -f 1,3,4 hg19-chm13v2.paf | bedtools sort -i - -g chm13v2.0.fasta.fai \
| bedtools merge \
| bedtools complement -g chm13v2.0.fasta.fai -i - \
| bedtools merge \
> T2T-CHM13v2.0_unique__regions_hg19.bed
Credits
The unique region annotations were generated by Nae-Chyun Chen<[email protected]>
and Mitchell Vollger<[email protected]>
References
Nurk S, Koren S, Rhie A, Rautiainen M, et al. The complete sequence of a human genome. bioRxiv, 2021.
|