Short Variants HPRC Variants <= 3bp Track Settings
 
HPRC VCF variants filtered for items size <= 3bp

Track collection: Short Variants

+  Description
+  All tracks in this collection (3)

Display mode:      Duplicate track

Haplotype sorting display

Enable Haplotype sorting display
Haplotype sorting order:
using middle variant in viewing window as anchor.
If this mode is selected and genotypes are phased or homozygous, then each genotype is split into two independent haplotypes. These local haplotypes are clustered by similarity around a central variant. Haplotypes are reordered for display using the clustering tree, which is drawn in the left label area. Local haplotype blocks can often be identified using this display.
To anchor the sorting to a particular variant, click on the variant in the genome browser, and then click on the 'Use this variant' button on the next page.
using the order in which samples appear in the underlying VCF file
Haplotype clustering tree leaf shape:
draw branches whose samples are all identical as <
draw branches whose samples are all identical as [
Allele coloring scheme:
reference alleles invisible, alternate alleles in black
reference alleles in blue, alternate alleles in red
first base of allele (A = red, C = blue, G = green, T = magenta)
Haplotype sorting display height:

Filters

Exclude variants with Quality/confidence score (QUAL) score less than
Exclude variants with these FILTER values:
PASS (All filters passed)
Minimum minor allele frequency (if INFO column includes AF or AC+AN):


Display data as a density graph:

VCF configuration help

Data schema/format description and download
Source data version: August 2023
Assembly: Human Dec. 2013 (GRCh38/hg38)

Description

This track shows short nucleotide variants of a few base pairs when aligning HPRC genomes to the hg38 reference assembly. The alignment was made with the Minigraph-cactus approach described in the references below.

There are three subtracks in this superTrack:

  1. All short variants up to 50bp, without any length filter
  2. All short variants <= 3 bp long
  3. All short variants > 3 bp long

VCF Decomposition from HPRC Pangenome Resources Github: "The Raw VCF files contain a site for each bubble in the graph. Nested bubbles will result in overlapping sites. The nesting relationships are denoted with the PS (parent snarl), LV (level) and AT (allele traversal) tags and need to be taken into account when interpreting the VCF. Alternatively, you can use the 'Decomposed VCFs' which have been normalized by using vcfbub to 'pop' bubbles with alleles larger than 100k and vcfwave to realign each alt (script). Note that in order to reproduce the PanGenie analyses from the papers, you should instead use the PanGenie HPRC Workflow. This workflow has a CHM13 branch to use when working with that reference.

The exact tools and commands used to produce the VCFs are given here."

Display Conventions and Configuration

The Name of the items are the pair of node labels that denote the site's location in the graph, with the '>' and '<' denoting the forward and reverse orientation of the node. Mouseover on items in "squish" and "pack" modes shows the items Name and Genotypes. Mouseover on items in "full" mode shows Alleles.

Methods

The Minigraph-Cactus HPRC v1.0 graph was converted to VCF using vg deconstruct. This result was further postprocessed using vcfbub to flatten nested sites then vcfwave to normalize by realigning alt alleles to the reference. All steps are described in Hickey et al 2023. The postprocessing command lines and data can be found on Github. Finally, the resulting VCF was filtered by length and split into two VCFs using a cutoff of 3bp.

Credits

Thanks to Glenn Hickey for providing the HAL file from the HPRC project and for making these VCFs from them.

References

Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649; DOI: 10.1038/s41586-020-2871-y

Glenn Hickey, Jean Monlong, Jana Ebler, Adam M Novak, Jordan M Eizenga, Yan Gao; Human Pangenome Reference Consortium; Tobias Marschall, Heng Li, Benedict Paten Pangenome graph construction from genome alignments with Minigraph-Cactus. Nature Biotechnology. 2023 May 10. doi: 10.1038/s41587-023-01793-w. PMID: 37165083; DOI: 10.1038/s41587-023-01793-w

Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011 Sep;21(9):1512-28. PMID: 21665927; PMC: PMC3166836; DOI: 10.1101/gr.123356.111

Wen-Wei Liao, Mobin Asri, Jana Ebler, ...et al, Heng Lin, Benedict Paten A draft human pangenome reference. Nature. 2023 May;617(7960):312-324. PMID: 37165242; PMC: PMC1017212; DOI: 10.1038/s41586-023-05896-x