dbSNP Archive Common SNPs(150) Track Settings

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Simple Nucleotide Polymorphisms (dbSNP 150) Found in >= 1% of Samples

Track collection: dbSNP Track Archive

Description

All tracks in this collection (29)

dbSNP 153	Short Genetic Variants from dbSNP release 153
Common SNPs(151)	Simple Nucleotide Polymorphisms (dbSNP 151) Found in >= 1% of Samples
All SNPs(151)	Simple Nucleotide Polymorphisms (dbSNP 151)
Flagged SNPs(151)	Simple Nucleotide Polymorphisms (dbSNP 151) Flagged by dbSNP as Clinically Assoc
Mult. SNPs(151)	Simple Nucleotide Polymorphisms (dbSNP 151) That Map to Multiple Genomic Loci
Mult. SNPs(150)	Simple Nucleotide Polymorphisms (dbSNP 150) That Map to Multiple Genomic Loci
All SNPs(150)	Simple Nucleotide Polymorphisms (dbSNP 150)
Common SNPs(150)	Simple Nucleotide Polymorphisms (dbSNP 150) Found in >= 1% of Samples
Flagged SNPs(150)	Simple Nucleotide Polymorphisms (dbSNP 150) Flagged by dbSNP as Clinically Assoc
Mult. SNPs(147)	Simple Nucleotide Polymorphisms (dbSNP 147) That Map to Multiple Genomic Loci
Flagged SNPs(147)	Simple Nucleotide Polymorphisms (dbSNP 147) Flagged by dbSNP as Clinically Assoc
Common SNPs(147)	Simple Nucleotide Polymorphisms (dbSNP 147) Found in >= 1% of Samples
All SNPs(147)	Simple Nucleotide Polymorphisms (dbSNP 147)
Mult. SNPs(146)	Simple Nucleotide Polymorphisms (dbSNP 146) That Map to Multiple Genomic Loci
Flagged SNPs(146)	Simple Nucleotide Polymorphisms (dbSNP 146) Flagged by dbSNP as Clinically Assoc
Common SNPs(146)	Simple Nucleotide Polymorphisms (dbSNP 146) Found in >= 1% of Samples
All SNPs(146)	Simple Nucleotide Polymorphisms (dbSNP 146)
Mult. SNPs(144)	Simple Nucleotide Polymorphisms (dbSNP 144) That Map to Multiple Genomic Loci
Flagged SNPs(144)	Simple Nucleotide Polymorphisms (dbSNP 144) Flagged by dbSNP as Clinically Assoc
Common SNPs(144)	Simple Nucleotide Polymorphisms (dbSNP 144) Found in >= 1% of Samples
All SNPs(144)	Simple Nucleotide Polymorphisms (dbSNP 144)
Mult. SNPs(142)	Simple Nucleotide Polymorphisms (dbSNP 142) That Map to Multiple Genomic Loci
Flagged SNPs(142)	Simple Nucleotide Polymorphisms (dbSNP 142) Flagged by dbSNP as Clinically Assoc
Common SNPs(142)	Simple Nucleotide Polymorphisms (dbSNP 142) Found in >= 1% of Samples
All SNPs(142)	Simple Nucleotide Polymorphisms (dbSNP 142)
Mult. SNPs(141)	Simple Nucleotide Polymorphisms (dbSNP 141) That Map to Multiple Genomic Loci
Flagged SNPs(141)	Simple Nucleotide Polymorphisms (dbSNP 141) Flagged by dbSNP as Clinically Assoc
Common SNPs(141)	Simple Nucleotide Polymorphisms (dbSNP 141) Found in >= 1% of Samples
All SNPs(141)	Simple Nucleotide Polymorphisms (dbSNP 141)

Display mode: Duplicate track

Include Chimp state and observed human alleles in name:
(If enabled, chimp allele is displayed first, then '>', then human alleles).
Show alleles on strand of reference genome reported by dbSNP:

Use Gene Tracks for Functional Annotation

On details page, show function and coding differences relative to:

GENCODE V47	NCBI RefSeq: RefSeq All	NCBI RefSeq: RefSeq Curated	NCBI RefSeq: RefSeq Predicted
NCBI RefSeq: UCSC RefSeq	NCBI RefSeq: RefSeq Select and MANE	NCBI RefSeq: RefSeq HGMD	NCBI RefSeq: RefSeq Historical
All GENCODE V47: Genes: Basic	All GENCODE V46: Genes: Basic	All GENCODE V47: Genes: Comprehensive	All GENCODE V45: Genes: Basic
All GENCODE V46: Genes: Comprehensive	All GENCODE V47: Genes: Pseudogenes	All GENCODE V44: Genes: Basic	All GENCODE V45: Genes: Comprehensive
All GENCODE V46: Genes: Pseudogenes	All GENCODE V43: Genes: Basic	All GENCODE V44: Genes: Comprehensive	All GENCODE V45: Genes: Pseudogenes
All GENCODE V47: PolyA	All GENCODE V42: Genes: Basic	All GENCODE V43: Genes: Comprehensive	All GENCODE V44: Genes: Pseudogenes
All GENCODE V46: PolyA	All GENCODE V42: Genes: Comprehensive	All GENCODE V43: Genes: Pseudogenes	All GENCODE V41: Genes: Basic
All GENCODE V45: PolyA	All GENCODE V42: Genes: Pseudogenes	All GENCODE V40: Genes: Basic	All GENCODE V41: Genes: Comprehensive
All GENCODE V44: PolyA	All GENCODE V39: Genes: Basic	All GENCODE V40: Genes: Comprehensive	All GENCODE V43: PolyA
All GENCODE V41: Genes: Pseudogenes	All GENCODE V38: Genes: Basic	All GENCODE V39: Genes: Comprehensive	All GENCODE V42: PolyA
All GENCODE V40: Genes: Pseudogenes	All GENCODE V41: 2-Way: 2-way Pseudogenes	All GENCODE V37: Genes: Basic	All GENCODE V38: Genes: Comprehensive
All GENCODE V39: Genes: Pseudogenes	All GENCODE V40: 2-Way: 2-way Pseudogenes	All GENCODE V41: PolyA	All GENCODE V36: Genes: Basic
All GENCODE V37: Genes: Comprehensive	All GENCODE V38: Genes: Pseudogenes	All GENCODE V39: 2-Way: 2-way Pseudogenes	All GENCODE V40: PolyA
All GENCODE V35: Genes: Basic	All GENCODE V36: Genes: Comprehensive	All GENCODE V37: Genes: Pseudogenes	All GENCODE V38: 2-Way: 2-way Pseudogenes
All GENCODE V39: PolyA	All GENCODE V35: Genes: Comprehensive	All GENCODE V36: Genes: Pseudogenes	All GENCODE V37: 2-Way: 2-way Pseudogenes
All GENCODE V34: Genes: Basic	All GENCODE V38: PolyA	All GENCODE V35: Genes: Pseudogenes	All GENCODE V36: 2-Way: 2-way Pseudogenes
All GENCODE V33: Genes: Basic	All GENCODE V34: Genes: Comprehensive	All GENCODE V37: PolyA	All GENCODE V35: 2-Way: 2-way Pseudogenes
All GENCODE V32: Genes: Basic	All GENCODE V33: Genes: Comprehensive	All GENCODE V36: PolyA	All GENCODE V34: Genes: Pseudogenes
All GENCODE V31: Genes: Basic	All GENCODE V32: Genes: Comprehensive	All GENCODE V35: PolyA	All GENCODE V33: Genes: Pseudogenes
All GENCODE V34: 2-Way: 2-way Pseudogenes	All GENCODE V30: Genes: Basic	All GENCODE V31: Genes: Comprehensive	All GENCODE V32: Genes: Pseudogenes
All GENCODE V33: 2-Way: 2-way Pseudogenes	All GENCODE V34: PolyA	All GENCODE V29: Genes: Basic	All GENCODE V30: Genes: Comprehensive
All GENCODE V31: Genes: Pseudogenes	All GENCODE V32: 2-Way: 2-way Pseudogenes	All GENCODE V33: PolyA	All GENCODE V28: Genes: Basic
All GENCODE V29: Genes: Comprehensive	All GENCODE V30: Genes: Pseudogenes	All GENCODE V31: 2-Way: 2-way Pseudogenes	All GENCODE V32: PolyA
All GENCODE V28: Genes: Comprehensive	All GENCODE V29: Genes: Pseudogenes	All GENCODE V30: 2-Way: 2-way Pseudogenes	All GENCODE V27: Genes: Basic
All GENCODE V31: PolyA	All GENCODE V28: Genes: Pseudogenes	All GENCODE V29: 2-Way: 2-way Pseudogenes	All GENCODE V26: Genes: Basic
All GENCODE V27: Genes: Comprehensive	All GENCODE V30: PolyA	All GENCODE V28: 2-Way: 2-way Pseudogenes	All GENCODE V25: Genes: Basic
All GENCODE V26: Genes: Comprehensive	All GENCODE V29: PolyA	All GENCODE V27: Genes: Pseudogenes	All GENCODE V24: Genes: Basic
All GENCODE V25: Genes: Comprehensive	All GENCODE V28: PolyA	All GENCODE V26: Genes: Pseudogenes	All GENCODE V27: 2-Way: 2-way Pseudogenes
All GENCODE V23: Genes: Basic	All GENCODE V24: Genes: Comprehensive	All GENCODE V25: Genes: Pseudogenes	All GENCODE V26: 2-Way: 2-way Pseudogenes
All GENCODE V27: PolyA	All GENCODE V22: Genes: Basic	All GENCODE V23: Genes: Comprehensive	All GENCODE V24: Genes: Pseudogenes
All GENCODE V25: 2-Way: 2-way Pseudogenes	All GENCODE V26: PolyA	All GENCODE V22: Genes: Comprehensive	All GENCODE V23: Genes: Pseudogenes
All GENCODE V24: 2-Way: 2-way Pseudogenes	All GENCODE V25: PolyA	All GENCODE V22: Genes: Pseudogenes	All GENCODE V23: 2-Way: 2-way Pseudogenes
GENCODE V20 (Ensembl 76): Genes: Basic	All GENCODE V24: PolyA	All GENCODE V22: 2-Way: 2-way Pseudogenes	GENCODE V20 (Ensembl 76): Genes: Comprehensive
All GENCODE V23: PolyA	All GENCODE V22: PolyA	GENCODE V20 (Ensembl 76): Genes: Pseudogenes	GENCODE V20 (Ensembl 76): 2-Way: 2-way Pseudogenes
GENCODE V20 (Ensembl 76): PolyA	CCDS	Old UCSC Genes	Other RefSeq
Prediction Archive: AUGUSTUS	Prediction Archive: Geneid Genes	Prediction Archive: Genscan Genes	Non-coding RNA: lincRNA TUCP
Prediction Archive: SGP Genes	Prediction Archive: SIB Genes

Filtering Options

Coloring Options

SNP Feature for Color Specification:

The selected "Feature for Color Specification" above has the selection of colors below for each attribute. Only the color options for the feature selected above will be used to color items; color options for other features will not be shown. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black.

Unknown			Locus			Coding - Synonymous			Coding - Non-Synonymous
Untranslated			Intron			Splice Site

Data schema/format description and download

Assembly: Human Dec. 2013 (GRCh38/hg38)
Data last updated at UCSC: 2017-04-17

Description

This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 150, available from ftp.ncbi.nlm.nih.gov/snp. Only SNPs that have a minor allele frequency (MAF) of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. Allele counts from all submissions that include frequency data are combined when determining MAF, so for example the allele counts from the 1000 Genomes Project and an independent submitter may be combined for the same variant.

dbSNP provides download files in the Variant Call Format (VCF) that include a "COMMON" flag in the INFO column. That is determined by a different method, and is generally a superset of the UCSC Common set. dbSNP uses frequency data from the 1000 Genomes Project only, and considers a variant COMMON if it has a MAF of at least 0.01 in any of the five super-populations:

African (AFR)
Admixed American (AMR)
East Asian (EAS)
European (EUR)
South Asian (SAS)

In build 151 (which has replaced build 150 on the dbSNP web and download site), dbSNP marks approximately 38M variants as COMMON; 23M of those have a global MAF < 0.01. The remainder should be in agreement with UCSC's Common subset.

The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information.

The remainder of this page is identical on the following tracks:

Common SNPs(150) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly.
Flagged SNPs(150) - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated" -- not necessarily a risk allele!
Mult. SNPs(150) - SNPs mapping in more than one place on reference assembly.
All SNPs(150) - all SNPs from dbSNP mapping to reference assembly.

Interpreting and Configuring the Graphical Display

Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases.

On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes:

Class: Describes the observed alleles
- Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles)
- In-del - insertion/deletion
- Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'
- Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats
- Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-
- No Variation - the submission reports an invariant region in the surveyed sequence
- Mixed - the cluster contains submissions from multiple classes
- Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1
- Insertion - the polymorphism is an insertion relative to the reference assembly
- Deletion - the polymorphism is a deletion relative to the reference assembly
- Unknown - no classification provided by data contributor
Validation: Method used to validate the variant (each variant may be validated by more than one method)
- By Frequency - at least one submitted SNP in cluster has frequency data submitted
- By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method
- By Submitter - at least one submitter SNP in cluster was validated by independent assay
- By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes
- By HapMap (human only) - submitted by HapMap project
- By 1000Genomes (human only) - submitted by 1000Genomes project
- Unknown - no validation has been reported for this variant
Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser.
- Unknown - no functional classification provided (possibly intergenic)
- synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon)
- intron_variant - A transcript variant occurring within an intron (dbSNP term: intron)
- downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3)
- upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5)
- nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA)
- stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense)
- missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense)
- stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss)
- frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift)
- inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel)
- 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3)
- 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5)
- splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3)
- splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5)
In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors. If a SNP has more than one of these attributes, the stronger color will override the weaker color. The order of colors, from strongest to weakest, is red, green, blue, gray, and black.
- Locus: downstream_gene_variant, upstream_gene_variant
- Coding - Synonymous: synonymous_variant
- Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel
- Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant
- Intron: intron_variant
- Splice Site: splice_acceptor_variant, splice_donor_variant
Non-coding (ncRNA): (nc_transcript_variant) are colored blue.
Molecule Type: Sample used to find this variant
- Genomic - variant discovered using a genomic template
- cDNA - variant discovered using a cDNA template
- Unknown - sample type not known
Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found:
- AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete.
- DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed).
- FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.)
- MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly.
- NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.)
- NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.)
- NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect.
- ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
- ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed.
- ObservedTooLong - Observed allele not given (length too long).
- ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class.
- RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range.
- RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele.
- SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.)
- SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.)
Another condition, which does not necessarily imply any problem, is noted:
- SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two).
Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details)
- Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies.
- Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information.
- Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA.
- Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative.
- MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed.
- MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed.
- Genotype Conflict - Quality check: different genotypes have been submitted for the same individual.
- Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles.
- Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles.

Several other properties do not have coloring options, but do have some filtering options:

Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters.
- Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions.
Weight: Alignment quality assigned by dbSNP. Before dbSNP build 147, weight had values 1, 2 or 3, with 1 being the highest quality (mapped to a single genomic location). As of dbSNP build 147, dbSNP now releases only the variants with weight 1.
Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples).
AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP.

You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq.

Insertions/Deletions

dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'.

UCSC Re-alignment of flanking sequences

dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition.

Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period >= 12) is shown in lower case, and matching bases are indicated by a "+".

Data Sources and Methods

The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/database/data/organism_data/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/database/data/organism_data/ for hg38. The fasta files were downloaded from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/rs_fasta/ for hg19 and from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b150_GRCh38p7/rs_fasta/ for hg38.

Coordinates, orientation, location type and dbSNP reference allele data were obtained from b150_SNPContigLoc_N.bcp.gz and b150_ContigInfo_N.bcp.gz. (N = 105 for hg19, 107 for hg38)
b150_SNPMapInfo_N.bcp.gz provided the alignment weights.
Functional classification was obtained from b150_SNPContigLocusId_N.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms.
Validation status and heterozygosity were obtained from SNP.bcp.gz.
SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz .
Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz.
SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details.
The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism.

Data Access

The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38 and hg19 (snp150*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information.

Orthologous Alleles (human assemblies only)

For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria:

class = 'single'
mapped position in the human reference genome is one base long
aligned to only one location in the human reference genome
not aligned to a chrN_random chrom
biallelic (not tri- or quad-allelic)

In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero).

Masked FASTA Files (human assemblies only)

FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download: GRCh37/hg19, GRCh38/hg38. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exclude problematic SNPs.

References

Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783