 |
SNP Discovery and Genotyping
SNP Discovery
Genotyping
Quality Control (QC)/ Quality Assurance (QA):
SNP Discovery:
Genes are submitted to be sequenced along with LocusLink Ids and accession numbers. These are checked for consistency, and the cDNA sequence and coding sequence is extracted from the accession. The cDNA is aligned with the latest assembly of the human genome to allow for design of PCR assays from genomic sequence. Genes for which the cDNA fails to align properly to the genome (at least 95% of coding region aligned to the genome) are set aside for future attempts as the genome assembly becomes more complete. For the remaining genes, primers are designed to allow 80 bases between the primer and the targeted exon(s), eliminating issues with lower quality sequencing near the beginning of the sequencing reaction. Assays are designed to be approximately 500 bases long, and where overlapping assays are required to cover large exons, 80 bases of overlap are designed. Once primers are designed, each primer pair (“assay”) is characterized by amplification of 3 DNAs and scoring on an agarose gel. Assays producing a single band in 2 or 3 of the 3 DNAs are scored as passing (typical results are 3/3 single bands); remaining assays are set aside for future primer resynthesis or redesign. Current success rate is 87%.
Once the assay has passed characterization, the samples to be sequenced are amplified; spot checking on an agarose gel is performed again to catch systematic failures. PCR products are purified and sent to the Sequencing Facility at 320 Charles Street, where fluorescent, dideoxy sequencing occurs in a fully automated, bar-coded process on ABI 3700 capillary sequencers. The QA and QC for this process are those used to sequence reads for the human genome. Once sequence data has been obtained, gels are scored by two independent observers using Consed as a viewer and Polyphred to call heterozygotes; heterozygotes are also identified manually. At polymorphic positions, genotypes are tentatively assigned to all samples by PolyPhred with some additional manual review. Gels with pass rates of <60% are re-done (a passing lane has at least 40% of bases with Phred scores > 20). The overall pass rate is 88%, and passing lanes typically have well over 70% of bases with Phred score >20.
Once the gels have been scored, polymorphisms are screened for Hardy-Weinberg equilibrium and triallelic markers (suggesting problems with scoring), with additional review of markers failing these screens. Candidate polymorphisms are validated by either 1) resequencing one or more of the DNAs and again observing the polymorphism or 2) by observing multiple genotypes in a genotyping reaction. Once candidate polymorphisms have been validated, they are submitted to the PGA webpage and to dbSNP.
Top of Page
SNP Genotyping:
To genotype SNPs in the Framingham population, we are using a MassArray platform provided by Sequenom. This methodology involves multiplex PCR amplification of up to 7 SNPs. Excess nucleotides are removed by treatment with shrimp alkaline phophatase, and multiplex primer extension is performed using a mix of deoxy- and dideoxynucleotides, such that products of different masses are obtained for each allele of each SNP. These products are resolved by mass spectrometry (MALDI-TOF). Automated allele-calling algorithms are used to process the data.
Quality control is achieved by typing families (non-Mendelian transmissions give an estimate of error rate) and by typing replicate samples (at least 5% of total). In our hands, the average error rate with this method is < 1%; SNPs with error rates of > 2% are excluded or regenotyped. SNPs with genotype call rates of < 75% are excluded or regenotyped. SNPs are also checked for Hardy-Weinberg equilibrium (HWE), and SNPs out of HWE (P < 0.01) are excluded or regenotyped.
Top of Page
Participants
|