Key Points
-
DNA pooling is an effective way of reducing the cost of genotyping in large-scale association studies.
-
DNA pools should be constituted with care to ensure that equal amounts of DNA are contributed by the individuals that make up a pool.
-
Accurate quantitative genotyping assays are available for use on pooled DNA.
-
Differential amplification occurs for many single-nucleotide polymorphisms, and this bias should be corrected in the estimation of allele frequency from pooled DNA.
-
A two-stage design, in which positive marker loci from pooling studies are followed by confirmatory individual genotyping, might represent the best trade-off between the cost savings of pooling and the full information that is provided by individual genotyping.
-
Random experimental errors in the constitution of DNA pools and in the measurement of allele frequencies from pooled DNA should be taken into account in statistical analysis.
-
Sophisticated pooling designs are being developed that can take account of hidden population stratification, confounders and interactions, and that allow the analysis of haplotypes.
Abstract
DNA pooling is a practical way to reduce the cost of large-scale association studies to identify susceptibility loci for common diseases. Pooling allows allele frequencies in groups of individuals to be measured using far fewer PCR reactions and genotyping assays than are used when genotyping individuals. Here, we discuss recent developments in quantitative genotyping assays and in the design and analysis of pooling studies. Sophisticated pooling designs are being developed that can take account of hidden population stratification, confounders and inter-loci interactions, and that allow the analysis of haplotypes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Risch, N. J. Searching for genetic determinants in the new millennium. Nature 405, 847–856 (2002).
Cardon, L. R. & Bell, J. I. Association study designs for complex disease. Nature Rev. Genet. 2, 91–99 (2001).
Tabor, H. K., Risch, N. J. & Myers, R. M. Candidate-gene approaches for studying complex traits: practical considerations. Nature Rev. Genet. 3, 1–7 (2002).
Syvanen, A. C. Accessing genetic variation: genotyping single nucleotide polymorphisms. Nature Rev. Genet. 2, 930–942 (2001).This review provides a good introduction to SNP-genotyping methods.
Dorfman, R. The detection of defective members of large populations. Ann. Math. Stat. 14, 436–440 (1943).
Thompson, K. H. Estimation of the proportion of vectors in a natural population of insects. Biometrics 18, 568–578 (1962).
Sobel, M. & Elashoff, R. M. Group testing with a new goal, estimation. Biometrics 62, 181–193 (1975).
Tu, X. M., Litvak, E. & Pagano, M. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening. Biometrika 82, 287–297 (1995).
Brookmeyer, R. Analysis of multistage pooling studies of biological specimens for estimating disease incidence and prevalence. Biometrics 55, 608–612 (1999).
Weinberg, C. R. & Umbach, D. M. Using pooled exposure assessment to improve efficiency in case–control studies. Biometrics 55, 718–726 (1999).
Gastwirth, J. L. & Hammick, P. A. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of AIDS antibodies in blood. J. Stat. Planning Inference 22, 15–27 (1989).
Gastwirth, J. L. & Johnson, W. Screening with cost effective quality control: potential application to HIV and drug testing. J. Am. Stat. Assoc. 89, 972–981 (1994).
Arnheim, N., Strange, C. & Erlich, H. Use of pooled DNA samples to detect linkage disequilibrium of polymorphic restriction fragments and human disease: studies of HLA class II loci. Proc. Natl Acad. Sci. USA 82, 6970–6974 (1985).
Michelmore, R. W., Paran, I. & Kesseli, R. V. Identification of markers linked to disease-resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proc. Natl Acad. Sci. USA 88, 9828–9832 (1991).
Sheffield, V. C. et al. Identification of a Bardet–Biedl syndrome locus on chromosome 3 and evaluation of an efficient approach to homozygosity mapping. Hum. Mol. Genet. 3, 1331–1335 (1994).
Carmi, R. et al. Use of a DNA pooling strategy to identify a human obesity syndrome locus on chromosome 15. Hum. Mol. Genet. 4, 9–13 (1995).An example of the successful application of pooling.
Nystuen, A., Benke, P. J., Merren, J., Stone, E. M. & Sheffield, V. C. A cerebellar ataxia locus identified by DNA pooling to search for linkage disequilibrium in an isolated population from the Cayman Islands. Hum. Mol. Genet. 5, 525–531 (1996).
Scott, D. A. et al. An autosomal recessive non-syndromic-hearing-loss locus identified by DNA pooling using two inbred Bedouin kindreds. Am. J. Hum. Genet. 59, 385–391 (1996).
Amos, C. I., Frazier, M. L. & Wang, W. DNA pooling in mutation detection with reference to sequence analysis. Am. J. Hum. Genet. 66, 1689–1692 (2000).
Pacek, P., Sajantila, A. & Syvanen, A. C. Determination of allele frequencies at loci with length polymorphism by quantitative analysis of DNA amplified from pooled samples. PCR Methods Appl. 2, 313–317 (1993).
Barcellos, L. F. et al. Association mapping of disease loci, by use of a pooled DNA genomic screen. Am. J. Hum. Genet. 61, 737–747 (1997).
Daniels, J. et al. A simple method for analysing microsatellite allele image patterns generated from DNA pools and its applications to allelic association studies. Am. J. Hum. Genet. 62, 1189–1197 (1998).
Shaw, S. H., Carrasquillo, M. M., Kashuk, C., Puffenberger, E. G. & Chakravarti, A. Allele frequency distributions in pooled DNA samples: applications to mapping complex disease genes. Genome Res. 8, 111–123 (1998).
Kirov, G., Stephens, M., Williams, N., O'Donovan, M. & Owen, M. Automated genotyping of single-nucleotide polymorphisms by extension of fluorescently labelled primers: analysis of individual and pooled DNA samples. Balkan J. Med. Genet. 3, 23–28 (2000).
Hoogendoorn, B. et al. Cheap, accurate and rapid allele frequency estimation of single nucleotide polymorphisms by primer extension and DHPLC in DNA pools. Hum. Genet. 107, 488–493 (2000).
Germer, S., Holland, M. J. & Higuchi, R. High-throughput SNP allele frequency determination in pooled DNA samples by kinetic PCR. Genome Res. 10, 258–266 (2000).
Ross, P., Hall, L. & Haff, L. A. Quantitative approach to single-nucleotide polymorphism analysis using MALDI–TOF mass spectrometry. Biotechniques 29, 620–626, 628–629 (2000).
Breen, G., Harold, D., Ralston, S., Shaw, D. & St Clair, D. Determining SNP allele frequencies in DNA pools. Biotechniques 28, 464–470 (2000).
Sasaki, T. et al. Precise estimation of allele frequencies of single-nucleotide polymorphisms by a quantitative SSCP analysis of pooled DNA. Am. J. Hum. Genet. 68, 214–218 (2001).
Norton, N. et al. Universal, robust, highly quantitative SNP allele frequency measurement in DNA pools. Hum. Genet. 110, 471–478 (2002).
Plomin, R. et al. A genome-wide scan of 1847 DNA markers for allelic associations with general cognitive ability: a five-stage design using DNA pooling. Behav. Genet. 31, 497–509 (2002).This study illustrates the use of pooling as an efficient screening tool in a multi-stage design.
Curran, S. et al. Validation of single nucleotide polymorphism (SNP) quantification in pooled DNA samples using SNaPIT™ technology, a glycosylase-mediated polymorphism detection method. Biotechniques (in the press).
Craig, I. W. & McClay, J. in Behavioral Genetics in the Post-genomics Era (eds Plomin, R., DeFries, J., Craig, I. & McGuffin, P) 19–40 (APA Books, Washington, DC, 2002).This book reviews genotyping methods for microsatellite and SNP markers, with comments on pooling strategy.
Vaughan, P. & McCarthy, T. V. A novel process for mutation detection using uracil DNA-glycosylase. Nucleic Acids Res. 26, 810–815 (1998).
Syvanen, A. C., Aalto-Setala, K., Kontula, K. & Soderlund, H. A primer-guided nucleotide incorporation assay in the genotyping of apolipoprotein E. Genomics 8, 684–692 (1990).
Syvanen, A. C. From gels to chips: 'minisequencing' primer extension for analysis of point mutations and single nucleotide polymorphisms. Hum. Mutat. 13, 1–10 (1999).
Tully, G., Sullivan, K. M., Nixon, P., Stones, R. E. & Gill, P. Rapid detection of mitochondrial sequence polymorphisms using mutiplex solid phase fluorescent minisequencing. Genomics 34, 107–113 (1996).
Pastinen, T. et al. A system for specific, high-throughput genotyping by allele-specific primer extension on microarrays. Genome Res. 10, 1031–1042 (2000).
Braun, A., Little, D. P. & Koster, H. Detecting CFTR gene mutations by using primer oligo base extension and mass spectrometry. Clin. Chem. 43, 1151–1158 (1997).
Nordfors, l. et al. Large-scale genotyping of single nucleotide polymorphisms by pyrosequencing and validation against the 5′ nuclease (TaqMan) assay. Hum. Mutat. 19, 395–401 (2000).
Gruber, J. D., Colligan, P. B. & Wolford, J. K. Estimation of single nucleotide polymorphism allele frequency in DNA pools by using pyrosequencing. Hum. Genet. 110, 395–401 (2002).
Wasson, J., Skolnick, G., Love-Gregory, L. & Permutt, M. A. Assessing allele frequencies of single nucleotide polymorphisms in DNA pools by pyrosequencing technology. Biotechniques 32, 1144–1152 (2002).
Werner, M. et al. Large scale determination of SNP allele frequencies in DNA pools using MALDI–TOF mass spectroscopy. Hum. Mutat. 20, 57–64 (2002).
Fan, J. B. et al. Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays. Genome Res. 10, 853–860 (2000).
Hacia, J. G. et al. Strategies for mutation analysis of the large multi-exon ATM gene using high-density oligonucleotide arrays. Genome Res. 8, 1245–1258 (1998).
Germer, S. & Higuchi, R. Single tube genotyping without oligonucleotide probes. Genome Res. 9, 72–78 (1999).
McClay, J., Sugden, K., Koch, H. G., Higuchi, S. & Craig, I. W. High-throughput single-nucleotide polymorphism genotyping by fluorescent competitive allele-specific polymerase chain reaction (SNiPTag). Anal. Biochem. 301, 200–206 (2002).
Livak, K. J. Allelic discrimination using fluorogenmeic probes and the 5′ nuclease assay. Genet. Anal. 14, 143–149 (1999).
Uhl, G., Liu, Q.-R., Walther, W., Hess, J. & Naiman, D. Polysubstance abuse — vulnerability genes: genome scans for association, using 1,004 subjects and 1,494 single nucleotide polymorphisms. Am. J. Hum. Genet. 69, 1290–1300 (2001).
Holland, P. M., Abramson, R. D., Watson, R. & Gelfland, D. H. Detection of specific polymerase chain reaction product by utilizing the 5′ to 3′ exonuclease activity of Thermus aquaticus polymerase. Proc. Natl Acad. Sci. USA 88, 7276–7280 (1991).
Higuchi, R. G., Dolligenger, P. S., Walsh, P. S. & Griffith, R. Simultaneous amplification and detection of specific DNA sequences. Biotechnology 10, 413–417 (1992).
Luedeck, H. & Blascyk, R. Fluorotyping of HLA-C: differential detection on amplicons by sequence-specific priming and fluorogenic probing. Tissue Antigens 50, 627–638 (1997).
Le Hellard, S. et al. SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis. Nucleic Acids Res. (in the press).This paper describes the correction of differential amplification and assesses the accuracy of allele-frequency estimation in pooled samples.
Barratt, B. J. et al. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann. Hum. Genet. (in the press).This paper considers the sources of errors in the estimation of allele frequency in pooled samples and proposes the use of multiple pools, each containing DNA from a small number of individuals.
Bader, J. S., Bansal, A. & Sham, P. C. Efficient SNP-based tests of association for quantitative phenotypes using pooled DNA. GeneScreen 1, 143–150 (2001).A mathematical description of the optimal pooling study designs for analysing quantitative phenotypes.
Jawaid, A., Bader, J. S., Purcell, S., Cherny, S. S. & Sham, P. Optimal selection strategies for QTL mapping using pooled DNA samples. Eur. J. Hum. Genet. (in the press).
Bader, J. S. & Sham, P. C. Family-based association tests for quantitative traits using pooled DNA. Eur. J. Hum. Genet. (in the press).
Mosteller, F. On some useful 'inefficient statistics'. Ann. Math. Stat. 17, 377–408 (1946).
Hill, W. G. Design and efficiency of selection experiments for estimating genetic parameters. Biometrics 27, 293–311 (1971).
Kimura, M. & Crow, J. F. Effect of overall phenotypic selection on genetic change at individual loci. Proc. Natl Acad. Sci. USA 75, 6168–6171 (1978).
Ollivier, L., Messer, L. A., Rothschild, M. F. & Legault, C. The use of selection experiments for detecting quantitative trait loci. Genet. Res. 69, 227–232 (1997).
Hammick, P. A. & Gastwirth, J. L. Group testing for sensitive characteristics: extension to higher prevalence levels. Int. Stat. Rev. 62, 319–331 (1994).
Pritchard, J. K. & Rosenberg, N. A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Pritchard, J. K., Stephens, M., Rosenberg, N. A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).
Satten, G. A., Flanders, W. D. & Yang, Q. Accounting for unmeasured population substructure in case–control studies of genetic association using a novel latent-class model. Am. J. Hum. Genet. 68, 466–477 (2001).
Zhang, S. & Zhao, H. Quantitative similarity-based association tests using population samples. Am. J. Hum. Genet. 69, 601–614 (2001).
Spielman, R. S., McGinnis, R. E. & Ewens, W. J. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52, 506–516 (1993).
Curtis, D. Use of siblings as controls in case–control association studies. Ann. Hum. Genet. 61, 319–333 (1997).
Kirov, G., Williams, N., Sham, P., Craddock, N. & Owen, M. J. Pooled genotyping of microsatellite markers in parent–offspring trios. Genome Res. 10, 105–115 (2000).
Risch, N. & Teng, J. The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases. Genome Res. 8, 1273–1288 (1998).A key paper that discusses the design of pooling studies for family-based association studies.
Akey, J., Jin, L. & Xiong, M. Haplotypes vs single marker linkage disequilibrium tests: what do we gain? Eur. J. Hum. Genet. 9, 291–300 (2001).
Zollner, S. & von Haessler, A. A coalescent approach to study linkage disequilibrium between single nucleotide polymorphisms. Am. J. Hum. Genet. 66, 615–628 (2000).
Martin, E. R. et al. SNPing away at complex disease: analysis of single-nucleotide polymorphisms around APOE in Alzheimer's disease. Am. J. Hum. Genet. 67, 383–394 (2000).
Long, A. D. & Langley, C. H. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9, 720–731 (1999).
Bader, J. S. The relative power of SNPs and haplotype as genetic markers for association tests. Pharmacogenomics 2, 11–24 (2001).
Pfeiffer, R. M., Rutter, J. L., Gail, M. H., Struewing, J. & Gastwirth, J. L. Efficiency of DNA pooling to estimate joint allele frequencies and measure linkage disequilibrium. Genet. Epidemiol. 22, 94–102 (2002).
Cohen, J. Statistical Power Analysis for the Behavioural Sciences 2nd edn (Academic, New York, 1988).
Haff, L. A. & Smirnov, I. P. Single-nucleotide polymorphism identification assays using a thermostable DNA polymerase and delayed extraction MALDI–TOF mass spectrometry. Genome Res. 7, 378–388 (1997).
Zhou, G.-H. et al. Quantitative detection of single nucleotide polymorphisms for a pooled DNA sample by a bioluminometric assay coupled with modified primer extension reactions (BAMBER). Nucleic Acids Res. 29, E93 (2001).
Acknowledgements
P.S. was supported by grants from the UK Medical Research Council, the Wellcome Trust and the National Eye Institute.
Author information
Authors and Affiliations
Corresponding author
Related links
Related links
DATABASES
LocusLink
OMIM
type I (insulin-dependent) diabetes mellitus
FURTHER INFORMATION
Glossary
- STUTTER BANDS
-
The signals that indicate the presence of DNA fragments that are one or two repeats shorter than the true allele, owing to a 'slippage' artefact that arises from the PCR reaction.
- FLUORIMETRY
-
An assay for measuring DNA concentration in which a fluorescent dye is used that intercalates quantitatively between stacked DNA base pairs.
- QUANTITATIVE REAL-TIME PCR
-
A procedure in which the PCR reaction is tracked as it progresses, by monitoring the accumulating signal that is provided by a fluorescent dye released during each PCR cycle.
- MALDI–TOF
-
A mass spectrometry method in which laser-vaporized PCR fragments are accelerated through a vacuum using an electric field, eventually having an impact on a detector. The time taken for the fragments to travel the distance from the plate to the detector is measured and depends on the charge-to-mass ratio of each molecule, so providing a way to distinguish between allele-specific products.
- TAQMAN™
-
A proprietary system that allows the progression of a PCR reaction to be monitored in real time.
- QUANTITATIVE TRAIT
-
A measurable trait that depends on the cumulative action of many genes and that can vary among individuals over a given range to produce a continuous distribution of phenotypes. Common examples include height, weight and blood pressure.
- PEARSON X2-TEST
-
A statistical test that is used to assess whether the frequencies of individuals in different categories of one or more qualitative variables are consistent with those frequencies that are predicted under a certain hypothesis.
- QUALITATIVE TRAIT
-
Those traits for which there is a sharp distinction between phenotypes — the trait is usually present or not. Often only one or a few genes are involved in the expression of qualitative traits.
- RELATIVE RISK
-
The ratio of the risk of developing a disease in individuals who have been exposed to a risk factor to that in individuals who have not been exposed to the risk factor.
- POPULATION STRATIFICATION
-
The presence of multiple population subgroups that show limited inter-breeding. When such subgroups differ both in allele frequency and in disease prevalence, this can lead to erroneous results in association studies.
- HAPLOTYPE
-
The allelic configuration of two or more alleles on a single chromosome of a given individual.
- LINKAGE DISEQUILIBRIUM
-
This occurs when the frequency of a particular haplotype for two or more loci deviates significantly from that expected from the product of the observed allelic frequencies at each locus.
Rights and permissions
About this article
Cite this article
Sham, P., Bader, J., Craig, I. et al. DNA Pooling: a tool for large-scale association studies. Nat Rev Genet 3, 862–871 (2002). https://doi.org/10.1038/nrg930
Issue Date:
DOI: https://doi.org/10.1038/nrg930
This article is cited by
-
Evaluation of the pooled sample method in Infinium MethylationEPIC BeadChip array by comparison with individual samples
Clinical Epigenetics (2023)
-
A joint use of pooling and imputation for genotyping SNPs
BMC Bioinformatics (2022)
-
Deletions in GSN gene associated with growth traits of four Chinese cattle breeds
Molecular Genetics and Genomics (2022)
-
Efficient low-cost marker-assisted selection of trees with MALE STERILITY 1 (MS1) in Japanese cedar (Cryptomeria japonica D. Don) using bulk DNA samples
Tree Genetics & Genomes (2022)
-
Sex determination in the GIFT strain of tilapia is controlled by a locus in linkage group 23
BMC Genetics (2020)