Accurate whole human genome sequencing using reversible terminator chemistry

doi:10.1038/nature07517

. 2008 Nov 6;456(7218):53-9.

doi: 10.1038/nature07517.

Accurate whole human genome sequencing using reversible terminator chemistry

David R Bentley¹, Shankar Balasubramanian, Harold P Swerdlow, Geoffrey P Smith, John Milton, Clive G Brown, Kevin P Hall, Dirk J Evers, Colin L Barnes, Helen R Bignell, Jonathan M Boutell, Jason Bryant, Richard J Carter, R Keira Cheetham, Anthony J Cox, Darren J Ellis, Michael R Flatbush, Niall A Gormley, Sean J Humphray, Leslie J Irving, Mirian S Karbelashvili, Scott M Kirk, Heng Li, Xiaohai Liu, Klaus S Maisinger, Lisa J Murray, Bojan Obradovic, Tobias Ost, Michael L Parkinson, Mark R Pratt, Isabelle M J Rasolonjatovo, Mark T Reed, Roberto Rigatti, Chiara Rodighiero, Mark T Ross, Andrea Sabot, Subramanian V Sankar, Aylwyn Scally, Gary P Schroth, Mark E Smith, Vincent P Smith, Anastassia Spiridou, Peta E Torrance, Svilen S Tzonev, Eric H Vermaas, Klaudia Walter, Xiaolin Wu, Lu Zhang, Mohammed D Alam, Carole Anastasi, Ify C Aniebo, David M D Bailey, Iain R Bancarz, Saibal Banerjee, Selena G Barbour, Primo A Baybayan, Vincent A Benoit, Kevin F Benson, Claire Bevis, Phillip J Black, Asha Boodhun, Joe S Brennan, John A Bridgham, Rob C Brown, Andrew A Brown, Dale H Buermann, Abass A Bundu, James C Burrows, Nigel P Carter, Nestor Castillo, Maria Chiara E Catenazzi, Simon Chang, R Neil Cooley, Natasha R Crake, Olubunmi O Dada, Konstantinos D Diakoumakos, Belen Dominguez-Fernandez, David J Earnshaw, Ugonna C Egbujor, David W Elmore, Sergey S Etchin, Mark R Ewan, Milan Fedurco, Louise J Fraser, Karin V Fuentes Fajardo, W Scott Furey, David George, Kimberley J Gietzen, Colin P Goddard, George S Golda, Philip A Granieri, David E Green, David L Gustafson, Nancy F Hansen, Kevin Harnish, Christian D Haudenschild, Narinder I Heyer, Matthew M Hims, Johnny T Ho, Adrian M Horgan, Katya Hoschler, Steve Hurwitz, Denis V Ivanov, Maria Q Johnson, Terena James, T A Huw Jones, Gyoung-Dong Kang, Tzvetana H Kerelska, Alan D Kersey, Irina Khrebtukova, Alex P Kindwall, Zoya Kingsbury, Paula I Kokko-Gonzales, Anil Kumar, Marc A Laurent, Cynthia T Lawley, Sarah E Lee, Xavier Lee, Arnold K Liao, Jennifer A Loch, Mitch Lok, Shujun Luo, Radhika M Mammen, John W Martin, Patrick G McCauley, Paul McNitt, Parul Mehta, Keith W Moon, Joe W Mullens, Taksina Newington, Zemin Ning, Bee Ling Ng, Sonia M Novo, Michael J O'Neill, Mark A Osborne, Andrew Osnowski, Omead Ostadan, Lambros L Paraschos, Lea Pickering, Andrew C Pike, Alger C Pike, D Chris Pinkard, Daniel P Pliskin, Joe Podhasky, Victor J Quijano, Come Raczy, Vicki H Rae, Stephen R Rawlings, Ana Chiva Rodriguez, Phyllida M Roe, John Rogers, Maria C Rogert Bacigalupo, Nikolai Romanov, Anthony Romieu, Rithy K Roth, Natalie J Rourke, Silke T Ruediger, Eli Rusman, Raquel M Sanches-Kuiper, Martin R Schenker, Josefina M Seoane, Richard J Shaw, Mitch K Shiver, Steven W Short, Ning L Sizto, Johannes P Sluis, Melanie A Smith, Jean Ernest Sohna Sohna, Eric J Spence, Kim Stevens, Neil Sutton, Lukasz Szajkowski, Carolyn L Tregidgo, Gerardo Turcatti, Stephanie Vandevondele, Yuli Verhovsky, Selene M Virk, Suzanne Wakelin, Gregory C Walcott, Jingwen Wang, Graham J Worsley, Juying Yan, Ling Yau, Mike Zuerlein, Jane Rogers, James C Mullikin, Matthew E Hurles, Nick J McCooke, John S West, Frank L Oaks, Peter L Lundberg, David Klenerman, Richard Durbin, Anthony J Smith

Affiliations

PMID: 18987734
PMCID: PMC2581791
DOI: 10.1038/nature07517

Accurate whole human genome sequencing using reversible terminator chemistry

David R Bentley et al. Nature. 2008.

. 2008 Nov 6;456(7218):53-9.

doi: 10.1038/nature07517.

Authors

Affiliation

¹ Illumina Cambridge Ltd. (Formerly Solexa Ltd), Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex CB10 1XL, UK. dbentley@illumina.com

PMID: 18987734
PMCID: PMC2581791
DOI: 10.1038/nature07517

Abstract

DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

PubMed Disclaimer

Figures

**Figure 1. Sample preparation**
a. DNA fragments are generated e.g. by random shearing and joined to a pair of oligonucleotides in a forked adapter configuration. The ligated products are amplified using two oligonucleotide primers, resulting in double-stranded blunt-ended material with a different adapter sequence on either end. b. formation of clonal single molecule array. DNA fragments prepared as in a are denatured and single strands are annealed to complementary oligonucleotides on the flowcell surface (hatched in the figure). A new strand (dotted) is copied from the original strand in an extension reaction that is primed from the 3’ end of the surface-bound oligonucleotide, and the original strand is then removed by denaturation. The adapter sequence at the 3’ end of each copied strand is annealed to a new surface bound complementary oligonucleotide, forming a bridge and generating a new site for synthesis of a second strand (shown dotted). Multiple cycles of annealing, extension and denaturation in isothermal conditions result in growth of clusters each ~1micron in physical diameter. This follows the basic method outlined in ref c. The DNA in each cluster is linearised by cleavage within one adapter sequence (gap marked by an asterisk) and denatured, generating single stranded template for sequencing by synthesis to obtain a sequence read (read 1)(the sequencing product is shown dotted). To perform paired-read sequencing, the products of read 1 are removed by denaturation, the template is used to generate a bridge, the second strand is re-synthesised (shown dotted), and the opposite strand is then cleaved (gap marked by an asterisk) to provide the template for the second read (read 2). d. Long range paired end sample preparation. To sequence the ends of a long (e.g. >1 kb) DNA fragment, the ends of each fragment are tagged by incorporation of biotinylated (B) nucleotide and then circularised, forming a junction between the two ends. Circularised DNA is randomly fragmented and the biotinylated junction fragments are recovered and used as starting material in the standard sample preparation procedure illustrated in a above. The orientation of the sequence reads relative to the DNA fragment is tracked in the figure by magenta arrows. When aligned to the reference sequence, these reads are oriented with their 5’ ends towards each other (in contrast to the short insert paired reads produced as shown in ***a–c***). See fig S17a for examples of both. Turquoise and blue lines represent oligonucleotides and red lines represent genomic DNA. Note that all surface-bound oligonucleotides are attached to the flowcell by their 5’ ends. Dotted lines indicate newly synthesized strands during cluster formation or sequencing. See supplementary methods for details.

**Figure 2. X chromosome data**
a. Distribution of mapped read depth in the X chromosome dataset, sampled at every 50th position along the chromosome and displayed as a histogram (‘all’). An equivalent analysis of mapped read depth for the unique subset of these positions is also shown (‘unique only’). The solid line represents a Poisson distribution with the same mean. b. Distribution of X chromosome uniquely mapped reads as a function of GC content. Note that the x axis is % GC content and is scaled by percentile of unique sequence. The solid line is average mapped depth of unique sequence; the grey region is the central 80% of the data (10th to 90th centiles); the dashed lines are 10th and 90th centiles of a Poisson distribution with the same mean as the data.

**Figure 3. SNPs identified in the human genome sequence of NA18507**
a. number of SNPs detected by class and % in dbSNP (release 128). Results from ELAND and MAQ alignments are reported separately. b. Overlap of SNPs detected in each analysis reveals extensive overlap. The % of NA18507 SNP calls that match previous entries in dbSNP is lower than that of our X chromosome study (see fig S6). We expect this because individual NA07340 (from the X study) was also previously used for discovery and submission of SNPs to dbSNP during the HapMap project, in contrast to NA18507.

Figure 4. Homozygous complex rearrangement detected by anomalous paired reads. The rearrangement involves an inversion of 369 bp (blue-turquoise bar in the schematic) flanked by deletions (red bars) of 1206 and 164 bp, respectively, at the left and right hand breakpoints
a. summary tracks in the Resembl browser, denoting scale, simulated alignability of reads to reference (blue plot), actual aligned depth of coverage by NA18507 reads (green plot), density of anomalous reads indicating structural variants (red plot; peaks denote ‘hotspots’), density of singleton reads (pink plot). b. anomalous long insert read pairs (orange lines denote DNA fragment, blocks at either end denote each read); the data indicate loss of ~1.3kb in NA18507 relative to the reference. c. anomalous short insert pairs of two types (red and pink) indicate an inverted sequence flanked by two deletions. d. normal short insert read pair alignments (each green line denotes the extent of the reference that is covered by the short fragment, including the two reads). e. The schematic depicts the arrangement of normal and anomalous read pairs relative to the rearrangement. Top line: structure of NA18507, second line: structure of reference sequence. Green bars denote sequence that is collinear in the reference and NA18507. The turquoise-blue bar illustrates the inverted segment. Red bars indicate the sequences present in the reference but absent in NA18507. Arrows denote orientation of reads when aligned to the reference. Note that the display in ***a–d*** is a composite of screen shots of the same window, overlapped for display purposes in this figure.

**Figure 5. Effect of sequence depth on coverage and accuracy of human genome sequencing. ELAND alignments were used for this analysis**
a. Accumulation of sequence-based SNP calls, including all SNPs (squares), heterozygous SNPs (triangles) and homozygous SNPs (circles) with increasing input read depth. b. Decrease in genotype positions not covered by sequence (squares), heterozygote undercalls in sequence data relative to genotype data (triangles) and discordant SNP calls compared to genotypes (circles) with increasing input read depth. Vertical dotted lines indicate various input read depths (10x, 15x, 30x haploid genome).

See this image and copyright information in PMC

Comment in

Human genetics: Individual genomes diversify.
Levy S, Strausberg RL. Levy S, et al. Nature. 2008 Nov 6;456(7218):49-51. doi: 10.1038/456049a. Nature. 2008. PMID: 18987731 No abstract available.

Cited by

The Evolution of Next-Generation Sequencing Technologies.
Akintunde O, Tucker T, Carabetta VJ. Akintunde O, et al. Methods Mol Biol. 2025;2866:3-29. doi: 10.1007/978-1-0716-4192-7_1. Methods Mol Biol. 2025. PMID: 39546194 Review.
A mapping-free natural language processing-based technique for sequence search in nanopore long-reads.
Strzoda T, Cruz-Garcia L, Najim M, Badie C, Polanska J. Strzoda T, et al. BMC Bioinformatics. 2024 Nov 13;25(1):354. doi: 10.1186/s12859-024-05980-7. BMC Bioinformatics. 2024. PMID: 39538122 Free PMC article.
Comparative Transcriptome Analysis of Cold Tolerance Mechanism in Honeybees (Apis mellifera sinisxinyuan).
Shan J, Cheng R, Magaoya T, Duan Y, Chen C. Shan J, et al. Insects. 2024 Oct 11;15(10):790. doi: 10.3390/insects15100790. Insects. 2024. PMID: 39452366 Free PMC article.
Avidity sequencing of whole genomes from retinal degeneration pedigrees identifies causal variants.
Biswas P, Villanueva A, Krajacich BJ, Moreno J, Zhao J, Berry AM, Lazaro D, Lajoie BR, Kruglyak S, Ayyagari R. Biswas P, et al. PLoS One. 2024 Oct 4;19(10):e0307266. doi: 10.1371/journal.pone.0307266. eCollection 2024. PLoS One. 2024. PMID: 39365799 Free PMC article.
Telomere-to-telomere assemblies of cattle and sheep Y-chromosomes uncover divergent structure and gene content.
Olagunju TA, Rosen BD, Neibergs HL, Becker GM, Davenport KM, Elsik CG, Hadfield TS, Koren S, Kuhn KL, Rhie A, Shira KA, Skibiel AL, Stegemiller MR, Thorne JW, Villamediana P, Cockett NE, Murdoch BM, Smith TPL. Olagunju TA, et al. Nat Commun. 2024 Sep 27;15(1):8277. doi: 10.1038/s41467-024-52384-5. Nat Commun. 2024. PMID: 39333471 Free PMC article.

See all "Cited by" articles

References

1. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. - PubMed
1. Levy S, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. - PMC - PubMed
1. Margulies M, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
1. Shendure J, et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005;309:1728–1732. - PubMed
1. Harris TD, et al. Single-molecule DNA sequencing of a viral genome. Science. 2008;320:106–109. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- Coriell Cell Repositories

[1] International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. - PubMed

[2] International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. - PubMed

[3] Levy S, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. - PMC - PubMed

[4] Levy S, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. - PMC - PubMed

[5] Margulies M, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed

[6] Margulies M, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed

[7] Shendure J, et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005;309:1728–1732. - PubMed

[8] Shendure J, et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005;309:1728–1732. - PubMed

[9] Harris TD, et al. Single-molecule DNA sequencing of a viral genome. Science. 2008;320:106–109. - PubMed

[10] Harris TD, et al. Single-molecule DNA sequencing of a viral genome. Science. 2008;320:106–109. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accurate whole human genome sequencing using reversible terminator chemistry

Affiliation

Accurate whole human genome sequencing using reversible terminator chemistry

Authors

Affiliation

Abstract

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Figures

Comment in

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials