STAR: ultrafast universal RNA-seq aligner

doi:10.1093/bioinformatics/bts635

. 2013 Jan 1;29(1):15-21.

doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

STAR: ultrafast universal RNA-seq aligner

Alexander Dobin¹, Carrie A Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson, Thomas R Gingeras

Affiliations

PMID: 23104886
PMCID: PMC3530905
DOI: 10.1093/bioinformatics/bts635

STAR: ultrafast universal RNA-seq aligner

Alexander Dobin et al. Bioinformatics. 2013.

. 2013 Jan 1;29(1):15-21.

doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.

Authors

Alexander Dobin¹, Carrie A Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson, Thomas R Gingeras

Affiliation

¹ Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA. dobin@cshl.edu

PMID: 23104886
PMCID: PMC3530905
DOI: 10.1093/bioinformatics/bts635

Abstract

Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.

Results: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy.

Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

PubMed Disclaimer

Figures

**Fig. 1.**
Schematic representation of the Maximum Mappable Prefix search in the STAR algorithm for detecting (a) splice junctions, (b) mismatches and (c) tails

**Fig. 2.**
True-positive rate versus false-positive rate (ROC-curve) for simulated RNA-seq data for STAR, TopHat2, GSNAP, RUM and MapSplice

**Fig. 3.**
Various accuracy metrics for splice junction detection in the experimental RNA-seq data. The color-coding scheme for mappers is the same in all plots. X-axis in plots (a), (b), (d) and (e) is the detection threshold defined as the number of reads mapped across each junction, i.e. each point with the X-value of N represents all junctions that are supported by at least N reads mapped by a given aligner. (a) Total number of detected junctions, annotated (solid lines) and unannotated (dashed lines); (b) percentage of detected junctions that are annotated; (c) pseudo-ROC curve: percentage of all annotated junctions that are detected versus percentage of detected junctions that are unannotated; (d) number of unannotated junctions detected by at least two mappers (solid lines) and number of unannotated junctions detected exclusively by only one mapper (dashed lines); (e) percentage of detected unannotated junctions that are detected exclusively by only one mapper and (f) pseudo-ROC curve: percentage of unannotated junctions that are detected by at least two mappers versus percentage of detected unannotated junctions that are detected exclusively by only one mapper

See this image and copyright information in PMC

Cited by

Conserved glucokinase regulation in zebrafish confirms therapeutic utility for pharmacologic modulation in diabetes.
Schmitner N, Thumer S, Regele D, Mayer E, Bergerweiss I, Helker C, Stainier DYR, Meyer D, Kimmel RA. Schmitner N, et al. Commun Biol. 2024 Nov 23;7(1):1557. doi: 10.1038/s42003-024-07264-5. Commun Biol. 2024. PMID: 39580550 Free PMC article.
Proteogenomic analysis reveals non-small cell lung cancer subtypes predicting chromosome instability, and tumor microenvironment.
Song KJ, Choi S, Kim K, Hwang HS, Chang E, Park JS, Shim SB, Choi S, Heo YJ, An WJ, Yang DY, Cho KC, Ji W, Choi CM, Lee JC, Kim HR, Yoo J, Ahn HS, Lee GH, Hwa C, Kim S, Kim K, Kim MS, Paek E, Na S, Jang SJ, An JY, Kim KP. Song KJ, et al. Nat Commun. 2024 Nov 23;15(1):10164. doi: 10.1038/s41467-024-54434-4. Nat Commun. 2024. PMID: 39580524 Free PMC article.
A multi-region single nucleus transcriptomic atlas of Parkinson's disease.
N M P, Fullard JF, Clarence T, Mathur D, Casey C, Hennigan E, Alvia M, Krause-Massaguer J, Barreda A, Davis DA, Vontell RT, Garamszegi SP, Vance JM, Sang L, Chatigny M, Vismer D, Landin B, Burstein D, Lee D, Voloudakis G, Berretta S, Haroutunian V, Scott WK, Bendl J, Roussos P. N M P, et al. Sci Data. 2024 Nov 23;11(1):1274. doi: 10.1038/s41597-024-04117-y. Sci Data. 2024. PMID: 39580497 Free PMC article.
Bacterial single-cell RNA sequencing captures biofilm transcriptional heterogeneity and differential responses to immune pressure.
Korshoj LE, Kielian T. Korshoj LE, et al. Nat Commun. 2024 Nov 24;15(1):10184. doi: 10.1038/s41467-024-54581-8. Nat Commun. 2024. PMID: 39580490 Free PMC article.
Analysis of the Pattern of RNA Expression in the Skin of TR-Deficient Mice By RNA-seq.
Gallardo-Gómez M. Gallardo-Gómez M. Methods Mol Biol. 2025;2876:151-162. doi: 10.1007/978-1-0716-4252-8_10. Methods Mol Biol. 2025. PMID: 39579314

See all "Cited by" articles

References

1. Au KF, et al. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 2010;38:4570–4578. - PMC - PubMed
1. Darling AC, et al. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14:1394–1403. - PMC - PubMed
1. Darling AE, et al. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5:e11147. - PMC - PubMed
1. De Bona F, et al. Optimal spliced alignments of short sequence reads. Bioinformatics. 2008;24:i174–180. - PubMed
1. Delcher AL, et al. Alignment of whole genomes. Nucleic Acids Res. 1999;27:2369–2376. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in GEO
Actions
- Search in PubMed
- Search in GEO

Grants and funding

U54HG004557/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect
- The Lens - Patent Citations Database

[1] Au KF, et al. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 2010;38:4570–4578. - PMC - PubMed

[2] Au KF, et al. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 2010;38:4570–4578. - PMC - PubMed

[3] Darling AC, et al. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14:1394–1403. - PMC - PubMed

[4] Darling AC, et al. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14:1394–1403. - PMC - PubMed

[5] Darling AE, et al. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5:e11147. - PMC - PubMed

[6] Darling AE, et al. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010;5:e11147. - PMC - PubMed

[7] De Bona F, et al. Optimal spliced alignments of short sequence reads. Bioinformatics. 2008;24:i174–180. - PubMed

[8] De Bona F, et al. Optimal spliced alignments of short sequence reads. Bioinformatics. 2008;24:i174–180. - PubMed

[9] Delcher AL, et al. Alignment of whole genomes. Nucleic Acids Res. 1999;27:2369–2376. - PMC - PubMed

[10] Delcher AL, et al. Alignment of whole genomes. Nucleic Acids Res. 1999;27:2369–2376. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

STAR: ultrafast universal RNA-seq aligner

Affiliation

STAR: ultrafast universal RNA-seq aligner

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Associated data

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources