STAR: ultrafast universal RNA-seq aligner
- PMID: 23104886
- PMCID: PMC3530905
- DOI: 10.1093/bioinformatics/bts635
STAR: ultrafast universal RNA-seq aligner
Abstract
Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases.
Results: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy.
Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
Figures
Similar articles
-
Mapping RNA-seq Reads with STAR.Curr Protoc Bioinformatics. 2015 Sep 3;51:11.14.1-11.14.19. doi: 10.1002/0471250953.bi1114s51. Curr Protoc Bioinformatics. 2015. PMID: 26334920 Free PMC article. Review.
-
Optimizing RNA-Seq Mapping with STAR.Methods Mol Biol. 2016;1415:245-62. doi: 10.1007/978-1-4939-3572-7_13. Methods Mol Biol. 2016. PMID: 27115637
-
Supersplat--spliced RNA-seq alignment.Bioinformatics. 2010 Jun 15;26(12):1500-5. doi: 10.1093/bioinformatics/btq206. Epub 2010 Apr 21. Bioinformatics. 2010. PMID: 20410051 Free PMC article.
-
Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19. Bioinformatics. 2011. PMID: 21775302 Free PMC article.
-
Mapping RNA-seq reads to transcriptomes efficiently based on learning to hash method.Comput Biol Med. 2020 Jan;116:103539. doi: 10.1016/j.compbiomed.2019.103539. Epub 2019 Nov 13. Comput Biol Med. 2020. PMID: 31765913 Review.
Cited by
-
Conserved glucokinase regulation in zebrafish confirms therapeutic utility for pharmacologic modulation in diabetes.Commun Biol. 2024 Nov 23;7(1):1557. doi: 10.1038/s42003-024-07264-5. Commun Biol. 2024. PMID: 39580550 Free PMC article.
-
Proteogenomic analysis reveals non-small cell lung cancer subtypes predicting chromosome instability, and tumor microenvironment.Nat Commun. 2024 Nov 23;15(1):10164. doi: 10.1038/s41467-024-54434-4. Nat Commun. 2024. PMID: 39580524 Free PMC article.
-
A multi-region single nucleus transcriptomic atlas of Parkinson's disease.Sci Data. 2024 Nov 23;11(1):1274. doi: 10.1038/s41597-024-04117-y. Sci Data. 2024. PMID: 39580497 Free PMC article.
-
Bacterial single-cell RNA sequencing captures biofilm transcriptional heterogeneity and differential responses to immune pressure.Nat Commun. 2024 Nov 24;15(1):10184. doi: 10.1038/s41467-024-54581-8. Nat Commun. 2024. PMID: 39580490 Free PMC article.
-
Analysis of the Pattern of RNA Expression in the Skin of TR-Deficient Mice By RNA-seq.Methods Mol Biol. 2025;2876:151-162. doi: 10.1007/978-1-0716-4252-8_10. Methods Mol Biol. 2025. PMID: 39579314
References
-
- De Bona F, et al. Optimal spliced alignments of short sequence reads. Bioinformatics. 2008;24:i174–180. - PubMed
Publication types
MeSH terms
Associated data
- Actions
- Actions
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources