Machine Learning and Integrative Analysis of Biomedical Big Data
Abstract
:1. Introduction
2. Curse of Dimensionality
- (1)
- Filter methods,
- (2)
- Wrapper methods,
- (3)
- Embedded methods.
3. Heterogenous Data
4. Missing Data
5. Rarity and Class Imbalance
6. Big Data Scalability
7. Conclusions and Future Perspectives
Author Contributions
Funding
Conflicts of Interest
References
- Strobel, E.J.; Angela, M.Y.; Lucks, J.B. High-throughput determination of RNA structures. Nat. Rev. Genet. 2018, 19, 615–634. [Google Scholar] [CrossRef] [PubMed]
- Hwang, B.; Lee, J.H.; Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 2018, 50, 96. [Google Scholar] [CrossRef] [PubMed]
- Sedlazeck, F.J.; Lee, H.; Darby, C.A.; Schatz, M.C. Piercing the dark matter: Bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 2018, 19, 329–346. [Google Scholar] [CrossRef] [PubMed]
- Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422, 198. [Google Scholar] [CrossRef] [PubMed]
- Dettmer, K.; Aronov, P.A.; Hammock, B.D. Mass spectrometry-based metabolomics. Mass Spectrom. Rev. 2007, 26, 51–78. [Google Scholar] [CrossRef] [PubMed]
- Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning; Springer: New York, NY, USA, 2001. [Google Scholar]
- Domingos, P. A few useful things to know about machine learning. Commun. ACM 2012, 55, 78–87. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef] [Green Version]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533. [Google Scholar] [CrossRef]
- Breiman, L. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Stat. Sci. 2001, 16, 199–231. [Google Scholar] [CrossRef]
- Obermeyer, Z.; Emanuel, E.J. Predicting the future—Big data, machine learning, and clinical medicine. N. Engl. J. Med. 2016, 375, 1216. [Google Scholar] [CrossRef]
- Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 2015, 16, 321. [Google Scholar] [CrossRef] [PubMed]
- Rohrback, S.; April, C.; Kaper, F.; Rivera, R.R.; Liu, C.S.; Siddoway, B.; Chun, J. Submegabase copy number variations arise during cerebral cortical neurogenesis as revealed by single-cell whole-genome sequencing. Proc. Natl. Acad. Sci. USA 2018, 115, 10804–10809. [Google Scholar] [CrossRef] [PubMed]
- Wang, D.; Li, J.-R.; Zhang, Y.-H.; Chen, L.; Huang, T.; Cai, Y.-D. Identification of Differentially Expressed Genes between Original Breast Cancer and Xenograft Using Machine Learning Algorithms. Genes 2018, 9, 155. [Google Scholar] [CrossRef] [PubMed]
- Kerepesi, C.; Daróczy, B.; Sturm, Á.; Vellai, T.; Benczúr, A. Prediction and characterization of human ageing-related proteins by using machine learning. Sci. Rep. 2018, 8, 4094. [Google Scholar] [CrossRef] [PubMed]
- Bourdon, A.K.; Spano, G.M.; Marshall, W.; Bellesi, M.; Tononi, G.; Serra, P.A.; Baghdoyan, H.A.; Lydic, R.; Campagna, S.R.; Cirelli, C. Metabolomic analysis of mouse prefrontal cortex reveals upregulated analytes during wakefulness compared to sleep. Sci. Rep. 2018, 8, 11225. [Google Scholar] [CrossRef] [PubMed]
- Zheng, P.-Z.; Wang, K.-K.; Zhang, Q.-Y.; Huang, Q.-H.; Du, Y.-Z.; Zhang, Q.-H.; Xiao, D.-K.; Shen, S.-H.; Imbeaud, S.; Eveno, E. Systems analysis of transcriptome and proteome in retinoic acid/arsenic trioxide-induced cell differentiation/apoptosis of promyelocytic leukemia. Proc. Natl. Acad. Sci. USA 2005, 102, 7653–7658. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Azimzadeh, O.; Sievert, W.; Sarioglu, H.; Merl-Pham, J.; Yentrapalli, R.; Bakshi, M.V.; Janik, D.; Ueffing, M.; Atkinson, M.J.; Multhoff, G. Integrative proteomics and targeted transcriptomics analyses in cardiac endothelial cells unravel mechanisms of long-term radiation-induced vascular dysfunction. J. Proteome Res. 2015, 14, 1203–1219. [Google Scholar] [CrossRef] [PubMed]
- Gerling, I.C.; Singh, S.; Lenchik, N.I.; Marshall, D.R.; Wu, J. New data analysis and mining approaches identify unique proteome and transcriptome markers of susceptibility to autoimmune diabetes. Mol. Cell. Proteom. 2006, 5, 293–305. [Google Scholar] [CrossRef]
- Ryan, C.J.; Cimermančič, P.; Szpiech, Z.A.; Sali, A.; Hernandez, R.D.; Krogan, N.J. High-resolution network biology: Connecting sequence with function. Nat. Rev. Genet. 2013, 14, 865. [Google Scholar] [CrossRef] [PubMed]
- Hoadley, K.A.; Yau, C.; Wolf, D.M.; Cherniack, A.D.; Tamborero, D.; Ng, S.; Leiserson, M.D.; Niu, B.; McLellan, M.D.; Uzunangelov, V. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 2014, 158, 929–944. [Google Scholar] [CrossRef] [PubMed]
- De Cecco, L.; Giannoccaro, M.; Marchesi, E.; Bossi, P.; Favales, F.; Locati, L.D.; Licitra, L.; Pilotti, S.; Canevari, S. Integrative miRNA-gene expression analysis enables refinement of associated biology and prediction of response to cetuximab in head and neck squamous cell cancer. Genes 2017, 8, 35. [Google Scholar] [CrossRef] [PubMed]
- Argelaguet, R.; Velten, B.; Arnol, D.; Dietrich, S.; Zenz, T.; Marioni, J.C.; Buettner, F.; Huber, W.; Stegle, O. Multi-Omics Factor Analysis—A framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 2018, 14, e8124. [Google Scholar] [CrossRef] [PubMed]
- Oberbach, A.; Blüher, M.; Wirth, H.; Till, H.; Kovacs, P.; Kullnick, Y.; Schlichting, N.; Tomm, J.M.; Rolle-Kampczyk, U.; Murugaiyan, J. Combined proteomic and metabolomic profiling of serum reveals association of the complement system with obesity and identifies novel markers of body fat mass changes. J. Proteome Res. 2011, 10, 4769–4788. [Google Scholar] [CrossRef] [PubMed]
- Costello, J.C.; Heiser, L.M.; Georgii, E.; Gönen, M.; Menden, M.P.; Wang, N.J.; Bansal, M.; Hintsanen, P.; Khan, S.A.; Mpindi, J.-P. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 2014, 32, 1202. [Google Scholar] [CrossRef] [PubMed]
- Joyce, A.R.; Palsson, B.Ø. The model organism as a system: Integrating’omics’ data sets. Nat. Rev. Mol. Cell Biol. 2006, 7, 198. [Google Scholar] [CrossRef] [PubMed]
- Cavill, R.; Jennen, D.; Kleinjans, J.; Briedé, J.J. Transcriptomic and metabolomic data integration. Brief Bioinform. 2015, 17, 891–901. [Google Scholar] [CrossRef] [PubMed]
- Shen, R.; Olshen, A.B.; Ladanyi, M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 2009, 25, 2906–2912. [Google Scholar] [CrossRef] [Green Version]
- Wang, B.; Mezlini, A.M.; Demir, F.; Fiume, M.; Tu, Z.; Brudno, M.; Haibe-Kains, B.; Goldenberg, A. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 2014, 11, 333–337. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef]
- Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 2017, 18, 851–869. [Google Scholar] [CrossRef]
- Kim, M.; Oh, I.; Ahn, J. An Improved Method for Prediction of Cancer Prognosis by Network Learning. Genes 2018, 9, 478. [Google Scholar] [CrossRef] [PubMed]
- De Meulder, B.; Lefaudeux, D.; Bansal, A.T.; Mazein, A.; Chaiboonchoe, A.; Ahmed, H.; Balaur, I.; Saqi, M.; Pellet, J.; Ballereau, S. A computational framework for complex disease stratification from multiple large-scale datasets. BMC Syst. Biol. 2018, 12, 60. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Wang, Y.; Chang, Q. Feature selection methods for big data bioinformatics: A survey from the search perspective. Methods 2016, 111, 21–31. [Google Scholar] [CrossRef] [PubMed]
- Hira, Z.M.; Gillies, D.F. A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 2015. [Google Scholar] [CrossRef] [PubMed]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
- Wang, Y.; Yao, H.; Zhao, S. Auto-encoder based dimensionality reduction. Neurocomputing 2016, 184, 232–242. [Google Scholar] [CrossRef]
- Meng, C.; Zeleznik, O.A.; Thallinger, G.G.; Kuster, B.; Gholami, A.M.; Culhane, A.C. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief. Bioinform. 2016, 17, 628–641. [Google Scholar] [CrossRef]
- Lock, E.F.; Hoadley, K.A.; Marron, J.S.; Nobel, A.B. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Stat. 2013, 7, 523. [Google Scholar] [CrossRef]
- Meng, C.; Kuster, B.; Culhane, A.C.; Gholami, A.M. A multivariate approach to the integration of multi-omics datasets. BMC Bioinform. 2014, 15, 162. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Liu, C.-C.; Li, W.; Shen, H.; Laird, P.W.; Zhou, X.J. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 2012, 40, 9379–9391. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chalise, P.; Fridley, B.L. Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm. PLoS ONE 2017, 12, e0176278. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z.; Michailidis, G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 2015, 32, 1–8. [Google Scholar] [CrossRef] [PubMed]
- Lake, B.B.; Chen, S.; Sos, B.C.; Fan, J.; Kaeser, G.E.; Yung, Y.C.; Duong, T.E.; Gao, D.; Chun, J.; Kharchenko, P.V. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 2018, 36, 70–80. [Google Scholar] [CrossRef] [PubMed]
- Butler, A.; Hoffman, P.; Smibert, P.; Papalexi, E.; Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018, 36, 411–420. [Google Scholar] [CrossRef] [PubMed]
- Ding, M.Q.; Chen, L.; Cooper, G.F.; Young, J.D.; Lu, X. Precision oncology beyond targeted therapy: Combining omics data with machine learning matches the majority of cancer cells to effective therapeutics. Mol. Cancer Res. 2018, 16, 269–278. [Google Scholar] [CrossRef] [PubMed]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
- Alshahrani, M.; Khan, M.A.; Maddouri, O.; Kinjo, A.R.; Queralt-Rosinach, N.; Hoehndorf, R. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 2017, 33, 2723–2730. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ma, T.; Zhang, A. Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis. arXiv, 2018; arXiv:180901772. [Google Scholar]
- Xu, Q.; Chen, J.; Ni, S.; Tan, C.; Xu, M.; Dong, L.; Yuan, L.; Wang, Q.; Du, X. Pan-cancer transcriptome analysis reveals a gene expression signature for the identification of tumor tissue origin. Mod. Pathol. 2016, 29, 546–556. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Whalen, S.; Truty, R.M.; Pollard, K.S. Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 2016, 48, 488–496. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kim, S.; Jhong, J.-H.; Lee, J.; Koo, J.-Y. Meta-analytic support vector machine for integrating multiple omics data. BioData Min. 2017, 10, 2. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Sun, F.; McGovern, D.P. Sparse generalized linear model with L 0 approximation for feature selection and prediction with big omics data. BioData Min. 2017, 10, 39. [Google Scholar] [CrossRef] [PubMed]
- Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef] [PubMed]
- Sánchez-Maroño, N.; Alonso-Betanzos, A.; Tombilla-Sanromán, M. Filter methods for feature selection—A comparative study. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning, Birmingham, UK, 16–19 December 2007; pp. 178–187. [Google Scholar]
- Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
- Kursa, M.B.; Rudnicki, W.R. Feature selection with the Boruta package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
- Chung, N.C.; Storey, J.D. Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 2014, 31, 545–554. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Meinshausen, N.; Bühlmann, P. Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 2010, 72, 417–473. [Google Scholar] [CrossRef] [Green Version]
- Sill, M.; Saadati, M.; Benner, A. Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data. Bioinformatics 2015, 31, 2683–2690. [Google Scholar] [CrossRef] [Green Version]
- Haury, A.-C.; Mordelet, F.; Vera-Licona, P.; Vert, J.-P. TIGRESS: Trustful inference of gene regulation using stability selection. BMC Syst. Biol. 2012, 6, 145. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Pineda, S.; Real, F.X.; Kogevinas, M.; Carrato, A.; Chanock, S.J.; Malats, N.; Van Steen, K. Integration analysis of three omics data using penalized regression methods: An application to bladder cancer. PLoS Genet. 2015, 11, e1005689. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Wu, F.-X.; Ngom, A. A review on machine learning principles for multi-view biological data integration. Brief. Bioinform. 2016, 19, 325–340. [Google Scholar] [CrossRef] [PubMed]
- Tini, G.; Marchetti, L.; Priami, C.; Scott-Boyer, M.-P. Multi-omics integration—A comparison of unsupervised clustering methodologies. Brief Bioinform. 2017. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.; Oesterreich, S.; Kim, S.; Park, Y.; Tseng, G.C. Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization. Biostatistics 2017, 18, 165–179. [Google Scholar] [CrossRef] [PubMed]
- Rohart, F.; Gautier, B.; Singh, A.; Le Cao, K.-A. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 2017, 13, e1005752. [Google Scholar] [CrossRef] [PubMed]
- Mallik, S.; Bhadra, T.; Maulik, U. Identifying epigenetic biomarkers using maximal relevance and minimal redundancy based feature selection for multi-omics data. IEEE Trans. Nanobiosci. 2017, 16, 3–10. [Google Scholar] [CrossRef] [PubMed]
- Liu, C.; Wang, X.; Genchev, G.Z.; Lu, H. Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction. Methods 2017, 124, 100–107. [Google Scholar] [CrossRef] [PubMed]
- Poruthoor, A.; Phan, J.H.; Kothari, S.; Wang, M.D. Exploration of genomic, proteomic, and histopathological image data integration methods for clinical prediction. In Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, IEEE China Summit & International Conference on Signal and Information Processing, Beijing, China, 6–10 July 2013; p. 259. [Google Scholar]
- Narvaez-Bandera, I.; Sanchez, F. Integration of Multi Omics Data for Breast Cancer Subtype Classification. In IIE Annual Conference Proceedings; Institute of Industrial and Systems Engineers (IISE): Norcross, GA, USA, 2017; pp. 1314–1319. [Google Scholar]
- Chen, Q.; Meng, Z.; Liu, X.; Jin, Q.; Su, R. Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE. Genes 2018, 9, 301. [Google Scholar] [CrossRef] [PubMed]
- Mo, Q.; Wang, S.; Seshan, V.E.; Olshen, A.B.; Schultz, N.; Sander, C.; Powers, R.S.; Ladanyi, M.; Shen, R. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl. Acad. Sci. USA 2013. [Google Scholar] [CrossRef] [PubMed]
- Kim, M.; Rai, N.; Zorraquino, V.; Tagkopoulos, I. Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli. Nat. Commun. 2016, 7, 13090. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, Y.; Li, A.; Peng, C.; Wang, M. Improve glioblastoma multiforme prognosis prediction by using feature selection and multiple kernel learning. IEEE ACM Trans. Comput. Biol. Bioinform. TCBB 2016, 13, 825–835. [Google Scholar] [CrossRef] [PubMed]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A.A.; Kim, S.; Wilson, C.J.; Lehár, J.; Kryukov, G.V.; Sonkin, D. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012, 483, 603. [Google Scholar] [CrossRef] [PubMed]
- Spicker, J.S.; Brunak, S.; Frederiksen, K.S.; Toft, H. Integration of clinical chemistry, expression, and metabolite data leads to better toxicological class separation. Toxicol. Sci. 2008, 102, 444–454. [Google Scholar] [CrossRef] [PubMed]
- Aben, N.; Vis, D.J.; Michaut, M.; Wessels, L.F. TANDEM: A two-stage approach to maximize interpretability of drug response models based on multiple molecular data types. Bioinformatics 2016, 32, i413–i420. [Google Scholar] [CrossRef] [PubMed]
- Gönen, M.; Alpaydın, E. Multiple kernel learning algorithms. J. Mach. Learn. Res. 2011, 12, 2211–2268. [Google Scholar]
- Rakotomamonjy, A.; Bach, F.R.; Canu, S.; Grandvalet, Y. SimpleMKL. J. Mach. Learn. Res. 2008, 9, 2491–2521. [Google Scholar]
- Speicher, N.K.; Pfeifer, N. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics 2015, 31, i268–i275. [Google Scholar] [CrossRef] [PubMed]
- Le, D.-H.; Pham, V.-H. Drug Response Prediction by Globally Capturing Drug and Cell Line Information in a Heterogeneous Network. J. Mol. Biol. 2018, 18, 2993–3004. [Google Scholar] [CrossRef]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009; ISBN 0-262-01319-3. [Google Scholar]
- Davies, S.; Moore, A. Mix-nets: Factored mixtures of gaussians in Bayesian networks with mixed continuous and discrete variables. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 2000; pp. 168–175. [Google Scholar]
- Wahl, S.; Vogt, S.; Stückler, F.; Krumsiek, J.; Bartel, J.; Kacprowski, T.; Schramm, K.; Carstensen, M.; Rathmann, W.; Roden, M. Multi-omic signature of body weight change: Results from a population-based cohort study. BMC Med. 2015, 13, 48. [Google Scholar] [CrossRef] [PubMed]
- Langfelder, P.; Horvath, S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinform. 2008, 9, 559. [Google Scholar] [CrossRef] [PubMed]
- Krumsiek, J.; Suhre, K.; Illig, T.; Adamski, J.; Theis, F.J. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Syst. Biol. 2011, 5, 21. [Google Scholar] [CrossRef] [PubMed]
- Vaske, C.J.; Benz, S.C.; Sanborn, J.Z.; Earl, D.; Szeto, C.; Zhu, J.; Haussler, D.; Stuart, J.M. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 2010, 26, i237–i245. [Google Scholar] [CrossRef] [PubMed]
- Cheng, W.; Shi, Y.; Zhang, X.; Wang, W. Fast and robust group-wise eQTL mapping using sparse graphical models. BMC Bioinform. 2015, 16, 2. [Google Scholar] [CrossRef] [PubMed]
- Dimitrakopoulos, C.; Hindupur, S.K.; Häfliger, L.; Behr, J.; Montazeri, H.; Hall, M.N.; Beerenwinkel, N. Network-based integration of multi-omics data for prioritizing cancer genes. Bioinformatics 2018, 34, 2441–2448. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shi, C.; Li, Y.; Zhang, J.; Sun, Y.; Philip, S.Y. A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 2017, 29, 17–37. [Google Scholar] [CrossRef]
- Tsuyuzaki, K.; Nikaido, I. Biological Systems as Heterogeneous Information Networks: A Mini-review and Perspectives. arXiv, 2017; arXiv:171208865. [Google Scholar]
- Hosseini, A.; Chen, T.; Wu, W.; Sun, Y.; Sarrafzadeh, M. HeteroMed: Heterogeneous Information Network for Medical Diagnosis. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 763–772. [Google Scholar]
- Ge, S.-G.; Xia, J.; Sha, W.; Zheng, C.-H. Cancer subtype discovery based on integrative model of multigenomic data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 1115–1121. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, T.D.; Tran, T.; Phung, D.; Venkatesh, S. Latent patient profile modelling and applications with mixed-variate restricted Boltzmann machine. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Gold Coast, Australia, 14–17 April 2013; pp. 123–135. [Google Scholar]
- Frey, B.J.; Dueck, D. Clustering by passing messages between data points. Science 2007, 315, 972–976. [Google Scholar] [CrossRef] [PubMed]
- Liang, M.; Li, Z.; Chen, T.; Zeng, J. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE ACM Trans. Comput. Biol. Bioinform. TCBB 2015, 12, 928–937. [Google Scholar] [CrossRef] [PubMed]
- Srivastava, N.; Salakhutdinov, R.R. Multimodal learning with deep boltzmann machines. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 2222–2230. [Google Scholar]
- Choi, J.; Park, S.; Yoon, Y.; Ahn, J. Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers. Bioinformatics 2017, 33, 3619–3626. [Google Scholar] [CrossRef] [PubMed]
- Sun, D.; Wang, M.; Li, A. A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018. [Google Scholar] [CrossRef] [PubMed]
- Chaudhary, K.; Poirion, O.B.; Lu, L.; Garmire, L.X. Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer. Clin. Cancer Res. 2018, 24, 1248–1259. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Zhang, L.; Payne, P.R.; Li, F. Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models. arXiv, 2018; arXiv:181107054. [Google Scholar]
- Choi, H.; Pavelka, N. When one and one gives more than two: Challenges and opportunities of integrative omics. Front. Genet. 2012, 2, 105. [Google Scholar] [CrossRef] [PubMed]
- Torres-García, W.; Zhang, W.; Runger, G.C.; Johnson, R.H.; Meldrum, D.R. Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: A non-linear model to predict abundance of undetected proteins. Bioinformatics 2009, 25, 1905–1914. [Google Scholar] [CrossRef] [PubMed]
- Misra, B.B.; Langefeld, C.D.; Olivier, M.; Cox, L.A. Integrated Omics: Tools, Advances, and Future Approaches. J. Mol. Endocrinol. 2018. [Google Scholar] [CrossRef] [PubMed]
- Rouillard, A.D.; Wang, Z.; Ma’ayan, A. Abstraction for data integration: Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction. Comput. Biol. Chem. 2015, 58, 104. [Google Scholar] [CrossRef] [PubMed]
- Lin, D.; Zhang, J.; Li, J.; Xu, C.; Deng, H.-W.; Wang, Y.-P. An integrative imputation method based on multi-omics datasets. BMC Bioinform. 2016, 17, 247. [Google Scholar] [CrossRef] [PubMed]
- Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
- Allison, P.D. Estimation of linear models with incomplete data. Sociol. Methodol. 1987, 71–103. [Google Scholar] [CrossRef]
- Allison, P.D. Missing Data; Sage Publications: Thousand Oaks, CA, USA, 2001; Volume 136, ISBN 1-4522-0790-9. [Google Scholar]
- Allison, P.D. Handling missing data by maximum likelihood. In Proceedings of the SAS Global Forum, Statistical Horizons, Havenford, PA, USA, 22–25 April 2012. [Google Scholar]
- Mias, G.I.; Yusufaly, T.; Roushangar, R.; Brooks, L.R.; Singh, V.V.; Christou, C. MathIOmica: An integrative platform for dynamic omics. Sci. Rep. 2016, 6, 37237. [Google Scholar] [CrossRef] [PubMed]
- Kohl, M.; Megger, D.A.; Trippler, M.; Meckel, H.; Ahrens, M.; Bracht, T.; Weber, F.; Hoffmann, A.-C.; Baba, H.A.; Sitek, B. A practical data processing workflow for multi-OMICS projects. Biochim. Biophys. Acta BBA-Proteins Proteom. 2014, 1844, 52–62. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Newgard, C.D.; Lewis, R.J. Missing data: How to best account for what is not known. Jama 2015, 314, 940–941. [Google Scholar] [CrossRef] [PubMed]
- Schafer, J.L. Analysis of Incomplete Multivariate Data; Chapman and Hall/CRC: Boca Raton, FL, USA, 1997; ISBN 1-4398-2186-0. [Google Scholar]
- Van Buuren, S.; Brand, J.P.; Groothuis-Oudshoorn, C.G.; Rubin, D.B. Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. 2006, 76, 1049–1064. [Google Scholar] [CrossRef] [Green Version]
- Honaker, J.; King, G.; Blackwell, M. Amelia II: A program for missing data. J. Stat. Softw. 2011, 45, 1–47. [Google Scholar] [CrossRef]
- Morris, T.P.; White, I.R.; Royston, P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med. Res. Methodol. 2014, 14, 75. [Google Scholar] [CrossRef] [PubMed]
- Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; John Wiley & Sons: Hoboken, NJ, USA, 2004; Volume 81, ISBN 0-471-65574-0. [Google Scholar]
- Voillet, V.; Besse, P.; Liaubet, L.; San Cristobal, M.; González, I. Handling missing rows in multi-omics data integration: Multiple imputation in multiple factor analysis framework. BMC Bioinform. 2016, 17, 402. [Google Scholar] [CrossRef] [PubMed]
- Graham, J.W. Missing data analysis: Making it work in the real world. Annu. Rev. Psychol. 2009, 60, 549–576. [Google Scholar] [CrossRef] [PubMed]
- Carpenter, J.; Kenward, M. Multiple Imputation and Its Application; John Wiley & Sons: Hoboken, NJ, USA, 2012; ISBN 1-119-94227-6. [Google Scholar]
- Yadav, M.L.; Roychoudhury, B. Handling Missing Values: A study of Popular Imputation Packages in R. Knowl.-Based Syst. 2018, 160, 104–118. [Google Scholar] [CrossRef]
- Sovilj, D.; Eirola, E.; Miche, Y.; Björk, K.-M.; Nian, R.; Akusok, A.; Lendasse, A. Extreme learning machine for missing data using multiple imputations. Neurocomputing 2016, 174, 220–231. [Google Scholar] [CrossRef]
- Shah, A.D.; Bartlett, J.W.; Carpenter, J.; Nicholas, O.; Hemingway, H. Comparison of random forest and parametric imputation models for imputing missing data using MICE: A CALIBER study. Am. J. Epidemiol. 2014, 179, 764–774. [Google Scholar] [CrossRef] [PubMed]
- Beaulieu-Jones, B.K.; Moore, J.H. Missing data imputation in the electronic health record using deeply learned autoencoders. In Proceedings of the Pacific Symposium on Biocomputing, Kohala Coast, HI, USA, 3–7 January 2017; pp. 207–218. [Google Scholar]
- Gondara, L.; Wang, K. Mida: Multiple imputation using denoising autoencoders. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, VIC, Australia, 3–6 June 2018; pp. 260–272. [Google Scholar]
- Gondara, L.; Wang, K. Recovering loss to followup information using denoising autoencoders. arXiv, 2018; arXiv:180204664. [Google Scholar]
- Talwar, D.; Mongia, A.; Sengupta, D.; Majumdar, A. AutoImpute: Autoencoder based imputation of single-cell RNA-seq data. Sci. Rep. 2018, 8, 16329. [Google Scholar] [CrossRef] [PubMed]
- Linderman, G.C.; Zhao, J.; Kluger, Y. Zero-preserving imputation of scRNA-seq data using low-rank approximation. bioRxiv 2018. [Google Scholar] [CrossRef]
- Troyanskaya, O.; Cantor, M.; Sherlock, G.; Brown, P.; Hastie, T.; Tibshirani, R.; Botstein, D.; Altman, R.B. Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17, 520–525. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jiang, B.; Ma, S.; Causey, J.; Qiao, L.; Hardin, M.P.; Bitts, I.; Johnson, D.; Zhang, S.; Huang, X. SparRec: An effective matrix completion framework of missing data imputation for GWAS. Sci. Rep. 2016, 6, 35534. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Davies, R.W.; Flint, J.; Myers, S.; Mott, R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 2016, 48, 965. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Zhu, X.; Li, M.; Wang, L.; Tang, C.; Yin, J.; Shen, D.; Wang, H.; Gao, W. Late Fusion Incomplete Multi-view Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2018. [Google Scholar] [CrossRef]
- Yu, H.; Sun, C.; Yang, W.; Xu, S.; Dan, Y. A Review of Class Imbalance Learning Methods in Bioinformatics. Curr. Bioinform. 2015, 10, 360–369. [Google Scholar] [CrossRef]
- Kleftogiannis, D.; Kalnis, P.; Bajic, V.B. DEEP: A general computational framework for predicting enhancers. Nucleic Acids Res. 2014, 43, e6. [Google Scholar] [CrossRef]
- Triguero, I.; del Río, S.; López, V.; Bacardit, J.; Benítez, J.M.; Herrera, F. ROSEFW-RF: The winner algorithm for the ECBDL’14 big data competition: An extremely imbalanced big data bioinformatics problem. Knowl.-Based Syst. 2015, 87, 69–79. [Google Scholar] [CrossRef] [Green Version]
- Aledo, J.C.; Cantón, F.R.; Veredas, F.J. A machine learning approach for predicting methionine oxidation sites. BMC Bioinform. 2017, 18, 430. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hu, J.; Li, Y.; Zhang, M.; Yang, X.; Shen, H.-B.; Yu, D.-J. Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 1389–1398. [Google Scholar] [CrossRef] [PubMed]
- Ding, J.; Zhou, S.; Guan, J. MiRenSVM: Towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features. BMC Bioinform. 2010, 11, S11. [Google Scholar] [CrossRef] [PubMed]
- Fernández-Martínez, J.L.; de Andrés-Galiana, E.J.; Sonis, S.T. Genomic data integration in chronic lymphocytic leukemia. J. Gene Med. 2017, 19, e2936. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C. iDNA-Methyl: Identifying DNA methylation sites via pseudo trinucleotide composition. Anal. Biochem. 2015, 474, 69–77. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Spector, T.D.; Deloukas, P.; Bell, J.T.; Engelhardt, B.E. Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome Biol. 2015, 16, 14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wei, Z.-S.; Yang, J.-Y.; Shen, H.-B.; Yu, D.-J. A cascade random forests algorithm for predicting protein-protein interaction sites. IEEE Trans. Nanobioscience 2015, 14, 746–760. [Google Scholar] [CrossRef]
- Wei, Z.-S.; Han, K.; Yang, J.-Y.; Shen, H.-B.; Yu, D.-J. Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing 2016, 193, 201–212. [Google Scholar] [CrossRef]
- Lin, W.; Xu, D. Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types. Bioinformatics 2016, 32, 3745–3752. [Google Scholar] [CrossRef] [Green Version]
- Troisi, J.; Sarno, L.; Martinelli, P.; Di Carlo, C.; Landolfi, A.; Scala, G.; Rinaldi, M.; D’Alessandro, P.; Ciccone, C.; Guida, M. A metabolomics-based approach for non-invasive diagnosis of chromosomal anomalies. Metabolomics 2017, 13, 140. [Google Scholar] [CrossRef]
- Dubey, R.; Zhou, J.; Wang, Y.; Thompson, P.M.; Ye, J.; Initiative, A.D.N. Analysis of sampling techniques for imbalanced data: An n= 648 ADNI study. NeuroImage 2014, 87, 220–241. [Google Scholar] [CrossRef] [PubMed]
- Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2008, 1263–1284. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Lin, W.-J.; Chen, J.J. Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 2012, 14, 13–26. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Huang, C.-C.; Chang, C.-C.; Chen, C.-W.; Ho, S.; Chang, H.-P.; Chu, Y.-W. PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection. Genes 2018, 9, 91. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Yan, L.-F.; Hu, Y.-C.; Li, G.; Yang, Y.; Han, Y.; Sun, Y.-Z.; Liu, Z.-C.; Tian, Q.; Han, Z.-Y. Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features. Oncotarget 2017, 8, 47816. [Google Scholar] [CrossRef]
- Bach, M.; Werner, A.; Żywiec, J.; Pluskiewicz, W. The study of under-and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis. Inf. Sci. 2017, 384, 174–190. [Google Scholar] [CrossRef]
- Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One-sided selection. In Proceedings of the ICML, Nashville, TN, USA, 8–12 July 1997; pp. 179–186. [Google Scholar]
- Veropoulos, K.; Campbell, C.; Cristianini, N. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on AI, Stockholm, Sweden, 31 July–6 August 1999; p. 60. [Google Scholar]
- Bao, F.; Deng, Y.; Zhao, Y.; Suo, J.; Dai, Q. Bosco: Boosting corrections for genome-wide association studies with imbalanced samples. IEEE Trans. Nanobiosci. 2017, 16, 69–77. [Google Scholar] [CrossRef]
- Martina, F.; Beccuti, M.; Balbo, G.; Cordero, F. Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets. PLoS ONE 2017, 12, e0177475. [Google Scholar] [CrossRef]
- Liu, Z.; Tang, D.; Cai, Y.; Wang, R.; Chen, F. A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data. Neurocomputing 2017, 266, 641–650. [Google Scholar] [CrossRef]
- Liu, G.-H.; Shen, H.-B.; Yu, D.-J. Prediction of protein–protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J. Membr. Biol. 2016, 249, 141–153. [Google Scholar] [CrossRef] [PubMed]
- Mirza, B.; Lin, Z.; Liu, N. Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 2015, 149, 316–329. [Google Scholar] [CrossRef]
- Chen, L.; Jin, P.; Qin, Z.S. DIVAN: Accurate identification of non-coding disease-specific risk variants using multi-omics profiles. Genome Biol. 2016, 17, 252. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.-Y.; Wu, J.; Zhou, Z.-H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2009, 39, 539–550. [Google Scholar]
- Yang, P.; Hwa Yang, Y.; Zhou, B.B.; Zomaya, A.Y. A review of ensemble methods in bioinformatics. Curr. Bioinform. 2010, 5, 296–308. [Google Scholar] [CrossRef]
- Li, C.-X.; Wheelock, C.E.; Sköld, C.M.; Wheelock, Å.M. Integration of multi-omics datasets enables molecular classification of COPD. Eur. Respir. J. 2018, 1701930. [Google Scholar] [CrossRef]
- Yan, K.K.; Zhao, H.; Pang, H. A comparison of graph-and kernel-based–omics data integration algorithms for classifying complex traits. BMC Bioinform. 2017, 18, 539. [Google Scholar] [CrossRef]
- Singh, A.; Gautier, B.; Shannon, C.P.; Rohart, F.; Vacher, M.; Tebutt, S.J.; Le Cao, K.-A. DIABLO: From multi-omics assays to biomarker discovery, an integrative approach. bioRxiv 2018. [Google Scholar] [CrossRef]
- Bica, I.; Velickovic, P.; Xiao, H.; Li, P. Multi-omics data integration using cross-modal neural networks. In Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2018), Bruges, Belgium, 25–27 April 2018. [Google Scholar]
- Lin, X.; Chen, X. Heterogeneous data integration by tree-augmented naïve B ayes for protein–protein interactions prediction. Proteomics 2013, 13, 261–268. [Google Scholar] [CrossRef]
- Goldfarb, D.; Hast, B.; Wang, W.; Major, M.B. An Improved Algorithm and Web Application for Predicting Co-Complexed Proteins from Affinity Purification–Mass Spectrometry Data. J. Proteome Res. 2014, 13, 5944. [Google Scholar] [CrossRef] [PubMed]
- Frasca, M.; Bertoni, A.; Valentini, G. UNIPred: Unbalance-aware Network Integration and Prediction of protein functions. J. Comput. Biol. 2015, 22, 1057–1074. [Google Scholar] [CrossRef] [PubMed]
- Yu, G.; Zhu, H.; Domeniconi, C.; Guo, M. Integrating multiple networks for protein function prediction. In Proceedings of the BMC Systems Biology; BioMed Central: London, UK, 2015; Volume 9, p. S3. [Google Scholar]
- Kwon, M.-S.; Kim, Y.; Lee, S.; Namkung, J.; Yun, T.; Yi, S.G.; Han, S.; Kang, M.; Kim, S.W.; Jang, J.-Y. Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer. BMC Genom. 2015, 16, S4. [Google Scholar] [CrossRef] [PubMed]
- Song, Y.; Westerhuis, J.A.; Aben, N.; Wessels, L.F.; Groenen, P.J.; Smilde, A.K. Generalized Simultaneous Component Analysis of Binary and Quantitative data. arXiv, 2018; arXiv:180704982. [Google Scholar]
- Re, M.; Valentini, G. Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction. In Proceedings of the MLSB, PMLR, Ljubljana, Slovenia, 5–6 September 2009; Volume 8, pp. 98–111. [Google Scholar]
- Yu, H.; Hong, S.; Yang, X.; Ni, J.; Dan, Y.; Qin, B. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers. BioMed Res. Int. 2013, 2013, 239628. [Google Scholar] [CrossRef] [PubMed]
- Fortino, V.; Kinaret, P.; Fyhrquist, N.; Alenius, H.; Greco, D. A robust and accurate method for feature selection and prioritization from multi-class OMICs data. PLoS ONE 2014, 9, e107801. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Zhang, Y.-H.; Huang, G.; Pan, X.; Wang, S.; Huang, T.; Cai, Y.-D. Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection. Mol. Genet. Genom. 2018, 293, 137–149. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Suganthan, P.N. A survey of randomized algorithms for training neural networks. Inf. Sci. 2016, 364, 146–155. [Google Scholar] [CrossRef]
- Cao, W.; Wang, X.; Ming, Z.; Gao, J. A review on neural networks with random weights. Neurocomputing 2018, 275, 278–287. [Google Scholar] [CrossRef]
- Tang, J.; Deng, C.; Huang, G.-B. Extreme learning machine for multilayer perceptron. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 809–821. [Google Scholar] [CrossRef]
- Lai, X.; Cao, J.; Lin, Z. A Novel Relaxed ADMM with Highly Parallel Implementation for Extreme Learning Machine. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar]
- Wang, X.; Cao, W. Non-Iterative Approaches in Training Feed-Forward Neural Networks and Their Applications. Soft Comput. 2018, 22, 3473–3476. [Google Scholar] [CrossRef]
- Huang, G.-B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2012, 42, 513–529. [Google Scholar] [CrossRef] [PubMed]
- Pao, Y.-H.; Takefuji, Y. Functional-link net computing: Theory, system architecture, and functionalities. Computer 1992, 25, 76–79. [Google Scholar] [CrossRef]
- Zhang, L.; Suganthan, P.N. A comprehensive evaluation of random vector functional link networks. Inf. Sci. 2016, 367, 1094–1105. [Google Scholar] [CrossRef]
- Maass, W.; Natschläger, T.; Markram, H. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Comput. 2002, 14, 2531–2560. [Google Scholar] [CrossRef] [PubMed]
- Jaeger, H. Adaptive nonlinear system identification with echo state networks. In Proceedings of the Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2003; Volume 15, pp. 593–600. [Google Scholar]
- Cevher, V.; Becker, S.; Schmidt, M. Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Process. Mag. 2014, 31, 32–43. [Google Scholar] [CrossRef]
- Rubiolo, M.; Milone, D.H.; Stegmayer, G. Extreme learning machines for reverse engineering of gene regulatory networks from expression time series. Bioinformatics 2017, 34, 1253–1260. [Google Scholar] [CrossRef] [PubMed]
- Lei, H.; Wen, Y.; Elazab, A.; Tan, E.-L.; Zhao, Y.; Lei, B. Protein-protein Interactions Prediction via Multimodal Deep Polynomial Network and Regularized Extreme Learning Machine. IEEE J. Biomed. Health Inform. 2018. [Google Scholar] [CrossRef]
- Belciug, S.; Gorunescu, F. Learning a single-hidden layer feedforward neural network using a rank correlation-based strategy with application to high dimensional gene expression and proteomic spectra datasets in cancer detection. J. Biomed. Inform. 2018, 83, 159–166. [Google Scholar] [CrossRef] [PubMed]
- Pian, C.; Zhang, G.; Chen, Z.; Chen, Y.; Zhang, J.; Yang, T.; Zhang, L. LncRNApred: Classification of long non-coding RNAs and protein-coding transcripts by the ensemble algorithm with a new hybrid feature. PLoS ONE 2016, 11, e0154567. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, T.V.; Mirza, B. Dual-layer kernel extreme learning machine for action recognition. Neurocomputing 2017, 260, 123–130. [Google Scholar] [CrossRef]
- Aiolli, F.; Donini, M. EasyMKL: A scalable multiple kernel learning algorithm. Neurocomputing 2015, 169, 215–224. [Google Scholar] [CrossRef]
- Hoi, S.C.; Sahoo, D.; Lu, J.; Zhao, P. Online Learning: A Comprehensive Survey. arXiv, 2018; arXiv:180202871. [Google Scholar]
- Georga, E.I.; Protopappas, V.C.; Polyzos, D.; Fotiadis, D.I. Online prediction of glucose concentration in type 1 diabetes using extreme learning machines. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 3262–3265. [Google Scholar]
- Liang, N.-Y.; Huang, G.-B.; Saratchandran, P.; Sundararajan, N. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 2006, 17, 1411–1423. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.A.; Bottou, L.; Orr, G.B.; Müller, K.-R. Efficient backprop. In Neural Networks: Tricks of the Trade; Springer: Berlin, Germany, 2012; pp. 9–48. [Google Scholar]
- Cauwenberghs, G.; Poggio, T. Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2001; Volume 13, pp. 409–415. [Google Scholar]
- Gu, B.; Quan, X.; Gu, Y.; Sheng, V.S. Chunk Incremental Learning for Cost-Sensitive Hinge Loss Support Vector Machine. Pattern Recognit. 2018, 83, 196–208. [Google Scholar] [CrossRef]
- Mirza, B.; Kok, S.; Dong, F. Multi-layer online sequential extreme learning machine for image classification. In Proceedings of ELM-2015; Springer: Berlin, Germany, 2016; Volume 1, pp. 39–49. [Google Scholar]
- Sahoo, D.; Pham, Q.; Lu, J.; Hoi, S.C. Online deep learning: Learning deep neural networks on the fly. arXiv, 2017; arXiv:171103705. [Google Scholar]
- Dean, J.; Ghemawat, S. MapReduce: Simplified data processing on large clusters. Commun. ACM 2008, 51, 107–113. [Google Scholar] [CrossRef]
- Zou, Q.; Li, X.-B.; Jiang, W.-R.; Lin, Z.-Y.; Li, G.-L.; Chen, K. Survey of MapReduce frame operation in bioinformatics. Brief. Bioinform. 2013, 15, 637–647. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- White, T. Hadoop: The Definitive Guide; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2012; ISBN 1-4493-1152-0. [Google Scholar]
- Foss, A.; Markatou, M.; Ray, B.; Heching, A. A semiparametric method for clustering mixed data. Mach. Learn. 2016, 105, 419–458. [Google Scholar] [CrossRef] [Green Version]
- Foss, A.H.; Markatou, M. kamila: Clustering Mixed-Type Data in R and Hadoop. J. Stat. Softw. 2018, 83, 1–44. [Google Scholar] [CrossRef] [Green Version]
- Zaharia, M.; Xin, R.S.; Wendell, P.; Das, T.; Armbrust, M.; Dave, A.; Meng, X.; Rosen, J.; Venkataraman, S.; Franklin, M.J. Apache spark: A unified engine for big data processing. Commun. ACM 2016, 59, 56–65. [Google Scholar] [CrossRef]
- Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.B.; Amde, M.; Owen, S. Mllib: Machine learning in apache spark. J. Mach. Learn. Res. 2016, 17, 1235–1241. [Google Scholar]
- Owen, S.; Anil, R.; Dunning, T.; Friedman, E. Mahout in Action; Manning Publications Co.: Shelter Island, NY, USA, 2011; ISBN 1-935182-68-4. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Savannah, GA, USA, 2–4 November 2016; USENIX Association: Berkeley, CA, USA; Volume 16, pp. 265–283. [Google Scholar]
- Afgan, E.; Baker, D.; Batut, B.; van den Beek, M.; Bouvier, D.; Čech, M.; Chilton, J.; Clements, D.; Coraor, N.; Grüning, B.A. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018, 46, W537–W544. [Google Scholar] [CrossRef]
- Afgan, E.; Baker, D.; Coraor, N.; Goto, H.; Paul, I.M.; Makova, K.D.; Nekrutenko, A.; Taylor, J. Harnessing cloud computing with Galaxy Cloud. Nat. Biotechnol. 2011, 29, 972–974. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fisch, K.M.; Meißner, T.; Gioia, L.; Ducom, J.-C.; Carland, T.M.; Loguercio, S.; Su, A.I. Omics Pipe: A community-based framework for reproducible multi-omics data analysis. Bioinformatics 2015, 31, 1724–1728. [Google Scholar] [CrossRef] [PubMed]
- Forsberg, E.M.; Huan, T.; Rinehart, D.; Benton, H.P.; Warth, B.; Hilmers, B.; Siuzdak, G. Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online. Nat. Protoc. 2018, 13, 633–651. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chong, J.; Soufan, O.; Li, C.; Caraus, I.; Li, S.; Bourque, G.; Wishart, D.S.; Xia, J. MetaboAnalyst 4.0: Towards more transparent and integrative metabolomics analysis. Nucleic Acids Res. 2018, 46, W486–W494. [Google Scholar] [CrossRef] [PubMed]
- Tafti, A.P.; LaRose, E.; Badger, J.C.; Kleiman, R.; Peissig, P. Machine learning-as-a-service and its application to medical informatics. In Proceedings of the International Conference on Machine Learning and Data Mining in Pattern Recognition, New York, NY, USA, 15–20 July 2017; pp. 206–219. [Google Scholar]
- Price, N.D.; Magis, A.T.; Earls, J.C.; Glusman, G.; Levy, R.; Lausted, C.; McDonald, D.T.; Kusebauch, U.; Moss, C.L.; Zhou, Y. A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nat. Biotechnol. 2017, 35, 747. [Google Scholar] [CrossRef] [PubMed]
- Glaab, E. Using prior knowledge from cellular pathways and molecular networks for diagnostic specimen classification. Brief. Bioinform. 2015, 17, 440–452. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Greene, C.S.; Krishnan, A.; Wong, A.K.; Ricciotti, E.; Zelaya, R.A.; Himmelstein, D.S.; Zhang, R.; Hartmann, B.M.; Zaslavsky, E.; Sealfon, S.C. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 2015, 47, 569. [Google Scholar] [CrossRef] [PubMed]
- Yao, V.; Kaletsky, R.; Keyes, W.; Mor, D.E.; Wong, A.K.; Sohrabi, S.; Murphy, C.T.; Troyanskaya, O.G. An integrative tissue-network approach to identify and test human disease genes. Nat. Biotechnol. 2018, 36, 1091–1099. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Pan, C.; Zhang, S.; Spin, J.M.; Deng, A.; Leung, L.L.; Dalman, R.L.; Tsao, P.S.; Snyder, M. Decoding the Genomics of Abdominal Aortic Aneurysm. Cell 2018, 174, 1361–1372. [Google Scholar] [CrossRef] [PubMed]
- Ritchie, M.D. Large-Scale Analysis of Genetic and Clinical Patient Data. Annu. Rev. Biomed. Data Sci. 2018, 1, 263–274. [Google Scholar] [CrossRef]
- Liem, D.A.; Murali, S.; Sigdel, D.; Shi, Y.; Wang, X.; Shen, J.; Choi, H.; Caufield, J.H.; Wang, W.; Ping, P. Phrase Mining of Textual Data to Analyze Extracellular Matrix Protein Patterns Across Cardiovascular Disease. Am. J. Physiol.-Heart Circ. Physiol. 2018. [Google Scholar] [CrossRef] [PubMed]
- Tao, F.; Zhuang, H.; Yu, C.W.; Wang, Q.; Cassidy, T.; Kaplan, L.R.; Voss, C.R.; Han, J. Multi-Dimensional, Phrase-Based Summarization in Text Cubes. IEEE Data Eng. Bull. 2016, 39, 74–84. [Google Scholar]
- Shokri, R.; Shmatikov, V. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 1310–1321. [Google Scholar]
- Beaulieu-Jones, B.K.; Wu, Z.S.; Williams, C.; Greene, C.S. Privacy-preserving generative deep neural networks support clinical data sharing. BioRxiv 2017. [Google Scholar] [CrossRef]
- Olson, R.S.; La Cava, W.; Orzechowski, P.; Urbanowicz, R.J.; Moore, J.H. PMLB: A large benchmark suite for machine learning evaluation and comparison. BioData Min. 2017, 10, 36. [Google Scholar] [CrossRef] [PubMed]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mirza, B.; Wang, W.; Wang, J.; Choi, H.; Chung, N.C.; Ping, P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes 2019, 10, 87. https://doi.org/10.3390/genes10020087
Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes. 2019; 10(2):87. https://doi.org/10.3390/genes10020087
Chicago/Turabian StyleMirza, Bilal, Wei Wang, Jie Wang, Howard Choi, Neo Christopher Chung, and Peipei Ping. 2019. "Machine Learning and Integrative Analysis of Biomedical Big Data" Genes 10, no. 2: 87. https://doi.org/10.3390/genes10020087
APA StyleMirza, B., Wang, W., Wang, J., Choi, H., Chung, N. C., & Ping, P. (2019). Machine Learning and Integrative Analysis of Biomedical Big Data. Genes, 10(2), 87. https://doi.org/10.3390/genes10020087