Abstract
Unsupervised techniques are ubiquitous to study and understand the complex patterns that arise when analyzing genomic data at single-cell resolution. Particularly, unsupervised deep learning models provide state-of-the-art solutions for the most common tasks that arise when dealing with scRNA-seq data. However, the biological usefulness of these complex models is burdened by their black-box nature. To address such limitations several lines of research have emerged, from post hoc approximations to ante hoc modeling. In this work, we study the behavior of two biologically-constrained variational autoencoders (ante hoc modeling). On the one hand, we use a one-layer architecture where the constraints come from the signaling pathways, and, on the other hand, we propose a two-layer architecture following the recent trends in mechanistic models of signal transduction. We use the representations learned by the model as proxies of the signaling activity at the single-cell level. We check the performance of the scoring model using a known scRNA-seq public dataset with a clearly established ground truth. Although both models capture the relevant signals, the most pronounced differences are better captured by the one-layer architecture, while the two-layer design is able to learn more fine-grained features that can expose less prominent aspects of the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems, March 2016. https://doi.org/10.48550/arXiv.1603.04467
Aibar, S., et al.: SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14(11), 1083–1086 (2017). https://doi.org/10.1038/nmeth.4463
Badia-i-Mompel, P., et al.: decoupleR: ensemble of computational methods to infer biological activities from omics data. Bioinf. Adv. 2(1), vbac016 (2022). https://doi.org/10.1093/bioadv/vbac016
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. B (Methodological) 57(1), 289–300 (1995). https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Çubuk, C., Loucera, C., Peña-Chilet, M., Dopazo, J.: Crosstalk between metabolite production and signaling activity in breast cancer. Int. J. Mol. Sci. 24(8), 7450 (2023). https://doi.org/10.3390/ijms24087450
Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: A review of some techniques for inclusion of domain-knowledge into deep neural networks. Sci. Rep. 12(1), 1040 (2022). https://doi.org/10.1038/s41598-021-04590-0
Gillespie, M., et al.: The Reactome pathway knowledgebase 2022. Nucleic Acids Res. 50(D1), D687–D692 (2022). https://doi.org/10.1093/nar/gkab1028
Graziani, M., et al.: A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences. Artif. Intell. Rev. 56(4), 3473–3504 (2023). https://doi.org/10.1007/s10462-022-10256-8
Gundogdu, P., Alamo, I., Nepomuceno-Chamorro, I.A., Dopazo, J., Loucera, C.: SigPrimedNet: a signaling-informed neural network for scRNA-seq annotation of known and unknown cell types. Biology 12(4), 579 (2023). https://doi.org/10.3390/biology12040579
Gundogdu, P., Loucera, C., Alamo-Alvarez, I., Dopazo, J., Nepomuceno, I.: Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data. BioData Mining 15(1), 1 (2022). https://doi.org/10.1186/s13040-021-00285-4
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Heumos, L., et al.: Best practices for single-cell analysis across modalities. Nat. Rev. Genet. (2023). https://doi.org/10.1038/s41576-023-00586-w
Hidalgo, M.R., Cubuk, C., Amadoz, A., Salavert, F., Carbonell-Caballero, J., Dopazo, J.: High throughput estimation of functional cell activities reveals disease mechanisms and predicts relevant clinical outcomes. Oncotarget 8(3), 5160–5178 (2016). https://doi.org/10.18632/oncotarget.14107
Kang, H.M., et al.: Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36(1), 89–94 (2018). https://doi.org/10.1038/nbt.4042
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, January 2017. https://doi.org/10.48550/arXiv.1412.6980
Kuenzi, B.M., et al.: Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 38(5), 672-684.e6 (2020). https://doi.org/10.1016/j.ccell.2020.09.014
Lähnemann, D., et al.: Eleven grand challenges in single-cell data science. Genome Biol. 21(1), 31 (2020). https://doi.org/10.1186/s13059-020-1926-6
Levine, J.H., et al.: Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162(1), 184–197 (2015). https://doi.org/10.1016/j.cell.2015.05.047
Li, C., et al.: SciBet as a portable and fast single cell type identifier. Nat. Commun. 11(1), 1818 (2020). https://doi.org/10.1038/s41467-020-15523-2. https://www.nature.com/articles/s41467-020-15523-2, bandiera_abtest: a Cc_license_type: cc_by Cg_type: Nature Research Journals Number: 1 Primary_atype: Research Publisher: Nature Publishing Group Subject_term: Machine learning;Transcriptomics Subject_term_id: machine-learning;transcriptomics
Lotfollahi, M., et al.: Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25(2), 337–350 (2023). https://doi.org/10.1038/s41556-022-01072-x
Ma, J., et al.: Using deep learning to model the hierarchical structure and function of a cell. Nat. Methods 15(4), 290–298 (2018). https://doi.org/10.1038/nmeth.4627
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction, September 2020. https://doi.org/10.48550/arXiv.1802.03426
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999). https://doi.org/10.1093/nar/27.1.29
Petegrosso, R., Li, Z., Kuang, R.: Machine learning and statistical methods for clustering single-cell RNA-sequencing data. Brief. Bioinform. 21(4), 1209–1223 (2020). https://doi.org/10.1093/bib/bbz063
Regev, A., et al.: Human cell atlas meeting participants: the human cell atlas. eLife 6, e27041 (2017). https://doi.org/10.7554/eLife.27041
Traag, V., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9(1), 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z
Virshup, I., et al.: The scverse project provides a computational ecosystem for single-cell omics data analysis. Nat. Biotechnol., 1–3 (2023). https://doi.org/10.1038/s41587-023-01733-8
Virshup, I., Rybakov, S., Theis, F.J., Angerer, P., Wolf, F.A.: Anndata: annotated data, December 2021. https://doi.org/10.1101/2021.12.16.473007
Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17(3), 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Wang, J., Zou, Q., Lin, C.: A comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data. Briefings Bioinf. 23(1), bbab345 (2022). https://doi.org/10.1093/bib/bbab345
Way, G.P., Greene, C.S.: Discovering pathway and cell type signatures in transcriptomic compendia with machine learning. Ann. Rev. Biomed. Data Sci. 2(1), 1–17 (2019). https://doi.org/10.1146/annurev-biodatasci-072018-021348
Wolf, F.A., Angerer, P., Theis, F.J.: SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19(1), 15 (2018). https://doi.org/10.1186/s13059-017-1382-0
Zappia, L., Theis, F.J.: Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol. 22(1), 301 (2021). https://doi.org/10.1186/s13059-021-02519-4
Zhao, Y., Shao, J., Asmann, Y.W.: Assessment and optimization of explainable machine learning models applied to transcriptomic data. Genomics Proteomics Bioinf. 20(5), 899–911 (2022). https://doi.org/10.1016/j.gpb.2022.07.003
Acknowledgements
This work has been partially supported by grants PID2020-117979RB-I00 and PID2020-117954RB-C22 from the Spanish Ministry of Science and Innovation, IMP/00019 from the Instituto de Salud Carlos III (ISCIII), PIP-0087-2021 from Junta de Andalucía, co-funded with European Regional Development Funds (ERDF); grant H2020 Programme of the European Union grants Marie Curie Innovative Training Network “Machine Learning Frontiers in Precision Medicine” (MLFPM) (GA 813533). The authors also acknowledge Junta de Andalucía for the postdoctoral contract of Carlos Loucera (PAIDI2020-DOC_00350) co-funded by the European Social Fund (FSE) 2014-2020.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gundogdu, P., Payá-Milans, M., Alamo-Alvarez, I., Nepomuceno-Chamorro, I.A., Dopazo, J., Loucera, C. (2023). Cell-Level Pathway Scoring Comparison with a Biologically Constrained Variational Autoencoder. In: Pang, J., Niehren, J. (eds) Computational Methods in Systems Biology. CMSB 2023. Lecture Notes in Computer Science(), vol 14137. Springer, Cham. https://doi.org/10.1007/978-3-031-42697-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-42697-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42696-4
Online ISBN: 978-3-031-42697-1
eBook Packages: Computer ScienceComputer Science (R0)