iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction

Lin Yuan; Jiawang Zhao; Zhen Shen; Qinhu Zhang; Yushui Geng; Chun-Hou Zheng; De-Shuang Huang

doi:10.1371/journal.pcbi.1011344

Abstract

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.

Author summary

CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. In this paper, we proposed a novel deep learning-based method called iCircDA-NEAE to discover new potential circRNA-disease associations. Experimental results demonstrated that iCircDA-NEAE outperforms other state-of-the-art prediction methods, and can accurately predict potential circRNA-disease associations. Furthermore, according to the relevant literature, we observed that novel circRNA-disease associations predicted by iCircDA-NEAE are potential associations. The performance of iCircDA-NEAE mainly depends on three factors: (i) iCircDA-NEAE incorporates multi-source biometric information to measure complex associations between circRNAs and diseases. (ii) iCircDA-NEAE uses disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity, and Jaccard similarity to make the most of biometric information in the data. (iii) iCircDA-NEAE incorporates the advantages of ANNE and DCAE, which not only effectively integrates multi-source information, but also effectively captures hidden high-level information of data.

Citation: Yuan L, Zhao J, Shen Z, Zhang Q, Geng Y, Zheng C-H, et al. (2023) iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction. PLoS Comput Biol 19(8): e1011344. https://doi.org/10.1371/journal.pcbi.1011344

Editor: Qinghua Cui, Peking University, CHINA

Received: May 11, 2023; Accepted: July 10, 2023; Published: August 31, 2023

Copyright: © 2023 Yuan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data and codes underlying this article are available at https://github.com/nathanyl/iCircDA-NEAE.

Funding: DSH is supported by STI 2030—Major Projects (No. 2021ZD0200403), the National Key R&D Program of China (Nos. 2018AAA0100100 & 2018YFA0902600), the National Natural Science Foundation of China (Grant nos. 62002266, 61932008, and 62073231), the Key Project of Science and Technology of Guangxi (Grant no. 2021AB20147), Guangxi Natural Science Foundation (Grant nos. 2022JJD170019 & 2021JJA170204 & 2021JJA170199) and Guangxi Science and Technology Base and Talents Special Project (Grant nos. 2021AC19354 & 2021AC19394), CHZ is supported by the National Natural Science Foundation of China (No. U19A2064), LY is supported by the National Natural Science Foundation of China (No. 62002189), the Natural Science Foundation of Shandong Province, China (No. ZR2020QF038) and Technology Small and Medium Enterprises Innovation Capability Improvement Project of Shandong Province (No. 2023TSGC0279), ZS is supported by the National Natural Science Foundation of China (No. 62102200), YSG is supported by the 20 Planned Projects in Jinan (No. 2021GXRC046) and the Excellent Teaching Team Training Plan Project of QILU UNIVERSITY OF TECHNOLOGY. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Circular RNAs (circRNAs) are a class of non-coding RNA characterized by a covalently closed-loop structure generated through a special type of alternative splicing termed back-splicing. Given that circRNAs lack free ends and are thus relatively stable, they are abundant in the eukaryotic transcriptomes. It has been shown that circRNAs are involved in various life activities of organisms, including functioning as microRNA (miRNA) sponges [1], regulating alternative splicing [2], modulating the expression of parental genes [3], etc. In addition, accumulating evidence suggests that circRNAs affect many diseases, such as glioma [4], breast cancer [5], and liver cancer [6]. Therefore, the study of circRNAs is crucial for disease diagnosis and treatment.

At present, identifying circRNA-disease associations is appealing to find potential biomarkers and understand the diagnosis and treatment of diseases. However, the circRNA-disease associations are very complicated and remain still obscure. With the development of sequencing and analysis technology, various biological experiments have emerged to identify circRNA-disease associations [7–9]. However, biological experiments are generally costly and labor-intensive. The experimentally supported circRNA-disease association databases (circ2Disease [10], circRNADisease [11], circR2Disease [12], circ2Traits [13], circFunbase [14]) provide an opportunity to develop computational methods for circRNA-disease association identification.

Recently, researchers have proposed many deep learning-based methods to predict circRNA-disease associations. For example, GCNCDA [15], one of the most well-verified DL-based algorithms, applied graph convolutional network to predict circRNA-disease associations. ASAECDA [16], another impressive DL-based algorithm, calculated weight values of the links between circRNAs and diseases based on graph embedding and stacked autoencoder. GATCDA [17] used graph attention network to predict scores for unknown circRNA-disease associations. IMS-CDA [18] identified potential circRNA-disease associations by incorporating multi-source similarity information into a deep stacked autoencoder model. iCDA-CGR [19] used chaos game representation technology to discover the associations between circRNAs and diseases. RNMFLP predicted circRNA-disease associations based on robust nonnegative matrix factorization and label propagation [20]. iGRLCDA identified circRNA-disease association based on graph representation learning [21]. These methods achieved impressive prediction performance. However, we found that these methods suffer from two major drawbacks. The first is these methods underutilize biometric information in the data. Second, the features used by these methods are not outstanding to represent association characteristics between circRNAs and diseases.

In this study, we developed a novel deep learning model for identifying Circrna-Disease Associations based on accelerated attribute Network Embedding and dynamic convolutional AutoEncoder (iCircDA-NEAE). The proposed model iCircDA-NEAE can (i) make the most of the bio-metric information in the data (ii) enhance the feature extraction capability of the model by using multiple feature extraction methods, and (iii) predict circRNA-disease associations accurately. Specifically, (i) circRNA-disease association data were collected from the circR2Disease database; (ii) disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity, and Jaccard similarity were used to measure the biometric information in the data, then multisource information fusion descriptor was constructed; (iii) accelerated attribute network embedding (AANE) extracts features from the descriptor data; (IV) dynamic convolutional autoencoder (DCAE) extracts hidden features from data; (V) random forest classifier used hidden features to predict circRNA-disease association. The schematic overview of iCircDA-NEAE framework is shown in Fig 1. 5-fold and 10-fold cross-validation on training data and test data experiments were used to validate the model performance. Experimental results show that iCircDA-NEAE outperforms other competing methods significantly. Furthermore, according to the relevant literature, we observe that novel circRNA-disease associations predicted by iCircDA-NEAE are potential associations.

Download:

Fig 1. Schematic overview of iCircDA-NEAE framework.

Experimental data comes from exoRBase dataset, circR2Disease dataset and MeSH dataset. Disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity, and Jaccard similarity are used to measure the biometric information in the data, then multisource information fusion descriptor was constructed. AANE and DCAE are used to learn the features in the data. Random forest classifier are used to predict circRNA-disease association.

https://doi.org/10.1371/journal.pcbi.1011344.g001

Results

Hyperparameter Selection of iCircDA-NEAE

In a random forest classifier, max_feature determines the number of features in each decision tree. Too small max_feature may contain incomplete feature information, while too large max_feature led to overfitting problems. In this section, the important hyperparameter max_feature was investigated experimentally, whereas other hyperparameters were set to default values.

The value of max_feature ranges from 0.1 to 0.5 [22]. As shown in Fig 2, the AUC value of iCircDA-NEAE is the highest when max_feature is set to 0.2. Therefore, in this experiment, we set max_feature to 0.2.

Download:

Fig 2. Comparison of model performance under different max_feature values.

AUC value of iCircDA-NEAE is the highest when max_feature is set to 0.2.

https://doi.org/10.1371/journal.pcbi.1011344.g002

Contribution of AANE and DCAE

In this section, the effects of AANE and DCAE were evaluated by ablation experiments with five different models. Specifically, (i) iCircDA-NEAE without AANE; (ii) iCircDA-NEAE without DCAE; (iii) iCircDA-NEAE without AANE and DCAE; (IV) DCAE replaced by CAE in iCircDA-NEAE; (V) AANE replaced by NE in iCircDA-NEAE.

As shown in Table 1, when we remove AANE or DCAE, the performance drops by about 8%, and after removing both two feature extraction models, the model suffers significant performance degradation. Furthermore, after replacing DCAE and AANE with CAE and NE respectively, both models give worse results than our proposed iCircDA-NEAE model. Experimental results show that both AANE and DCAE are beneficial to circRNA-disease association prediction, and the model outperforms traditional network embedding and convolutional autoencoder.

Download:

Table 1. Ablation study of iCircDA-NEAE with different kinds of feature extraction model.

https://doi.org/10.1371/journal.pcbi.1011344.t001

We compared the run time of iCircDA-NEAE with iCircDA-NEAE’ (DCAE replaced by CAE) on the NVIDIA RTX 3080 GPU with 10GB of VRAM. Experimental results show that the computation time (63 min 27 s) of iCircDA-NEAE is less than that (80 min 23 s) of iCircDA-NEAE’. CAE model are computationally more expensive than DCAE model. The detailed results were recorded in S1 Table.

Comparison with different classifiers

In this section, we compared iCircDA-NEAE with traditional machine learning algorithms as well as common deep learning algorithms, including SVM (Support Vector Machine) [23], RF (Rotation Forest) classifier [24], DNN (Deep Neural Network) [25] and XGBoost [26]. To make the results comparable, we only replaced the classifier in the model with the classifier that need to be compared. The detailed parameters of all classifiers were presented in Table 2.

Download:

Table 2. The detailed parameters of all classifiers.

https://doi.org/10.1371/journal.pcbi.1011344.t002

We compared the performance of iCircDA-NEAE with the five classifiers by using benchmark dataset and two independent datasets (circR-NAdisease and circ2Disease datasets). The ROC curves on the three datasets were shown in Fig 3A–3C, respectively. As shown in Fig 3, iCircDA-NEAE with random forest classifier outperforms other classifiers on all datasets. The ACC, Sen, F1, MCC and AUC values were presented in Table 3. As shown in Table 3, iCircDA-NEAE with random forest classifier outperforms other classifiers on all evaluation metrics.

Download:

Fig 3. The performance of iCircDA-NEAE with five classifiers on three datasets.

(A) The performance on circR2Disease dataset. (B) The performance on circRNAdisease dataset. (C) The performance on circ2Disease dataset.

https://doi.org/10.1371/journal.pcbi.1011344.g003

Download:

Table 3. The performance of iCircDA-NEAE with the five classifiers by using benchmark dataset and two independent datasets (circRNAdisease and circ2Disease datasets).

https://doi.org/10.1371/journal.pcbi.1011344.t003

Comparison of different datasets

In this section, the model performance was evaluated by using two independent datasets (circRNAdisease dataset and circ2Disease dataset) with 5-fold and 10-fold cross-validation. As shown in Fig 4, the AUC values of iCircDA-NEAE on the circRNAdisease and circ2Disease datasets are 0.8809 and 0.8505 respectively. The 5-fold cross-validation experimental results on the circRNAdisease and circ2Disease datasets were presented in Table 4. For the circRNAdisease dataset, the ACC, Sen, F1 and MCC of iCircDA-NEAE are 0.8682, 0.8335, 0.8327 and 0.6613, respectively. For the circ2Disease dataset, the ACC, Sen, F1 and MCC of iCircDA-NEAE are 0.8487, 0.7325, 0.7170 and 0.4327, respectively. The 10-fold cross-validation experimental results were presented in S2 and S3 Tables, respectively. For circRNAdisease dataset, the ACC, Sen, F1, MCC and AUC of iCircDA-NEAE are 0.8735, 0.8413, 0.8274, 0.6635 and 0.8962, respectively. For the circ2Disease dataset, the ACC, Sen, F1, MCC and AUC of iCircDA-NEAE are 0.8537, 0.7530, 0.7074, 0.4341 and 0.8575, respectively. These results suggest that iCircDA-NEAE can achieve good prediction performance on several important datasets.

Download:

Fig 4. The performance of iCircDA-NEAE on circRNAdisease and circ2Disease datasets.

(A) AUC values of iCircDA-NEAE on the circRNAdisease dataset. (B) AUC values of iCircDA-NEAE on the circ2Disease dataset.

https://doi.org/10.1371/journal.pcbi.1011344.g004

Download:

Table 4. The 5-fold cross-validation experimental results on the circRNAdisease and circ2Disease datasets.

https://doi.org/10.1371/journal.pcbi.1011344.t004

Comparison with other methods

In this section, we used 5-fold cross-validation to compare the performance of iCircDA-NEAE with five state-of-the-art circRNA-disease association prediction models, including iCDA-CGR [19], GCNCDA [15], ASAECDA [16], GATCDA [17] and IMS-CDA [18]. All models were run on a widely used benchmark dataset circR2Disease. As shown in Fig 5, iCircDA-NEAE outperforms other state-of-the-art prediction methods significantly.

Download:

Fig 5. Performance comparison of iCircDA-NEAE and the competing methods on the benchmark dataset.

https://doi.org/10.1371/journal.pcbi.1011344.g005

In terms of features, although these state-of-the-art methods have used a variety of feature information, they can consider more biometric information. Our proposed iCircDA-NEAE considers both circRNA expression profile similarity and Jaccard similarity. To the best of our knowledge, we are the first to use both circRNA expression profile similarity and Jaccard similarity to predict circRNA-disease associations. Furthermore, our method performs multi-source feature fusion, which can measure the correlation of multiple feature information and fuse this information into a unified information identifier. At the same time, features without redundant information can effectively improve model performance.

In terms of models, these state-of-the-art methods used traditional deep learning or machine learning algorithms. iCDA-CGR used chaos game representation (CGR) technology to quantify the nonlinear relationship of circRNA sequences. However, the model did not deal with redundant information resulting in poor predictive performance. IMS-CDA and ASAECDA are two deep learning methods based on stacked autoencoder (SAE), which use SAE to extract features from multi-source information. Compared with SAE, our proposed DCAE can capture high-level representations of the data. GCNCDA is a GCN (Graph Convolutional Networks)-based prediction method, and GATCDA is a GTN (Graph Attention Network)-based prediction method. Compared with these two methods, iCircDA-NEAE incorporates the advantages of ANNE and DCAE, which not only effectively integrates multi-source information, but also effectively capture hidden high-level information of data.

Case studies

In this section, we applied iCircDA-NEAE to the benchmark dataset circR2Disease for predicting novel potential circRNA-disease associations. We sorted all unconfirmed circRNA-disease associations in descending order based on their prediction scores. The higher the score, the greater the likelihood of a circRNA-disease association. We selected the top 20 circRNA-disease associations (as shown in Table 5), 17 of which have been confirmed by different databases and literature. For example, hsa_circ_0004214 is highly upregulated in breast cancer and promotes tumorigenesis [27]; hsa_circ_0001785 acts as a diagnostic biomarker in breast cancer treatment [28]; and hsa_circ_0004277 is considered as a potential diagnostic marker and therapeutic target for acute myeloid leukemia [29]. The three unconfirmed circRNA-disease associations are hsa_circ_0046701-lung cancer, hsa_circ_0037911-pancreatic cancer, and hsa_circ_0005836-colorectal cancer. hsa_circ_0046701 promotes carcinogenesis by increasing the expression of ITGB8 in glioma [30], and the expression level of ITGB8 has significantly upregulated in lung cancer tissues compared with normal tissues [31]. These pieces of evidence suggest that hsa_circ_0046701 may serve as a potential biomarker in lung cancer. miRNA-637 suppresses tumorigenesis in pancreatic ductal adenocarcinoma cells [32]. In essential hypertension, has-circ-0037911 was found to suppress miR-637 activity by acting as a sponge [33]. These results show that has-circ-0037911 may promote pancreatic ductal adenocarcinoma by inhibiting miR-637 activity. In pulmonary tuberculosis, hsa_circ_0005836 is related to the regulation of the mTOR signaling pathway [34]. The mTOR signaling pathway is a target for colorectal cancer therapy [35]. These studies suggest that hsa_circ_0005836 may be related to colorectal cancer.

Download:

Table 5. The top 20 circRNA-disease associations.

https://doi.org/10.1371/journal.pcbi.1011344.t005

Discussion

Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. Therefore, there is an urgent need to develop novel computational methods to accurately predict circRNA-disease associations.

In this paper, we proposed a novel deep learning-based method called iCircDA-NEAE to discover new potential circRNA-disease associations. Experimental results demonstrated that iCircDA-NEAE outperforms other state-of-the-art prediction methods, and can accurately predict potential circRNA-disease associations. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, according to the relevant literature, we observed that novel circRNA-disease associations predicted by iCircDA-NEAE are potential associations.

The performance of iCircDA-NEAE mainly depends on three factors: (i) iCircDA-NEAE incorporates multi-source biometric information to measure complex associations between circRNAs and diseases. (ii) iCircDA-NEAE uses disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity, and Jaccard similarity to make the most of biometric information in the data. (iii) iCircDA-NEAE incorporates the advantages of ANNE and DCAE, which not only effectively integrates multi-source information, but also effectively captures hidden high-level information of data.

Two possible issues in this paper should be discussed: (i) since negative samples are difficult to obtain, we can only randomly select samples from unconfirmed samples as negative samples. The number of positive samples and negative samples is the same, thus avoiding the sample imbalance problem. But doing this will inevitably lead to negative samples containing very few true positive samples. (ii) since iCircDA-NEAE utilizes the strongly-supervised label information (true association labels) to predict circRNA-disease associations, so iCircDA-NEAE is overwhelmingly dependent on the quality of the ground truth association labels. Therefore, some more comprehensive methods should be proposed to solve the two issues in future works.

Materials and methods

Datasets and model

Since circR2Disease (http://bioinfo.snnu.edu.cn/) is the most comprehensive and commonly used database, this study used circR2Disease as the benchmark database. The circRNA expression profiles and disease information were collected from the exoRBase database (http://www.exoRBase.org) [36] and the MeSH database (http://www.nlm.nih.gov/mesh) [37], respectively.

We constructed a sample-balanced circRNA-disease association dataset using the circR2Disease dataset. The association dataset contains 661 circRNAs, 100 diseases, 739 circRNA-disease positive associations, and 739 circRNA-disease negative associations. 739 circRNA-disease positive associations are experimentally validated associations, and 739 circRNA-disease negative associations are randomly selected from 66100 unknown associations of the circR2Disease dataset. The circRNAdisese database contains 330 circRNAs, 48 diseases and 354 circRNA-disease associations. The circ2Disease database contains 249 circRNAs, 61 diseases and 273 circRNA-disease associations.

First, iCircDA-NEAE uses disease semantic similarity, Gaussian interaction kernel (GIP), circRNA expression profile similarity and Jaccard similarity to measure the biometric information in the data, and constructs multisource information fusion descriptor. Second, AANE extracts feature from the descriptor data. Third, DCAE extracts hidden features from data. Finally, the random forest classifier uses hidden features to predict circRNA-disease association. The flow chart of iCircDA-NEAE is shown in Fig 1. The source code and data are available at: https://github.com/nathanyl/iCircDA-NEAE.

Similarity measures

Before introducing the method, we summarize the notation used in this paper as follows: italic indicates a scalar quantity, as in A or a; lower case boldface indicates a vector quantity, as in a; upper case boldface indicates a matrix quantity, as in A.

Similarity measurement can convert the relationship between biological factors into feature information that can be used by the model, so it is a crucial step in building a prediction model. We constructed similarity matrices from four aspects: disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity.

Construction of disease semantic similarity

Disease semantic similarity measures the relationship between diseases [38–40]. The MeSH database uses a directed cycle graph (DAG) to represent diseases and disease associations. A node in the DAG represents a disease, and the edges of the DAG represent associations between diseases. In MeSH, DAG_d(d, N_d, E_d) is used to represent information about disease d, N_d represents the set of disease nodes that are related to d and contain d itself, and E_d represents the set of edges between these diseases. For disease e, if Nd contains e and e = d, the disease contribution value of e to d is defined as 1(D_d(e) = 1). If e≠d, the disease contribution value is calculated as follows: (1) where μ is the semantic contribution factor between diseases, we set μ to 0.5 according to the study [41].

Then, the semantic value DV(d) of disease d is defined as follows: (2)

In DAG, the more nodes are shared between two diseases, the more similar the two diseases are. The semantic similarity DSS₁(d(i), d(j)) between disease d(i) and d(j) is defined as follows: (3) where DSS₁ is the disease semantic similarity matrix.

While considering the disease semantic similarity DSS₁, the impact of disease number on disease contribution should also be considered. Inspired by Wang’s method [42], the contribution of disease e under the influence of the disease number can be defined as follows: (4) where num(DAG_d(e)) is the number of diseases associated with disease d and num(diseases) is the number of all diseases.

Then, the disease semantic similarity DSS₂(d(i), d(j)) of disease d(i) and d(j) can be defined as follows: (5)

Construction of the Gaussian interaction profile kernel

To obtain comprehensive disease similarity information, we used Gaussian interaction profile (GIP) [43–45] kernel to calculate disease similarity. Assuming that circRNA c₁ is associated with disease d₁, if disease d₂ is highly similar to disease d₁, then disease d₂-associated circRNAs tend to have similar functions to circRNA c₁ [46]. Therefore, we used circRNA-disease association adjacency matrix to calculate the GIP kernel similarity between disease d_i and d_j, the formula is defined as follows: (6) where GD is the GIP kernel similarity matrix between diseases. d(i) represents the row vector of the i-th disease and μ is the bandwidth parameter of the GIP, which can be calculated by the following formula: (7) where n is the number of rows of the circRNA-disease association matrix.

Similarly, the GIP kernel similarity between circRNAs is defined as follows: (8) where GC is the GIP kernel similarity matrix between circRNAs. c(i) represents the column vector of the i-th circRNA and μ is the bandwidth parameter of the GIP, which can be calculated by the following formula: (9) where m is the number of columns of the circRNA-disease association matrix.

Construction of the CircRNA expression profile similarity

The circRNA expression profile (EP) similarity from exoRBase data-base is another important information for constructing circRNA-disease association prediction models. We used 32-dimensional feature vectors to represent circRNAs, and sorted the circRNAs in descending order according to the feature vectors [16,47,48]. Spearman correlation coefficient [49] was used to calculate the EP similarity between circRNAs: (10) where d_p is the feature vector difference between circRNA i and circRNA j, lⁱ represents the 32-dimensional vector of i-th circRNA after sorting, and k is the number of circRNAs. Let SE be an k×k circRNA adjacency matrix consisting of ρ(c_i, c_j).

Construction of the Jaccard similarity

Jaccard similarity is used to represent the similarity between sets [50–52]. J(A, B) is the ratio of the intersection of sets A and B to the union of A and B. The larger the Jaccard value, the higher the similarity between sets A and B. We used Jaccard to calculate the similarities between diseases and circRNAs. We calculated the Jaccard similarity of disease d(i) and disease d(j) with the following formula: (11) where JD is the Jaccard similarity matrix between diseases. ca(d(i)) represents the circRNAs associated with disease d(i).

The Jaccard similarity calculation formula of circRNAs is defined as follows: (12) where JC is the Jaccard similarity matrix between circRNAs. da(c(i)) represents the diseases associated with circRNA c(i).

Multisource feature fusion

The multisource feature fusion method can fuse a variety of biological feature information, eliminate redundant information, and improve the accuracy of feature extraction. Feature fusion was used to integrate multiple similarity information into a unified identifier, which contains a large number of circRNA and disease feature information, and contains multiple association information. The fusion of disease similarity multisource in-formation can be defined as follows: (13) (14)

The fusion of circRNA similarity multisource information can be defined as follows: (15) (16)

Finally, we used principal component analysis (PCA) [53] to reduce the dimensionality of CM and DM, and obtain CM and DM. The fusion information of circRNA and disease is obtained according to the following formula: (17) Among them, CM(c(i)) represents the i-th row vector of CM, and DM(d(j)) represents the j-th column vector of DM.

Let AM be an m×n adjacency matrix corresponding to the circRNA-disease association dataset from circR2Disease database, where m (m = 661) is the number of circRNAs and n (n = 100) is the number of diseases. If AM(i, j) = 1, it means that circRNA c(i) is associated with disease d(j), otherwise AM(i, j) = 0.

Feature extraction methods

AANE algorithm to extract features.

Compared with widely used feature extraction methods PCA, LINE (Large-scale Information Network Embedding) [54], node2vec [55] and DeepWalk [56], AANE incorporates the correlation between node attrib-utes into the network embedding to better learn feature representations. AANE is used to extract low-dimensional features. The flowchart of AANE algorithm is shown in Fig 6.

Download:

Fig 6. Schematic overview of AANE framework.

https://doi.org/10.1371/journal.pcbi.1011344.g006

For a network N = (V, E, W), V is the node set, W is the edge set, and the edge e_ij in W represents the edge connecting node i and node j. The value of e_ij is closely related to the similarity between nodes. The larger the value of e_ij, the more similar node i is to node j. According to the theory that a real symmetric matrix can be diagonalized by an orthogonal matrix, the formula is defined as follows: (18) where A is a semi-definite symmetric matrix, which can be represented by an orthogonal matrix H and a diagonal matrix Λ. B is a matrix consisting of the square root of the elements in the Λ.

When applying this algorithm, the similarity matrix S is calculated by applying the cosine similarity algorithm to the attribute matrix AM. Based on Eq 18, matrix S is decomposed into two matrices Q and Q^T.

(19)

Node vectors have high similarity in two situations, one is that the nodes have high similarity in topological structure, and the other is that the weight value between nodes is large. The objective function is defined as follows: (20) where λ is the balance parameter. Based on Z = Q, the objective function can be written as follws: (21) where q represents the penalty parameter, and u_i is the scaled data of the dual variable. The alternating direction method of the multiplier (ADMM) is used to solve the objective function: (22) (23)

Dynamic convolutional autoencoder to extract features.

Convolutional autoencoder (CAE) can efficiently extract hidden features from data [57,58]. Inspired by the dynamic convolution [59,60], we proposed a dynamic convolutional autoencoder (DCAE) by replacing the convolution with dynamic convolution. DCAE extracts features more efficiently than CAE (see Table 1). The flowchart of DCAE algorithm is shown in Fig 7. The details of DCAE are as follows. First, the input vector x passes through the dynamic convolution layer, the pooling layer and hidden layer to obtain an output vector y. This process is called encoding. The encoding formula is as follows: (24) (25) (26) where Π_k denotes the attention weight of the K-th linear function, ⨂ de-notes the convolution operation, W and b are the weight matrix and bias vector, g is the sigmoid activation function, is the aggregation weight, and is the aggregation bias.

Download:

Fig 7. Schematic overview of DCAE framework.

https://doi.org/10.1371/journal.pcbi.1011344.g007

Then, the input y passes through the deconvolution layer and the out-put layer to obtain the reconstructed vector x’. This process is called decoding. The formula for decoding is as follows: (27)

During the training of each layer, we computed the loss function between the reconstruction vector x’ and the input vector x, and optimized the value of the loss function to a threshold. An optimization process was performed at each layer.

The attention weights will vary according to x to obtain the optimal aggregation model. Therefore, the dynamic convolutional autoencoder can achieve better higher level representations than the ordinary autoencoder. The dynamic convolution consists of three parts, including attention weights, and in the optimal weights. In DCAE, the computational cost of the input feature H×W×C_in is much smaller than that of ordinary convolution. The computational cost is as follows: (28) (29) where O(•) denotes computational cost, D_k denotes kernel size, C_out denotes the number of output channels. The computational cost of attention weights is much lower than directly calculating the optimal parameters. DCAE has better flexibility and lower computational cost than ordinary autoencoders.

In the experiment, we set the DCAE as a two-layer network with a learning rate of 0.001, using minimum mean squared error (MSE) as the loss function and gradient descent algorithm as the optimization method.

Random forest classifier predicts associations

In the experiment, a random forest classifier used the extracted features to complete a classification task to discover potential circRNA-disease associations. The execution steps of the random forest classifier can be summarized as follows:

The classifier selects N samples using Bootstrap method. The selected N samples are used to train a decision tree.
The classifier randomly selects m features from the M features of the sample (m << M), and selects one feature from the m features as the split feature of the node using the information gain ratio. In the process of forming a decision tree, each node is split until it can no longer be split.
According to steps 1~2, a large number of decision trees are constructed to form a random forest.

The random forest classifier predicts scores for circRNA-disease associations. An association is considered a potential association if the prediction score is greater than a set threshold. The grid search algorithm was used to determine parameters in the classifier, and the number of decision trees was set to 100.

Evaluation methods

The two commonly used methods (k-fold cross-validation and independent dataset testing) were used to evaluate the model performance. In the experiments, we recorded the true positive (TP), false negative (FN), true negative (TN) and false positive (FP) values. Five evaluation metrics were used to assess the model, namely area under curve (AUC), accuracy (ACC), sensitivity (Sen), F1-Score and Matthew correlation coefficient (MCC). These evaluation metrics are defined as follows: (30)

Supporting information

S1 Table. Comparison of running times of iCircDA-NEAE and iCircDA-NEAE.

https://doi.org/10.1371/journal.pcbi.1011344.s001

(DOCX)

S2 Table. The 10-fold cross-validation experimental results on the circRNAdisease.

https://doi.org/10.1371/journal.pcbi.1011344.s002

(DOCX)

S3 Table. The 10-fold cross-validation experimental results on the circ2Disease.

https://doi.org/10.1371/journal.pcbi.1011344.s003

(DOCX)

References

1. Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, et al. Natural RNA circles function as efficient microRNA sponges. Nature. 2013;495(7441):384–8. pmid:23446346
- View Article
- PubMed/NCBI
- Google Scholar
2. Das A, Sinha T, Mishra SS, Das D, Panda AC. Identification of potential proteins translated from circular RNA splice variants. European journal of cell biology. 2023;102(1):151286. pmid:36645925
- View Article
- PubMed/NCBI
- Google Scholar
3. Zhang W, Yuan Z, Zhang J, Su X, Huang Q, Liu Q, et al. Identification and Functional Prediction of CircRNAs in Leaves of F1 Hybrid Poplars with Different Growth Potential and Their Parents. International Journal of Molecular Sciences. 2023;24(3):2284. pmid:36768607
- View Article
- PubMed/NCBI
- Google Scholar
4. Wu X, Shi M, Lian Y, Zhang H. Exosomal circRNAs as promising liquid biopsy biomarkers for glioma. Frontiers in Immunology. 2023;14:1039084. pmid:37122733
- View Article
- PubMed/NCBI
- Google Scholar
5. Weidle UH, Birzele F. Triple-negative Breast Cancer: Identification of circRNAs With Efficacy in Preclinical In Vivo Models. Cancer Genomics & Proteomics. 2023;20(2):117–31. pmid:36870692
- View Article
- PubMed/NCBI
- Google Scholar
6. Zhou C, Zhu D, Zhou S, Wang H, Huang M. Screening differential circular RNA expression profiles and the potential role of hsa_circ_0085465 in liver cancer. Journal of Cancer Research and Therapeutics. 2023. pmid:37470573
- View Article
- PubMed/NCBI
- Google Scholar
7. Song C, Zhang Y, Huang W, Shi J, Huang Q, Jiang M, et al. Circular RNA Cwc27 contributes to Alzheimer’s disease pathogenesis by repressing Pur-α activity. Cell Death & Differentiation. 2022;29(2):393–406.
- View Article
- Google Scholar
8. Cheng Q, Wang J, Li M, Fang J, Ding H, Meng J, et al. CircSV2b participates in oxidative stress regulation through miR-5107-5p-Foxk1-Akt1 axis in Parkinson’s disease. Redox biology. 2022;56:102430. pmid:35973363
- View Article
- PubMed/NCBI
- Google Scholar
9. Li H, Liu B. BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo. PLOS Computational Biology. 2023;19(6):e1011214. pmid:37339155
- View Article
- PubMed/NCBI
- Google Scholar
10. Yao D, Zhang L, Zheng M, Sun X, Lu Y, Liu P. Circ2Disease: a manually curated database of experimentally validated circRNAs in human disease. Scientific reports. 2018;8(1):11018. pmid:30030469
- View Article
- PubMed/NCBI
- Google Scholar
11. Zhao Z, Wang K, Wu F, Wang W, Zhang K, Hu H, et al. circRNA disease: a manually curated database of experimentally supported circRNA-disease associations. Cell death & disease. 2018;9(5):1–2. pmid:29700306
- View Article
- PubMed/NCBI
- Google Scholar
12. Fan C, Lei X, Fang Z, Jiang Q, Wu F-X. CircR2Disease: a manually curated database for experimentally supported circular RNAs associated with various diseases. Database. 2018;2018. pmid:29741596
- View Article
- PubMed/NCBI
- Google Scholar
13. Ghosal S, Das S, Sen R, Basak P, Chakrabarti J. Circ2Traits: a comprehensive database for circular RNA potentially associated with disease and traits. Frontiers in genetics. 2013;4:283. pmid:24339831
- View Article
- PubMed/NCBI
- Google Scholar
14. Meng X, Hu D, Zhang P, Chen Q, Chen M. CircFunBase: a database for functional circular RNAs. Database. 2019;2019. pmid:30715276
- View Article
- PubMed/NCBI
- Google Scholar
15. Wang L, You Z-H, Li Y-M, Zheng K, Huang Y-A. GCNCDA: a new method for predicting circRNA-disease associations based on graph convolutional network algorithm. PLOS Computational Biology. 2020;16(5):e1007568. pmid:32433655
- View Article
- PubMed/NCBI
- Google Scholar
16. Yang J, Lei X. Predicting circRNA-disease associations based on autoencoder and graph embedding. Information Sciences. 2021;571:323–36.
- View Article
- Google Scholar
17. Bian C, Lei X-J, Wu F-X. GATCDA: predicting circRNA-disease associations based on graph attention network. Cancers. 2021;13(11):2595. pmid:34070678
- View Article
- PubMed/NCBI
- Google Scholar
18. Wang L, You Z-H, Li J-Q, Huang Y-A. IMS-CDA: prediction of CircRNA-disease associations from the integration of multisource similarity information with deep stacked autoencoder model. IEEE transactions on cybernetics. 2020;51(11):5522–31.
- View Article
- Google Scholar
19. Zheng K, You Z-H, Li J-Q, Wang L, Guo Z-H, Huang Y-A. iCDA-CGR: Identification of circRNA-disease associations based on Chaos Game Representation. PLoS Computational Biology. 2020;16(5):e1007872. pmid:32421715
- View Article
- PubMed/NCBI
- Google Scholar
20. Peng L, Yang C, Huang L, Chen X, Fu X, Liu W. RNMFLP: predicting circRNA–disease associations based on robust nonnegative matrix factorization and label propagation. Briefings in Bioinformatics. 2022;23(5):bbac155. pmid:35534179
- View Article
- PubMed/NCBI
- Google Scholar
21. Zhang H-Y, Wang L, You Z-H, Hu L, Zhao B-W, Li Z-W, et al. iGRLCDA: identifying circRNA–disease association based on graph representation learning. Briefings in Bioinformatics. 2022;23(3):bbac083. pmid:35323894
- View Article
- PubMed/NCBI
- Google Scholar
22. Shah K, Patel H, Sanghvi D, Shah M. A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Human Research. 2020;5:1–16.
- View Article
- Google Scholar
23. Schuldt C, Laptev I, Caputo B, editors. Recognizing human actions: a local SVM approach. Proceedings of the 17th International Conference on Pattern Recognition, 2004 ICPR 2004; 2004: IEEE.
24. Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation forest: A new classifier ensemble method. IEEE transactions on pattern analysis and machine intelligence. 2006;28(10):1619–30. pmid:16986543
- View Article
- PubMed/NCBI
- Google Scholar
25. Montavon G, Samek W, Müller K-R. Methods for interpreting and understanding deep neural networks. Digital signal processing. 2018;73:1–15.
- View Article
- Google Scholar
26. Chen T, Guestrin C, editors. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016.
27. Yang Q, Du WW, Wu N, Yang W, Awan FM, Fang L, et al. A circular RNA promotes tumorigenesis by inducing c-myc nuclear translocation. Cell Death & Differentiation. 2017;24(9):1609–20. pmid:28622299
- View Article
- PubMed/NCBI
- Google Scholar
28. Yin W-B, Yan M-G, Fang X, Guo J-J, Xiong W, Zhang R-P. Circulating circular RNA hsa_circ_0001785 acts as a diagnostic biomarker for breast cancer detection. Clinica chimica acta. 2018;487:363–8. pmid:29045858
- View Article
- PubMed/NCBI
- Google Scholar
29. Li W, Zhong C, Jiao J, Li P, Cui B, Ji C, et al. Characterization of hsa_circ_0004277 as a new biomarker for acute myeloid leukemia via circular RNA profile and bioinformatics analysis. International journal of molecular sciences. 2017;18(3):597. pmid:28282919
- View Article
- PubMed/NCBI
- Google Scholar
30. Li G, Yang H, Han K, Zhu D, Lun P, Zhao Y. A novel circular RNA, hsa_circ_0046701, promotes carcinogenesis by increasing the expression of miR-142-3p target ITGB8 in glioma. Biochemical and biophysical research communications. 2018;498(1):254–61. pmid:29337055
- View Article
- PubMed/NCBI
- Google Scholar
31. Wu P, Wang Y, Wu Y, Jia Z, Song Y, Liang N. Expression and prognostic analyses of ITGA11, ITGB4 and ITGB8 in human non-small cell lung cancer. PeerJ. 2019;7:e8299. pmid:31875161
- View Article
- PubMed/NCBI
- Google Scholar
32. Xu R-l, He W, Tang J, Guo W, Zhuang P, Wang C-q, et al. Primate-specific miRNA-637 inhibited tumorigenesis in human pancreatic ductal adenocarcinoma cells by suppressing Akt1 expression. Experimental cell research. 2018;363(2):310–4. pmid:29366808
- View Article
- PubMed/NCBI
- Google Scholar
33. Tang Y, Bao J, Hu J, Liu L, Xu DY. Circular RNA in cardiovascular disease: Expression, mechanisms and clinical prospects. Journal of cellular and molecular medicine. 2021;25(4):1817–24. pmid:33350091
- View Article
- PubMed/NCBI
- Google Scholar
34. Zhuang Z-G, Zhang J-A, Luo H-L, Liu G-B, Lu Y-B, Ge N-H, et al. The circular RNA of peripheral blood mononuclear cells: Hsa_circ_0005836 as a new diagnostic biomarker and therapeutic target of active pulmonary tuberculosis. Molecular immunology. 2017;90:264–72. pmid:28846924
- View Article
- PubMed/NCBI
- Google Scholar
35. Zhang Y-J, Dai Q, Sun D-F, Xiong H, Tian X-Q, Gao F-H, et al. mTOR signaling pathway is a target for the treatment of colorectal cancer. Annals of surgical oncology. 2009;16:2617–28. pmid:19517193
- View Article
- PubMed/NCBI
- Google Scholar
36. Li S, Li Y, Chen B, Zhao J, Yu S, Tang Y, et al. exoRBase: a database of circRNA, lncRNA and mRNA in human blood exosomes. Nucleic acids research. 2018;46(D1):D106–D12. pmid:30053265
- View Article
- PubMed/NCBI
- Google Scholar
37. Coletti MH, Bleich HL. Medical subject headings used to search the biomedical literature. Journal of the American Medical Informatics Association. 2001;8(4):317–23. pmid:11418538
- View Article
- PubMed/NCBI
- Google Scholar
38. Jiang L, Zhu J. Review of MiRNA-disease association prediction. Current Protein and Peptide Science. 2020;21(11):1044–53. pmid:32039677
- View Article
- PubMed/NCBI
- Google Scholar
39. Zeng X, Lin W, Guo M, Zou Q. A comprehensive overview and evaluation of circular RNA detection tools. PLoS computational biology. 2017;13(6):e1005420. pmid:28594838
- View Article
- PubMed/NCBI
- Google Scholar
40. Zeng X, Lin W, Guo M, Zou Q. Details in the evaluation of circular RNA detection tools: Reply to Chen and Chuang. PLoS Computational Biology. 2019;15(4):e1006916. pmid:31022173
- View Article
- PubMed/NCBI
- Google Scholar
41. Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26(13):1644–50. pmid:20439255
- View Article
- PubMed/NCBI
- Google Scholar
42. Wang L, You Z-H, Huang Y-A, Huang D-S, Chan KC. An efficient approach based on multi-sources information to predict circRNA–disease associations using deep convolutional neural network. Bioinformatics. 2020;36(13):4038–46. pmid:31793982
- View Article
- PubMed/NCBI
- Google Scholar
43. van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics. 2011;27(21):3036–43. pmid:21893517
- View Article
- PubMed/NCBI
- Google Scholar
44. Zeng X, Zhong Y, Lin W, Zou Q. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in bioinformatics. 2020;21(4):1425–36. pmid:31612203
- View Article
- PubMed/NCBI
- Google Scholar
45. Niu M, Zhang J, Li Y, Wang C, Liu Z, Ding H, et al. CirRNAPL: a web server for the identification of circRNA based on extreme learning machine. Computational and structural biotechnology journal. 2020;18:834–42. pmid:32308930
- View Article
- PubMed/NCBI
- Google Scholar
46. Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, et al. Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PloS one. 2013;8(8):e70204. pmid:23950912
- View Article
- PubMed/NCBI
- Google Scholar
47. Jiao S, Wu S, Huang S, Liu M, Gao B. Advances in the identification of circular RNAs and research into circRNAs in human diseases. Frontiers in Genetics. 2021;12:665233. pmid:33815488
- View Article
- PubMed/NCBI
- Google Scholar
48. Niu M, Ju Y, Lin C, Zou Q. Characterizing viral circRNAs and their application in identifying circRNAs in viruses. Briefings in Bioinformatics. 2022;23(1):bbab404. pmid:34585234
- View Article
- PubMed/NCBI
- Google Scholar
49. Myers L, Sirois MJ. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences. 2004;12.
- View Article
- Google Scholar
50. Salvatore S, Dagestad Rand K, Grytten I, Ferkingstad E, Domanska D, Holden L, et al. Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis. Briefings in bioinformatics. 2020;21(5):1523–30. pmid:31624847
- View Article
- PubMed/NCBI
- Google Scholar
51. Niu M, Zou Q, Lin C. CRBPDL: identification of circRNA-RBP interaction sites using an ensemble neural network approach. PLoS computational biology. 2022;18(1):e1009798. pmid:35051187
- View Article
- PubMed/NCBI
- Google Scholar
52. Niu M, Zou Q, Wang C. GMNN2CD: identification of circRNA–disease associations based on variational inference and graph Markov neural networks. Bioinformatics. 2022;38(8):2246–53. pmid:35157027
- View Article
- PubMed/NCBI
- Google Scholar
53. Martinez AM, Kak AC. Pca versus lda. IEEE transactions on pattern analysis and machine intelligence. 2001;23(2):228–33.
- View Article
- Google Scholar
54. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q, editors. Line: Large-scale information network embedding. Proceedings of the 24th international conference on world wide web; 2015.
55. Grover A, Leskovec J, editors. node2vec: Scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining; 2016.
56. Perozzi B, Al-Rfou R, Skiena S, editors. Deepwalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining; 2014.
57. Xuan P, Fan M, Cui H, Zhang T, Nakaguchi T. GVDTI: graph convolutional and variational autoencoders with attribute-level attention for drug–protein interaction prediction. Briefings in bioinformatics. 2022;23(1):bbab453. pmid:34718408
- View Article
- PubMed/NCBI
- Google Scholar
58. Chen Y, Wang Y, Ding Y, Su X, Wang C. RGCNCDA: relational graph convolutional network improves circRNA-disease association prediction by incorporating microRNAs. Computers in Biology and Medicine. 2022;143:105322. pmid:35217342
- View Article
- PubMed/NCBI
- Google Scholar
59. Chen Y, Wang J, Wang C, Liu M, Zou Q. Deep learning models for disease-associated circRNA prediction: a review. Briefings in Bioinformatics. 2022;23(6):bbac364. pmid:36130259
- View Article
- PubMed/NCBI
- Google Scholar
60. He S, Jiang C, Dong D, Ding L, editors. Sd-conv: Towards the parameter-efficiency of dynamic convolution. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2023.

[ref1] 1. Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, et al. Natural RNA circles function as efficient microRNA sponges. Nature. 2013;495(7441):384–8. pmid:23446346
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Das A, Sinha T, Mishra SS, Das D, Panda AC. Identification of potential proteins translated from circular RNA splice variants. European journal of cell biology. 2023;102(1):151286. pmid:36645925
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Zhang W, Yuan Z, Zhang J, Su X, Huang Q, Liu Q, et al. Identification and Functional Prediction of CircRNAs in Leaves of F1 Hybrid Poplars with Different Growth Potential and Their Parents. International Journal of Molecular Sciences. 2023;24(3):2284. pmid:36768607
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Wu X, Shi M, Lian Y, Zhang H. Exosomal circRNAs as promising liquid biopsy biomarkers for glioma. Frontiers in Immunology. 2023;14:1039084. pmid:37122733
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Weidle UH, Birzele F. Triple-negative Breast Cancer: Identification of circRNAs With Efficacy in Preclinical In Vivo Models. Cancer Genomics & Proteomics. 2023;20(2):117–31. pmid:36870692
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Zhou C, Zhu D, Zhou S, Wang H, Huang M. Screening differential circular RNA expression profiles and the potential role of hsa_circ_0085465 in liver cancer. Journal of Cancer Research and Therapeutics. 2023. pmid:37470573
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Song C, Zhang Y, Huang W, Shi J, Huang Q, Jiang M, et al. Circular RNA Cwc27 contributes to Alzheimer’s disease pathogenesis by repressing Pur-α activity. Cell Death & Differentiation. 2022;29(2):393–406.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref8] 8. Cheng Q, Wang J, Li M, Fang J, Ding H, Meng J, et al. CircSV2b participates in oxidative stress regulation through miR-5107-5p-Foxk1-Akt1 axis in Parkinson’s disease. Redox biology. 2022;56:102430. pmid:35973363
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Li H, Liu B. BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo. PLOS Computational Biology. 2023;19(6):e1011214. pmid:37339155
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref10] 10. Yao D, Zhang L, Zheng M, Sun X, Lu Y, Liu P. Circ2Disease: a manually curated database of experimentally validated circRNAs in human disease. Scientific reports. 2018;8(1):11018. pmid:30030469
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref11] 11. Zhao Z, Wang K, Wu F, Wang W, Zhang K, Hu H, et al. circRNA disease: a manually curated database of experimentally supported circRNA-disease associations. Cell death & disease. 2018;9(5):1–2. pmid:29700306
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref12] 12. Fan C, Lei X, Fang Z, Jiang Q, Wu F-X. CircR2Disease: a manually curated database for experimentally supported circular RNAs associated with various diseases. Database. 2018;2018. pmid:29741596
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref13] 13. Ghosal S, Das S, Sen R, Basak P, Chakrabarti J. Circ2Traits: a comprehensive database for circular RNA potentially associated with disease and traits. Frontiers in genetics. 2013;4:283. pmid:24339831
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Meng X, Hu D, Zhang P, Chen Q, Chen M. CircFunBase: a database for functional circular RNAs. Database. 2019;2019. pmid:30715276
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Wang L, You Z-H, Li Y-M, Zheng K, Huang Y-A. GCNCDA: a new method for predicting circRNA-disease associations based on graph convolutional network algorithm. PLOS Computational Biology. 2020;16(5):e1007568. pmid:32433655
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Yang J, Lei X. Predicting circRNA-disease associations based on autoencoder and graph embedding. Information Sciences. 2021;571:323–36.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref17] 17. Bian C, Lei X-J, Wu F-X. GATCDA: predicting circRNA-disease associations based on graph attention network. Cancers. 2021;13(11):2595. pmid:34070678
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref18] 18. Wang L, You Z-H, Li J-Q, Huang Y-A. IMS-CDA: prediction of CircRNA-disease associations from the integration of multisource similarity information with deep stacked autoencoder model. IEEE transactions on cybernetics. 2020;51(11):5522–31.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref19] 19. Zheng K, You Z-H, Li J-Q, Wang L, Guo Z-H, Huang Y-A. iCDA-CGR: Identification of circRNA-disease associations based on Chaos Game Representation. PLoS Computational Biology. 2020;16(5):e1007872. pmid:32421715
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref20] 20. Peng L, Yang C, Huang L, Chen X, Fu X, Liu W. RNMFLP: predicting circRNA–disease associations based on robust nonnegative matrix factorization and label propagation. Briefings in Bioinformatics. 2022;23(5):bbac155. pmid:35534179
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref21] 21. Zhang H-Y, Wang L, You Z-H, Hu L, Zhao B-W, Li Z-W, et al. iGRLCDA: identifying circRNA–disease association based on graph representation learning. Briefings in Bioinformatics. 2022;23(3):bbac083. pmid:35323894
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref22] 22. Shah K, Patel H, Sanghvi D, Shah M. A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Human Research. 2020;5:1–16.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref23] 23. Schuldt C, Laptev I, Caputo B, editors. Recognizing human actions: a local SVM approach. Proceedings of the 17th International Conference on Pattern Recognition, 2004 ICPR 2004; 2004: IEEE.

[ref24] 24. Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation forest: A new classifier ensemble method. IEEE transactions on pattern analysis and machine intelligence. 2006;28(10):1619–30. pmid:16986543
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref25] 25. Montavon G, Samek W, Müller K-R. Methods for interpreting and understanding deep neural networks. Digital signal processing. 2018;73:1–15.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref26] 26. Chen T, Guestrin C, editors. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016.

[ref27] 27. Yang Q, Du WW, Wu N, Yang W, Awan FM, Fang L, et al. A circular RNA promotes tumorigenesis by inducing c-myc nuclear translocation. Cell Death & Differentiation. 2017;24(9):1609–20. pmid:28622299
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref28] 28. Yin W-B, Yan M-G, Fang X, Guo J-J, Xiong W, Zhang R-P. Circulating circular RNA hsa_circ_0001785 acts as a diagnostic biomarker for breast cancer detection. Clinica chimica acta. 2018;487:363–8. pmid:29045858
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref29] 29. Li W, Zhong C, Jiao J, Li P, Cui B, Ji C, et al. Characterization of hsa_circ_0004277 as a new biomarker for acute myeloid leukemia via circular RNA profile and bioinformatics analysis. International journal of molecular sciences. 2017;18(3):597. pmid:28282919
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref30] 30. Li G, Yang H, Han K, Zhu D, Lun P, Zhao Y. A novel circular RNA, hsa_circ_0046701, promotes carcinogenesis by increasing the expression of miR-142-3p target ITGB8 in glioma. Biochemical and biophysical research communications. 2018;498(1):254–61. pmid:29337055
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref31] 31. Wu P, Wang Y, Wu Y, Jia Z, Song Y, Liang N. Expression and prognostic analyses of ITGA11, ITGB4 and ITGB8 in human non-small cell lung cancer. PeerJ. 2019;7:e8299. pmid:31875161
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref32] 32. Xu R-l, He W, Tang J, Guo W, Zhuang P, Wang C-q, et al. Primate-specific miRNA-637 inhibited tumorigenesis in human pancreatic ductal adenocarcinoma cells by suppressing Akt1 expression. Experimental cell research. 2018;363(2):310–4. pmid:29366808
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref33] 33. Tang Y, Bao J, Hu J, Liu L, Xu DY. Circular RNA in cardiovascular disease: Expression, mechanisms and clinical prospects. Journal of cellular and molecular medicine. 2021;25(4):1817–24. pmid:33350091
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref34] 34. Zhuang Z-G, Zhang J-A, Luo H-L, Liu G-B, Lu Y-B, Ge N-H, et al. The circular RNA of peripheral blood mononuclear cells: Hsa_circ_0005836 as a new diagnostic biomarker and therapeutic target of active pulmonary tuberculosis. Molecular immunology. 2017;90:264–72. pmid:28846924
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref35] 35. Zhang Y-J, Dai Q, Sun D-F, Xiong H, Tian X-Q, Gao F-H, et al. mTOR signaling pathway is a target for the treatment of colorectal cancer. Annals of surgical oncology. 2009;16:2617–28. pmid:19517193
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref36] 36. Li S, Li Y, Chen B, Zhao J, Yu S, Tang Y, et al. exoRBase: a database of circRNA, lncRNA and mRNA in human blood exosomes. Nucleic acids research. 2018;46(D1):D106–D12. pmid:30053265
View Article
PubMed/NCBI
Google Scholar

[131] View Article

[132] PubMed/NCBI

[133] Google Scholar

[ref37] 37. Coletti MH, Bleich HL. Medical subject headings used to search the biomedical literature. Journal of the American Medical Informatics Association. 2001;8(4):317–23. pmid:11418538
View Article
PubMed/NCBI
Google Scholar

[135] View Article

[136] PubMed/NCBI

[137] Google Scholar

[ref38] 38. Jiang L, Zhu J. Review of MiRNA-disease association prediction. Current Protein and Peptide Science. 2020;21(11):1044–53. pmid:32039677
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref39] 39. Zeng X, Lin W, Guo M, Zou Q. A comprehensive overview and evaluation of circular RNA detection tools. PLoS computational biology. 2017;13(6):e1005420. pmid:28594838
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref40] 40. Zeng X, Lin W, Guo M, Zou Q. Details in the evaluation of circular RNA detection tools: Reply to Chen and Chuang. PLoS Computational Biology. 2019;15(4):e1006916. pmid:31022173
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref41] 41. Wang D, Wang J, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26(13):1644–50. pmid:20439255
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref42] 42. Wang L, You Z-H, Huang Y-A, Huang D-S, Chan KC. An efficient approach based on multi-sources information to predict circRNA–disease associations using deep convolutional neural network. Bioinformatics. 2020;36(13):4038–46. pmid:31793982
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref43] 43. van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics. 2011;27(21):3036–43. pmid:21893517
View Article
PubMed/NCBI
Google Scholar

[159] View Article

[160] PubMed/NCBI

[161] Google Scholar

[ref44] 44. Zeng X, Zhong Y, Lin W, Zou Q. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in bioinformatics. 2020;21(4):1425–36. pmid:31612203
View Article
PubMed/NCBI
Google Scholar

[163] View Article

[164] PubMed/NCBI

[165] Google Scholar

[ref45] 45. Niu M, Zhang J, Li Y, Wang C, Liu Z, Ding H, et al. CirRNAPL: a web server for the identification of circRNA based on extreme learning machine. Computational and structural biotechnology journal. 2020;18:834–42. pmid:32308930
View Article
PubMed/NCBI
Google Scholar

[167] View Article

[168] PubMed/NCBI

[169] Google Scholar

[ref46] 46. Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, et al. Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PloS one. 2013;8(8):e70204. pmid:23950912
View Article
PubMed/NCBI
Google Scholar

[171] View Article

[172] PubMed/NCBI

[173] Google Scholar

[ref47] 47. Jiao S, Wu S, Huang S, Liu M, Gao B. Advances in the identification of circular RNAs and research into circRNAs in human diseases. Frontiers in Genetics. 2021;12:665233. pmid:33815488
View Article
PubMed/NCBI
Google Scholar

[175] View Article

[176] PubMed/NCBI

[177] Google Scholar

[ref48] 48. Niu M, Ju Y, Lin C, Zou Q. Characterizing viral circRNAs and their application in identifying circRNAs in viruses. Briefings in Bioinformatics. 2022;23(1):bbab404. pmid:34585234
View Article
PubMed/NCBI
Google Scholar

[179] View Article

[180] PubMed/NCBI

[181] Google Scholar

[ref49] 49. Myers L, Sirois MJ. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences. 2004;12.
View Article
Google Scholar

[183] View Article

[184] Google Scholar

[ref50] 50. Salvatore S, Dagestad Rand K, Grytten I, Ferkingstad E, Domanska D, Holden L, et al. Beware the Jaccard: the choice of similarity measure is important and non-trivial in genomic colocalisation analysis. Briefings in bioinformatics. 2020;21(5):1523–30. pmid:31624847
View Article
PubMed/NCBI
Google Scholar

[186] View Article

[187] PubMed/NCBI

[188] Google Scholar

[ref51] 51. Niu M, Zou Q, Lin C. CRBPDL: identification of circRNA-RBP interaction sites using an ensemble neural network approach. PLoS computational biology. 2022;18(1):e1009798. pmid:35051187
View Article
PubMed/NCBI
Google Scholar

[190] View Article

[191] PubMed/NCBI

[192] Google Scholar

[ref52] 52. Niu M, Zou Q, Wang C. GMNN2CD: identification of circRNA–disease associations based on variational inference and graph Markov neural networks. Bioinformatics. 2022;38(8):2246–53. pmid:35157027
View Article
PubMed/NCBI
Google Scholar

[194] View Article

[195] PubMed/NCBI

[196] Google Scholar

[ref53] 53. Martinez AM, Kak AC. Pca versus lda. IEEE transactions on pattern analysis and machine intelligence. 2001;23(2):228–33.
View Article
Google Scholar

[198] View Article

[199] Google Scholar

[ref54] 54. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q, editors. Line: Large-scale information network embedding. Proceedings of the 24th international conference on world wide web; 2015.

[ref55] 55. Grover A, Leskovec J, editors. node2vec: Scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining; 2016.

[ref56] 56. Perozzi B, Al-Rfou R, Skiena S, editors. Deepwalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining; 2014.

[ref57] 57. Xuan P, Fan M, Cui H, Zhang T, Nakaguchi T. GVDTI: graph convolutional and variational autoencoders with attribute-level attention for drug–protein interaction prediction. Briefings in bioinformatics. 2022;23(1):bbab453. pmid:34718408
View Article
PubMed/NCBI
Google Scholar

[204] View Article

[205] PubMed/NCBI

[206] Google Scholar

[ref58] 58. Chen Y, Wang Y, Ding Y, Su X, Wang C. RGCNCDA: relational graph convolutional network improves circRNA-disease association prediction by incorporating microRNAs. Computers in Biology and Medicine. 2022;143:105322. pmid:35217342
View Article
PubMed/NCBI
Google Scholar

[208] View Article

[209] PubMed/NCBI

[210] Google Scholar

[ref59] 59. Chen Y, Wang J, Wang C, Liu M, Zou Q. Deep learning models for disease-associated circRNA prediction: a review. Briefings in Bioinformatics. 2022;23(6):bbac364. pmid:36130259
View Article
PubMed/NCBI
Google Scholar

[212] View Article

[213] PubMed/NCBI

[214] Google Scholar

[ref60] 60. He S, Jiang C, Dong D, Ding L, editors. Sd-conv: Towards the parameter-efficiency of dynamic convolution. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2023.

Figures

Abstract

Author summary

Introduction

Results

Hyperparameter Selection of iCircDA-NEAE

Contribution of AANE and DCAE

Comparison with different classifiers

Comparison of different datasets

Comparison with other methods

Case studies

Discussion

Materials and methods

Datasets and model

Similarity measures

Construction of disease semantic similarity

Construction of the Gaussian interaction profile kernel

Construction of the CircRNA expression profile similarity

Construction of the Jaccard similarity

Multisource feature fusion

Feature extraction methods

AANE algorithm to extract features.

Dynamic convolutional autoencoder to extract features.

Random forest classifier predicts associations

Evaluation methods

Supporting information

S1 Table. Comparison of running times of iCircDA-NEAE and iCircDA-NEAE.

S2 Table. The 10-fold cross-validation experimental results on the circRNAdisease.

S3 Table. The 10-fold cross-validation experimental results on the circ2Disease.

References