2.1. Graph Neural Network
The graph structure in computers belongs to a non-Euclidean structure, which is characterized by an irregular arrangement, and it is difficult to define its neighboring nodes for a certain point in the data, while the number of neighboring nodes of different nodes is also different, so it is difficult to directly apply it to deep learning methods such as convolutional neural networks and recurrent data networks. The graph embedding model aims to downscale the graph structure, i.e., mapping a highly dense graph to a low-density graph. Graph embedding algorithms are now widely used in the fields of node classification, link prediction, clustering, and visualization [
5].
Due to the different graph structures and different algorithmic principles applied in the embedding process, graph embedding models are mainly categorized as follows. Cai et al. classified graph embedding models into four categories according to the different structures of the input graphs, i.e., matrix-decomposition-based graph embedding, deep-learning-based graph embedding, edge-reconstruction-based graph embedding, graph-kernel-based graph embedding, and generative-model-based graph embedding [
6]. Goyal et al. classified graph embedding into three categories according to the model utilized in the embedding process, i.e., factorization-based graph embedding model, random-walk-based graph embedding model, and deep-learning-based graph embedding model [
7]. Chen et al. classified graph embedding models into seven categories by incorporating the type of the graph along with the model utilized in the embedding process as the basis for the classification, i.e., dimensionality-reduction-based graph embedding, random-walk-based graph embedding, matrix-decomposition-based graph embedding, neural-network-based graph embedding, graph embedding for large graphs, graph embedding for hypergraphs, and graph embedding based on attentional mechanisms [
8]. Xie et al. classified the dynamic networks into five categories based on whether they are of the snapshot type or continuous time type and in accordance with their different graph embedding strategies, i.e., eigenvalue-decomposition-based graph embedding, Skim-Gram-model-based graph embedding, self-encoding-based graph embedding, and graph-convolution-based graph embedding [
9]. Xia et al. classified graph-embedding algorithms into four categories based on graph learning techniques, i.e., graph embedding for graph signal processing, graph embedding for matrix decomposition, graph embedding for random walk, and graph embedding for deep learning [
10]. Based on the existing research ignoring the correlation between static and dynamic graphs, Yuan et al. simultaneously classified static and dynamic graph models into five categories based on the algorithmic principles applied in the embedding process, namely: graph embedding based on matrix decomposition, graph embedding based on random walk, graph embedding based on a self-encoder, graph embedding based on a graph neural network, and graph embedding based on other methods [
11].
The following summarizes the current mainstream graph-embedding algorithm models at home and abroad and briefly introduces their principles. Singular-value decomposition (SVD) is a typical algorithm for the decomposition of adjacency matrices, and SVD essentially employs the idea of vector orthogonal transformation to decompose matrices. Further, on the basis of generalized SVD (GSVD), combined with asymmetric transfer, the HOPE algorithm is derived [
12]. Node2vec [
13], as a classical algorithm for a random walk, is inspired by the skip-gram algorithm and DeepWalk algorithm in the language model and artificially introduces two parameters on the basis of obtaining sequences of neighboring nodes via a random walk along the balance the breadth-first traversal and depth-first traversal, so as to take into account the homogeneity and structure of the graph. Due to the strong time-varying nature of the network in real problems, the CTDNE algorithm [
14] introduces temporal information on the basis of a static graph random walk, i.e., a random walk based on the temporal path, which is more suitable for dynamic graph scenarios. Based on the feature decomposition of the Laplace matrix, the GCN [
15] simultaneously deals with the node features, as well as the connections between nodes to realize a new node representation. Based on the GCN, GraphSAGE [
16] utilizes known nodes to predict unknown nodes through a fixed-size sampling mechanism, and GAT [
17] introduces an attention mechanism to assign weights to nodes so that the input of complete graph data is not required, which is also more in line with real-world scenarios.
In addition, since many real-world problems are not pairwise node relations, concepts such as hypergraphs and heterogeneous graphs have arisen. In order to better study these graph structures, some scholars have also proposed graph-embedding algorithms for hypergraphs, e.g., DHNE [
18], HGNN [
19], etc., as well as algorithms for heterogeneous graphs, e.g., HGSL [
20], etc.
Building on the extensive survey of graph-embedding techniques presented in
Section 2, our study specifically focuses on the advancements of GNN methods, in particular GAT, GCN, and GraphSAGE. These methods represent the prior work graph embedding technique to our study due to their state-of-the-art performance in various graph-based tasks and their relevance to our proposed approach. In the context of these foundational GNN models, our work introduces a novel enhancement by integrating the Louvain method for community detection. This integration significantly boosts the model performance across multiple metrics, including the precision, recall, F1-score, and AUC score. The Louvain method’s ability to uncover latent community structures provides an additional layer of information, enriching the original node features and, thereby, enabling the GNN models to achieve a deeper understanding of the graph’s topology. This enhancement aligns with the recent trends in deep learning, which prioritize the incorporation of structural and contextual information within the learning process, affirming our contribution to the evolving landscape of graph neural network research.
2.2. Link Prediction
The following key metrics are mainly used in link prediction algorithms, i.e., the area under the curve (AUC) metric, accuracy, common neighbor (CN) metric, Adamic–Adar metric, resource allocation metric (RA), preferred attachment metric (PA), random walk with restart (RWR) metric, and Katz metric. Traditional link prediction algorithms can be divided into two categories: unsupervised and supervised algorithms, where unsupervised algorithms are mainly centered on generating functions, node clustering, likelihood analysis, and other related models, while models under supervised algorithms are more diverse. In recent years, many correlations in graph-embedding algorithms have been utilized in link prediction, e.g., DeepWalk algorithm, LINE algorithm. In solving practical problems, since the network structure tends to be more complex and diverse, the main research directions of link prediction models are currently focused on temporal network link prediction, heterogeneous network link prediction, and directed graph link prediction.
Links in temporal networks change over time and, thus, have a more-complex and -diverse dynamic structure. For link prediction in time series networks, Rahman et al. transformed the link-prediction problem into an optimal coding problem for graph embedding through feature representation extraction, minimized the reconstruction error through gradient descent, and proposed the Dylink2vec algorithm. This method simultaneously considers the topology of the network and the historical changes of the links, and compared to the traditional link prediction methods based on a single topological structure and DeepWalk-based link prediction methods, Dylink2vec’s experiments yielded higher CN, AA, and Katz metrics [
21]. Lekui Zhou et al. proposed the DynamicTriad algorithm in conjunction with representational learning, which can simultaneously preserve the network’s structural and dynamic change information, and through the triadic closure process, the nodes disconnected from the triad are connected to each other, in order to capture the dynamic changes at each moment and perform representation learning [
22]. Wu et al. established the GETRWR algorithm, which draws on the random walk algorithm in graph embedding and improves the transfer probability in the RWR metrics by applying the node2vec model, as well as by introducing a restart factor, i.e., the particles have the probability to return to the initial position in the process of the random walk and dynamically calculate the transfer probability of the particles to achieve higher indicators such as the AUC and RWR [
23].
Nodes and edges in heterogeneous networks have different data types, and many link predictions based on homogeneous networks are often not directly applicable. For link prediction in heterogeneous networks, Dong et al. proposed the metapath2vec algorithm, which firstly uses a random walk model based on metapaths to construct neighboring nodes in the network and, then, performs graph-embedding operations on them using the skip-gram algorithm, and finally, predicts the neighboring nodes of the nodes through a method based on negative sampling of heterogeneous networks. This algorithm can maximize the preservation of the structure and semantics of the heterogeneous network [
24]. Fu et al., on the other hand, through the construction of the neural network model used for heterogeneous network representation learning and, then, using the Hin2vec algorithm to capture the different node relationships and network structure in it, combined the meta-path to the nodes and, then, the vector feature learning to generate the meta-paths for comparative experiments. This algorithm better preserves the contextual information, including the correlation and differentiation between nodes, compared to traditional heterogeneous network-link-prediction methods [
25]. Jiang et al. improved the graph convolution neural network, which firstly replaces the traditional graph convolution transfer layer by a HeGCN layer to make it more suitable for dealing with heterogeneous networks and combines an adversarial learning layer to obtain node characterization, then the Adam model is trained to reduce the value of the loss function, which improves the AUC metrics and PA metrics in link prediction compared to the deep walk and Hin2vec algorithms [
26].
Edges in directed networks carry directional information, and the directionality of edges is very common in real-world problems, while ignoring the directional information in them will reduce the model predictability. For link prediction in directed networks, Lichtenwalter et al. proposed the PropFlow algorithm based on the random walk algorithm, which assigns parameters to potential connections through a baseline, and the objects will choose routes based on the weights when walking while restricting the direction of the object’s walking to the direction of the edges; the objects do not need to restart or converge, so the computational time can be reduced [
27]. Ramesh Sarukkai et al. used the Markov chain algorithm in machine learning for link prediction and path research, which can be modeled dynamically according to the historical change condition and combined with eigenvector decomposition to obtain link tours, and this algorithm provides an effective tool for the research of web path analysis and link prediction [
2]. Pan et al. applied a neural network to the link prediction of a directed graph NNLP algorithm, being more suitable, combined the standard particle swarm algorithm to optimize it, and used the error back propagation algorithm and BP iterative learning algorithm to train the neural network; the training process was fused using CN, AA, PA, Katz, and other indicators, thus improving the model complementarity and accuracy [
28].
In addressing the evolving landscape of link prediction, our study employs the same rigorous evaluation metrics as previous research, to ensure comparability and consistency. We established a robust baseline by encompassing heuristic-based methods and machine learning approaches, which sets the stage for our advanced GNN models.