1. Introduction
With the continuous improvement of industrialization, the structure of industrial equipment has become increasingly complex. For the aviation industry, the aircraft engine, as the most important core component of an aircraft, directly determines whether the aircraft can operate stably and reliably. Due to the harsh environment of high temperature and high pressure that the engine is subjected to for long periods, it is prone to engine performance degradation. At the same time, as a highly sophisticated thermomechanical device, repairing a failed engine is extremely difficult, leading to irreparable losses. The turbofan engine, as a typical type of aircraft engine, is used in various types of aircraft. However, due to its complex operating conditions and large amount of data, predicting the remaining life of a turbofan engine is quite challenging. Therefore, how to extract effective features from the data and accurately predict the remaining useful life of an engine has become a hot topic and a difficult problem in the field of industrial life prediction.
There are three types of existing remaining useful life (RUL) prediction models: model-based methods, data-driven methods, and hybrid methods [
1,
2]. Model-based methods include physical model-based methods and statistical model-based methods. The physical model-based methods for RUL prediction are mainly derived from reliability theory research. They analyze a large amount of experimental data obtained throughout the lifecycle of mechanical equipment, and then utilize mathematical statistics and probability theory to analyze and process the data, thereby obtaining statistical predictions of RUL based on reliability criteria. Kim et al. [
3] proposed a physics-based Markov chain model to identify degradation pathways of lithium-ion batteries. By analyzing the direct correlations between phenomena and states, they used capacity measurements to identify degradation pathways and predict the remaining useful life. Statistical model-based methods, also known as empirical model-based methods, estimate the RUL of machinery by establishing statistical models based on empirical knowledge. Usually, the RUL prediction results are presented as conditional probability density functions based on observation results. Zhang et al. [
4] proposed a statistical feature fusion method based on statistical quantities for equipment health condition assessment. This method has advantages in terms of decoupling indicators and fusing multi-source information. The aforementioned model-based methods not only require measured parameters from actual engineering systems but also rely on extensive prior knowledge [
5] during the model construction process. They depend on certain levels of expert experience, which may hinder their applicability in transfer learning and domain development.
With the continuous development of artificial intelligence, machine learning [
6] and deep learning [
7] are gradually being applied to life prediction. At the same time, due to the widespread use of sensors, data-driven [
8] methods are receiving more and more attention as it is easier to obtain monitoring data for devices. Ren et al. [
9] proposed an adaptive sensor weighting (TGE-ASW) method based on a time-varying Gaussian encoder to address the issue of difficulty in building representative features for multiple sensor raw signals with noise. They used an adaptive sensor weighting strategy and built a convolutional neural network (CNN) to predict RUL by obtaining advanced feature representations. Chen et al. [
10] addressed the problem of little research using graph neural networks (GNN) to capture spatial correlations between sensors, by introducing sensor embedding and proposing a new RUL prediction model based on ConvGAT. Hu et al. [
11] proposed a deep bidirectional recursive neural network (DBRNNs) integration method. In this method, several DBRNNs with different neuron structures were constructed to extract hidden features from sensory data for predicting the remaining useful life of aircraft engines. Li et al. [
12] addressed the problem of increasing complexity of degradation characteristics of aircraft engine components during flight with multi-operating work points (MOP) and proposed a deep learning fusion algorithm based on a self-attention mechanism (SAM). This algorithm uses a one-dimensional CNN to extract spatial features and a long short-term memory(LSTM) network [
13,
14] to fuse the measurement data of one component, and extracts time features from actual measured data.
Combining model-based and data-driven methods [
15] into hybrid approaches can overcome the limitations of both methods. However, due to the complexity and high cost of the models, the development of hybrid methods has not been very successful, and there is limited research in this area. Hybrid algorithms aim to address the shortcomings of the model-based approaches by combining them with data-driven methods. Khumprom et al. [
16] used an evolutionary selection method to choose features from the C-MAPSS aircraft gas turbine engine dataset, and then applied the selected features to train a hybrid convolutional long short-term memory (CNN-LSTM) deep neural network for RUL prediction. XueBin et al. [
17] proposed a diagnostic method for predicting remaining useful life based on degradation trajectory similarity. They first accurately constructed degradation trajectories using convolutional neural network autoencoders and attention mechanisms. Then, they used a new similarity matching rule to evaluate the similarity of degradation trajectories. The results showed that this method has good predictive performance and low sensitivity to sample size, and can be easily incorporated into similarity-based frameworks.
Although neural networks have shown promising performance in predictive tasks, there are several challenges in current research:
- (1)
The impact of spatial characteristics on mechanical life prediction may vary in different working environments. In such cases, it is a challenging problem to integrate time features and spatial features to improve trajectory similarity.
- (2)
Due to the presence of noise and other interferences in the raw signals from sensors, there may be issues with improper allocation of feature weights during similarity calculation, thereby affecting the final prediction results.
To address the aforementioned issues, this study proposes a life prediction model based on spatial–temporal similarity calculation, aiming to enhance the accuracy of RUL predictions. Firstly, certain features exhibit minimal variation throughout the entire time span and carry little information. Including all features directly in the model would result in longer training time. Therefore, this study adopts an adaptive feature selection method to eliminate features that remain unchanged during the lifecycle, thus resolving the issue of data redundancy. Secondly, within the selected features, spatial characteristics are identified, and a modified longest common subsequence (LCSS) algorithm is utilized to calculate the similarity of spatial–temporal trajectories, thereby improving the accuracy of similarity calculation. Finally, the weight training module of the life prediction model is used to assign the feature weights of the remaining parts, and then the final prediction RUL is generated by the life prediction module of the life prediction model.
2. Definition of Terms
The dataset used in this study is the NASA dataset, where each set of degradation trajectory data consists of the engine ID, rounds, three configuration parameters, and measurement data from 21 sensors. Prior to adaptive matching, certain preprocessing steps need to be applied to the dataset [
18]. Due to the uniqueness of the dataset and the subsequent algorithm descriptions, it is necessary to define relevant terms required for the algorithms.
Definition 1. Spatial–Temporal Trajectory Sequence.
The environmental type of an engine can be represented by a triad of attributes, namely, flight altitude, Mach number, and flight speed, defined as envor = {param_1, param_2, param_3}, where param_1 represents configuration parameter 1, param_2 represents configuration parameter 2, and param_3 represents configuration parameter 3. The set of environmental types is defined as E = {envor_1, envor_2, …, envor_n}, where n is the total number of environmental types. A spatial–temporal event at time t is represented as a tuple event = (t, e), where t is the occurrence time of the event and e is an environmental type from the set of environmental types E. A spatial–temporal trajectory sequence is represented as L = {event_1, event_2, …, event_n}, where n is the total number of space-time events in the trajectory sequence.
Definition 2. Remaining Useful Life (RUL) Metric.
Since the dataset does not provide a specific indicator for lifespan, after analyzing the dataset, the number of rounds from the engine’s healthy state to failure state is considered as the lifespan indicator. The formula for calculating the RUL is defined as follows:
where
i represents the current engine ID,
T represents the operating sequence of the current engine ID,
represents the maximum number of flight cycles for engine with ID, and
t represents the flight cycles at the current time.
Definition 3. Matching Result Set.
During the matching process of the matching algorithm, in order to better explain the matching process, the spatial–temporal trajectory sequence of the test set engines is defined as the original string (initial), and the spatial–temporal trajectory sequence of the training set sample engines is defined as the mother string (haystack). After each matching operation, there can be either a successful match or a failed match. When a successful match occurs, a substring (needle) is obtained. The matching cycles are defined as fitCycle = {c_1, c_2, …, c_n}, where n is the length of the substring, and c_1, c_2, c_n correspond to each cycle of the substring. The matching sequence is defined as fitSeries = {number, fitCycle_1, fitCycle_2, …, fitCycle_n}, where number represents the engine ID of a successful match, and the subsequent n matching cycles are the matching cycles of n successful matches. The matching results are defined as fitResult = {fitSeries_1, fitSeries_2, …, fitSeries_n}, where n is the number of successful matches. The matching result set is defined as fitresultSet = {fitResult_1, fitResult_2, …, fitResult_n}, where n is the number of engines in the test set. The matching result set contains the matching results of all engines in the test set and will be used for subsequent similarity calculation algorithms.
Definition 4. Spatial–Temporal Similarity.
To calculate the similarity of sequence matching, the following formula is defined:
where
is the space–time similarity;
is the substring when a successful match occurs;
is the mother string when a successful match occurs;
is the initial substring;
is the length of the longest common subsequence between
and
;
,
,
are the lengths of
,
,
, respectively.
Definition 5. Sensor Parameter Error.
In order to ensure that the parameters of each sensor are not affected by dimensionality, data normalization is performed. The following formula is used to restrict the data to the range [0, 1.0]:
where
x is the value to be normalized,
is the normalized value.
u represents the mean of the sample, and
S represents the standard deviation of the sample.
To calculate the sensor parameter error for a successful match, using the matching sequence obtained from the engine ID and matching cycles to calculate the Euclidean distance between the matching sequence and the matching sequence in the training set samples. The Euclidean distance for each parameter of these two sequences is then calculated based on the weights obtained from PCA weighting. This calculation yields the sensor parameter error for a successful match.
where
i represents one of the
n engine numbers,
represents the weight coefficient for sensor parameter
j, and
and
, respectively, represent the sensor parameters of the substring
and the mother string
with index
j.
Definition 6. Similarity Calculation Formula.
After performing the above operations and obtaining the spatial–temporal similarity and sensor parameter error, firstly the sensor parameter error is converted into sensor parameter similarity. To combine the similarities of different parts [
19,
20], a weighted training model is used to allocate weights to these two similarities. In order to compare the similarity of each matching cycle, a similarity formula is defined.
where
is the spatial–temporal similarity;
is the sensor parameter similarity;
,
are weights assigned to
and
, respectively;
,
represents the maximum value between
and
.