1. Introduction
The exponential increase in online activities, particularly during the Covid-19 pandemic [
1], has led to significant growth toward building online infrastructures for numerous new and existing services, resulting in an unprecedented amount of data being processed and stored in cyberspace. For example, the global cloud services market is expected to grow from USD 396.1 billion in 2020 to USD 798.84 billion in 2025 at an annual growth rate of 14% [
2]. As governments and organizations plan to deliver increasingly advanced services in smart city areas worldwide and factories embrace Industry 4.0 using Internet of Things (IoT) networks, it is projected that the number of IoT devices will surpass 30 billion by 2030. The multifold possibilities within a smart city concept include, among others, efficient and eco-friendly usage of technology to enhance the quality of services in healthcare [
3], coordinated development boosted by smart economy [
4], transportation, water, air quality management, waste management and surveillance. However, the main requirement of a smart city is connectivity across all devices and all aspects, which can only be a possibility if IoT is used on a mass scale [
5], which is further proved by a recent study that shows about 70% of the USA businesses have invested heavily in Industrial IoT [
6] and devices in smart homes and wearables, which directly contributes to the growth of smart cities.
To understand the vulnerabilities and underlying security challenges of smart city applications, it is crucial to understand the threats and vulnerabilities of the sensors on which those applications are built and associated threat mitigation strategies. Sensors detect and respond to physical stimuli, such as changes in temperature, pressure or motion, and convert them into analog or digital signals for subsequent processing and decision-making. From existing IoT-based smart city applications, it is evident that a wide range of sensors are now being used to operate and monitor the functions of smart cities that reach almost all facets of society. For example, one of the largest smart city projects was undertaken at the city of Santander, Spain, which utilizes Libelium’s Waspmote sensor platform [
7], a versatile and modular sensor network analysis space, to monitor the environmental conditions of the city systems. In this project, Meshlium scanners are used as edge devices to gather data, while 750 sensors were deployed across 22 specified zones. Temperature, luminosity, carbon monoxide and sound noise sensors were employed. Libelium’s dedicated sensor nodes for smart cities, the Plug and Sense! Smart Cities PRO, equipped with two radios for 2.4 GHz communication and IEEE 802.15.4 protocol as standard are developed exclusively for this purpose. They contain BME280 temperature, humidity and pressure sensor, along with SCP v30 07 luminosity sensor, OPC-N3 dust sensor and CO-A4 carbon monoxide sensors [
8]. Libelium’s next advanced project on smart city is currently under development at the city of Cartagena [
9], where lampposts were integrated with air quality (OPC N3) and noise monitoring sensors equipped with fast cellular communication standards (5G). Utilizing OPC-N2 technology, AirSensa spearheads a large-scale air pollution monitoring endeavor in London, capturing precise measurements of PM1, PM2.5 and PM10 concentrations.
The Smart Nation Sensor Platform in Singapore [
10] uses sensors to perform important tasks for smart city monitoring such as detecting water leaks, minimizing energy wastage and incident reporting on lightning strikes, where they integrated Vaisala GLD360 sensor network [
11]. To monitor and keep track of environmental pollution, a major task for smart cities, Lecce, Italy is using air quality sensors in its “Integrated Energy Plan” to reduce CO2 emissions. A project on sustainable environment is taking place at Tartu, Estonia and Sonderberg, Norway who are utilizing smart meters and sensor-based monitoring [
12] to improve energy efficiency in community housing and optimize solar energy usage. In Vitoria-Gasteiz, low-level sensors measuring temperature, humidity and CO2 are installed in dwellings, along with energy consumption measuring devices to improve living conditions [
13]. The integration of sensors in smart city projects is essential for effective data-driven decision-making, and addressing the diverse challenges facing modern cities.
The exponential use of these sensors and IoT devices in smart cities and regular IoT environments as a whole present a unique set of challenges. Because of their resource-constrained nature and criticality of some applications, these devices are particularly vulnerable to complex and hard-to-detect attacks [
14]. As most devices used in the lower tiers of IoT systems are left with up to 1 MB only in memory after installation of the operating system [
15], resource-intensive security mechanisms are practically unfeasible. The importance of sectors where IoT devices are deployed, such as medical, energy and military operations, compounded by the scarcity of resources, makes them an easy target for cyber-attacks [
16].
In the cyber attack landscape, malware is the most vicious threat to security. Their ability to stay undetected in systems and deploy automated coordinated attacks makes them particularly destructive for distributed systems such as IoT and Smart cities. Attacks such as RapperBot (August 2022), a recent variation of Mirai malware [
17], infected a large number of IoT devices with a combined attack from over 3500 unique IPs, underpinning the importance of protecting resource-constraint devices from cyberattacks. Obfuscated memory malware (OMM) is a particularly notorious form of malware that employs obfuscation techniques to obscure its presence in the device memory and hide its activities such as code scrambling, command-and-control, string compression and encryption, and code injection to evade detection. Their polymorphic ability to change their behavior with each iteration also makes their goals impossible to understand, thus making it difficult to extract information about the intent of the malware. OMMs include some of the most dangerous viruses, such as Ransomware, Spyware and Trojans [
18]. Thus, OMMs have certainly become powerful tools for infiltrating secure networks and stealing or destroying valuable information. Consequently, there has been a growing interest in the development of robust and efficient detection mechanisms that can analyze and identify OMMs in memory.
Recent advances in deep learning (DL) have led to an increased use of advanced neural networks algorithms, such as Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTMs), to detect and identify malware [
19]. However, it is important to consider the complexity and sporadic nature of OMMs activity patterns, especially when performing multiclass detection (i.e., identifying individual attack types), as the nature of these malware makes the task more challenging. While researchers have implemented a wide range of singular and hybrid models for malware detection, most of the research has focused on binary detection only, which detects the presence/absence of attack within a system. Multiclass detection is critical for embedded and constrained devices to optimize their security strategy by observing and investigating the specific nature of each attack type and devising more custom malware prevention systems. Another major challenge is that most existing methods for detecting OMMs fail to deliver sufficient detection accuracy while maintaining small model sizes. Very few works have focused on models specifically tailored to resource-constrained environments, as most DL methods consist of many layers and a large number of parameters, resulting in model sizes that exceed 1MB and are impractical for deployment on end devices [
15].
Compared to LSTM which has a tendency to forget patterns after a while, Bi-directional LSTMs, have the unique ability to extract rich feature representations and model complex patterns while considering both past and future contexts [
20]. They are adept at retaining or discarding critical and redundant information through the use of gates, thus reducing the resources required. These capabilities are useful to overcome the challenges posed by OMMs in targeted systems. Our model combines CNN and Bi-LSTMs because CNNs excel at mapping features, thereby eliminating the need for feature selection. They also reduce the number of learnable parameters by sharing weights and sub-sample data through pooling layers, reducing the dimensionality of data and computation requirements. Therefore, employing CNNs and Bi-LSTMs as feature extractors and classifiers, respectively, in resource-constrained environments enables us to develop robust and resilient memory attack detection techniques to counter the advanced tactics employed by OMMs.
Overall in this work, we address the associated issues and propose a DL-based OMM detection and attack classification system suited for resource-constrained systems, and make the following contributions:
We designed a hybrid CNN-BiLSTM architecture for the detection and classification of OMM types (multiclass detection). Our model implements a two-layer CNN block, followed by a two-layer Bi-LSTM block to extract high-dimensional feature representations and capture the sequential correlations, respectively.
Through extensive tuning of model parameters, we constructed two distinct models, namely CompactCBL and RobustCBL. While they vary slightly in performance, both models are embeddable in resource-limited IoT devices.
Extensive evaluation on the most recent OMM dataset (CIC-Malmem-2022) demonstrates our models’ superior performance to other competing models and provides a tradeoff between performance and resource.
Acronyms and notations used in this paper are specified in Abbreviations.
The rest of the paper is organized as follows.
Section 2 provides an overview of the current literature on binary and multiclass obfuscated malware detection.
Section 3 presents an in-depth look at our proposed method, while
Section 4 details the performance results obtained from evaluating the CNN-BiLSTM approach on CIC-Malmem-2022 Dataset. Finally,
Section 5 summarizes our study and outlines future perspectives.
4. Experimental Results
To evaluate the effectiveness of our models in detecting obfuscated malware, we utilized the most recent and comprehensive dataset, CIC-Malmem-2022. This section provides a detailed description of the dataset, along with a performance analysis of our models. To ensure a thorough assessment of their capabilities, we conducted three distinct detection tasks: (i) binary attack detection, (ii) attack family detection and (iii) identification of individual attack types. Moreover, we compared our approach with the existing literature to assess the validity of our approach.
The experiments were conducted on an HP EliteOne Desktop, equipped with a 64-bit Windows 10 Education operating system and an Intel(R) Core(TM) i5-7500 CPU @ 3.40 GHz 3.41 GHz processor, with 8 GB RAM. For model building and testing, we utilized popular libraries such as Pandas, Numpy, TensorFlow, Keras and Sklearn.
An 80–20% division of the dataset was performed to generate training and testing sets. As such, the train set contained 46876 samples while the test set had 11,720 samples. Both sets were stratified to ensure all classes exist in both sets. The models were then constructed using the parameters described in
Table 1. During the training phase, batch size of 64 was used to avoid any additional latency, which was also done for the test set to ensure a consistent evaluation process. After the training phase, the model’s weights were saved and loaded for evaluation on the test set. We opted to use the Adam optimizer to update the weights while tracking the loss with the categorical cross-entropy function, which is known for its ability to handle multi-class problems.
4.1. Dataset
The CIC-Malmem-2022 dataset [
18] was created by the Canadian Institute of Cybersecurity and made publicly accessible in 2022. The dataset was created with a focus towards memory-based obfuscated attacks, containing malware from very recent real-life cyberattacks. The samples were created from malicious memory dumps created by using VolMemLyzer [
39], a technique for extracting significant feature representations from real-time network communications. Malmem-2022 contains 58,596 samples in total, with 56 features in each sample. Exactly half of the collection consists of benign network data, and the other half is made up of a variety of contemporary and old obfuscation-based cyberattacks. These attacks can be broadly classified into three families, namely Trojan, Spyware and Ransomware, which collectively comprise 15 distinct attack types. The diversity presented by these attacks provides a unique opportunity to evaluate detection models that can perform against an array of threats, where the attacks range from the browser hijacker CWS malware (2003) to the ever-changing Trojan Scar whose patterns are changing even after 10 years since first detection, to the Ransomware Conti that surfaced in the pandemic era of 2020. This dataset was selected for our study because evaluating new models on old datasets does not translate to efficacy in the real world, as the performance of models heavily depends on the features associated with the attacks and their complexity [
40].
Figure 2 shows the class distribution between benign and attack family categories in CIC-Malmem-2022, which comprises 50% Benign Samples, 16.7% of Obfuscated Ransomware, 17.1% Spyware and 16.2% Trojan malware.
We then illustrate the breakdown of individual attacks in
Table 2. The sample number of all attacks is almost the same, except Transponder, Gator and Shade which contain over 2000 samples. TIBS has the lowest number of samples for any class.
4.2. Model Evaluation Metrics
To evaluate models’ performance in detecting OMMs and its validity for application in resource-constrained devices (IoT devices) used in smart city applications, we measured several widely used metrics that are used to determine the detection accuracy of a machine learning model. At first, we define the following:
True Positive (TP) = number of samples the model correctly detects as attack.
True Negative (TN) = number of samples the model correctly detects as benign.
False Positive (FP) = number of samples the model incorrectly detects as attack.
False Negative (FN) = number of samples the model incorrectly detects as benign.
Accuracy (
ACC): The proportion of correct detections (True Positives and True Negatives) made by the model to the total number of detections (
N):
Precision (
P): The specified percentage of positive predictions which are accurate:
Recall (
R): Proportion of actual benign classes that the model is able to correctly identify:
F1-Score (
F1): The harmonic mean of precision and recall:
To determine the feasibility of integrating the proposed model into resource-constrained devices, two further metrics, (i) model weight size and (ii) detection speed, were calculated and compared with existing works. In the following sections, we demonstrate the results obtained from our experiments to determine our models’ performances in binary, attack family and individual attack detection.
4.3. Binary Attack Detection
At first, we experimented with binary attack detection using our proposed models and contrasted their findings with the existing literature. Our RobustCBL and CompactCBL models both demonstrate similar performance, attaining an accuracy rate of 99.96% and 99.92%, respectively. Notably, both models achieved almost perfect scores (1.0) on additional evaluation metrics such as Precision, Recall and F1-score. However, given the relatively straightforward differentiation between two distinct feature types of classes, wherein one is markedly different from the other, it is typical for models to achieve near-perfect accuracy, as evidenced in
Table 3, where we compare our model results with existing literature. We observe that our model performs at perm or even better than the other models. Further assessments are required to fully assess the models’ potential for attack family and individual attack detection.
In alignment with the studies in [
18,
31], our models have been evaluated with an 80–20% division of training and test data (as we mentioned before). However, the LSTM model in [
27] employed a 70–30% split. To ensure a fair comparison with LSTM model, we additionally conducted another experiment with the same split. Results with this split show that RobustCBL and CompactCBL attain accuracies of 0.9994 and 0.9985, respectively, compared with 0.9943 obtained by the LSTM model in [
27].
4.4. Attack Family Detection
Although binary detection results are promising, merely detecting the existence of an attack is not enough. To effectively prevent attacks and secure a network through robust security policies, it is crucial to know the attack types and their nature. In this section, we built models to identify three broad attack families and a benign category, i.e., four classes in total.
Table 4 presents the results of our family detection performance compared to the existing literature.
The two proposed models attain 84.56% (RobustCBL) and 84.22% (CompactCBL) detection accuracy and outperform other existing works. Note that RobustCBL achieved slightly higher accuracy due to its larger parameter size, whereas CompactCBL performed remarkably well despite being nearly 40% smaller than RobustCBL. Moreover, both models outperformed Mezina et al.’s Dilated CNN model [
31] by 1.03% and 0.7%, respectively, which is the sole other work to evaluate multiclass classification on this dataset and reported by the authors. They also evaluated the performance of Decision Tree, but with less successful outcomes.
Considering other metrics, our models attain 0.85 and 0.84 precision, 0.85 and 0.84 recall and 0.84 F1-score for both cases against 0.76, 0.75 and 0.75, respectively, by DCNN, making significant improvements over existing works. These metrics hold particular relevance in the context of malware detection. The classwise performance comparison in detail are presented in
Table 5. Results from both models reveal that the classifier performed remarkably well in identifying benign samples, whereas obfuscated attack samples posed a greater challenge. Nevertheless, our models perform better than existing works in this regard in most cases. The difficulty in detecting obfuscated attack samples can be attributed to the absence of obfuscation in benign samples, which makes them comparatively easier to detect. Additionally, ransomware attack detection is more challenging than that for other attacks. This can be attributed to the fact that most Ransomware attacks in the dataset have originated quite recently, and uses advanced obfuscation technologies to hide any characteristics that would help categorize it as malware.
Size constraints represent another key factor in assessing tradeoffs between size and performance. Thus, we compared our models’ weight size to that of DCNN. However, as no model size for DCNN was reported in [
31], therefore, we reconstructed an approximate model based on their description, which yielded a size closer to 6 MB, rendering it unsuitable for most sensors and IoT end devices. As alluded to in
Section 3, resource-constrained devices, such as IoT devices for real-time or quasi-real-time applications in smart cities, where response times are within 1 ms, require ML models of at most 1 MB size. Therefore, our architecture, particularly the CompactCBL model, demonstrated the ability to attain greater performance than DCNN [
31] while being significantly smaller, only 577 kB, or 1/20th the size of DCNN.
4.5. Attack Type Detection
It is important to note that individual attacks in an attack family can possess unique characteristics and different obfuscation techniques that require customized detection and mitigation strategies. By being able to distinguish among different variants, the effectiveness of security measures can be enhanced. For instance, the Trojan family consists of Zeus, which originated in 2007, and Scar attacks, the latest version of which appeared in 2019 [
41]. As such, we can assume there would be significant differences between the mechanisms of Zeus and Scar, as well as the associated patterns or features.
Therefore, evaluating our model’s performance in detecting all 16 individual classes separately would provide insights into its ability to detect individual attack types.
To the best of our knowledge, our study is the first one that attempts to identify all 15 separate attack types in the dataset. We show the evaluated results of RobustCBL and CompactCBL in
Table 6, where they attain detection accuracy of 72.6% and 71.42%, respectively. As no reported works exist in the literature on attack type detection on the Malmem-2022 dataset, these results could not be compared.
As we can see, RobustCBL performs better than CompactCBL in all four metrics. An analysis of the model’s predicted classes shows that the benign class exhibits very high accuracy, as it contains features that are not obfuscated and hence easy to distinguish from attack class features. As expected, since there are many classes, identifying each class with high accuracy is difficult because extracting features with sufficient enough distinguishable characteristics among a large number of classes are challenging. Our observations show that some of the ransomware attacks (Ako, Shade, Ransomware Pysa, Maze or Conti) are misclassified as one of the Trojans, specifically, Zeus attack. Within the Spyware family, we observe that some of the 180 solutions and CWS samples are misclassified as Transponder attacks. In general, detecting Ransomware type of attacks was more difficult than Spyware or Trojan type of attacks.
4.6. Detection Speed
Security measures placed in a network should be able to make prompt and decisive actions in response to incoming attacks. This is more critical in IoT systems where real-time detection is essential. As alluded to before, sensors and end devices in IoT systems should have the ability to detect signals within a time frame of less than 1 ms. This motivated us to compare the speed of our proposed models with existing works [
18,
31], shown in
Table 7. Results demonstrated CompactCBL’s superior performance, classifying each sample in only 0.255 ms, aided by its compact model size without significant accuracy compromise, as previously detailed. RobustCBL detected sample at a speed of 0.384 ms, indicating both models’ suitability for IoT devices. While DCNN’s detection speed is within the acceptable time limit, its model size makes it unsuitable for many IoT devices. On the other hand, the stacked ensemble-based model is extremely slow for real-time attack detection.
4.7. Applicability of the Proposed Models
Our CNN-BiLSTM-based models, CompactCBL and RobustCBL, embody the necessary properties for integration into IoT-fuelled Smart City applications. As urban spaces increasingly adopt sensor-driven systems, sectors such as guided parking in heavily populated areas [
42], detecting violence through video surveillance and real-time monitoring of communication infrastructures (such as railroads and highways) are key areas that use sensors to monitor and relay the information to the processing layer. IoT sensors used in these applications are vulnerable to cyber threats including obfuscated memory malware attacks. In the event of an attack, the compromised sensors can affect the operational conditions of a smart city including traffic conditions, urban environment and law enforcement. Likewise, monitoring systems assessing water, air and noise quality, which are foundational for urban sustainability, demand efficient memory malware detection frameworks. As mentioned in
Section 1, OMMs are harder to detect than regular malware, and as such, our models aim to secure the constrained devices used in these applications, bolstering their resilience against complex cyber threats.
Validated on the most recent OMM dataset CIC-Malmem-2022, our models outperform relevant existing models, achieving high accuracy for binary and attack family (4-class) identification, with good accuracy for individual attack identification (16-class). Most notably, CompactCBL, despite being the smallest, outperforms every other algorithm except RobustCBL. The model achieves 99.92%, 84.22% and 71.42% for binary, family and individual attack detection, while RobustCBL attains 99.98%, 84.56% and 72.60% accuracy, respectively. The larger model size aids RobustCBL in producing higher accuracy, still, its size of around 970 KB is easily implementable for constrained devices. The small size of CompactCBL makes it faster with a detection speed of 0.255 ms/sample, making it more suitable for implementation in various sectors mentioned above. As such, for broader applicability, the two models introduced by this study can be used in defense of a wide range of IoT and small-scaled embedded devices used in smart cities, where real-time response is required.
5. Conclusions and Future Works
Sophisticated evasive mechanisms against detection have made OMMs harder to detect than other types of malware, and as such their usage for malicious purposes has skyrocketed in recent years. Though large DL models with huge parameters exist to prevent these attacks, they cannot be used in small-scale systems such as IoT networks in smart cities and other applications. In this paper, we present a robust system for OMM detection in resource-constrained devices. We utilize CNNs for their ability to extract obscure features from malware memory, and Bidirectional LSTMs for longer and context-aware pattern analysis. To evaluate the effectiveness of our method, we built two models, (i) CompactCBL and (ii) RobustCBL, and evaluated their performance using the recent OMM dataset. Our models outperformed existing models in terms of widely used detection performance metrics and the time required for detection. Additionally, this study helps to gain insights into obfuscated ransomware’s enhanced deception ability compared to other attacks.
While our proposed models demonstrate advancement in obfuscated malware detection, there are scopes for further improvements. Specifically, enhancing accuracy in detecting granular multiclass (individual) attack types remains a challenge. Thus, our future efforts will focus on addressing this limitation. This will involve devising innovative architectures to enhance the identification of individual attack types, as well as unknown or zero-day attacks, for which we will investigate semi-supervised/unsupervised learning models into our framework while keeping the model size implementable in sensors. Furthermore, future studies will also target evaluating the models on a real-world IoT-based smart city application (Guided Parking) under various obfuscated malware attacks.