International Journal of Research and Innovation in Social Science

Submission Deadline- 14th February 2025
February Issue of 2025 : Publication Fee: 30$ USD Submit Now
Submission Deadline-04th February 2025
Special Issue on Economics, Management, Sociology, Communication, Psychology: Publication Fee: 30$ USD Submit Now
Submission Deadline-20th February 2025
Special Issue on Education, Public Health: Publication Fee: 30$ USD Submit Now

Comparison of the Use of Support Vector Machine (SVM) & Random Forest Algorithms (RF) for DDOS Attack Detection

  • Ho Zi Rui
  • Tan Ying Chien
  • Loo Xin Ee
  • Cheong Wei San
  • Law Teng Yi
  • 1126-1138
  • Feb 4, 2025
  • Cybersecurity

Comparison of the Use of Support Vector Machine (SVM) & Random Forest Algorithms (RF) for DDOS Attack Detection

Ho Zi Rui, Tan Ying Chien, Loo Xin Ee, Cheong Wei San, Law Teng Yi

New Era University College

DOI: https://dx.doi.org/10.47772/IJRISS.2025.9010094

Received: 30 December 2024; Accepted: 02 January 2025; Published: 04 February 2025

ABSTRACT

DDoS attack is one of the major challenges to network security in today’s time, destroying services and creating huge losses. The study here presents an assessment of the performance of Support Vector Machine and Random Forest algorithms on DDoS detection based on the DDoS-SDN datasets. Key metrics that were considered for performance evaluation include accuracy, precision, recall, and F1-score. The results indicate that RF outperforms SVM in complex, high-dimensional datasets such as DDoS-SDN, using its ensemble learning approach to attain greater robustness and accuracy. This research also explores the role of feature selection techniques, such as Genetic Algorithm (GA) and Recursive Feature Elimination (RFE), to enhance model efficiency and accuracy. This paper discusses the strengths and limitations of both algorithms to provide insight into the optimization of machine learning models toward efficient DDoS detection for secure and resilient network systems.

INTRODUCTION

In today’s interconnected world, the internet plays a pivotal role in daily life. Cybersecurity has become crucial as many fields, such as economic, cultural, political, and educational activities are conducted in cyberspace [1]. It is very important to determine and detect cybersecurity from cyber-attacks towards the vulnerable so that it can protect important data, and provide authentication, and access control for resources. Therefore, the CIA triad (Confidentiality, Integrity and Availability) should be integrated into cyber security. Confidentiality is to limit and restrict the access of certain resources to protect confidential data away from illegitimate or unauthorized users. Integrity, also known as originality, maintains the accuracy and validity of data by making sure that it is protected against unauthorized modification, deletion, and alteration. In addition to this, availability in the CIA triad is about to ensure resources are accessible whenever it is needed [2].

As technology advances, cyber-attacks are becoming more and more difficult to prevent and detect, and this poses significant risks of data leakage, sabotage and even international conflict. According to Richard Clark, cyber-attacks can be considered as actions by a country infiltrating other countries’ computers or computer networks. Distributed Denial-of-Service (DDoS) attacks have emerged as a significant cyber-security challenge [1]. DDoS operates by overwhelming the target resources using numerous computers to send excessive requests and make them unavailable. DDoS is very hard to detect as it breaks the traditional attack methods. However, DDoS can be detected by using machine learning as it can identify complex patterns in network traffic [3].

Machine Learning models such as SVM and RF have great promise in pattern identification from network traffic for the classification of malicious activity such as DDoS attacks. These models can be trained to detect abnormal behaviour. Therefore, this differentiated the normal and attacked traffic. Besides, various detection models have been optimized using different search algorithms, which allowed for better feature selection and resulted in high performances of these models with regards to accuracy and computational efficiency [2].

This research compares Support Vector Machines (SVM) and Random Forests to investigate which of the algorithms outperforms DDoS attack detection. In this research, the dataset, DDOS-SDN-DATASET with the implementation of linear search algorithm, is used, and the algorithms are compared in terms of different metrics such as accuracy, precision, recall, and F1-score. The testing of these algorithms on datasets with diverse features, including different complexities and distribution methods of the data will help in an effective presentation of where each algorithm is more effective. It also investigates how techniques such as feature selection can improve performance, helping to fine-tune the models by focusing on the most important features for detecting DDoS attacks.

The goal of this research is to improve machine learning-based solutions for cybersecurity. By understanding the strengths and weaknesses of SVM and Random Forest, this study hopes to guide the development of better tools for detecting and preventing DDoS attacks. Ultimately, the findings will provide practical insights to help the networks be more secure and resilient against today’s evolving cyber threats.

OBJECTIVES

This study aims to evaluate and compare the performance of Support Vector Machine (SVM) and Random Forest (RF) algorithms in detecting Distributed Denial of Service (DDoS) attacks. The comparison focuses on key performance metrics like precision, recall, and the F1-score. These metrics provide a comprehensive understanding of the models’ effectiveness in identifying attacks and minimizing false positives and improving the detection accuracy [6].

Feature selection techniques such as Genetic Algorithm (GA) and Recursive Feature Elimination (RFE) will also be incorporated. These methods help improve classification accuracy and reduce model computational complexity. Previous research has highlighted their effectiveness in selecting optimal features from high-dimensional datasets, which is essential for accurate and efficient DDoS detection [10].

Another goal is to identify the strengths and weaknesses of SVM and RF on different types of datasets. RF performs well with large, high-dimensional data and reduces overfitting due to its ensemble structure [8]. On the other hand, SVM works best with datasets that have clear decision boundaries. However, it struggles with larger datasets and high-dimensional features [6]. This study aims to provide insights into their practical use in real-world scenarios. The findings will contribute to developing more efficient machine learning models for cybersecurity [10].

PROBLEM STATEMENT

Distributed Denial of Service (DDoS) attacks are one of the most significant threats to modern network infrastructure. These attacks overwhelm resources, disrupt services, and cause serious financial and reputational losses [4][7]. Machine learning techniques have improved detection, but many challenges remain.

One key issue is the quality of datasets. Many publicly available datasets are highly imbalanced. They contain many normal instances but fewer attack samples. These datasets also fail to capture modern and advanced attack patterns. As a result, machine learning models trained on them may not perform well in real-world situations [6][10]. High-dimensional datasets further complicate the process. Feature selection is necessary to reduce complexity, improve accuracy, and prevent overfitting [3][9].

The algorithms themselves have limitations. SVM is effective for small datasets with clear decision boundaries but struggles with scalability and high-dimensional data. Without dimensionality reduction techniques, its performance and computational efficiency suffer significantly [3][6]. It also performs poorly with datasets that have many features. RF, on the other hand, excels with large datasets and handles imbalanced data effectively due to its ensemble structure. However, the algorithm can become computationally expensive when utilizing a large number of trees or when extensive parameter tuning is required​ [10]. Inconsistent use of evaluation metrics across studies also makes it hard to compare results [7].

These challenges highlight the need for improved approaches to DDoS detection. This study will analyze SVM and RF in detail. It will also explore the role of feature selection techniques such as Genetic Algorithm (GA) and Recursive Feature Elimination (RFE). By addressing these issues, the research aims to develop better tools for network security [6][7].

LITERATURE REVIEW

Distributed Denial-of-Service (DDoS)

There are many studies regarding the Distributed-Denial-of-Services (DDoS) attack that have been established and discussed. There are many tools that can be used to perform DDoS attacks. For instance Tor’s Hammer, Trinoo, and GoldenEye HTTP DoS tool [2] . DDoS also is a significant threat to Internet of Things (IoT) networks. Basically, there are three types of DDoS attack by referring to the basic IoT network layered architecture. These attacks are typically generated through bots and malware that exploit IoT devices so that it can create numerous false requests. The DDoS attacks can be categorized into the Application Layer Attacks and Infrastructure Layer Attacks according to the IoT network layered architecture [8].

Application Layer Attacks are attacks that attempt to invade the application layer of the IoT network infrastructure. These attacks are characterized by a large number of HTTP (Get/Post) requests to applications or web servers and target system software such as Windows, Apache and OpenBSD. They typically run at low traffic rates and appear legitimate, but trigger resource-consuming back-end processes. Application layer attacks are measured in requests per second (Rps). The common examples include HTTP flood and DNS service-based attacks [8].

The infrastructure Layer Attacks use vulnerabilities in the transport or network layer. This attack can be further divided into two types, which are Protocol-based Attacks and Volume-based Attacks. The Protocol-based Attacks are also known as Resource Depletion attacks. This subtype of attacks is measured in packets per second (Pps), it also targets server resources and communication equipment. The example of Protocol-based attacks is Ping of Death, SYN floods, and Smurf DDoS. Moreover, the Volume-based Attacks also known as Bandwidth Depletion attacks as it saturates the bandwidth of the target system by producing excessive traffic in bits per second (Bps). These attacks use amplification and reflection techniques and account for approximately 65% of all attacks. Common examples of volume-based attacks include UDP/TCP floods and ICMP floods [8].

DDoS has a more structured variant involving multiple sources, amplifying the attack’s impact. In response, Intrusion Prevention Systems (IPS) and Intrusion Detection Systems (IDS) have become critical for network security. IPS is a method of detecting network threats by monitoring packets within the network firewall and analyzing their anomalous behavior in order to prevent attacks immediately [10]. Conversely, Intrusion Detection Systems (IDS) concentrate on the detection of attacks by assessing whether they are deemed secure, suspicious, or malicious. The K-Nearest Neighbour (KNN) algorithm plays a significant role in the classification of attacks within IDS by offering a method to categorize network traffic utilizing historical data. The algorithm uses labeled training data to identify patterns and classify new instances of traffic as normal or attacks such as Denial of Service (DoS) and probing attempts by accomplishing the classification by evaluating how close the new data points are to the existing labeled data. This allows the IDS to detect abnormalities in network behavior by gradually learning over time and thus increasing its accuracy and detection anomalies in network behavior [4]. Moreover, IDS is additionally integrated with deep learning methods such as Convolutional Neural Networks (CNN) VGG-19 in order to classify traffic and detect DoS attacks [10]. In addition to this, more and more people are using deep learning (DL) methods in intrusion detection. For example, Tang et al. achieved 75.75% accuracy using finite traffic features, Yin et al. achieved 83.28% accuracy using recurrent neural networks on the NSL-KDD dataset, and Fu et al. achieved 97.52% accuracy using long short-term memory (LSTM-RNN) [11].

Besides, many of the research has focused on DDoS detection methods within Software-Defined Networking (SDN). Software-Defined Networking (SDN) represents a new paradigm in designing, managing, and implementing networks to address complex needs. SDN’s primary concept separates the control and forwarding functions in the data plane. Unlike traditional networks that combine the control plane and data plane on a single device, SDN centralizes the control plane to make network management faster and more efficient [10]. DDoS detection methods in SDN have received significant research attention. Early approaches in SDN include the method of Braga et al. which utilizes the Self-Organising Map (SOM) model to detect attacks based on six traffic characteristics to achieve a high detection accuracy. Nam et al. combined SOM with K-Nearest Neighbours (KNN) to maintain 98.24% accuracy while reducing computational overhead [11].

Random Forest (RF)

Random forest is a set of decision trees, which are passed against the data set elements. It makes a final judgment by creating multiple decision trees and then combining the predictions of these trees. Random forest models perform classification by voting or averaging the outputs of individual decision trees. [6] Due to its combination of predictions from multiple models, Random Forest tends to perform more consistently when dealing with complex data and is less susceptible to individual incorrect predictions.

Random Forest uses an integrated learning approach, which stands for the fact that it aggregates the predictions of multiple decision trees to generate results with higher accuracy. Each decision tree in RF is trained on a different subset of the data, which is generated by a technique called bagging or bootstrap aggregating. RF is very dominant in high-dimensional data, such as network traffic classification data used for DDoS detection. High dimensional datasets include many features or variables and hence may lead to overfitting of a single decision tree model. At this point, the integration feature of RF and the technique of randomly selecting features at each split just reduces the risk of overfitting drastically. The process of random feature selection is such that only a portion of the randomly selected features are considered in the decision making for each tree. Thus the likelihood of a feature dominating the model is further reduced, thus reducing the risk of overfitting. As the number of trees in the RF increases, the error of each tree is averaged out and the performance of the model is improved. The overall error rate then decreases until an optimal point is reached [5]. At this time, adding more trees will not significantly improve performance, but will provide a more stable and robust model.

The Random Forest algorithm advantages are that there is a certain amount of flexibility. This is because it can handle both discrete and continuous data types. Even if there is noise in the data, the predictions are usually not too far off. This algorithm can be described as particularly robust in that it can distribute the data held between many decision nodes, therefore greatly improving accuracy as well as ensuring low correlation between the trees. [9] In terms of why it ensures low correlation between the trees, this relates to the potential for analyzing decision errors. The Random Forest algorithm is supposed to improve performance by reducing the combination of bias and variance. So, if the decision trees have a high degree of correlation, they will collectively tend to make the same wrong decisions and diminish diversity, which in turn reduces the overall model performance. In addition, random forests are effective in avoiding overfitting problems and improving the generalization of the model.

Aside from the advantages, random forests have their own limitations. Compared to a single decision tree, the training and prediction of a tree in a random forest can be relatively slow. This process adds to its computational complexity, so it can also be time-consuming. Again, since the random forest model is made up of multiple decision trees, the final model will be more complex. [5] This then causes the process to become more difficult to understand and interpret.

Support Vector Machine (SVM)

The Support Vector Machine (SVM) is frequently used in DDoS detection because of its capacity to create an optimum hyperplane that efficiently separates normal and malicious traffic, making it ideal for high-dimensional data situations such as network traffic analysis [3].

Researchers used dimensionality reduction and optimization strategies to increase the efficiency of SVMs. A significant technique combines Kernel Principal Component Analysis (KPCA) with the Genetic Algorithm (GA), which decreases computing burden by projecting data into a lower-dimensional space while improving accuracy through parameter optimization. This approach is very useful for real-time detection in Software-Defined Networking (SDN) [11].

Additionally, there is the N-RBF kernel, a modified version of the Radial Basis Function that is intended to save training time by normalizing attribute values across different communication protocols. This change decreases the amount of support vectors, allowing for quicker model convergence and making it appropriate for large-scale DDoS detection.

An improved SVM model may be incorporated directly into SDN controllers to provide continuous traffic monitoring and categorization. This design allows for real-time responses to DDoS assaults, such as blocking certain flows as soon as they are detected. Comparative studies reveal that KPCA-GA-enhanced SVM models outperform regular SVM and other classifiers such as Random Forest, obtaining up to 98.91% accuracy and handling unbalanced datasets more effectively [13].

Comparison of SVM and RF in DDoS Detection

In comparing the effectiveness of Support Vector Machine (SVM) and Random Forest (RF) in DDoS detection, studies reveal high success rates for both algorithms, though SVM demonstrates a slight advantage. Specifically, SVM achieves a success rate of 99.7%, outperforming RF’s success rate of 98.4% [13]. This suggests that SVM’s margin-based classification approach may be more effective at correctly identifying attack traffic, particularly in distinguishing between subtle variations within complex datasets. However, despite SVM’s higher success rate, RF’s close performance indicates its strong reliability as a detection mechanism in DDoS environments.

While both SVM and Random Forest (RF) offer effective DDoS detection capabilities, RF consistently outperforms SVM in terms of accuracy (AR) and lower false positive rate (FR) across TCP, UDP, and ICMP protocols. RF’s more stable detection rate (DR) and reduced variability across sampling intervals make it a stronger choice for high-traffic and mixed-protocol environments, where reliability and precision are critical. Though SVM occasionally demonstrates competitive detection rates, particularly in UDP detection, RF’s overall consistency and robustness in minimizing false positives position it as a more dependable solution for comprehensive DDoS detection in dynamic network conditions [3].

RESEARCH METHODOLOGY

Dataset

The DDoS SDN dataset is derived from Kaggle 2021 and contains 104,345 rows and 23 columns, including 3 categorical features and 20 numerical features. Its target variable categorizes traffic as malicious (1) or benign (0). The dataset provides a detailed view of network traffic through metrics such as packet count, byte count, duration and protocol details, providing realistic conditions for testing machine learning models. It is designed to reflect the real-world SDN (software-defined networking) environment, including a variety of DDoS attack types, making it particularly well suited for evaluating the effectiveness of detection algorithms. By covering both normal and malicious traffic, the DDoS SDN dataset allows for comprehensive analysis and evaluation of machine learning techniques for DDoS detection [15].

Classification Techniques

The Support Vector Machine (SVM) algorithm is implemented using scikit-learn’s svm.SVC and employs various kernel functions to explore their impact on classification performance. The kernels evaluated include linear, polynomial (poly), radial basis function (rbf), and sigmoid, each offering distinct mathematical transformations to improve the separability of data in the feature space.

For each kernel, the SVM model is trained on the dataset (self.X_train, self.y_train) and predictions are generated for the test set (self.X_test). The accuracy of each kernel is computed using accuracy_score and stored along with the kernel type in a result list. The kernel achieving the highest accuracy is identified, and the SVM model is retrained using this optimal kernel to ensure the best performance.

The final model’s classification metrics, such as precision, recall, and F1-score, are computed and displayed using classification_report. This iterative kernel evaluation approach ensures that the most suitable transformation is chosen for the dataset. The overall computational efficiency is captured by calculating the execution time, emphasizing the versatility and adaptability of the SVM algorithm in detecting DDoS attacks.

The Random Forest (RF) algorithm is implemented in this study using the RandomForestClassifier from scikit-learn which configured with several important hyperparameters. The criterion for splitting nodes is set to ‘gini’ to measure the quality of a split, and the model uses 500 estimators to ensure robust predictions. The min_samples_split parameter is set to 10 which means require at least ten samples to split an internal node to prevent overfitting. Additionally, the max_features parameter is set to ‘auto’, allowing the algorithm to consider all features when looking for the best split. The model also utilizes out-of-bag (OOB) samples to compute the OOB score to provide an unbiased estimate of model accuracy.

The training process begins by fitting the RF classifier to the training dataset (self.X_train, self.y_train). Once trained, the model predicts labels for the test dataset (self.X_test). Performance metrics, including accuracy which are calculated using accuracy_score, and detailed results are generated through classification_report. The algorithm is optimized for efficiency by setting n_jobs=-1, enabling parallel computation across all CPU cores. Finally, the total execution time is calculated and displayed which is used to highlight the computational efficiency of the RF model. This implementation demonstrates how RF can effectively classify data and evaluate its performance in detecting DDoS attacks.

Performance Metrics(Accuracy, Precision, Recall and F1-Score)

Accuracy – Accuracy measures the proportion of correctly classified samples out of the total number of samples. It is a general indicator of the model’s correctness. However, it can be misleading in imbalanced datasets, as a model can achieve high accuracy by simply predicting the majority class.

Precision – Precision measures the proportion of true positives out of all samples predicted as positive. It focuses on minimizing false positives.

Recall – Recall measures the proportion of actual positive samples that are correctly identified by the model out of all actual positive samples .

F1 Score – The F1-Score is the harmonic mean of precision and recall. It combines both precision and recall into a single metric, balancing the trade-off between them. F1-Score is particularly useful in situations where the dataset is imbalanced, as it ensures that both false positives and false negatives are considered in the evaluation. A high F1-Score indicates a good balance between precision and recall.

DDOS-SDN-Dataset

In this section, we evaluate the performance of a binary classification model using a DDOS-SDN dataset containing two classes: Class 0 and Class 1. The dataset provides evaluation metrics for each class, including Precision, Recall, F1-Score, and Support, which represent the actual number of samples in each class. Our goal is to calculate the overall performance metrics of the model based on these class-specific values.

Random Forest :

Metric Class 0 Class 1 Overall
Precision 0.99 1.00
Recall 1.00 0.99
F1 Score 1.00 0.99
Support 18922 12230 31152
Accuracy 0.99

Accuracy = 0.99

Precision (Overall) :

\( \text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}} \)

\( 0.59 = \frac{31029.7}{31029.7 + 191.13} \)

Precision = 0.99

Recall (Overall) :

\( \text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}} \)

\( \text{Recall} = \frac{\text{True Positives (TP)}}{\text{Supports}} \)

\( \text{Recall} = \frac{31029.7}{31152} \)

Recall = 1.00

F1 Score (Overall) :

\( \text{F1 Score} = \frac{2 \times (\text{Precision} \times \text{Recall})}{\text{Precision} + \text{Recall}} \)

\( \text{F1 Score} = \frac{2 \times (0.99 \times 1.00)}{0.99 + 1.00} \)

FI Score = 0.99

Support Vector Machine :

Metric Class 0 Class 1 Overall
Precision 0.90 0.95
Recall 0.96 0.86
F1 Score 0.93 0.90
Support 17757 13395 31152
Accuracy 0.92

Accuracy = 0.92

Precision (Overall) :

\( \text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}} \)

\( 0.59 = \frac{28566.42}{28566.42 + 2500.38} \)

Precision = 0.92

Recall (Overall) :

\( \text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}} \)

\( \text{Recall} = \frac{\text{True Positives (TP)}}{\text{Supports}} \)

\( \text{Recall} = \frac{28566.42}{31152} \)

Recall = 0.92

F1 Score (Overall) :

\( \text{F1 Score} = \frac{2 \times (\text{Precision} \times \text{Recall})}{\text{Precision} + \text{Recall}} \)

\( \text{F1 Score} = \frac{2 \times (0.92 \times 0.92)}{0.92 + 0.92} \)

FI Score = 0.92

DISCUSSION

This section compares the performance of Support Vector Machines (SVM) and Random Forest (RF) in detecting DDoS attacks across different metrics accuracy, precision, recall, and F1-score based on results from the DDos-SDN-Dataset. It further analyzes why one algorithm outperforms the other based on dataset characteristics and the strengths of each algorithm.

Comparison of Results Across Metrics

In the current study, the Support Vector Machine (SVM) and Random Forest (RF) RF with an accuracy of near perfect accuracy 100.00% significantly surpassed the SVM which achieved an accuracy of 91.77% in radial basis function (RBF) kernel on the DDoS-SDN-Dataset. The RBF kernel yielded the best performance, achieving an accuracy of 91.77%. In terms of precision, recall, and F1-score, SVM showed strong results, particularly for both classes, with a precision of 0.97 for Class 0 and Class 1, recall values of 0.98 for Class 0 and 0.95 for Class 1, and F1-score values of 0.93 for Class 0 and 0.90 for Class 1. RF consistently demonstrated higher precision, recall, and F1-score than SVM. Specifically, RF recorded a precision of 0.99, recall of 1.00, and F1-score of 1.00 for Class 0, and precision of 1.00, recall of 0.99, and F1-score of 0.99 for Class 1. These results indicate that RF is more reliable in detecting true positives and reducing false negatives, especially in complex datasets like DDoS-SDN-Dataset. In short, the Random Forest algorithm significantly outperformed SVM in DDoS detection on the DDoS-SDN-Dataset.

Ensemble methods like Random Forest are particularly well-suited for handling complex network intrusion detection tasks. Random Forest’s ensemble learning approach involves training multiple decision trees on different subsets of the data and aggregating their predictions. This process enhances not only the accuracy but also reduces the overfitting problems, which may affect single-model algorithms like SVM [18][19].

SVM’s Challenges and Contributions

The limitations of SVM on handling large-scale and complex data such as the DDoS-SDN-Dataset are highlighted in this paper. SVM presents scalability issues due to its nature that enable computational complexity to become exponential with an increase in the number of training samples. This makes it less apt for datasets containing high-dimensionality or noisy data, for which training needs much computational resources and memory.

In fact, SVM is very sensitive to feature selection, kernel tuning, and data scaling which are parts of preprocessing. SVM may not provide an adequate model that captures a meaningful pattern in the data without preprocessing. It is also worth noting that datasets like DDoS-SDN-Dataset consist of imbalanced classes. Also, the selection of kernels should be done carefully as an improper selection may lead to either underfitting or overfitting, which may degrade performance [16].

Despite these, SVM executes well on simpler datasets such as NSL-KDD, where the decision boundaries are clearer, and the feature set is balanced. For SVM, the maximization of the margin between classes enables it to be very effective in less complex environments. However, with the increasing complexity of the dataset, especially with nonlinear attack patterns, SVM requires more extensive preprocessing and careful hyperparameter tuning to maintain optimal performance [14][16].

Random Forest’s Strengths and Robustness

The main advantage of the Random Forest ensemble learning approach is its high adaptability to complex and noisy datasets. RF reduces the variance in order to improve the overall accuracy by training multiple trees independently and then combining their predictions using averaging. This is demonstrated by the DDoS-SDN dataset [19].

Random Forest is not heavily dependent on preprocessing steps such as feature scaling and can manage class imbalances and high-dimensional data with minimal performance loss. Its ability to process large and diverse datasets efficiently is supported by its parallelization feature, making it a strong candidate for real-time intrusion detection. Additionally, by averaging predictions across multiple decision trees, Random Forest reduces the risk of overfitting and performs reliably even when handling complex attack patterns, such as those found in DDoS-SDN datasets [19].

Based on these merits, Random Forest is appropriate for the task of DDoS detection. The ease with which it deals with noisy and complex data, imbalance, and high-dimensional data space adds to its real-world reliability. Moreover, its resistance to overfitting and parallelization capabilities make it well-equipped to manage the evolving nature of DDoS attacks, providing a balance of accuracy and efficiency for real-time detection systems.

In other research, the Random Forest classifier outperformed algorithms like SVM and Naïve Bayes in both performance and efficiency, especially for IoT data that is highly variable and contains intricate patterns. Its robustness has been documented in complex environments, and studies have been highlighted regarding its suitability for datasets like CICDDoS2019, which include real-world attack scenarios and show practical applicability. It has been noted, however, that the effectiveness of Random Forest may be influenced by the quality and representativeness of the training data, as indicated in prior evaluations. To address this, researchers have suggested incorporating intelligent mechanisms, such as parameter self-optimization, to enhance detection accuracy and scalability in dynamic IoT networks. This approach could further solidify Random Forest as a reliable framework for detecting DDoS attacks in evolving environments.[19]

As well, in this research, the comparative analysis between SVM and RF underlined the better adaptability of Random Forest to the high-dimensional and imbalanced nature of datasets, such as DDoS-SDN. While SVM is limited by the necessity of extensive preprocessing and careful kernel selection, RF shines with minimal intervention, hence being more suitable for operational environments. This is a representative example of how leveraging ensemble learning techniques plays a crucial role in effectively tackling the increasing complexity of modern cyber threats.

Support Vector Machine (SVM)’s Strengths and Robustness

Support Vector Machine (SVM) was selected for its ability to create optimal decision boundaries and handle non-linear datasets using kernel methods. It is particularly effective in identifying patterns within linearly separable datasets, which makes it suitable for detecting malicious activities in network traffic [12]. The kernel-based approach, especially the Radial Basis Function (RBF) kernel, allows SVM to manage high-dimensional and non-linear data effectively, as highlighted in multiple studies. However, like Random Forest (RF), SVM requires careful preprocessing, such as feature scaling and selection, to achieve optimal performance [16].

Research indicates that SVM excels in scenarios with smaller, balanced datasets, delivering high accuracy and generalization when tuned with appropriate hyperparameters. Additionally, SVM shows lower false positive rates, especially when datasets are well-preprocessed. However, its computational overhead poses challenges for scalability. SVM struggles with large-scale datasets and noisy data due to the increased complexity of training time and hyperparameter tuning. In Software Defined Networking (SDN) environments, where real-time detection is critical, these limitations can hinder its practicality [7].

In our experiments, SVM showcased strong performance in distinguishing normal traffic, particularly when dealing with linearly separable features. The use of the Radial Basis Function (RBF) kernel significantly improved its ability to classify non-linear patterns, a result that aligns with prior research findings. However, SVM struggled with imbalanced and noisy datasets, which led to reduced recall for specific DDoS attack types. Its computational demands were notably high, especially during hyperparameter tuning for gamma and C values, which extended the training time considerably. To enhance efficiency and address feature redundancy, Recursive Feature Elimination (RFE) was applied, demonstrating modest improvements in performance and training speed [7].

Despite these enhancements, scalability issues persisted, reflecting limitations similar to those noted in RF studies. Hybrid approaches, such as combining SVM with other models like k-Nearest Neighbors (KNN) or deep learning frameworks, have been proposed to address these challenges [12]. Additionally, the adoption of adaptive hyperparameter optimization algorithms could streamline the tuning process, reducing complexity and improving performance in large-scale datasets [16].

For future research, hybrid models integrating SVM with ensemble techniques or leveraging deep learning are promising avenues. These combinations could provide the adaptability and robustness needed for real-time detection in dynamic environments like SDN. By addressing these challenges, SVM can continue to serve as a powerful tool in DDoS detection frameworks.

The Use of Linear Search Algorithm

To enhance the accuracy of SVM and RF in DDoS detection, the inclusion of a linear search algorithm is implemented into the code. The linear search algorithm is chosen for this study due to its simplicity, ease of implementation, versatility, and focused data filtering. In our context, this method offers significant benefits in scenarios where a quick and resource-light solution is necessary. While more complex algorithms may offer greater precision, linear search is optimal for specific tasks such as filtering relevant data before passing it on to more sophisticated models.

The linear search algorithm operates sequentially, checking each item in a dataset until the desired target is found [21]. In our dataset, it scans sequentially to identify entries associated with specific labels. For example, “1” for malicious traffic. This makes it well-suited for preprocessing where rapid identification of relevant data is critical, especially in large datasets containing both benign and malicious traffic. By identifying attack labels quickly, it helps to isolate relevant data for deeper analysis by models like SVM and RF.

Integrating the linear search algorithm into existing systems is straightforward. In preprocessing or real-time detection phases, linear search identifies target values like attack labels efficiently. This reduces the computational burden on models such as SVM and RF, allowing them to focus on the most relevant data, thus improving the overall efficiency and accuracy of the detection process.

However, the linear search algorithm is not without limitations. Its time complexity of O(n) makes it inefficient as dataset sizes grow. As the dataset expands, the search process becomes increasingly slower, which can be a significant challenge in systems like SDN, where large datasets are common. From my perspective, this limitation is a trade-off for the simplicity and ease of integration that linear search offers. For large-scale systems, the search may not be optimal.

A probable solution could involve leveraging more advanced filtering methods such as the two-pointer linear search algorithm, which does not require sorted data and operates both forward and backward, improving efficiency without the overhead of sorting. The two-pointer method, as suggested by [21]. It could better handle larger datasets in SDN environments. Future work will explore integrating this technique to address the scalability concerns inherent in the linear search approach.

This approach allows for the use of simple, efficient search methods in small-scale scenarios, with potential for optimization in larger systems. By reflecting on these challenges, it’s clear that balancing simplicity with efficiency will be key in selecting the most appropriate search technique for different stages of the DDoS detection system.

Limitations of the Study

A key limitation of this study is the use of datasets that may have significant imbalances. Many datasets contain a disproportionate number of normal instances compared to attack samples, which can make it harder for machine learning models to perform well in real-world scenarios. This imbalance can lead to models producing false positives or missing newer attack types. Additionally, datasets that do not capture modern attack patterns may not generalize well to more sophisticated attacks, such as application-layer or encrypted DDoS attacks​ [5][6].

Another limitation is that older datasets may fail to reflect the current landscape of DDoS attacks. As attackers develop new strategies, training models on outdated datasets becomes less effective. To improve detection in real-world settings, we need more up-to-date datasets that include a wider variety of attack types​ [6].

In Software Defined Networking (SDN) environments, applying traditional machine learning algorithms like SVM and RF presents its own challenges. SVM works well with small datasets and clear decision boundaries but struggles with large datasets and high-dimensional data, which makes it difficult to scale for SDN environments where real-time detection is crucial. As datasets grow, SVM’s performance declines, requiring more processing time and risking a loss in accuracy without proper adjustments​ [3][7].

Random Forest (RF) is better at handling large datasets and imbalanced data. Its ensemble structure helps reduce overfitting and improve accuracy. However, RF can become computationally expensive when using many trees or tuning multiple parameters. In SDN, where immediate detection and response to DDoS attacks is essential, these issues can limit RF’s effectiveness in real-world applications​ [6][7].

Real-World Implications

These limitations highlight important challenges in using machine learning-based DDoS detection systems in SDN environments. Imbalances in the training data can lead to models that struggle to detect modern, sophisticated attacks, leading to detection failures or false alarms. As attackers develop new methods, models trained on older data may become outdated and ineffective​ [6].

For SDN applications, the scalability problems with SVM and RF present significant obstacles. As networks grow, the real-time processing demands increase. SVM struggles with high-dimensional data and large datasets, while RF requires significant computational resources​ [10]. In environments that need fast, real-time detection of DDoS attacks, these issues can limit the models’ effectiveness. Without further optimization, these models may struggle under the pressure of real-world network traffic.

CONCLUSION

This paper compared two algorithms, SVM and RF algorithms for detection of DDoS in DDoS-SDN dataset. Overall, the results show that the RF algorithm performed better in most cases compared to the SVM in complex datasets of high dimensionality, the DDoS-SDN dataset. RF encompasses ensemble learning and is resilient to overfitting issues, thus being more applicable in real-world scenarios with higher values of accuracy, precision, recall, and F1-score.

In contrast, SVM presents scalability issues due to its nature, which causes computational complexity to grow exponentially with an increase in training samples. Its scalability challenges and reliance on preprocessing techniques further degrade its performance on larger and dynamic datasets. These results clearly demonstrate that SVM’s inherent limitations make it less effective for handling such data scenarios.

Random Forest (RF) is a robust and reliable algorithm for DDoS detection, particularly in handling complex and noisy datasets. Its ability to process high-dimensional data with minimal preprocessing, manage class imbalances, and resist overfitting makes it suitable for real-world scenarios. RF’s parallelization feature ensures efficient handling of large datasets, as demonstrated in studies involving CICDDoS2019 and DDoS-SDN datasets. However, its effectiveness depends on the quality of training data, and future research should explore intelligent parameter optimization to enhance scalability and adaptability in dynamic IoT networks.

Support Vector Machine (SVM) excels at identifying patterns in both non-linear and linearly separable datasets through kernel methods like RBF. It offers high accuracy and low false positive rates when datasets are well-preprocessed and balanced. Despite these strengths, SVM faces challenges with scalability due to its computational demands during training and hyperparameter tuning. Techniques like Recursive Feature Elimination (RFE) and hybrid models combining SVM with other approaches can mitigate these limitations, paving the way for improved efficiency in large-scale, dynamic environments.

The integration of a linear search algorithm into preprocessing enhances SVM and RF by efficiently filtering relevant data and reducing computational overhead. Linear search is simple and versatile, making it ideal for quick identification of attack labels in smaller datasets. However, its O(n) time complexity limits its scalability for larger datasets. Advanced methods, such as the two-pointer linear search, offer potential improvements, ensuring efficient preprocessing even in large-scale environments like SDN. Balancing simplicity and efficiency is crucial to optimizing preprocessing techniques for DDoS detection systems.

Despite the results, the study highlights important limitations. One issue is the use of outdated and imbalanced datasets. These datasets may not capture the full range of modern DDoS attacks. As a result, machine learning models trained on them may not be effective in real-world scenarios. This points to the need for more current and diverse datasets. These datasets would improve the accuracy and reliability of machine learning models.

Future research should focus on combining feature selection techniques with advanced ones and hyperparameter optimization, thus enhancing the model further. Furthermore, hybrid or deep learning models can provide advanced detection capabilities for SDNs while addressing the scalability problem of traditional approaches and overcoming their inability to perform real-time detection.Therefore, It provides valuable insight into how machine learning can be applied in cybersecurity to further develop more secure and resilient network systems against evolving cyber threats.

REFERENCES

  1. Y. Li and Q. Liu, “A comprehensive review study of cyber-attacks and cyber security: Emerging trends and recent developments,” Energy Reports, vol. 7, pp. 8176–8186, Nov. 2021, doi: 10.1016/j.egyr.2021.08.126.
  2. S. Pande, A. Khamparia, D. Gupta, and D. N. H. Thanh, “DDOS Detection Using Machine Learning Technique,” in Studies in Computational Intelligence, Springer Science and Business Media Deutschland GmbH, 2021, pp. 59–68, doi: 10.1007/978-981-15-8469-5_5.
  3. J. Pei, Y. Chen, and W. Ji, “A DDoS Attack Detection Method Based on Machine Learning,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jul. 2019, doi: 10.1088/1742-6596/1237/3/032040.
  4. L. Mhamdi, D. McLernon, F. El-Moussa, S. A. Raza Zaidi, M. Ghogho, and T. Tang, “A Deep Learning Approach Combining Autoencoder with One-class SVM for DDoS Attack Detection in SDNs,” in 2020 8th International Conference on Communications and Networking, ComNet2020 – Proceedings, Institute of Electrical and Electronics Engineers Inc., Oct. 2020, doi: 10.1109/ComNet47917.2020.9306073.
  5. V. Ivanova, T. Tashev, and I. Draganov, “Random Forest Detector and Classifier of Multiple IoT-based DDoS Attacks,” WSEAS Transactions on Information Science and Applications, vol. 19, pp. 30–43, 2022, doi: 10.37394/23209.2022.19.4.
  6. R. Ma, Q. Wang, X. Bu, and X. Chen, “Real-Time Detection of DDoS Attacks Based on Random Forest in SDN,” Applied Sciences (Switzerland), vol. 13, no. 13, Jul. 2023, doi: 10.3390/app13137872.
  7. K. S. Sahoo et al., “An Evolutionary SVM Model for DDOS Attack Detection in Software Defined Networks,” IEEE Access, vol. 8, pp. 132502–132513, 2020, doi: 10.1109/ACCESS.2020.3009733.
  8. R. Vishwakarma and A. K. Jain, “A survey of DDoS attacking techniques and defence mechanisms in the IoT network,” Telecommunications Systems, vol. 73, no. 1, pp. 3–25, Jan. 2020, doi: 10.1007/S11235-019-00599-Z/METRICS.
  9. H. T. Manjula and N. Mangla, “An approach to on-stream DDoS blitz detection using machine learning algorithms,” Materials Today: Proceedings, vol. 80, pp. 3492–3499, Jan. 2023, doi: 10.1016/J.MATPR.2021.07.280.
  10. R. I. Perwira and H. Prapcoyo, “Software Defined Network: The Comparison of SVM kernel on DDoS Detection,” RSF Conference Series: Engineering and Technology, vol. 1, no. 1, pp. 281–290, Dec. 2021, doi: 10.31098/cset.v1i1.413.
  11. S. Informatika and A. Polinema, “Klasifikasi Jenis serangan DOS dan Probing pada IDS menggunakan metode K-Nearest Neighbor,” SIAP, p. 2020. Available: http://kdd.ics.uci.edu.
  12. Z. Ma and B. Li, “A DDoS attack detection method based on SVM and K-nearest neighbour in SDN environment,” International Journal of Computational Science and Engineering, vol. 23, no. 3, pp. 224–234, 2020, doi: 10.1504/IJCSE.2020.111431.
  13. T. Aytaç, M. A. Aydın, and A. H. Zaim, “Detection of DDoS attacks using machine learning methods,” Electrica, vol. 20, no. 2, pp. 159–167, Jun. 2020, doi: 10.5152/electrica.2020.20049.
  14. R. D. Ravipati and M. Abualkibash, “Intrusion Detection System Classification Using Different Machine Learning Algorithms on KDD-99 and NSL-KDD Datasets – A Review Paper,” SSRN Electronic Journal, Jun. 2019, doi: 10.2139/SSRN.3428211.
  15. F. Ferdiansyah, D. Antoni, M. Valdo, M. Mikko, C. Mukmin, and U. Ependi, “Machine Learning Models for DDoS Detection in Software-Defined Networking: A Comparative Analysis,” Journal of Information Systems and Informatics, vol. 6, no. 3, pp. 1790–1803, Sep. 2024, doi: 10.51519/journalisi.v6i3.864.
  16. M. H. Almaspoor, A. Safaei, A. Salajegheh, and B. Minaei-Bidgoli, “Support Vector Machines in Big Data Classification: A Systematic Literature Review,” Aug. 09, 2021, doi: 10.21203/rs.3.rs-663359/v1.
  17. S. P. K. Gudla, S. K. Bhoi, S. R. Nayak, and A. Verma, “DI-ADS: A Deep Intelligent Distributed Denial of Service Attack Detection Scheme for Fog-Based IoT Applications,” Mathematical Problems in Engineering, vol. 2022, pp. 3747302, 2022, doi: 10.1155/2022/3747302.
  18. W.-W. Tay, S.-C. Chong, and L.-Y. Chong, “DDoS Attack Detection with Machine Learning,” Journal of Informatics and Web Engineering, vol. 3, no. 3, pp. 190–207, Oct. 2024, doi: 10.33093/jiwe.2024.3.3.12.
  19. P. Prashanthi, M. M. Reddy, and A. Lavanya, “Machine Learning for IoT Security: Random Forest Model for DDoS Attack Detection,” 2023.
  20. L. Boukraa, S. Essahraui, K. El Makkaoui, I. Ouahbi, and R. Esbai, “Intelligent Intrusion Detection in Software-Defined Networking: A Comparative Study of SVM and ANN Models,” in Procedia Computer Science, Elsevier B.V., 2023, pp. 26–33, doi: 10.1016/j.procs.2023.09.007.
  21. N. A. Zinnia and E. Hanada, “Optimizing Search Strategies: A Study of Two-Pointer Linear Search Implementation,” Jun. 2024.

Article Statistics

Track views and downloads to measure the impact and reach of your article.

0

PDF Downloads

0 views

Metrics

PlumX

Altmetrics

Paper Submission Deadline

Track Your Paper

Enter the following details to get the information about your paper

GET OUR MONTHLY NEWSLETTER

Subscribe to Our Newsletter

Sign up for our newsletter, to get updates regarding the Call for Paper, Papers & Research.

    Subscribe to Our Newsletter

    Sign up for our newsletter, to get updates regarding the Call for Paper, Papers & Research.