DOI QR코드

DOI QR Code

Cloud Attack Detection with Intelligent Rules

  • Pradeepthi, K.V (Department of Information Science and Technology, College of Engineering, Guindy, Anna University) ;
  • Kannan, A (Department of Information Science and Technology, College of Engineering, Guindy, Anna University)
  • 투고 : 2015.03.26
  • 심사 : 2015.07.02
  • 발행 : 2015.10.31

초록

Cloud is the latest buzz word in the internet community among developers, consumers and security researchers. There have been many attacks on the cloud in the recent past where the services got interrupted and consumer privacy has been compromised. Denial of Service (DoS) attacks effect the service availability to the genuine user. Customers are paying to use the cloud, so enhancing the availability of services is a paramount task for the service provider. In the presence of DoS attacks, the availability is reduced drastically. Such attacks must be detected and prevented as early as possible and the power of computational approaches can be used to do so. In the literature, machine learning techniques have been used to detect the presence of attacks. In this paper, a novel approach is proposed, where intelligent rule based feature selection and classification are performed for DoS attack detection in the cloud. The performance of the proposed system has been evaluated on an experimental cloud set up with real time DoS tools. It was observed that the proposed system achieved an accuracy of 98.46% on the experimental data for 10,000 instances with 10 fold cross-validation. By using this methodology, the service providers will be able to provide a more secure cloud environment to the customers.

키워드

1. Introduction

The advent of cloud has revolutionized the web application domain. Any user with a computer and an internet connection can use the resources like software or infrastructure placed elsewhere in the cloud. This has helped students, researchers and entrepreneurs to use as per their need and reduce cost. All major software companies like Amazon, Google, etc have joined cloud computing arena and initiated their own cloud services, which are extremely popular. The cloud data centres are also playing an important role in enhacing the data storage for mobile networks [1].

With the advances in cloud computing, the negative aspects of usage are also increasing. The news of hacking or data leakages in cloud is on the rise these days. In the current scenario, the availability of the cloud and security of customer data in the cloud is of utmost importance for the customers as well as to the cloud service providers. Cloud providers take all the security measures and put up the required firewalls to prevent attacks. But still the attacks are happening because there are certain ports like port 80 which has to be left open for the consumer traffic. The attackers make use of these open ports that have been left open for the consumer traffic to send flood pings and make the system unavailable for a genuine consumer. This becomes the Denial of Service (DoS) attack. Some common denial of service attacks are: UPD attack, HTTP attack, ICMP ping flood, Slowloris, SYN flood, ping of death, etc. The recognition of the pattern during an attack is of at most importance, as an attack should be terminated in the early stage itself and so that the magnitude of destruction it can cause can be minimized.

To solve the problem of attack detection in cloud, Artificial Intelligence (AI) techniques can be used because AI has the ability to solve a problem after learning from certain examples. Classification methods have been applied to the field of attack detection in many papers. In this paper, we propose a new rule based expert system to solve this problem. An important pre-processing step that is often performed before classification is feature selection. Features selection is applied when there are many features and the dataset size is large. This reduces the time required to perform the classification. In this paper, the detection of DoS attacks in cloud is done through a knowledge based feature selection method and the design of an intelligent rule based classification system. The knowledge based feature selection uses a novel entity called weight, which is assigned by the domain expert while training the dataset with neural networks. This weight along with the information gain factor is used for the purpose of feature selection. There are many techniques like information gain, entropy, gain ratio, etc which are being currently used. But they do not give any weight to the domain based importance of a feature while doing a feature selection. Hence, this methodology of knowledge based feature selection has been proposed for promoting a feature which would be able to aid classification of an instance based on neural networks and the domain expert opinion. The dataset is first trained using a back propagation neural network and then rules formulated by an intelligent rule based system based on expert weight. It can be inferred from our results that the accuracy improves by weight adjustment using the domain experts knowledge.

The reminder of this paper is organized as follows: Section 2 gives a literature survey of the related works; Section 3 describes the proposed system in detail, while Section 4 talks about the experimental setup done, Section 5 gives the ensuing results and related discussion and finally Section 6 concludes the paper.

 

2. Literature Survey

Cloud computing is a vast evolving field and the need for cloud security research is more important because of the increasing attack instances. There are many works in the literature on security and privacy of data. Most important among these techniques are key management [2], admission control [3] and intrusion detection [4], [5]. A survey of the different client side and server side protection mechanisms available is discussed in [6], where as the technical issues in cloud security from the browser and web service side have been touched up on in [7]. Information leakage in third party compute clouds through VM side channel attacks is discussed and mitigating techniques have been shared in [8].

Many researchers have utilized the unique features of the cloud, like its dynamic research allocation [9], software defined networking [10], statistical modeling [11] to handle attack scenarios. In [12], DoS attack on the Google cloud and the effects it has on the servers is discussed. The authors have expressed that the current protection mechanisms are only a temporary solution. Different attack types that affect the cloud performance were studied in [13], [14], [15] and solutions like graphical model [13], taxonomy [14], ellipsoid and kernel component analysis [16] has been proposed for attack detection.

Machine learning is being used quite efficiently in many cloud security related research works. Insider activity in the cloud can be monitored through performance data. Rule based classification was done to classify the insider activity into different types in [17]. Different classification algorithms like Naïve Bayes, Multilayer Perceptron, SVM, Decision tree and PART were used and their results were compared. In another work [18], a cloud trace back methodology has been proposed which is said to trace back the source of a HTTP denial of service attack or a HTML denial of service attack. The detection and filtering of these attacks has been done by using back propagation neural networks.

Severity analysis for intrusion in cloud is proposed in [19], where the virtual machine parameters have been analyzed. Then using machine learning techniques the severity of the intrusion is predicted. The different intrusions hampering the integrity, confidentiality and availability of the different cloud services have been surveyes in [20]. Intrusion detection in a network by applying conditional random fields and layered approach was done in [21]. They have used the concept of selecting features manually, instead of applying the automatically selected features from feature selection algorithms and shown that manually selected features give better performance. Machine learning has been incorporated in intrusion detection systems[22] by using many techniques like wrapper approach [23], learning model [24] and particle swarm approach [25].

The most prevalent attacks in cloud are denial of service attacks, cross side channel attacks in the virtual machine, phishing, shared memory attacks and insider malicious activities. A cloud set up was build and the attack scenarios were recreated with 5000 instances and 8 attributes and performed classification with SVM by Tanzim et al[26]. The same authors in another paper [27] have used 14 attributes and 536 instances and done the classification of the different denial of service attacks on a cloud environment. They have used machine learning algorithms to classify the attacks. In the current scenario were the processor can deal with any number of instances in seconds, more data can be generated and used to classify the system. To achieve a better accuracy in detection of attack, we have replaced the generic classification algorithms with a more problem specific intelligent rule based expert system to do the classification, as it has been proved time and again that applying domain knowledge to computational approach improves the results to a greater extent.

Using the facts and rules conceived, we have formulated an intelligent rule based system that can effectively classify the given data into the different attack types. Another important aspect of this paper is that an experimental cloud setup was done to generate a dataset of 10,000 instances.

 

3. Proposed Work

As discussed earlier, firewalls are unable to detect DoS attack. Cloud is loaded with unlimited resources and is dynamic in nature. But still, when the services of a particular service provider are targeted, then the genuine user gets affected and the services that need to be provided to them are compromised.

In this section, a detailed explanation of the proposed methodology is given. The two techniques proposed in this paper are features selection and classification with intelligent rule based system. In this work, a cloud environment was setup to conduct all the DoS attacks and understand their effects on the cloud. The performance parameters were observed and a dataset was constituted. Using this extensive dataset, all the further analysis has been carried out. The existing works in this area take up the classification problem and use the existing classification algorithms to perform the classification. Classification algorithms do not use domain knowledge to solve a problem. In this paper, we use an intelligent rule based system to enhance the classification accuracy compared to the normal classification algorithms by making use of the domain knowledge. The intelligent rule based classification method consists of a rule base for firing rules and perform deductive inference. The following subsections discuss in more detail the feature selection and classification modules.

3.1 System Architecture

The architecture of the intelligent rule based classifier is shown in Fig. 1. The attack generation module uses tools like LOIC, SynGUI, ping flood, Unicorn and Pyloris for simulating the attacks. When the cloud comes under a DoS attack, the cloud parameters like the various CPU, memory, network and storage values get affected. The performance parameters during these attacks and even during the no attack phase are monitored, so that we can differentiate between an attack and a normal acceptable cloud behavior. The performance metric capture during all the different attack phases and the no attack phase, form the dataset for further detection. This data now undergoes feature selection through the novel knowledge based feature selection technique. Feature Selection is performed on all the features to reduce their number and select only those that will provide a higher accuracy. The selected features are then used to classify the different types of attacks using the classification module. The classification module works based on the intelligent rule based system.

Fig. 1.Intelligent Rule based system- Architecture

The Intelligent rule based system, uses the inference engine to select the rules and schedule them to perform forward chaining inference. Moreover, the interpreter present in the inference engine carries out the tasks of rule matching and rule execution to perform deductive inference. Based on the results of the classifier; the decision manager module decides on the further course of action with the help of the intelligent rule based system. If the classifier detects the presence of an attack, then the particular process would be terminated, to prevent the further spread and depletion of the cloud resources.

3.2 Feature selection

The different features in memory, process, disk and network which get significantly affected during an attack were observed. There are 47 features in the dataset. After carefully studying the different features and applying the existing feature selection techniques, as expected, the accuracy was found to increase and the classification time got reduced, as the dataset size decreases. While applying the existing feature selection techniques, it was noticed that some features which are relevant to one or two of the different attacks classes was being missed in the reduced feature list. Applying mathematical formulae like information gain or entropy, it can be seen that at times the set of features selected and their order of preference is not always as perceived by a domain expert. By allowing a domain based knowledge system to select the features, we are able to avoid certain discrepancies that the automatic methods could make. The dominant features essential for the accurate classification of attacks get selected in our methodology by the intelligent rule based system. There are 5 classes into which the data has to be classified.

Let C1, C2, C3, C4 and C5 be the 5 classes. The probability that a random instance ‘a’ belongs to a particular class Ci is

where i ranges from 1 to 5 in our case.

Let us assume there are only two classes of data for classification say p1 and p2. Then the information needed to extract the result is given by

If we generalize the information gain to prediction the class of an instance then it can be given as

where pi is the probability that the instance belong to class Ci.

We conceive a new parameter called weight that can be assigned to every attribute. It is called W1. It is formulated as follows,

Where expert weight (i) is assigned to an attribute by a domain expert and n is the number of instances.

where the information gain for a feature is calculated as follows.

Let us consider a threshold τ. The weight of a feature should not be more than the threshold τ.

Optimal values for threshold and correction were chosen based on experiments.

Algorithm for Intelligent rule based feature selection:

The threshold for determining the selected weights is determined by the domain experts. The ideology of manual feature selection [21] using the knowledge base and has been applied in our proposed system and it is observed that it gives us the freedom of picking features which are going to effectively determine the attack classification. We observed increase in the accuracy. Using knowledge based manual feature selection, 30 features were selected for the purpose of the attack classification.

3.3 Classification

For the purpose of classification, the intelligent rule based classifier is used. The rules of the system have been set by using thresholds determined by knowledge base. We are dealing with a multi-class classification problem here; hence the complexity of the system is more. In a normal decision tree algorithm, the criteria for classification are selected using entropy, gini impurity index, and information gain or variance reduction. If the selection criteria are entropy then the smallest value is selected and if it is information gain, the maximum value is selected for making the decision. Our concept is similar to the decision tree methodology in ID3 or C4.5 algorithm but the selection criterion for the decision rules formation is different. Instead of using either the entropy or the information gain, each attribute is assigned an initial weight based on an expert weight. Based on the expert weight assigned to the attribute, the classification is carried out. The expert weights are assigned by the knowledge base. The prerequisite for the whole process is the formation of a knowledge base with the initial rules. For forming these initial rules, the dataset is studied and the expert weights are assigned to the attributes. The assignment of expert weights is done in such a fashion that the attributes that have high expectation for effectively classifying the classes are assigned higher weights and vice versa. Moreover, the dataset is trained with a back-propagation neural network and the rules are finalized by adjusting the weights. Such rules are compared with the rules formed by the domain experts. Finally, the matching rules are identified and are stored in the knowledge base. Rule matching is performed by building discriminant network and forward chaining inference method is used to perform deductive inference.

Each instance has a value for n attributes. It can be represented as

and it can be assigned to a distinct class Ci.

Generic if- then rules used in our system are

Where p1 ….pn are the parameters and C1 is a class to which the instance belongs.

Algorithm for Classification performed by the intelligent rule based system is given below.

Gist of the rules conceived for the knowledge base is given below.

The calculation for evaluating the minimum and maximum values of the different parameters is shown below. For Process blk (pbmin), the minimum value is

where m is the number of instances in a particular experiment.

For calculating the maximum value of a parameter, for example load-1m,

where n is the number of instances in a particular experiment.

The list of parameters names shown and their corresponding features names are listed in the appendix. The minimum and maximum threshold values for these features were set based on a series of experiments that were conducted with the individual attack tools under different system conditions, like varying the load on the memory, processor, network, etc.,

New attacks can also be detected based on these rules as the knowledge base is constantly updated based on newly added instances and the rules get refined. In the results section, we have compared the results of our classifier with Decision Tree and also done a comparison of the accuracy with other works mentioned in the literature.

 

4. Experimental Setup

Two machines were taken for this experimental purpose. They had the following configuration: Intel Core i7 with dual processors of 3.40GHz with 4GB RAM .The cloud was setup on one machine and the attack tools were run from the other machine. The cloud machine had Cent OS with Eucalyptus cloud installed in it. The attack machine had Windows OS, on which the various attack tools were run.

In our experiment, we attack the cloud with DoS tools. DoS attacks create futile traffic packets and send them to the target system, so that the genuine customers will not be able to make use of the resources. The different attack tools that are being used are LOIC, Pyloris, Unicorn, SynGUI and pingflood. These are DoS tools which work in such a way that CPU usage, memory and network bandwidth of the cloud gets depleted. Hence, the genuine users are denied the opportunity to utilize the cloud resources which they are entitled to. Fig 2 shows the screen shot of the cloud system performance during the no attack phase.

Fig. 2.System parameters- No attack phase.

Fig 3 shows the scenario when a Unicorn DoS tool in its full force is attacking the cloud. There is immense increase in the network traffic and the CPU utilization also fluctuates.

Fig. 3.System parameters- Unicorn attack phase.

4.1 Data Preparation

A total of 47 attributes are taken. For the initial no attacks case, a cloud application is run and the cloud system’s different parameters in the fields of CPU, network, memory and storage are noted. These are the fundamental computing resources of a system; hence their analysis helps to understand the cloud system performance. When the system is subject to attack, there is a fluctuation in these values, which helps to identify the occurrence and onset of an attack. After features selection some redundant and irrelevant attributes are removed. The final list of attributes used for classification is shown in Table 1.

Table 1.List of different features and their description.

4.2 Tool Description

Low Orbit Ion Cannon (LOIC) is a denial of service and open source stress testing tool written in C#. It is used by many researches to perform DoS attacks because it is a GUI based tool and hence is easy to operate. It can send TCP, UDP and HTTP packets to the target machine.

Fig. 4.A screen shot of the LOIC tool

Unicorn is a Dos testing tool based on http request written in C. It works in such a way that continuous http requests are sent to the server, so that its bandwidth gets exhausted and the genuine users are no longer able to access the resources they are entitled to.

Fig. 5.A screen shot of the Unicorn tool.

Similarly Pingflood, SynGUI and Pyloris are open source DoS tools that can be used to study attacks. They have many flexible options to increase the network payload and also the type of network packets to send like UDP, HTTP or TCP, the port to which this traffic has to be directed to, the number of packets to be send, the length of the packets, etc.,

Fig. 6.A screen shot of the SynGUI tool.

 

5. Results and Discussions

The dataset is divided into 10 folds and 9 folds were used for training. As we are dealing with a multiclass classifier problem, the analysis of the performance parameters is complex. There are 6 classes of instances in the dataset, namely, 1) No attack, 2) SynGUI attack, 3) Unicorn attack, 4) Pyloris attack, 5) LOIC attack, 6) Ping Flood attack.

In the feature selection phase, we compare the accuracy achieved with and without using manual feature selection in Fig. 7, against Decision Tree classifier and the intelligent rule based classifier. It can be observed that the accuracy is increased after applying feature selection.

Fig. 7.Accuracy with and without feature selection.

For ease in comparing the results and understanding them better, the total dataset of 10,000 instances is being split into different experimental sets. Experiment 1 consists of 100 instances in each of the types mentioned above. Likewise, experiment 2 consists of 200 instances in each type, experiment 3 has 500 instances each, experiment 4 has 1000 of each type and experiment 5 is the full dataset with 1000 instances in each attack type and 5000 instances in the no attack category.

Table 2.Confusion matrix in classification.

The various parameters that have been used for analysis purpose are:

The precision, recall and f-measure values computed for the different experiments have been tabulated in Table 3. The values have been tabulated for individual types of attacks. These are a gradual increase in the values as the size of the dataset increases. Precision is useful in expressing the exactness of the classifier and recall depicts its completeness. And they range from 0 to 1. The closer they are to 1, the better. From our experimental results we can see that the No Attack class and the Ping Flood attack class have higher precision and recall. F-measure is a measure used to tell about the test accuracy. Both precision and recall are taken into consideration while calculating F-measure. It ranges from 0 to 1 and the nearer it is to 1, the better the teat accuracy of the system. In our experiments we can observe that the value is more for the datasets which have more instances.

Table 3.Precision, Recall and F-measure values for the various experiments carried out.

The classifier accuracy gets increased when the dataset size is increased, this happens because the classifier algorithm has more examples to see and learn and perfect its facts. This can be seen in Table 4.

Table 4.Aggregation of the Precision, Recall and F-measure values for all the experiments in shown in Table 3.

The accuracy of Decision Tree for the different experiment batches is compared with our intelligent rule based classifier in Table 5, and it can be inferred that thought the accuracy of Decision Tree increases as the dataset size increases; the intelligent rule based classifier has higher accuracy throughout.

Table 5.Comparing intelligent rule based classifier results with Decision Tree.

The accuracy of intelligent rule based classifier is compared with the existing works [15] in Table 6. It can be clearly seen that by formulating our own rule based classifier, we have achieved greater accuracy.

Table 6.Comparison with existing works.

False positive is a state where an instance has been falsely predicted as positive. False positive rate is an important parameter in crucial applications like spamming and phishing web site detection, intrusion detection and bio medical screening application. Lower the false positive rate, the better. From the results of our experiment, we can see in Fig. 8 that Unicorn class instances are, throughout, displaying lesser false positive rate. We do not want the FP rate to be high for the No Attack class as we do not want any attack to be predicted as a no attack class. We can observe from the results that in the final batch of 10,000 instances, the FP-rate of No attack class has been considerably reduced.

Fig. 8.Graph comparing the FP rate across different classes and experiments.

 

6. Conclusion

In the present internet scenario, cloud is expanding rapidly and the need of the hour is to tap into its flexibility and dynamic resource allocation features. But the disadvantages of the existing internet scenario, viz the different attacks on privacy and secrecy of customer data should not be allowed to surge in cloud environment as well. So researchers have to build novel and intelligent intrusion detection systems to detect and terminated these attack scenarios. In this paper, we have successfully detected DoS attacks using an intelligent rule based classifier and also the applied knowledge based feature selection. We conclude that domain specific intelligent system will be able to counter the problems in a more efficient manner; as compared to the existing classifier algorithms when there is ample expert advice available. The system is more suitable for enhancing security of web applications such as e-commerce, tele-medicine and e-learning which are deployed in cloud for enhancing the reliability and availability. As a future extension of this work, we intend to add a fuzzy logic component to this system that can make the detection of attacks faster and more efficient.

피인용 문헌

  1. An Effective DOS Attack Detection Model in Cloud Using Artificial Bee Colony Optimization vol.9, pp.3, 2015, https://doi.org/10.1007/s13319-018-0195-6
  2. STRIDE and HARM Based Cloud Network Vulnerability Detection Scheme vol.29, pp.3, 2015, https://doi.org/10.13089/jkiisc.2019.29.3.599