DOI QR코드

DOI QR Code

Prefix Cuttings for Packet Classification with Fast Updates

  • Han, Weitao (National Digital Switching System Engineering & Technological R&D Center) ;
  • Yi, Peng (National Digital Switching System Engineering & Technological R&D Center) ;
  • Tian, Le (National Digital Switching System Engineering & Technological R&D Center)
  • 투고 : 2013.11.12
  • 심사 : 2014.04.10
  • 발행 : 2014.04.30

초록

Packet classification is a key technology of the Internet for routers to classify the arriving packets into different flows according to the predefined rulesets. Previous packet classification algorithms have mainly focused on search speed and memory usage, while overlooking update performance. In this paper, we propose PreCuts, which can drastically improve the update speed. According to the characteristics of IP field, we implement three heuristics to build a 3-layer decision tree. In the first layer, we group the rules with the same highest byte of source and destination IP addresses. For the second layer, we cluster the rules which share the same IP prefix length. Finally, we use the heuristic of information entropy-based bit partition to choose some specific bits of IP prefix to split the ruleset into subsets. The heuristics of PreCuts will not introduce rule duplication and incremental update will not reduce the time and space performance. Using ClassBench, it is shown that compared with BRPS and EffiCuts, the proposed algorithm not only improves the time and space performance, but also greatly increases the update speed.

키워드

1. Introduction

With the improvement of the Internet, people have higher requirements for network bandwidth, network security, and network latency. Therefore, Internet devices must be able to classify packets quickly so as to guarantee Quality of Service. Since packet classification is the key technology of Routers, Fire Walls and Network Intrusion Detection Systems (NIDS), its speed directly affects the overall performance of the Internet.

Given a predefined ruleset, packet classification is to find out the best rule that matches the incoming packet. An excellent packet classification algorithm should have the following three features: fast search speed, small memory usage and easy to update. Because of the importance of packet classification, many researchers have carried out detailed studies. However, the early research mainly focuses on the search speed and memory usage, while ignoring the update performance. As a matter of fact, after a ruleset is defined, as the network environment changes, its rules change as well. A ruleset will increase or reduce its rules according to the real time network environment. It is vital for a packet classification algorithm to be able to adapt to the incremental update of a ruleset. Therefore, the current problem that packet classification algorithm needs to address is to meet the time, space and update performance at the same time. However, in the real life network, with the size and the complexity of the ruleset growing, the difficulty in designing a global optimal packet classification algorithm increases.

In this paper, we propose PreCuts―a packet classification algorithm with excellent scalability and update speed. It achieves fast updates without reducing the time and space performance. Its update time is almost the same as a search time. After scrutinizing the characteristics of IP field, we put forward three heuristics to build a 3-layer decision tree, with each layer being constructed by one heuristic. The main contribution of this paper is as follows.

The rest of this paper is organized as follows. Section 2 briefly describes the background of packet classification. Section 3 shows the motivation of PreCuts. Section 4 illustrates the proposed algorithm. Section 5 summarizes the performance results, and the last section concludes this paper.

 

2. Background

2.1 Problem Statement

Packet Classification is a process that classifies packets into flows according to the predefined ruleset. Table 1 illustrates a simplified example of a ruleset. Each rule r consists of D fields and an action. Ruleset R is a set of rules. Let r = {F1,F2,⋯,FD,act}, and packet p = {f1,f2,⋯,fD}, then we call p matches r if fi ∈ Fi for all i = 1,2,⋯,D . From a geometric point of view, the whole ruleset can be seen as a multidimensional space, each rule of the ruleset can be seen as a hypercube, and a packet is a point in the multidimensional space. If a packet matches a rule, the corresponding point will fall within the hypercube of the correct rule. Therefore, packet classification can be treated as a point location problem in computational geometry [1].

Table 1.Simplified Example of Ruleset

Previous packet classification research centers on how to use heuristics to find the optimal way to locate the point [2]. Nevertheless, the rulesets being studied are static; thus the algorithms are designed based on such static rulesets. As a result, when confronted with a ruleset which needs incremental update, their update speed is unsatisfactory. However, what the present-day internet needs is an algorithm which can not only meet the internet demands, but also achieve incremental update with few overhead.

2.2 Packet Classification Algorithm

The current packet classification algorithms can be divided into two categories: RAM-based algorithmic solutions and hardware-based TCAM solutions. RAM-based algorithmic solutions focus on building the best search structure, including the following algorithms. a) Dimensional decomposition-based algorithms [3-5]. They are fast in searching but inadequate in space performance. These algorithms build their search structure by encoding the entire ruleset. This renders them incompetent in incremental update. b) Decision tree-based algorithms [6-10]. They build the decision tree by recursively partitioning the rule space. The heuristics of cutting space inevitably introduce rule redundancy. As a result, when a new rule is being inserted into the ruleset, it may simultaneously fall into various child nodes. The more the rules being inserted, the worse the time and space performance. In the worst case, the search structure has to be rebuilt.

Hardware-based TCAM solutions [11-15] are well known for their parallel search capability and constant processing speed. However, due to the limitation of TCAM’s native circuit structure, these solutions face two major problems, power consumption and range encoding. Current researches mainly focus on addressing the above two problems, while the problem of rule update is almost left unstudied. Since the circuit structure of TCAM renders inserting and deleting rules even more complex, incremental update can only be realized at the expense of extra resources.

2.3 Related work

Previous researchers proposed many packet classification algorithms, but early algorithms [3], [7] only apply to small ruleset. HyperSplit proposed by Qi et al. [2] uses non-equal sized cuts for more efficient memory usage, but it still has rule duplication, and the depth of the decision tree is not satisfactory. BRPS proposed by Chang [18] uses a hierarchical list of sorted ranges and prefixes that allow the binary search to be performed on the list at each level to find the best matched rule, but there are still redundant rules in the search structure. Besides, when BRPS is faced with a huge amount of update tasks, unwanted data structure will reduce the time and storage performance. EffiCuts [8] is an improved algorithm over HyperCuts. It proposes four novel ideas: separable trees, selective tree merging, equi-dense cuts, and node co-location. Employing the same space-decomposition as HyperCuts, EffiCuts still does not fully solve the rule redundancy problem. Although EffiCuts can reduce the memory usage, its heuristics decide that it cannot achieve fast updates. Since incremental update causes more rule duplication, as the number of updated rules is increasing, the performance of EffiCuts will reduce.

 

3. Motivation

A packet classification algorithm which is easy to update should possess the following features:

(1) Rule Completeness

The number of different rules in the search structure should be equal to the number of the rules in the ruleset. The completeness of rules ensures that the newly inserted rule can easily find its position in the search structure according to the rule priority.

(2) Rule Uniqueness

Each different rule in the search structure should be unique. No rule redundancy is allowed. Take decision tree-based algorithm as an example, let each leaf node ruleset be R1,R2,⋯,Rn, it is common that

The uniqueness of rules requires that there is no overlap between any two leaf nodes, i.e., ∀i,j,Ri∩Rj = ∅. Consequently, the uniqueness of rules ensures that the newly inserted rule will not introduce other rule duplication, thus avoiding the performance decrease caused by incremental update.

The heuristics of previous algorithms all fail to achieve these two features. Their heuristics use some certain points in a dimension to cut the rule space, and take these points as the basis to build the decision tree. When a rule needs to be updated, if one of these points is in the hypercube of this rule, this rule will be updated in more than two nodes. This will result in rule duplication. Accordingly, the more rules need to be updated the more redundant rules will be produced. This in turn will result in poorer performance.

In order to guarantee the incremental update performance, our heuristics must possess the above two features. Based on a thorough analysis of the characteristics of the IP domains, we propose three heuristics, Prefix Index Cutting, Prefix Length Cutting, and Bit Partition. All of them promise Rule Uniqueness. Moreover, PreCuts does not remove any rule in the ruleset, and it keeps every rule in the search structure. Therefore, PreCuts also has the feature of Rule Completeness.

We use the three heuristics to cut rulesets step by step. Considering the fact that the highest byte of the low boundary of the IP address is unique, our first heuristic puts the rules with the same highest byte together. This ensures that there is no rule redundancy. Besides, it is common that the bits in the IP prefix are definite, i.e., each bit in a given IP prefix has only one value. When cutting the rulesets based on these definite bits, we can avoid the rule redundancy caused by the wildcards. However, if the rules have different IP prefix lengths, we cannot ensure that all the bits that we choose to cut the ruleset are not in the wildcards. So we introduce our second heuristic to group the rules with the same IP prefix length. In such groups, we employ the third heuristic which uses definite bits to partition the rules to a finer degree.

To linearly search the least rules like other decision tree-based algorithms, PreCuts uses three heuristics to cut the rulesets layer by layer. In this way, PreCuts gradually narrows the scale of the ruleset. When the rules in a leaf node are no more than a given amount (e.g. no more than 8), we linearly search the rules to find the best matched rule. Our simulation profiles show that after the 3 layer cuttings, more than 98% of the leaf nodes have no more than 8 rules. In fact, although our first two heuristics can cut the ruleset into relatively small sub rulesets, some of them still have a large number of rules. To overcome this, we add the third heuristic to further reduce the number of the rules in the leaf nodes. However, if we introduce a fourth layer cutting, the pre-process will be more complex. This will reduce the space performance.

According to Table 1, Fig. 1 depicts a toy example to show the flavor of our algorithm. In Layer-1, we group the rules with the same first two bits in the source and destination IP addresses. Since we use 4 bits to split the ruleset, the total number of the child nodes are 16 (Not all child nodes are shown.). In Layer-2, we put the rules with the different IP prefix length in different groups. For example, the source IP prefix length of R3 is 4 and the destination IP prefix length of R3 is 4, while the source IP prefix length of R6 is 2 and the destination IP prefix length of R6 is 3. Therefore, R3 and R6 belong to different child nodes. In Layer-3, we choose certain bits in IP prefix to cut rulesets. For example, in {R1, R2, R4}, we choose two bits (i.e., the fourth bit of source IP address and the third bit of destination IP address) to cut the rules into three different leaf nodes.

Fig. 1.Example of PreCuts

Combining the aforementioned elements, PreCuts is not only easy to understand, but also easy to evaluate. This simplicity makes it readily employable. Next, we will go into the details of PreCuts.

 

4. The Proposed Algorithm

PreCuts employs a 3-layer search structure. First, we use prefix index cutting to split the ruleset into subsets. Second, we use prefix length cutting to cluster rules with the same IP prefix length. Last, we use bit partition to build the decision trees in each subset. Firstly, we introduce some relevant definitions.

4.1 Relevant Definitions

Table 2.Symbol Description

Definition I Byte Index

Let the source and destination IP addresses of a 5-tuple rule r be ,S1.S2.S3.S4 / MA and D1.D2.D3.D4 / MD, then the 2-dimension array (S1,D1) constructed by the highest bytes of the IP field is the byte index of rule r, recorded as BI(r). If all the rules in ruleset R have the same byte index, then the byte index of R exists, recorded as BI(R).

Definition II Accordance Set

The set formed by the rulesets which have byte index is named accordance set, recorded as AS.

Definition III Prefix Length Coordinate

Let the length of source and destination IP addresses of r be MA and MD, then the prefix length coordinate of r is PL(r) = (MA,MD). If all the rules in R have the same prefix length coordinate, the prefix length coordinate of R exists, recorded as PL(R).

Definition IV Split Bit

In the IP field, the bit which can divide the ruleset into 2 subsets with different bit values is named split bit, recorded as Es.

Definition V Split Vector

The bit vector which contains n Es and can divide the ruleset into 2n subsets is named split vector Vs. Its total length is 64 bits.

Definition VI Split Value

Let the 64 bit vector formed by the source and destination IP addresses of rule r be Vr, then the bit string formed through merging the split bit value with Vs is named the split value of r, recorded as SV(r,Vs) = bitMerge(Vr∧Vs), and the number of the possible values is 2n. If all the rules in ruleset R have the same split value, then, the split value of R exists, recorded as SV(r,Vs) = bitMerge(Vr∧Vs)

Definition VII Split Proportion

where, Ri = {r|S(r,Vs) = Vi,r∈RVs},Vi is the constant vector, SV(Ri,Vs) = Vi,|Ri| is the number of the rules whose split value is Vi, and |RVs| represents the total number of the rules.

Definition VIII Split Entropy

Split entropy reflects the balance degree of subsets split by Vs.

4.2 Prefix Index Cutting

The basic idea of prefix index cutting is to cluster rules with the same byte index, i.e.

where, R' is the subset after clustering, S and D are two constants, and R is the original ruleset.

When making prefix index cutting, we first extract the byte index of the rules, and then put the rules with the same prefix index into one group. Fig. 2 shows the pseudo code of prefix index cutting. In the pseudo code, the input is R, and the output is the prefix index set BIset and the subsets RA.

Fig. 2.Pseudo code of Prefix Index Cutting

According to Theory I, prefix index cutting will not introduce rule duplication. Therefore, when updating rules, it is impossible for one rule to fall into more than two nodes, so as to avoid new rule redundancy.

Byte index is decided by the highest byte of IP address. The max number of the possible values of a given BI(r) is 216. As a result, in the worst case, we will obtain 216 byte index groups. However, through simulation, it is found that, in the actual ruleset, the number of different byte indexes is far less than the theoretical boundary. But in order to guarantee the update and time performance, we retain every possible byte index pointer in the search structure. Consequently, even in the worst case, PreCuts still have enough byte index pointers to ensure that each byte index group has a corresponding node. Moreover, no matter how many byte index groups there are, PreCuts can still find the byte index group that the packet belongs to in one memory read.

There are not 216 byte index groups in one ruleset. If the corresponding byte index group is not empty, then the pointer will directly point to the group. In contrast, if the group is empty, the pointer is set to 0. Saving every possible pointer will slightly increase the space complexity, but the algorithm can locate the corresponding pointer straight away according to the byte index of the packet. In this way, only one memory read is required to find the byte index group to which the packet belongs.

The search algorithm of byte index is as follows: let the pointers be numbered in the ascending order and let the head of the incoming packet be head, then we get BI(head) = (S,D), and the number of the pointer which the packet corresponds to is BI(head) = S ×2n + D (n is the width of S). According to BIn(head), the search algorithm spots the pointer, and jumps to the next node to continue the search. Fig. 3 is the prefix index cutting tree of Table 1. It is built by the first two bits of IP addresses. Let the IP address of head be 100*,10**, BI(head) = (10,10) , then the corresponding number of the node BIn(head) = 2×22+2=10.

Fig. 3.Example of Prefix Index Cutting

Theory I There is no rule overlap in any two rulesets which have different byte indexes.

Proof: ∀X,Y ∈ AS and BI(X) ≠ BI(Y), in X and Y, ∀rX ∈ X, ∀rY ∈ Y from BI(X) ≠ BI(Y), we get BI(rX) ≠ BI(rY) and Yrhave different IP addresses, then rX ≠ rY. This means there is no rule overlap in X and Y.

4.3 Prefix Length Cutting

The basic idea of prefix length cutting is to put the rules which have the same prefix length coordinate into one group, i.e.

where R' represents the subset, M and N are two constants.

Similar to prefix index cutting, when performing prefix length cutting, we first calculate the prefix length coordinate of the rules, and then put the rules which have the same coordinate into one group (called PL Group). Fig. 4 shows the pseudo code of prefix length cutting. In the pseudo code, the input is R, and the output is the prefix length set PLset and the subsets RL.

Fig. 4.Pseudo code of Prefix Length Cutting

According to Theory II, prefix index cutting will not bring rule duplication.

There are two conditions in searching the prefix length nodes, namely, packet lookup and rule lookup.

(1) Packet Prefix Length Lookup

Since there is no mask information in the packet head, we cannot find the match rule using the mask information. To solve this problem, we find the child node that the packet corresponds to through judging whether the packet IP address is within the IP range of the node. In this way, a packet may belong to more than one child node, but when performing prefix length cutting, the rules are put into the subsets according to their priority. The first match rule is the optimal rule. Let the incoming packet head be head. Taking Node10 in Fig. 3 as an example, the packet prefix length lookup is shown in Fig. 5. Let the IP address of the packet be IP(head) = (8,9), and we obtain IP(head) ∈ IPRange(Child2), then the packet corresponds to #Child2.

Fig. 5.Packet Prefix Length Lookup

To get the longest prefix match, we only need to search the corresponding prefix length nodes in a reversed order. For example, in Fig. 5, firstly, we determine whether IP(head) belongs to IPRange(Child2), if yes, then we continue searching Child2; if no, then we will turn to search Child1. For PreCuts, the rules in the left node have shorter prefix than those in the right. Therefore, the rule found through the above method is the longest match rule.

(2) Rule Prefix Length Lookup

As is shown in Fig. 6, let the update rule be r, and compare PL(r) with the prefix length coordinate in PLset , then we get that r belongs to Child2.

Fig. 6.Rule Prefix Length Lookup

Theory II There is no rule overlap in any two rulesets which have different prefix length coordinates.

Proof: Take two arbitrary ruleset X and Y, and PL(X) ≠ PL(Y) . In X and Y , ∀rX ∈ X , ∀rY ∈ Y . From PL(X) ≠ PL(Y), we get PL(rX) ≠ PL(rY) , thus rX ≠ rY. This means there is no rule overlap in X and Y.

4.4 Bit Partition

The basic idea of bit partition is to choose the split vector with the largest split entropy as the partition standard, and then divide the ruleset into several subsets with no overlaps. The process is as follows.

We first calculate the partition entropy of every possible partition vector, and choose the partition vector VsM which has the largest entropy as the final partition vector.

According to VsM, we obtain the partition value of each rule in the ruleset, and then put the rules with the same partition value into one group. The pseudo code of prefix bit partition is shown in Fig. 7. In the pseudo code, the input is the ruleset R, and the output is the final partition vector VsM and the subsets RB.

Fig. 7.Pseudo code of Bit Partition

According to Theory III, prefix bit partition will not bring rule duplication.

In PreCuts, all the possible split value pointers are retained in the search structure. If the next node has no rules, the value of the pointer is 0. There are two reasons to retain all the nodes pointers.

(1) To improve the search speed.

We do not have to compare every split value of the current node to determine the child node that corresponds to the packet. We can locate the corresponding child node right from the split value of the packet head. This reduces the comparisons that the lookup needs.

(2) To improve the update performance.

Since the pointers of all the possible child nodes exist, in update, we do not need to add node pointers for new rules, but only need to modify the existing pointers. This reduces the update complexity.

Fig. 8 shows the bit tree built according to #Node2 in Fig. 3, which selects two split bits and has four leaf nodes.

Fig. 8.Example of Bit Tree

From Definition V, it is shown that the more split bits that are chosen, the better the partition result. This is because the increase of the number of the split bits can reduce the depth of the decision tree and promote the time performance. Choosing too many partition bits will greatly increase the pre-processing complexity. For example, to choose 8 split bits from IP addresses, the possible cases are 4.43×109 . Traversing all the values will bring a great deal of calculation. Therefore, we make some optimizations to prefix bit partition, as is shown in the following.

(1) Restrict the max number of the split bits.

[16] shows that, the prefix length of IP addresses is mainly less than 9 or larger than 12. According to this, we set the max number of split bits as 4, so as to enable the partition bit to achieve satisfactory partition in all cases.

(2) Restrict the select range of the split bits.

In a byte index group, the byte indexes of all the rules are the same. Therefore, it is useless to choose the split bit in the highest byte of the IP address. Additionally, if the split bit is chosen in the wildcards, rule duplication will be introduced. Given these two aspects, we select the split bits within the IP prefix except for the highest byte. The maximum number of candidate values is 48.

We use the method in [17] to complete the lookup in bit tree, as is shown in Fig. 9. Let the IP address of head be 1101, 0010, then we get S(head,Vs) = bitMerge(Vr∧Vs) = 11, the node that packet corresponds to is Child3. The way of update rule lookup is the same as the packet lookup.

Fig. 9.Bit Lookup

Theory III Bit partition will not bring rule duplication.

Proof: Given that we choose the split bit within the IP prefix, since there is no wildcard in IP prefix, the bit of any rules in IP prefix has set values: ‘0’ and ‘1’. Therefore, each rule has a set split value and one rule can only belong to one child node. In this way, rule duplication can be avoided.

4.5 Lookup and Update

4.5.1 Lookup Algorithm

According to the search structure of PreCuts, let the packet header be head, the lookup algorithm is as follows.

Fig. 10 illustrates the pseudo code of lookup algorithm.

Fig. 10.Pseudo code of Packet Lookup

According to the data structure of PreCuts, we analyze the search complexity of each layer. In Layer-1, PreCuts retains all the possible BI pointers. So, we only need one search to pinpoint the next node. Thus, the search complexity of this layer is O(1). In Layer-2, we need to linearly search every IPRange. Let the number of the IPRange be P, then the search complexity is O(P). In Layer-3, for each bit tree node, PreCuts can also fix the next node through only one search, and finally perform a linear search in the leaf nodes. Let the depth of the decision tree be d, the number of rules in the leaf node is N, then the search complexity is O(d+N). In all, the search complexity of PreCuts is O(d+P+N). However, since d is relatively small, the search complexity is O(P+N). Table 3 compares the time complexity of PreCuts with BRPS and EffiCuts.

Table 3.Complexity of BRPS, EffiCuts and PreCuts. M is the total number of rules, d is the number of dimension, N is the number of rules in leaf node

The analysis of the time complexity of PreCuts shows that its time performance mainly depends on the number of PL Groups (in a node) and the number of rules in the leaf node because PreCuts has to perform a linear search in these two parts, so their size will affect the time performance. In the worst case (Max Number), the numbers of these two parameters are relatively high, but the average number is far smaller than that in the worst case. Therefore, the average time performance is still good. When performing the search algorithms, if a node needs a large amount of linear search, a separate thread can be added to finish it. In this way, the impact of the worst case on the time performance of PreCuts can be avoided.

4.5.2 Update

There are two kinds of rule update, namely, reconstruction and incrementation. Reconstruction update means reconstructing the search structure based on a complete ruleset. When the current search structure cannot meet the actual demand of the Internet, reconstruction update will be used. While incrementation update means adding or deleting some rules in the current search structure. When a ruleset needs to add or delete some rules, incrementation update will be used. Incrementation update will not significantly change the current search structure, but just add or delete a certain amount of rules in the search structure.

We have already discussed the reconstruction update of PreCuts. Here, we only focus on the incremental update. Let the rule waiting to be inserted (or deleted) be r, then we find out the node to which r belongs. Next, we perform the function of addRule(r,updateNode) and deleteRule(r,updateNode) to insert and delete rules, respectively. Fig. 11 illustrates the pseudo code of finding rule node.

Fig. 11.Pseudo code of Finding Rule Node

The search method of update algorithm in Layer-1 and Layer-3 is the same as the search algorithm. So is their complexity. In Layer-2, through linearly searching the PL values within the current node, the update algorithm spots the next node corresponding to the new rule. Let the number of PL values be L, then the update complexity is O(L). Therefore, the update complexity of PreCuts is O(L+N). As can be seen in Table 4, the update complexity of BRPS and EffiCuts is the same, and is higher than PreCuts because with similar update method and lookup method, PreCuts only needs relatively low overhead to complete the update.

Table 4.Update Complexity of BRPS, EffiCuts and PreCuts

PreCuts has a very simple search structure, in which packet lookup and rule update are implemented using similar ways. That is, the overhead of one packet lookup and one rule update is almost the same, which ensures a fairly fast update speed of PreCuts. It is worth noting that the search structure of PreCuts does not introduce redundant rules in updating, and consequently PreCuts achieves fast rule update.

 

5. Performance Evaluation

In this section, we evaluate the performance of PreCuts and compare it with other packet classification schemes in terms of memory usage, search speed and update speed. In this paper, we use ClassBench [16] to generate classifiers representative of the real ones, with the scales of ruleset ranging from 1,000 to 1,000,000. Being widely used in the research of packet classification, ClassBench mainly includes three types of rules: ACL (Access Control List), FW (Fire Wall) and IPC (Linux IP Chains). For example, FW1-1K is 1,000 Fire Wall security scheme rules. All the rules are 5-dimensional. They contain 32-bit source/destination IP addresses, 16-bit source/destination port numbers and 8-bit transport layer protocol. The implementation platform is an Ubuntu 10.04 system running on an Intel Core-i3 2350m and 2 GB of main memory.

We implemented BRPS and EffiCuts with all optimizations used in [18] and [8], respectively. The bucketsize of EffiCuts is 8 while other parameters use their default values in [8].

5.1 Details of PreCuts

Table 5 illustrates the details of PreCuts, displaying 5 characteristics of the PreCuts decision tree.

Table 5.Details of PreCuts

5.2 Memory Usage

In this part we compare the memory usage of BRPS, EffiCuts and PreCuts. In order to ensure good time performance, PreCuts retains all the prefix index pointers and the child node pointers of the bit tree. This inevitably brings some redundant pointers to the decision tree structure. These redundancies will reduce as the size of the ruleset grows, for the redundant pointers will be filled up by more rules.

Fig. 12 compares the memory usage of BRPS, EffiCuts and PreCuts. From the figure we can see that, for relatively small ruleset (less than 10k), PreCuts has similar memory usage as BRPS and EffiCuts, while for a larger ruleset (100k), the average memory usage of PreCuts is 27.0% less than BRPS and 33.5% less than EffiCuts. Moreover, as a whole, the average memory usage of PreCuts is 24.1% less than BRPS and 38.5% less than EffiCuts.

Fig. 12.Memory Usage

5.3 Search Speed

In this part, we compare the time performance of BRPS, EffiCuts and PreCuts. We realize the three algorithms in simulation, and use the clock counter of CPU to measure the clocks that the three algorithms need in searching. Fig. 13 compares the search cycles of the three algorithms when dealing with different rulesets. The trace file used in the test is randomly generated by ClassBench [16]. From the simulation results, compared to EffiCuts, it is obvious that PreCuts makes a great improvement, its search cycles have reduced by 84.9% on average. Compared to BRPS, its search cycles have reduced by 21.9% on average. When processing ACL and IPC rulesets, BRPS and PreCuts have similar search performance, whereas, when processing FW rulesets, PreCuts is better.

Fig. 13.Search Cycles

The performance of PreCuts is much superior than EffiCuts in that the two algorithms adopt different data structures. The multi-decision tree search structure of EffiCuts makes it mandatory for the search algorithm to traverse each and every decision tree before obtaining the best match rule. This greatly raises the search cycles. For PreCuts, it is a different story. It only has to traverse one decision tree to get the best match rule. Consequently, the search cycles of EffiCuts are several times as many as those of PreCuts.

BRPS has the similar performance with PreCuts thanks to its huge amount of completed optimized heuristics. However, in practice, these optimized heuristics are hardly realizable. On the contrary, PreCuts is comparable with or even surpasses BRPS in time performance without any extra optimization. This simplicity of PreCuts ensures its easy realization. With the same time performance but easier realization, PreCuts is superior to BRPS.

5.4 Update Speed

We adapt the method in [18] to test the update performance. First, we randomly extract 10% of the rules from the ruleset and build the search structure according to the remaining 90% of the rules. Then we randomly insert the extracted rules into the search structure to obtain the insertion cycles. After that, we randomly select another 10% of rules from the classifier and delete them from the search structure to obtain the deletion cycles. The average of the insertion and deletion cycles is obtained to judge the update performance. Since EffiCuts has no specific algorithms for incremental update, we only compare the update performance of BRPS and PreCuts, as is shown in Fig. 14. The average number of cycles that PreCuts needs in an update is 95% less than BRPS.

Fig. 14.Update Cycles

It can also be seen from Fig. 14 that, compared to BRPS, the number of the cycles that the incremental update of PreCuts needs will not increase significantly as the ruleset gets larger. This is mainly because the average depth of the decision tree of PreCuts does not grow as the ruleset gets larger. This almost keeps the number of the nodes traversed unchanged when PreCuts makes incremental updates to rulesets of various sizes. Consequently, PreCuts can sustain an excellent performance in making incremental updates to larger rulesets.

 

6. Conclusion

Previous packet classification algorithms mainly focus on time and space performance; however, the update performance is often overlooked. In this paper, we propose PreCuts which achieves fast updates. To improve the update speed without reducing the time and space performance, we design a novel 3-layer search structure. Considering the characteristics of the IP field of the rules, the ruleset partition on the three levels is carried out using three different ways, namely, Byte Index Cutting, Prefix Length Cutting and Bit Partition. The experimental results show that our scheme outperforms other schemes not only in terms of classification speed and memory usage but also in terms of update speed.

참고문헌

  1. Fong J, Wang X, Qi Y, et al., "ParaSplit: a scalable architecture on FPGA for terabit packet classification," in Proc. of IEEE 20th Annual Symposium on High-Performance Interconnects, pp. 1-8, August, 2012.
  2. Yaxuan Q, Lianghong X, Baohua Y, et al. "Packet classification algorithms: from theory to practice," in Proc. of IEEE INFOCOM'09, pp. 648-656, April, 2009.
  3. Gupta P and Mckeown N, "Packet classification using hierarchical intelligent cuttings," in Proc. of Hot Interconnects VII, pp. 34-41, August, 1999.
  4. B. Xu, D. Jiang and J. Li, "HSM: a fast packet classification algorithm," in Proc. of IEEE 19th International Conference on Advanced Information Networking and Applications, pp. 987-992, March, 2005.
  5. F. Baboescu and G. Varghese, "Scalable packet classification," ACM SIGCOMM Computer Communication Review, vol.31, no. 4, pp. 199-210, August, 2001. https://doi.org/10.1145/964723.383075
  6. P. Gupta and N. McKeown, "Classifying packets with hierarchical intelligent cuttings," IEEE Micro, vol. 20, no. 1, pp. 34-41, 2000. https://doi.org/10.1109/40.820051
  7. S. Singh, F. Baboescu, G. Varghese and J. Wang, "Packet classification using multidimensional cutting," in Proc. of ACM SIGCOMM'03, pp. 213-224, August, 2003.
  8. B. Vamanan, G. Voskuilen and T. N. Vijaykumar, "Efficuts: optimizing packet classification for memory and throughput," in Proc. of ACM SIGCOMM'10, pp. 207-218, August, 2010.
  9. C. Yeim-Kuan and W. Yu-Hsiang, "CubeCuts: a novel cutting scheme for packet classification," in Proc. of IEEE 26th International Conference on Advanced Information Networking and Applications Workshops, pp. 274-279, March, 2012.
  10. C. Yeim-Kuan and C. Han-Chen, "Layered cutting scheme for packet classification," in Proc. of IEEE International Conference on Advanced Information Networking and Applications, pp. 675-681, March, 2011.
  11. Y. Ma and S. Banerjee, "A smart pre-classifier to reduce power consumption of TCAMs for multi-dimensional packet classification," in Proc. of ACM SIGCOMM'12, pp. 335-346, August, 2012.
  12. C. R. Meiners, A. X. Liu and E. Torng, "TCAM Razor: a systematic approach towards minimizing packet classifiers in TCAMs," in Proc. of IEEE International Conference on Network Protocols, pp. 266-275, October, 2007.
  13. C. R. Meiners, A. X. Liu and E. Torng, "Topological transformation approaches to optimizing TCAM-based packet classification systems," IEEE/ACM Transactions on Networking, vol. 19, no. 1, pp. 237-250, June, 2011. https://doi.org/10.1109/TNET.2010.2061864
  14. B. Vamanan and T. N. Vijaykumar, "TreeCAM: decoupling updates and lookups in packet classification," in Proc. of 7th Conference on Emerging Networking Experiments and Technologies, pp. 27, December, 2011.
  15. C. R. Meiners, A. X. Liu and E. Torng, "Bit weaving: a non-prefix approach to compressing packet classifiers in TCAMs," IEEE/ACM Transactions on Networking, vol. 20, no. 2, pp. 488-500, April, 2012. https://doi.org/10.1109/TNET.2011.2165323
  16. D. E. Taylor and J. S. Turner, "ClassBench: a packet classification benchmark," IEEE/ACM Transactions on Networking, vol. 15, no. 3, pp. 499-511, June, 2007. https://doi.org/10.1109/TNET.2007.893156
  17. B. Yang, J. Fong, W. Jiang, Y. Xue and J. Li, "Practical multi-tuple packet classification using dynamic discrete bit selection," IEEE Transactions on Computers, vol. 63, no. 2, pp. 424-434, January, 2014. https://doi.org/10.1109/TC.2012.191
  18. C. Yeim-Kuan, "Efficient multidimensional packet classification with fast updates," IEEE Transactions on Computers, vol. 58, no. 4, pp. 463-479, February, 2009. https://doi.org/10.1109/TC.2008.181

피인용 문헌

  1. MBitCuts: optimal bit-level cutting in geometric space packet classification vol.76, pp.4, 2014, https://doi.org/10.1007/s11227-019-03090-3