• Title/Summary/Keyword: Rule-Based Classification

Search Result 328, Processing Time 0.025 seconds

Tuple Pruning Using Bloom Filter for Packet Classification (패킷 분류를 위한 블룸 필터 이용 튜플 제거 알고리즘)

  • Kim, So-Yeon;Lim, Hye-Sook
    • Journal of KIISE:Information Networking
    • /
    • v.37 no.3
    • /
    • pp.175-186
    • /
    • 2010
  • Due to the emergence of new application programs and the fast growth of Internet users, Internet routers are required to provide the quality of services according to the class of input packets, which is identified by wire-speed packet classification. For a pre-defined rule set, by performing multi-dimensional search using various header fields of an input packet, packet classification determines the highest priority rule matching to the input packet. Efficient packet classification algorithms have been widely studied. Tuple pruning algorithm provides fast classification performance using hash-based search against the candidate tuples that may include matching rules. Bloom filter is an efficient data structure composed of a bit vector which represents the membership information of each element included in a given set. It is used as a pre-filter determining whether a specific input is a member of a set or not. This paper proposes new tuple pruning algorithms using Bloom filters, which effectively remove unnecessary tuples which do not include matching rules. Using the database known to be similar to actual rule sets used in Internet routers, simulation results show that the proposed tuple pruning algorithm provides faster packet classification as well as consumes smaller memory amount compared with the previous tuple pruning algorithm.

Optimum Range Cutting for Packet Classification (최적화된 영역 분할을 이용한 패킷 분류 알고리즘)

  • Kim, Hyeong-Gee;Park, Kyong-Hye;Lim, Hye-Sook
    • Journal of KIISE:Information Networking
    • /
    • v.35 no.6
    • /
    • pp.497-509
    • /
    • 2008
  • Various algorithms and architectures for efficient packet classification have been widely studied. Packet classification algorithms based on a decision tree structure such as HiCuts and HyperCuts are known to be the best by exploiting the geometrical representation of rules in a classifier. However, the algorithms are not practical since they involve complicated heuristics in selecting a dimension of cuts and determining the number of cuts at each node of the decision tree. Moreover, the cutting is not efficient enough since the cutting is based on regular interval which is not related to the actual range that each rule covers. In this paper, we proposed a new efficient packet classification algorithm using a range cutting. The proposed algorithm primarily finds out the ranges that each rule covers in 2-dimensional prefix plane and performs cutting according to the ranges. Hence, the proposed algorithm constructs a very efficient decision tree. The cutting applied to each node of the decision tree is optimal and deterministic not involving the complicated heuristics. Simulation results for rule sets generated using class-bench databases show that the proposed algorithm has better performance in average search speed and consumes up to 3-300 times less memory space compared with previous cutting algorithms.

COMPOUNDED METHOD FOR LAND COVERING CLASSIFICATION BASED ON MULTI-RESOLUTION SATELLITE DATA

  • HE WENJU;QIN HUA;SUN WEIDONG
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.116-119
    • /
    • 2005
  • As to the synthetical estimation of land covering parameters or the compounded land covering classification for multi-resolution satellite data, former researches mainly adopted linear or nonlinear regression models to describe the regression relationship of land covering parameters caused by the degradation of spatial resolution, in order to improve the retrieval accuracy of global land covering parameters based on 1;he lower resolution satellite data. However, these methods can't authentically represent the complementary characteristics of spatial resolutions among different satellite data at arithmetic level. To resolve the problem above, a new compounded land covering classification method at arithmetic level for multi-resolution satellite data is proposed in this .paper. Firstly, on the basis of unsupervised clustering analysis of the higher resolution satellite data, the likelihood distribution scatterplot of each cover type is obtained according to multiple-to-single spatial correspondence between the higher and lower resolution satellite data in some local test regions, then Parzen window approach is adopted to derive the real likelihood functions from the scatterplots, and finally the likelihood functions are extended from the local test regions to the full covering area of the lower resolution satellite data and the global covering area of the lower resolution satellite is classified under the maximum likelihood rule. Some experimental results indicate that this proposed compounded method can improve the classification accuracy of large-scale lower resolution satellite data with the support of some local-area higher resolution satellite data.

  • PDF

Rule Discovery for Cancer Classification using Genetic Programming based on Arithmetic Operators (산술 연산자 기반 유전자 프로그래밍을 이용한 암 분류 규칙 발견)

  • 홍진혁;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.8
    • /
    • pp.999-1009
    • /
    • 2004
  • As a new approach to the diagnosis of cancers, bioinformatics attracts great interest these days. Machine teaming techniques have produced valuable results, but the field of medicine requires not only highly accurate classifiers but also the effective analysis and interpretation of them. Since gene expression data in bioinformatics consist of tens of thousands of features, it is nearly impossible to represent their relations directly. In this paper, we propose a method composed of a feature selection method and genetic programming. Rank-based feature selection is adopted to select useful features and genetic programming based arithmetic operators is used to generate classification rules with features selected. Experimental results on Lymphoma cancer dataset, in which the proposed method obtained 96.6% test accuracy as well as useful classification rules, have shown the validity of the proposed method.

Factor-analysis based questionnaire categorization method for reliability improvement of evaluation of working conditions in construction enterprises

  • Lin, Jeng-Wen;Shen, Pu Fun
    • Structural Engineering and Mechanics
    • /
    • v.51 no.6
    • /
    • pp.973-988
    • /
    • 2014
  • This paper presents a factor-analysis based questionnaire categorization method to improve the reliability of the evaluation of working conditions without influencing the completeness of the questionnaire both in Taiwanese and Chinese construction enterprises for structural engineering applications. The proposed approach springs from the AI application and expert systems in structural engineering. Questions with a similar response pattern are grouped into or categorized as one factor. Questions that form a single factor usually have higher reliability than the entire questionnaire, especially in the case when the questionnaire is complex and inconsistent. By classifying questions based on the meanings of the words used in them and the responded scores, reliability could be increased. The principle for classification was that 90% of the questions in the same classified group must satisfy the proposed classification rule and consequently the lowest one was 92%. The results show that the question classification method could improve the reliability of the questionnaires for at least 0.7. Compared to the question deletion method using SPSS, 75% of the questions left were verified the same as the results obtained by applying the classification method.

Packet Classification Using Two-Dimensional Binary Search on Length (길이에 대한 2차원 이진검색을 이용한 패킷분류 구조)

  • Mun, Ju-Hyoung;Lim, Hye-Sook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9B
    • /
    • pp.577-588
    • /
    • 2007
  • The rapid growth of the Internet has stimulated the development of various new applications and services, and the service providers and the Internet users now require different levels of service qualities rather than current best-effort service which treats all incoming packet equally. Therefore, next generation routers should provide the various levels of services. In order to provide the quality of services, incoming packets should be classified into flows according to pre-defined rules, and this should be performed for all incoming packets in wire-speed. Packet classification not only involves multi-dimensional search but also finds the highest priority rule among all matching rules. Area-based quad-trie is a very good algorithm that constructs a two-dimensional trie using source and destination prefix fields. However, it performs the linear search for the prefix length, and hence it does not show very good search performance. In this paper, we propose to apply binary search on length to the area-based quad-trie algorithm. In improving the search performance, we also propose two new algorithms considering the priority of rules in building the trie.

A study on the expert system for classification of books (분류전문가시스팀에 관한 연구)

  • 김정현
    • Journal of Korean Library and Information Science Society
    • /
    • v.19
    • /
    • pp.35-57
    • /
    • 1992
  • This study is an attempt to provide some helpful data for the design and the implementation of the expert system for the book-classification based on the analysis of various cases of the classification-expert system models. Following the introduction, the concepts and some features of an expert system were overviewed in the second chapter, on the basis of which the following concrete cases were introduced and analyzed in the third chapter : (1) ACN System for NC, (2) Expert System for NDC, (3) Expert System for UDC, (4) Herba Medica System, (5) Expert System for IPC, (6) Stratcyclode Project, (7) Expert System for Classification of INIS Database, (8) AutoBC System, and etc. In the conclusion, for the development of the classification-expert system, it was turned out that constructing a new system by using an AI language such as Prolog or LISP is more desirable than employing any one of expert system shells. Together it is necessary for the following requirements to be met : (1) The subject concept of a document elicited should be accurate. (2) Not only a domain knowledge but also the knowledge covering all the subjects should be represented in the knowledge-bases. (3) The knowledge-bases should be organized in such a way that the characteristics of the knowledge about classification should be well defined. (4) rule-base consisting of accurate rules about classification should be made. (5) It should be possible for classification code wanted to be generated immediately.

  • PDF

Automatic e-mail classification using Dynamic Category Hierarchy and Principal Component Analysis (주성분 분석과 동적 분류체계를 사용한 자동 이메일 분류)

  • Park, Sun;Kim, Chul-Won;Lee, Yang-weon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.576-579
    • /
    • 2009
  • The amount of incoming e-mails is increasing rapidly due to the wide usage of Internet. Therefore, it is more required to classify incoming e-mails efficiently and accurately. Currently, the e-mail classification techniques are focused on two way classification to filter spam mails from normal ones based mainly on Bayesian and Rule. The clustering method has been used for the multi-way classification of e-mails. But it has a disadvantage of low accuracy of classification. In this paper, we propose a novel multi-way e-mail classification method that uses PCA for automatic category generation and dynamic category hierarchy for high accuracy of classification. It classifies a huge amount of incoming e-mails automatically, efficiently, and accurately.

  • PDF

Comparison of confidence measures useful for classification model building (분류 모형 구축에 유용한 신뢰도 측도 간의 비교)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.365-371
    • /
    • 2014
  • Association rule of the well-studied techniques in data mining is the exploratory data analysis for understanding the relevance among the items in a huge database. This method has been used to find the relationship between each set of items based on the interestingness measures such as support, confidence, lift, similarity measures, etc. By typical association rule technique, we generate association rule that satisfy minimum support and confidence values. Support and confidence are the most frequently used, but they have the drawback that they can not determine the direction of the association because they have always positive values. In this paper, we compared support, basic confidence, and three kinds of confidence measures useful for classification model building to overcome this problem. The result confirmed that the causal confirmed confidence was the best confidence in view of the association mining because it showed more precisely the direction of association.

Automated Modelling of Ontology Schema for Media Classification (미디어 분류를 위한 온톨로지 스키마 자동 생성)

  • Lee, Nam-Gee;Park, Hyun-Kyu;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.44 no.3
    • /
    • pp.287-294
    • /
    • 2017
  • With the personal-media development that has emerged through various means such as UCC and SNS, many media studies have been completed for the purposes of analysis and recognition, thereby improving the object-recognition level. The focus of these studies is a classification of media that is based on a recognition of the corresponding objects, rather than the use of the title, tag, and scripter information. The media-classification task, however, is intensive in terms of the consumption of time and energy because human experts need to model the underlying media ontology. This paper therefore proposes an automated approach for the modeling of the media-classification ontology schema; here, the OWL-DL Axiom that is based on the frequency of the recognized media-based objects is considered, and the automation of the ontology modeling is described. The authors conducted media-classification experiments across 15 YouTube-video categories, and the media-classification accuracy was measured through the application of the automated ontology-modeling approach. The promising experiment results show that 1500 actions were successfully classified from 15 media events with an 86 % accuracy.