Browse > Article
http://dx.doi.org/10.5762/KAIS.2010.11.2.517

Design and Implementation of Text Classification System based on ETOM+RPost  

Choi, Yun-Jeong (Department of Information Communication, Seoil University)
Publication Information
Journal of the Korea Academia-Industrial cooperation Society / v.11, no.2, 2010 , pp. 517-524 More about this Journal
Abstract
Recently, the size of online texts and textual information is increasing explosively, and the automated classification has a great potential for handling data such as news materials and images. Text classification system is based on supervised learning which needs laborous work by human expert. The main goal of this paper is to reduce the manual intervention, required for the task. The other goal is to increase accuracy to be high. Most of the documents have high complexity in contents and the high similarities in their described style. So, the classification results are not satisfactory. This paper shows the implementation of classification system based on ETOM+RPost algorithm and classification progress using SPAM data. In experiments, we verified our system with right-training documents and wrong-training documents. The experimental results show that our system has high accuracy and stability in all situation as 16% improvement in accuracy.
Keywords
Machine Learning; Text Classification System; Learning Algorithm; Feedback System;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 O.Dekel,J.Keshet, "Large Margin Hierarchical Classification.", In Proc. of the ICML'04, pp.209-216, 2004
2 M.Ruiz and P.Srinivasan, "Hierarchical Text Categorization Using Neural Networks", Information Retrieval, Vol.5, No.1, pp.87-118, 2002.   DOI   ScienceOn
3 김수희, "XML 문서의 구조기반 검색성능 평가", 한국산학기술학회 논문지, 제10권, 제2호, pp.396-406, 2009.2   과학기술학회마을   DOI
4 Y.Zhao and G.Karypis, "Hierarchical Clustering Algorithms for Document Datasets", Data Mining and Knowledge Discovery, Vol.10,No.2,pp. 141-168, 2005.   DOI   ScienceOn
5 Rainbow(BOW),http://www.cs.cmu.edu/-mccallum/bow
6 Apache Assassin Project, http://spamassassin.apache.org/
7 Bayesian Classifier,http://www.bayesia.com/GB/home/
8 SVM-light, http://www.cs.cornell.edu/People/tj/svm_light/
9 CLUTO-Clustering Algorithms, http://glaros.dtc.umn.edu/gkhome/views/cluto
10 김재준, 김한구, "베이지언 문서분류시스템을 위한 능동적 학습기반의 학습문서집합 구성방법", 한국정보 과학회 논문지, 제29권, 제12호,2002.12   과학기술학회마을
11 윤성희, "자연어 질의유형 판별과 응답 추출을 위한 어휘 의미 체계에 관한 연구", 한국산학기술학회 논문지, 제5권,제6호,pp.539-545, 2004.12   과학기술학회마을
12 M.Lan, C.Tan, H.-B. Low, and S.Y. Sung, "A Comprehensive Comparative Study On Term Weighting Schemes For Text Categorization With Support Vector Machines", In Proc. of 14th International World Wide Web Conference, pp.1032-1033, 2005.
13 최윤정, 지정규, 박승수, "경계범주 자동탐색에 의한 확장된 학습체계 구성방법", 한국정보처리학회 논문 지B , 제16-B권, 제6호, pp.0479-0488, 2009.12.   과학기술학회마을   DOI