Browse > Article
http://dx.doi.org/10.5392/JKCA.2017.17.11.600

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment  

Kim, Young Soo (배재대학교 사이버보안학과)
Lee, Byoung Yup (배재대학교 사이버보안학과)
Publication Information
Abstract
Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.
Keywords
Big Data; Hierarchical Class; Document Categorization; Clustering; Similarity;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 김현주, 박소미, 박석, "확장된 질의 처리를 위해 경로간 의미적 유사도를 고려한 XML 문서 순위화 기법," 한국정보과학회 학술발표논문집, Vol.36(1A), pp.8-13, 2009.
2 윤용욱, 이창기, 이근배, "지지 벡터 기계를 이용한 계층적 문서 분류," 한국정보과학회 언어공학연구회 학술발표 논문집, pp.7-13, 10. 2003.
3 유재학, 김성윤, 이한성, 김명섭, 박대희, "계층적 다중 클래스 SVM을 이용한 인터넷 애플리케이션 트래픽 분류," 한국컴퓨터종합학술대회 논문집, Vol.35, No.1(A), 2008.
4 김영수, 문형진, 조혜선, 김병익, 이진해, 이진우, 이병엽, "계층적침해자원기반의 침해사고 구성 및 유형 분석," 한국콘텐츠학회논문지, 제16권, 제11호, pp.139-153, 2016.   DOI
5 김영수, "보안 인텔리전트 유형 분류를 위한 다중 프로파일링 앙상블 모델," 한국콘텐츠학회논문지, Vol.17, No.3, pp.231-237, 2017(3).   DOI
6 C. Chu, S. K. Kim, Y. Lin, Y. Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun, "Map-Reduce for Machine Learning on Multicore," pp.281-288, NIPS 2006.
7 J. Dean and S. Ghemawat, "Mapreduce: Simplified data processing on large clusters," OSDI'04: Sixth Symposium on Operating System Design and Implementation, Dec 2004.
8 C. W. Hsu and C. J. Lin, A comparison of methods for multi-class support vector machines, IEEE Transactions on Neural Networks, Vol.13, pp.415-425, 2002.   DOI
9 S. Guha, R. Rastogi, and K. Shim. CURE: an efficient clustering algorithm for large databases, In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seatle, WA, 1998.
10 Y. Kanza and Y. Sagiv, "Flexible Queries over Semistructured Data," Proc. of 12th ACM SIGMODSIDACT-SIGART symposium on Principles of database systems, pp.40-51, 2001.
11 K. N. Rao, T. V. Rao, and D. R. Lakshmi, "A Novel Class Imbalance Learning Method using Subset Filtering," International Journal of Scientific and Engineering Research, Vol.3, No.9, pp.1-9, 2012.
12 Xiping Liu, Changxuan Wan, and Lei Chen, "Returning Clustered Results for Keyword Search on XML Documents," IEEE Transactions On Knowledge and Data Engineering, Vol.23, No.12, Dec. 2011.
13 C. Cortes and V. Vapnik, "Support-vector network," Machine Learning, Vol.20, pp.273-297, 1995.
14 C. Freeman, D. Kulic, and O. Basir, "Joint feature selection and hierarchical classifier design," IEEE International Conference on Systems Man and Cybernetic, pp.1728-1734, 2011.
15 C. F. Tan, "Short Text Classification Based on LDA and SVM," International Journal of Applied Mathematics and Statistics (IJAMS), Vol.51, No.22, pp.205-214, 2013.
16 Lijuan Cai and Thomas Hofmann, "Implementation of Support Vector Machine Technique in Feedback Analysis System," International Journal of Computer Applications, Vol.96, No.17, pp.24-27, Jun. 2014.   DOI
17 J. Hernandez, L. E. Sucar, E. F. Morales, "Multidimensional hierarchical classification," Expert Systems with Applications, Vol.41, No.17, pp.7671-7677, 2014.   DOI
18 D. Koller and M. Sahami, "Hierarchically classifying documents using very few words," Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pp.170-178, 1997.
19 H. Yu, J. Han, and K. C. Chang, PEBL: Positive-example based learning for Web page classification using SVM. In Proc. 8th Int. Conf. Knowledge Discovery and Data Mining, Edmonton, Canada, 2002.
20 Tao Li, Shenghuo Zho, and Mitsunori Orkhara, "Topic Hierarchy Generation via Linear Discriminant Projection," Proceedings of SIGIR 2003, the Twenty-Sixth Annual International ACM SIGIR Conference, pp.421-422, 2003.