Design of Automatic Document Classifier for IT documents based on SVM

Kang, Yun-Hee;Park, Young-B.;

전기전자학회논문지 (Journal of IKEEE)

제8권2호
/
Pages.186-194
/
2004
/
1226-7244(pISSN)
/
2288-243X(eISSN)

한국전기전자학회 (Institute of Korean Electrical and Electronics Engineers)

SVM을 이용한 디렉토리 기반 기술정보 문서 자동 분류시스템 설계

Design of Automatic Document Classifier for IT documents based on SVM

강윤희 (천안대학교 정보통신학부) ;
박용범 (단국대학교 전자컴퓨터학부)

Kang, Yun-Hee (Division of Information and Communication) ;
Park, Young-B. (Computer Science)

발행 : 2004.12.01

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

인터넷 상의 정보가 급증하여 필요한 정보를 찾고 관련된 정보를 조직화하는데 많은 시간이 소요된다. 따라서 정보접근 부하를 줄일 수 있는 자동적인 문서 분류의 중요성과 필요성이 증가하고 있다. 본 논문에서는 웹 문서의 자동 분류 시스템의 설계와 구현을 기술한다. 디렉터리 내의 학습 문서 집합을 기반으로 구성된 대표 단어 집합을 이용하여 문서 분류 모델을 학습하기 위해 SVM을 사용하였다. 본 시스템에서는 정보통신 웹 디렉터리 내의 문서로부터 추출된 단어 집합을 기반으로 SVM을 학습 시킨 후 신규 문서에 대해 문서 분류를 수행한다. 또한 TFiDF를 기반으로 특성을 표현하기 위해 벡터공간 모델을 사용하였고 학습 데이터는 가중치를 갖는 특성 집합으로 표현되어진 긍정 및 부정 집합으로 구성하였다. 실험에서는 문서분류의 결과 및 벡터길이의 관련성을 보인다.

Due to the exponential growth of information on the internet, it is getting difficult to find and organize relevant informations. To reduce heavy overload of accesses to information, automatic text classification for handling enormous documents is necessary. In this paper, we describe structure and implementation of a document classification system for web documents. We utilize SVM for documentation classification model that is constructed based on training set and its representative terms in a directory. In our system, SVM is trained and is used for document classification by using word set that is extracted from information and communication related web documents. In addition, we use vector-space model in order to represent characteristics based on TFiDF and training data consists of positive and negative classes that are represented by using characteristic set with weight. Experiments show the results of categorization and the correlation of vector length.

키워드

참고문헌

Proceedings of the 1995 USENIX Technical Conference Sift - A Tool for Wide-Area Information Dissemination Yan, Tak W.;Garcia-Molina, Hector
Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer Salton, G.
IEEE Trans. on Neural Networks v.10 no.5 SVM for histogram-based image classification Chapelle, O.;Haffner, P.;Vapnik, V.
Annual Review of Information Science & Technology v.25 Connectionist models and information retrieval Doszkocs, T.;Reggia, J.;Lin, X.
Proc. Of the 14th Internatinal Conference on Machine Learning ICML-97 A Comparative Study on Feature Selection in Text Categorization Yang, Y.;Pdedersen, J.O.
Proc. European Conference on Machine Learning (ECML) Text categorization with support vector machines: learning with many relevant features Joachims, T.
SVMLight Joachims, T.
Proc. IJCAI-95 workshop on Data Engineering for Inductive Learning Clustering full test documents Martin, J.
Proc. SDAIR '95 A neural network approach to topic spotting Wiener, E.;Pedersen, J.O.;Weigend, A.S.
Proc. SIGIR '94 A sequential algorithm for training text classifiers Lewis, D.;Gale, W.A.
Communications of the ACM v.37 no.7 Agents that reduce work and information overload Maes, Pattie
Proceedings of ACM SIGIR Training Algorithms for Linear Text Classifiers Lewis, D.;Schapire, R.;Callan, J.;Papka, R.
20th ACM SIGIR Conference Feature selection, perceptron learning and a usability case study for text categorization Ng, T.H.;Goh, W.B.;Low, K.L.
Proceedings of the Int'l ACM SIGIR Conference on R&D in Information Retrieval Exploration of text collections with hierarchical feature maps Merkl, D.;Merkl, D.(ed.)
Proceedings of the 21st ACM/SIGIR (SIGIR-98) Automatic essay grading using text categorization techniques Larkey, Leah
Feature selection in statistical learning of test categorization Yang, Y.;Pederson, J.O.

전기전자학회논문지 (Journal of IKEEE)

SVM을 이용한 디렉토리 기반 기술정보 문서 자동 분류시스템 설계

Design of Automatic Document Classifier for IT documents based on SVM

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)