A Real-Time Concept-Based Text Categorization System using the Thesauraus Tool

시소러스 도구를 이용한 실시간 개념 기반 문서 분류 시스템

  • Published : 1999.01.01

Abstract

The majority of text categorization systems use the term-based classification method. However, because of too many terms, this method is not effective to classify the documents in areal-time environment. This paper presents a real-time concept-based text categorization system,which classifies texts using thesaurus. The system consists of a Korean morphological analyzer, athesaurus tool, and a probability-vector similarity measurer. The thesaurus tool acquires the meaningsof input terms and represents the text with not the term-vector but the concept-vector. Because theconcept-vector consists of semantic units with the small size, it makes the system enable to analyzethe text with real-time. As representing the meanings of the text, the vector supports theconcept-based classification. The probability-vector similarity measurer decides the subject of the textby calculating the vector similarity between the input text and each subject. In the experimentalresults, we show that the proposed system can effectively analyze texts with real-time and do aconcept-based classification. Moreover, the experiment informs that we must expand the thesaurustool for the better system.

Keywords

References

  1. CACM v.18 no.11 A.Vector Space Model for Automatic Indexing Salton, A.Wong;C.S.Yang
  2. 확률벡터와 메타범주를 이용한 최적 문서 범주화 모델 권오욱
  3. Text-Based Intelligent System: Current Research and Practice in Information Extraction and Retrieval Intelligent High-Volume Text Processing Using Shallow, Domain-Specific Technique Hayes;Paul S.Jacobs (ed.)
  4. ACM Tr.On Information Systems v.12 no.3 Automated Learning of Decision Rules for Text Categorization C.Apte;F.Damerau;S.M.Weiss
  5. Journal of the ACM v.8 Automatic Indexing : An Experimental Inquiry M.E.Maron
  6. ACM SIGIR'92 An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task D.D.Lewis
  7. Proc. Intl.Conf.on Research and Development in Information Retrieval A Statistical Similarity Measure K.M.Wong;Y.Y.Yao
  8. Proc. Intl.Conf.on Research and Development in Information Retrieval On Extending the Vector Space Model for Boolean Query Processing K.M.Wong;W.Ziarko;V.V.Raghavan;P.C.M.N.Wong
  9. Proc. Intl.Conf.on Research and Development in Information Retrieva Classifying News Stories using Memory Based Reasoning Masand, G.Linoff;D.Waltz
  10. Proc. Intl.Conf.on Research and Development in Information Retrieva Expert Network: Effective and Efficient Learning from Human Decision in Text Categorization and Retrieval Yang
  11. ACM Tr.on Information Systems v.12 no.3 Example-based Mapping Method for Text Categorization and Retrieval Y.Yang;C.G.Chute
  12. SIGIR'95 Cluster-based Text Categorization : A Comparison of Catagori Search Strategies M.Iwayama;T.Tokunaga
  13. 5회 한글 및 한국어 정보처리학술대회 논문집 자동키워드제작기 시스템 설계 이창열;강현규;장호욱;박세영
  14. 제7회 한글 및 한국어 정보처리 학술대회 논문집 구문분석과 공기정보를 이요한 개념기반 명사구 색인방법 이현아;이종혁;이근배
  15. 대한민국국어정보베이스 한국어 품사부착 코퍼스 한국과학기술원
  16. ETRIKEMONG SET 한국전자통신연구원 자연어처리연구실
  17. Concept Dictionary EDR Technical Report
  18. 명사의미 분류표 한국과학기술원
  19. 영한기계번역에서의 전치사구 의미해석 강원석
  20. 1995년도 한국정보과학회 가을학술발표대회논문집 v.22 no.2 주제와 키워드의 밀접성 정보에 의한 문서자동 분류시스템 설계 및 구현 최동시;정경택
  21. 단일문서내에서의 언어 및 통계정보를 이용한 자동색인 정진성
  22. Representation and Learning in Information Retrieval D.D.Lewis
  23. SIGIR'96 Combining Classifiers in Text Categorization L.S.Larkey;W.B.Croft
  24. 1992년도 한국정보과학회 추계학술대회 논문집 v.19 no.2 한국어 특성을 이용한 자동색인기법 김민정;권혁철
  25. Tabular Parsing 방법과 접속정보를 이용한 형태소 분석기 김성용