Browse > Article

An Extraction Algorithm of Compound Field-associated Terms for Korean Document Classifications  

Lee, Samuel Sang-kon (전주대학교 정보기술공학부)
Abstract
Field-associated Terms itself have field Information. So, they determine field of document just like when human being perceives field. In case of Korean, we organized and experimented them by collecting approximately IS,999 document banks that are classified into 180 fields. We obtained high precision of extraction that 88,782 single field-associated terms are contracted into 8,405 ones thus recording compression rate as approximately 9$\%$ and recall as above 0.77 (average 0.85), precision as above 0.90 (average 0.94). By applying established field-associated terms to initial determination for document classification and comparing it with filed determination by human being, we got correct answers above approximately 90$\%$. We can use results of research as fundamental research for initial stage and apply it document retrieval between multilingual environment thus utilizing it as fundamental research for multilingual information retrieval.
Keywords
Compound Field-associated Term(FT); Stability Rank; Inheritance Rank; Passage Retrieval; Information Extraction; Document Classification; Information Retrieval;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 이상곤, '분야연상어를 이용한 화제분야의 계산방법과 단락검색', 정보처리학회논문지(B), 제 12권, 제 1호, pp. 57-68, 2005   과학기술학회마을   DOI
2 Tsuji, T., Nigazawa, H., Okada, M., & Aoe, J., 'Early Field Recognition by Using Field Association Words,' Paper Presented at the Proceedings of the 18th International Conference on Computer Processing of Oriental Language (ICCPOL '99), 1999
3 Yoshitaka Hayashi et aI., 'Efficient Method for Extracting Keywords of Compound Words Using Pattern Matching Machines,' Transactions of Information Processing Society of Japan, Vol. 38, No.4, pp. 815-825, 1997 (in Japanese)
4 남영신, 우리말 분류 사전, 성안당, 2001
5 이상곤, '분야연상어를 이용한 화제의 계속성과 전환성을 추적하는 단락분할 방법', 정보처리학회눈문지 (B), 제 10권, 제 1호, pp. 57-66, 2003
6 이상곤, 이완권, '분양연상어의 수집과 추출 알고리즘', 정보처리학회논문지(B), 제 10권, 제 3호, pp. 347-358, 2003
7 Naoyuki Nomura, 'ConceptBase- A NL -based IT Solution Core,' Proceedings of the 1999, the 18th International Conference on Computer Processing of Oriental Language (ICCPOL '99), p. 235, 1999
8 Norbert Fuhr, 'Models for Retrieval with Probabilistic Indexing,' Information Processing & Management, Vol. 25, No.1, pp. 55-72, 1989   DOI   ScienceOn
9 Salton, G. and McGill, M. J., 'Introduction of Modem Information Retrieval,' McGraw-Hill Book Company, 1983
10 Salton, G., 'Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer,' Addison-Wesley Publishing Company, 1989
11 Tokunaga, T. and Iwayarna, M., 'Text Categorization based on Weighted Inverse Document Frequency,' Natural Language Processing, Vol. 100, No.5, 1994. (in Japanese)
12 Fumiyo Fukumoto et aI., 'Automatic Clustering of Articles Using Dictionary Definition,' Transactions of Information Processing Society of Japan, Vol. 37, No. 10, pp. 1789-1799, 1996 (in Japanese)
13 M. J. Blosseville et aI., 'Automatic Document Classification: Natural Language Processing, Statistical Analysis, and Expert System Techniques Used Together,' Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '92), pp. 51-58, 1992   DOI
14 Masami Hara et aI., 'Keyword Extraction Using a Text Format and Word Importance in a Specific Field,' Transactions of Information Processing Society of Japan, Vol. 38, No.2, pp. 299-309, 1997 (in Japanese)
15 Mochizuki, H., Makoto, I., and Okumura, M. 'Passage-Level Document Retrieval Using Lexical Chains,' Journal of Natural Language Processing, Vol. 6, No.3, pp. 101-126, 1999 (in Japanese)   DOI
16 Edwin Williams, On the Notions 'Lexically Related and Head of a Word,' Linguistic Inquiry, Vol. 12, No.2, pp. 245-274, 1981