[KSCI] Korea Science Citation Index Service

An Extraction Algorithm of Compound Field-associated Terms for Korean Document Classifications

Lee, Samuel Sang-kon (전주대학교 정보기술공학부)

Publication Information

Journal of KIISE:Software and Applications / v.32, no.7, 2005 , pp. 636-649 More about this Journal

Abstract

Field-associated Terms itself have field Information. So, they determine field of document just like when human being perceives field. In case of Korean, we organized and experimented them by collecting approximately IS,999 document banks that are classified into 180 fields. We obtained high precision of extraction that 88,782 single field-associated terms are contracted into 8,405 ones thus recording compression rate as approximately 9 $\%$ and recall as above 0.77 (average 0.85), precision as above 0.90 (average 0.94). By applying established field-associated terms to initial determination for document classification and comparing it with filed determination by human being, we got correct answers above approximately 90 $\%$ . We can use results of research as fundamental research for initial stage and apply it document retrieval between multilingual environment thus utilizing it as fundamental research for multilingual information retrieval.

Keywords

Compound Field-associated Term(FT); Stability Rank; Inheritance Rank; Passage Retrieval; Information Extraction; Document Classification; Information Retrieval;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	이상곤, '분야연상어를 이용한 화제분야의 계산방법과 단락검색', 정보처리학회논문지(B), 제 12권, 제 1호, pp. 57-68, 2005 과학기술학회마을 DOI
2	Tsuji, T., Nigazawa, H., Okada, M., & Aoe, J., 'Early Field Recognition by Using Field Association Words,' Paper Presented at the Proceedings of the 18th International Conference on Computer Processing of Oriental Language (ICCPOL '99), 1999
3	Yoshitaka Hayashi et aI., 'Efficient Method for Extracting Keywords of Compound Words Using Pattern Matching Machines,' Transactions of Information Processing Society of Japan, Vol. 38, No.4, pp. 815-825, 1997 (in Japanese)
4	남영신, 우리말 분류 사전, 성안당, 2001
5	이상곤, '분야연상어를 이용한 화제의 계속성과 전환성을 추적하는 단락분할 방법', 정보처리학회눈문지 (B), 제 10권, 제 1호, pp. 57-66, 2003
6	이상곤, 이완권, '분양연상어의 수집과 추출 알고리즘', 정보처리학회논문지(B), 제 10권, 제 3호, pp. 347-358, 2003
7	Naoyuki Nomura, 'ConceptBase- A NL -based IT Solution Core,' Proceedings of the 1999, the 18th International Conference on Computer Processing of Oriental Language (ICCPOL '99), p. 235, 1999
8	Norbert Fuhr, 'Models for Retrieval with Probabilistic Indexing,' Information Processing & Management, Vol. 25, No.1, pp. 55-72, 1989 DOI ScienceOn
9	Salton, G. and McGill, M. J., 'Introduction of Modem Information Retrieval,' McGraw-Hill Book Company, 1983
10	Salton, G., 'Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer,' Addison-Wesley Publishing Company, 1989
11	Tokunaga, T. and Iwayarna, M., 'Text Categorization based on Weighted Inverse Document Frequency,' Natural Language Processing, Vol. 100, No.5, 1994. (in Japanese)
12	Fumiyo Fukumoto et aI., 'Automatic Clustering of Articles Using Dictionary Definition,' Transactions of Information Processing Society of Japan, Vol. 37, No. 10, pp. 1789-1799, 1996 (in Japanese)
13	M. J. Blosseville et aI., 'Automatic Document Classification: Natural Language Processing, Statistical Analysis, and Expert System Techniques Used Together,' Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '92), pp. 51-58, 1992 DOI
14	Masami Hara et aI., 'Keyword Extraction Using a Text Format and Word Importance in a Specific Field,' Transactions of Information Processing Society of Japan, Vol. 38, No.2, pp. 299-309, 1997 (in Japanese)
15	Mochizuki, H., Makoto, I., and Okumura, M. 'Passage-Level Document Retrieval Using Lexical Chains,' Journal of Natural Language Processing, Vol. 6, No.3, pp. 101-126, 1999 (in Japanese) DOI
16	Edwin Williams, On the Notions 'Lexically Related and Head of a Word,' Linguistic Inquiry, Vol. 12, No.2, pp. 245-274, 1981

KSCI

An Extraction Algorithm of Compound Field-associated Terms for Korean Document Classifications 한글문서 분류용으로 이용할 복합어로 구성된 분야연상어의 추출법

An Extraction Algorithm of Compound Field-associated Terms for Korean Document Classifications