Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2003.10B.3.347

Collection and Extraction Algorithm of Field-Associated Terms  

Lee, Sang-Kon (전주대학교 정보기술컴퓨터공학부)
Lee, Wan-Kwon (전주대학교 정보기술컴퓨터공학부)
Abstract
VSField-associated term is a single or compound word whose terms occur in any document, and which makes it possible to recognize a field of text by using common knowledge of human. For example, human recognizes the field of document such as or , a field name of text, when she encounters a word 'Pitcher' or 'election', respectively We Proposes an efficient construction method of field-associated terms (FTs) for specializing field to decide a field of text. We could fix document classification scheme from well-classified document database or corpus. Considering focus field we discuss levels and stability ranks of field-associated terms. To construct a balanced FT collection, we construct a single FTs. From the collections we could automatically construct FT's levels, and stability ranks. We propose a new extraction algorithms of FT's for document classification by using FT's concentration rate, its occurrence frequencies.
Keywords
Field-associated Term; Level of FTs; Stability Rank; Information Extraction; Document Classification; Information Retrieval;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 M. J. Blosseville et al., 'Automatic Document Classification : Natural Languge Processing, Statistical Analysis, and Expert SystemTechniques Used Together,' Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'92), pp.51-58, 1992   DOI
2 Nobert Fuhr, 'Models for Retrieval with Probabilistic Indexing,' Information Processing & Management, Vol.25, No.1, pp.55-72, 1989   DOI   ScienceOn
3 Masami Hara et al., 'Keyword Extraction Using a Text Format and World Importance in a Specific Field,' Transactions of Information Processing Society of Japan, Vol.38, No.2, pp.299-309, 1997(in japanese)
4 Salton, G., 'Automatic Text Processing : The Transformation, Analysis and Retrival of Information by Computer,' Addison-Wesley Publishing Company, 1989
5 Salton, G. and McGill, M. J., 'Introduction of Modern Information Retrieval,' McGraw-Hill Book Company, 1983
6 Tokunaga, T. and Iwayama, M., 'Text Categorization based on Weighted Inverse Document Frequency,' Natural Language Processing, Vol.100, No.5, 1994
7 Fumiyo Fukumoto et al., 'Automatic Clustering of Articles Using Dictionary Definition,' Transactions of Information Processing Society of Japan, Vol.37, No.10, pp.1789-1799, 1996(in japanese)
8 남영신, 우리말 분류 사전, 성안당, 2001
9 이상곤, '분야연상어를 이용한 화제의 계속성과 전환성을 추적하는 단락분할 방법', 정보처리학회논문지B, 제10권 제1호, pp.57-66, 2003   과학기술학회마을   DOI
10 Yoshitaka Hayashi et al., 'Efficient Method for Extracting Keywords of Compound Words Using Pattern Matching Machines,' Transactions of Information Processing Society of Japan, Vol.38, No.4, pp.815-825, 1997(in Japanese)
11 Naoyuki Nomura, 'ConceptBase-A NL-based IT Solution Core,' Proceedings of the 1999, the 18th International Conference on Computer Processing of Oriental Language(ICCPOL '99), pp.235, 1999
12 Mochizuki, H., Makoto, I. and Okumura, M. 'Passage-Level Document Retrieval Using Lexical Chains. Journal of Natural Language Processing,' Vol.6, No.3, pp.101-126, 1999(in Japanese)   DOI