[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.1633/JIM.2011.42.4.137

Headword Finding System Using Document Expansion

Kim, Jae-Hoon (Division of IT Engineering, Korea Maritime University)
Kim, Hyung-Chul (DMC R&D Center, Samsung Electronics Co. Ltd.)

Publication Information

Journal of Information Management / v.42, no.4, 2011 , pp. 137-154 More about this Journal

Abstract

A headword finding system is defined as an information retrieval system using a word gloss as a query. We use the gloss as a document in order to implement such a system. Generally the gloss is very short in length and then makes very difficult to find the most proper headword for a given query. To alleviate this problem, we expand the document using the concept of query expansion in information retrieval. In this paper, we use 2 document expansion methods : gloss expansion and similar word expansion. The former is the process of inserting glosses of words, which include in the document, into a seed document. The latter is also the process of inserting similar words into a seed document. We use a featureless clustering algorithm for getting the similar words. The performance (r-inclusion rate) amounts to almost 100% when the queries are word glosses and r is 16, and to 66.9% when the queries are written in person by users. Through several experiments, we have observed that the document expansions are very useful for the headword finding system. In the future, new measures including the r-inclusion rate of our proposed measure are required for performance evaluation of headword finding systems and new evaluation sets are also needed for objective assessment.

Keywords

Featureless Clustering; Headword Finding; Document Expansion; Information Finding;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	강현규, 박세영. 1988. 정보 검색. 정보처리, 5(5): 37-47.
2	국립국어원. 2007. 21세기 세종계획 최종 성과 발표회 자료집. 문화관광부․국립국어원.
3	박은진, 김재훈, 옥철영. 2005. 자질 확장에 따른 용어 클러스터링의 성능 향상. 한국정보과학회 제32회 추계학술발표회 논문집, 32(2): 529-531. 과학기술학회마을
4	Andrews, N. and E. Fox. 2007. Recent Developments in Document Clustering, Technical Report TR-07-35, Computer Science, Virginia Tech.
5	Baeza-Yates, R. and B. Ribeiro-Neto. 1999. Modern Information Retrieval, Addison Wesley.
6	Bilotti, M. W. and E. Nyberg. 2008. "Improving Text Retrieval Precision and Answer Accuracy in Question Answering Systems." Proceedings of the ACL 2nd Workshop on Information Retrieval for Question Answering, pp.1-8.
7	Cilibrasi, R. L. and P. M. B. Vitanyi. 2007. "The Google Similarity Distance." IEEE Transactions on Knowledge and Data Engineering, 19(3): 370-383. DOI ScienceOn
8	Wong, W., W. Liu, and M. Bennamoun. 2009. "Featureless Data Clustering." Handbook of Research on Text and Web Mining Technologies, 141-164.
9	German, D. J. 2000. "Basic Concepts in Child Word Finding." In German, D. J. Test of Word Finding-Second Edition, Examiners Manual. p.1-15. Austin.
10	CRFPP. 2011. .
11	Jain, A., M. Murty, and P. Flynn. "Data Clustering: A Review." ACM Computing Surveys, 31(3): 264-323.
12	Handl, J., J. Knowles, and M. Dorigo. 2003. Ant-based Clustering: A Comparative Study of its Relative Performance with Respect to K-means, Average Link and 1D-som, Technical Report TR/IRIDIA/ 2003-24. IRIDIA, Universite Libre de Bruxelles, Belgium.
13	Hartigan, J. A. and M. A. Wong. 1979. "Algorithm AS 136: A K-Means Clustering Algorithm." Journal of the Royal Statistical Society, 28(1): 100-108.
14	Hodge1, V. and J. Austin. 2002. "Hierarchical Word Clustering-Automatic Thesaurus Generation." Neurocomputing, 48: 819-846. DOI ScienceOn
15	Manning, C. D. and H. Schutze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press.
16	Voorhees, E. M. 1999. "The TREC-8 Question Answering Track Report." Proceedings of the 8th Text Retrieval Conference, 77-82.
17	Wise, R., F. Chollet, U. Hadar, K. Friston, E. Hoffner, and R. Frackowiak. 1991. "Distribution of Cortical Neural Networks Involved in Word Comprehension and Word Retrieval." Brain, 114(4): 1803-1817. DOI ScienceOn
18	Wong, W., W. Liu, and M. Bennamoun. 2006. "Terms Clustering Using Tree-traversing Ants and Featureless Similarities." Proceedings of the International Symposium on Practical Cognitive Agents and Robots.
19	Wong, W., W. Liu, and M. Bennamoun. 2007. "Tree-Traversing Ant Algorithm for Term Clustering Based on Featureless Similarities." Data Mining Knowledge Discovery, 15: 349-381. DOI

KSCI

Headword Finding System Using Document Expansion 문서 확장을 이용한 표제어 검색시스템

Headword Finding System Using Document Expansion