Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2004.11B.2.233

Performance Improvement by Cluster Analysis in Korean-English and Japanese-English Cross-Language Information Retrieval  

Lee, Kyung-Soon (전북대학교 전자정보공학부)
Abstract
This paper presents a method to implicitly resolve ambiguities using dynamic incremental clustering in Korean-to-English and Japanese-to-English cross-language information retrieval (CLIR). The main objective of this paper shows that document clusters can effectively resolve the ambiguities tremendously increased in translated queries as well as take into account the context of all the terms in a document. In the framework we propose, a query in Korean/Japanese is first translated into English by looking up bilingual dictionaries, then documents are retrieved for the translated query terms based on the vector space retrieval model or the probabilistic retrieval model. For the top-ranked retrieved documents, query-oriented document clusters are incrementally created and the weight of each retrieved document is re-calculated by using the clusters. In the experiment based on TREC test collection, our method achieved 39.41% and 36.79% improvement for translated queries without ambiguity resolution in Korean-to-English CLIR, and 17.89% and 30.46% improvements in Japanese-to-English CLIR, on the vector space retrieval and on the probabilistic retrieval, respectively. Our method achieved 12.30% improvements for all translation queries, compared with blind feedback in Korean-to-English CLIR. These results indicate that cluster analysis help to resolve ambiguity.
Keywords
Implicit Ambiguity Resolution; Cross-Language Information Retrieval; Incremental Clustering; Document Contex; Document Re-rank;
Citations & Related Records
연도 인용수 순위
  • Reference
1 천정훈, 한영 교차언어 정보검색 시스템에서 질의어의 모호성 해소와 병렬 코퍼스를 이용한 질의어 보완, 한국과학기술원 전자전산학과 석사학위논문, 2000
2 Anick, P. G. and Vaithyanathan, S. Exploiting Clustering and Phrases for Context-Based Information Retrieval, In Proc. of 20th ACM SIGIR Conference, 1997   DOI
3 Ballesteros, L. and Croft, W. B. Resolving Ambiguity for Cross-language Retrieval. In proc. of 21rd ACM SIGMR Conference, 1998   DOI
4 Hearst, M. A. and Pedersen, J. O. Reexamining the Cluster Hypothesis : Scatter/Gather on Retrieval Results, In Proc. of 19th ACM SIGIR Conference, 1996   DOI
5 Church, K. W. and Hanks, P. Word Association Norms Mutual Information and Lexicography, Computational Linguistics, 16(1), pp.23-29, 1990
6 Dumais, S. T., Letsche, T. A., Littman, M. L. and Landauer, T. K. Automatic cross-language retrieval using latent semantic indexing, In Proc. of AAAI Symposium on Cross-Language Text and Speech Retrieval, 1997
7 Eichmann, D., Ruiz, M. E. and Srinivasan, P. Cross-Language Information Retrieval with the UMLS Metathesaurus, In Proc. of the 21th ACM SIGIR Conference, 1998   DOI
8 Gilarranz, J., Gonzalo, J. and Verdejo, F. An Approach to Conceptual Text Retrieval Using the EuroWordNet Multilingual Semantic Database, In Proc. of AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, 1997
9 Hull, D. A. and Grefenstette, G. Querying across languages : a dictionary-based approach to multilingual information retrieval, In Proc. of the 19th ACM SIGIR Conference, 1996   DOI
10 Breen, J. EDICT Japanese/English dictionary File. The Electronic Dictionary Research and Development Group, Monash University, 2003
11 van Rijsbergen, C. J. Information Retrieval, Butterworths : London, second edition, 1979
12 Jang, M. G., Myaeng, S. H. and Park, S. H. Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting, In Proc. of the 37th Annual Meeting of the Association for Computational Linguistics, 1999   DOI
13 Kwon, O.-W., Kang, I. S., Lee, J.-H and Lee, G. B. Cross-Language Text Retrieval Based on Document Translation Using Japanese-to-Korean MT system, In Proc. of NLPRS'97, 1997
14 Lee, K. S., Park, Y. C., Choi, K. S. Re-ranking model based on document clusters, Information Processing and Management, 37(1), pp.1-14, 2001   DOI   ScienceOn
15 Paul, O. and Callan, J. Experiments Using the Lemur Toolkit, InProc. of the Tenth Text REtrieval Conference (TREC-10), 2001
16 Robertson, S. E. and Walker, S. Okapi/Keenbow at TREC-8, In Proc. of the Eighth Text REtrieval Conference (TREC-8), 1999
17 Oard, D. W. and Hackett, P. Document Translation for the Cross-Language Text Retrieval at the University of Maryland, In Proc. of the Sixth Text Retrieval Conference (TREC-6), 1997
18 Xu, J. and Croft, W. B. Query Expansion Using Local and Global Document Analysis, In Proc. of the 19th ACM SIGIR Conference, 1996   DOI
19 Salton, G. Automatic Text Processing : The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, Reading, Pennsylvania. 1989
20 Smadja, F., McKeown, K. R. and Hatzivassiloglou, V. Translating collocations for bilingual lexicons: A statistical approach, Computational Linguistics, 22(1), pp.1-38, 1996