Browse > Article
http://dx.doi.org/10.3745/JIPS.04.0039

A Method of Chinese and Thai Cross-Lingual Query Expansion Based on Comparable Corpus  

Tang, Peili (Intelligent Information Processing Laboratory of Yunnan Province, Key Laboratory in Regional University of Yunnan Province, Kunming University of Science and Technology)
Zhao, Jing (National University of Defense Technology)
Yu, Zhengtao (Intelligent Information Processing Laboratory of Yunnan Province, Key Laboratory in Regional University of Yunnan Province, Kunming University of Science and Technology)
Wang, Zhuo (Intelligent Information Processing Laboratory of Yunnan Province, Key Laboratory in Regional University of Yunnan Province, Kunming University of Science and Technology)
Xian, Yantuan (Intelligent Information Processing Laboratory of Yunnan Province, Key Laboratory in Regional University of Yunnan Province, Kunming University of Science and Technology)
Publication Information
Journal of Information Processing Systems / v.13, no.4, 2017 , pp. 805-817 More about this Journal
Abstract
Cross-lingual query expansion is usually based on the relationship among monolingual words. Bilingual comparable corpus contains relationships among bilingual words. Therefore, this paper proposes a method based on these relationships to conduct query expansion. First, the word vectors which characterize the bilingual words are trained using Chinese and Thai bilingual comparable corpus. Then, the correlation between Chinese query words and Thai words are computed based on these word vectors, followed with selecting the Thai candidate expansion terms via the correlative value. Then, multi-group Thai query expansion sentences are built by the Thai candidate expansion words based on Chinese query sentence. Finally, we can get the optimal sentence using the Chinese and Thai query expansion method, and perform the Thai query expansion. Experiment results show that the cross-lingual query expansion method we proposed can effectively improve the accuracy of Chinese and Thai cross-language information retrieval.
Keywords
Comparable Corpus; Cross-Language Query Expansion; Cross-Language Information Retrieval; Words Relationship;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Gao and J. Y. Nie, "Towards concept-based translation models using search logs for query expansion," in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, 2012.
2 J. Arguello, J. L. Elsas, J. Callan, and J. S. Carbonell, "Document representation and query expansion models for blog recommendation," in Proceedings of the 2nd International Conference on Weblogs and Social Media (ICWSM), Seattle, WA, 2008.
3 A. Samoilenko, F. Karimi, D. Edler, J. Kunegis, and M. Strohmaier, "Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity," EPJ Data Science, vol. 5, no. 1, article no. 9, 2016.
4 A. Kotov and C. Zhai, "Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining, Seattle, WA, 2012, pp. 403-412.
5 M. Lu, X. Sun, S. Wang, D. Lo, and Y. Duan, "Query expansion via WordNet for effective code search," in Proceedings of 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, Canada, 2015, pp. 545-549.
6 V. Klyuev and A. Yokoyama, "Web query expansion: a strategy utilising Japanese WordNet," JoC, vol. 1, no. 1, pp. 23-28, 2010.
7 C. Xiong and J. Callan, "Query expansion with freebase," in Proceedings of the 2015 International Conference on the Theory of Information Retrieval, Northampton, MA, 2015, pp. 111-120.
8 M. de Boer, K. Schutte, and W. Kraaij, "Knowledge based query expansion in complex multimedia event detection," Multimedia Tools and Applications, vol. 75, no. 15, pp. 9025-9043, 2016.   DOI
9 F. Colace, M. De Santo, L. Greco, and P. Napoletano, "Improving relevance feedback‐based query expansion by the use of a weighted word pairs approach," Journal of the Association for Information Science and Technology, vol. 66, no. 11, pp. 2223-2234, 2015.   DOI
10 A. Sordoni, Y. Bengio, and J. Y. Nie, "Learning concept embeddings for query expansion by quantum entropy minimization," in Proceedings of the 28th AAAI Conference on Artificial Intelligence, Quebec, Canada, 2014, pp. 1586-1592.
11 H. B. Hashemi and A. Shakery, "Mining a Persian-English comparable corpus for cross-language information retrieval," Information Processing & Management, vol. 50, no. 2, pp. 384-398, 2014.   DOI
12 J. Bhogal, A. MacFarlane, and P. Smith, "A review of ontology based query expansion," Information Processing & Management, vol. 43, no. 4, pp. 866-886, 2007.   DOI
13 D. Roy, D. Paul, M. Mitra, and U. Garain, "Using word embeddings for automatic query expansion," 2016 [Online]. Available: https://arxiv.org/abs/1606.07608.
14 R. Vaidyanathan, S. Das, and N. Srivastava, "Query expansion strategy based on pseudo relevance feedback and term weight scheme for monolingual retrieval," International Journal of Computer Applications, vol. 105, no. 8, pp. 1-6, 2014.
15 J. Singh and A. Sharan, "Context window based co-occurrence approach for improving feedback based query expansion in information retrieval," International Journal of Information Retrieval Research (IJIRR), vol. 5, no. 4, pp. 31-45, 2015.   DOI
16 L. Quesada, F. Berzal, and F. J. Cortijo, "A lexical analysis tool with ambiguity support," 2012 [Online]. Available: https://arxiv.org/abs/1202.6583.
17 R. Rahimi, A. Shakery, and I. King, "Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework," Information Processing & Management, vol. 52, no. 2, pp. 299-318, 2016.   DOI
18 A. Shakery and C. Zhai, "Leveraging comparable corpora for cross-lingual information retrieval in resourcelean language pairs," Information Retrieval, vol. 16, no. 1, pp. 1-29, 2013.   DOI
19 J. Yu, C. Li, W. Hong, S. Li, and D. Mei, "A new approach of rules extraction for word sense disambiguation by features of attributes," Applied Soft Computing, vol. 27, pp. 411-419, 2015.   DOI
20 J. Lafferty and C. Zhai, "Document language models, query models, and risk minimization for information retrieval," in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 2001, pp. 111-119.
21 S. Kim, Y. Ko, and D. W. Oard, "Combining lexical and statistical translation evidence for cross‐language information retrieval," Journal of the Association for Information Science and Technology, vol. 66, no. 1, pp. 23-39, 2015.   DOI
22 P. Sorg and P. Cimiano, "Exploiting Wikipedia for cross-lingual and multilingual information retrieval," Data & Knowledge Engineering, vol. 74, pp. 26-45, 2012.   DOI
23 H. E. Ping and L. I. Fan, "Design and implementation of a Lucene-based full-text retrieval management system," Journal of Yangtze University (Natural Science Edition), vol. 2014, no. 22, pp. 35-38, 2014.
24 C. Carpineto and G. Romano, "A survey of automatic query expansion in information retrieval," ACM Computing Surveys (CSUR), vol. 44, no. 1, article no. 1, 2012.
25 T. Talvensaari, J. Laurikkala, K. Jarvelin, M. Juhola, and H. Keskustalo, "Creating and exploiting a comparable corpus in cross-language information retrieval," ACM Transactions on Information Systems (TOIS), vol. 25, no. 1, article no. 4, 2007.
26 T. Talvensaari, A. Pirkola, K. Jarvelin, M. Juhola, and J. Laurikkala, "Focused web crawling in the acquisition of comparable corpora," Information Retrieval, vol. 11, no. 5, pp. 427-445, 2008.   DOI
27 O. Batarfi, R. Elshawi, A. Fayoumi, A. Barnawi, and S. Sakr, "A distributed query execution engine of big attributed graphs," SpringerPlus, vol. 5, no. 1, article no. 665, 2016.
28 H. U. Weiping and R. Wang, "Vehicle license plate recognition method based on threshold segmentation and region growing," Journal of Guangxi Academy of Sciences, vol. 2016, no. 1, pp. 54-58.2016.
29 J. Dalton, L. Dietz, and J. Allan, "Entity query feature expansion using knowledge base links," in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Gold Coast, Australia, 2014, pp. 365-374.
30 P. Gupta, K. Bali, R. E. Banchs, M. Choudhury, and P. Rosso, "Query expansion for mixed-script information retrieval," in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Gold Coast, Australia, 2014, pp. 677-686.
31 J. Gao, X. He, and J. Y. Nie, "Clickthrough-based translation models for web search: from word models to phrase models," in Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, Canada, 2010, pp. 1139-1148.
32 L. Li and H. Wang, "Multi-strategy query expansion method based on semantics," Journal of Digital Information Management, vol. 12, no. 3, pp. 183-191, 2014.