Browse > Article
http://dx.doi.org/10.16981/kliss.52.1.202103.79

The MeSH-Term Query Expansion Models using LDA Topic Models in Health Information Retrieval  

You, Sukjin (Information studies, University of Wisconsin-Milwaukee)
Publication Information
Journal of Korean Library and Information Science Society / v.52, no.1, 2021 , pp. 79-108 More about this Journal
Abstract
Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), found in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query. LDA topic words were filtered by threshold values of topic probability (TP) and word probability (WP). Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result.
Keywords
MeSH; LDA; Topic Model; Information Retrieval; Query Expansion;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Azad, H. K. & Deepak, A. (2019). Query expansion techniques for information retrieval: a survey. Information Processing & Management, 56(5), 1698-1735.   DOI
2 Beaulieu, M., Gatford, M., Huang, X., Robertson, S., Walker, S., & Williams, P. (1997). Okapi at TREC-5. Nist Special Publication SP, 143-166.
3 Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022.
4 Bompada, T., Chang, C. C., Chen, J., Kumar, R., & Shenoy, R. (2007, July). On the robustness of relevance measures with incomplete judgments. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 359-366.
5 Buckley, C., & Voorhees, E. M. (2004, July). Retrieval evaluation with incomplete information. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 25-32.
6 Carpineto, C. & Romano, G. (2012). A survey of automatic query expansion in information retrieval. Acm Computing Surveys (CSUR), 44(1), 1-50.   DOI
7 Chang, Y., Ounis, I., & Kim, M. (2006). Query reformulation using automatically generated query concepts from a document space. Information Processing & Management, 42(2), 453-468.   DOI
8 Harris, Z. S. (1954). Distributional structure. Word, 10(2/3), 146-62.   DOI
9 Diaz-Galiano, M. C., Garcia-Cumbreras, M. A., Martin-Valdivia, M. T., Montejo-Raez, A., & Urena-Lopez, L. A. (2007, September). Integrating mesh ontology to improve medical information retrieval. In Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, Berlin, Heidelberg, 601-606.
10 Efthimiadis, E. N. (1996). Query expansion. Annual Review of Information Science and Technology (ARIST), 31, 121-87.
11 Hofmann, T. (1999, August). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 50-57. ACM.
12 Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In Advances in Neural Information Processing Systems, 856-864.
13 Jian, F., Huang, J. X., Zhao, J., He, T., & Hu, P. (2016, July). A simple enhancement for ad-hoc information retrieval via topic modelling. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 733-736.
14 Liu, H. & Singh, P. (2004). ConceptNet-a practical commonsense reasoning tool-kit. BT Technology Journal, 22(4), 211-226.   DOI
15 Mu, X., Lu, K., & Ryu, H. (2014). Explicitly integrating MeSH thesaurus help into health information retrieval systems: an empirical user study. Information Processing & Management, 50(1), 24-40.   DOI
16 Lu, Z., Kim, W., & Wilbur, W. J. (2009). Evaluation of query expansion using MeSH in PubMed. Information Retrieval, 12(1), 69-80.   DOI
17 Lupu, M., Zhao, J., Huang, J., Gurulingappa, H., Fluck, J., Zimmermann, M., ... & Tait, J. (2011, November). Overview of the TREC 2011 Chemical IR Track. In TREC.
18 Merabti, T., Letord, C., Abdoune, H., Lecroq, T., Joubert, M., & Darmoni, S. J. (2009). Projection and inheritance of SNOMED CT relations between MeSH terms. In MIE, 233-237.
19 Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41.   DOI
20 Mitchell, P. C. (1973). A note about the proximity operators in information retrieval. ACM SIGPLAN Notices, 10(1), 177-180.   DOI
21 Munro, R. J., Bolanos, J. A., & May, J. (1978). LEXIS vs. WESTLAW: an analysis of automated education. Law Libr. J., 71.
22 Natsev, A., Haubold, A., Tesic, J., Xie, L., & Yan, R. (2007, September). Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th ACM International Conference on Multimedia, 991-1000.
23 Paik, J. H. (2013, July). A novel TF-IDF weighting scheme for effective ranking. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 343-352.
24 Roberts, K., Simpson, M. S., Voorhees, E. M., & Hersh, W. R. (2015, November). Overview of the TREC 2015 Clinical Decision Support Track. In TREC.
25 Roberts, K., Demner-Fushman, D., Voorhees, E. M., & Hersh, W. R. (2016, November). Overview of the TREC 2016 Clinical Decision Support Track. In TREC.
26 Wang, Y., Huang, H., & Feng, C. (2017, April). Query expansion based on a feedback concept model for microblog retrieval. In Proceedings of the 26th International Conference on World Wide Web, 559-568
27 Roberts, K., Demner-Fushman, D., Voorhees, E. M., Hersh, W. R., Bedrick, S., Lazar, A. J., & Pant, S. (2017, November). Overview of the TREC 2017 Precision Medicine Track. In TREC.
28 Schutze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.
29 Voorhees, E. M. (2014, July). The effect of sampling strategy on inferred measures. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, 1119-1122.
30 Xu, J. & Croft, W. B. (2017, August). Quary expansion using local and global document analysis. In Acm Sigir Forum. New York, NY, USA: ACM, 51(2), 168-175.
31 Yanagawa, A., Chang, S. F., Kennedy, L., & Hsu, W. (2007). Columbia university's baseline detectors for 374 lscom semantic visual concepts. Columbia University ADVENT Technical Report, 222-2006.
32 Yilmaz, E. & Aslam, J. A. (2006, November). Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, 102-111. ACM.
33 Yilmaz, E., Kanoulas, E., & Aslam, J. A. (2008, July). A simple and efficient sampling method for estimating AP and NDCG. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 603-610. ACM.
34 Zeng, Q. T., Redd, D., Rindflesch, T., & Nebeker, J. (2012). Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In AMIA Annual Symposium Proceedings. American Medical Informatics Association, 2012, 1050.
35 Zhou, D., Wu, X., Zhao, W., Lawless, S., & Liu, J. (2017). Query expansion with enriched user profiles for personalized search utilizing folksonomy data. IEEE Transactions on Knowledge and Data Engineering, 29(7), 1536-1548.   DOI
36 Zhai, C. & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2), 179-214.   DOI