DOI QR코드

DOI QR Code

A Study on the Retrieval Effectiveness of KoreaMed using MeSH Search Filter and Word-Proximity Search

검색용 MeSH 필터와 단어인접탐색 기법을 활용한 KoreaMed 검색 효율성 향상 연구

  • Jeong, So-Na (Medical Library, Catholic University of Korea) ;
  • Jeong, Ji-Na (Department of Health management, Jeonju University)
  • 정소나 (가톨릭대학교 성의교정 도서관) ;
  • 정지나 (전주대학교 보건관리학과)
  • Received : 2017.03.31
  • Accepted : 2017.05.12
  • Published : 2017.05.31

Abstract

This study examined the method for adding related to "stomach neoplasms" as filters to the Medical Subject Headings (MeSH) for search as well as a method for improving the search efficiency through a word-proximity search by measuring the distance of co-occurring terms. A total of 8,625 articles published between 2007 and 2016 with the major topic terms "stomach neoplasms" were downloaded from PubMed article titles. The vocabulary to be added to the MeSH for search were analyzed. The search efficiency was verified by 277 articles that had "Stomach Neoplasms" indexed as MEDLINE MeSH in KoreaMed. As a result, 973 terms were selected as the candidate vocabulary. "Gastric Cancer" (2,780 appearances) was the most frequent term and 7,376 compound words (88.51%) combined the histological terms of "stomach" and "neoplasm", such as "gastric adenocarcinoma" and "gastric MALT lymphoma". A total of 5,234 compounds words (70.95%), in which the co-occurring distance was two words, were found. The matching rate through the MEDLINE MeSH and KoreaMed MeSH Indexer was 209 articles (75.5%). The search efficiency improved to 263 articles (94.9%) when the search filters were added, and to 268 articles (96.7%) when the 13 word-proximity search technique of the co-occurring terms was applied. This study showed that the use of a thesaurus as a means of improving the search efficiency in a natural language search could maintain the advantages of controlled vocabulary. The search accuracy can be improved using the word-proximity search instead of a Boolean search.

의학학술문헌에는 해부학적 조직이나 기관명이 종양, 질환 또는 감염 용어들과 서로 조합하여 사용되는 언어적 특성을 가지고 있다. 의학학술문헌을 검색할 때 데이터베이스가 제공하는 통제어휘도구인 Medical Subject Headings (MeSH)를 활용하면 합성어, 동의어, 그리고 관련어를 추가로 검색할 수 있어 검색효율이 높다. 본 연구에서는 위암(Stomach Neoplasms) 어휘군을 검색용 필터로 추가하는 방법과 동시출현용어의 거리를 측정하여 단어인접탐색 기법으로 검색효율성을 향상시키는 연구를 수행하였다. 검색용 MeSH에 추가할 어휘군을 결정하기 위해 실험데이터로 PubMed에서 중심주제어가 "Stomach Neoplasms"인 2007년~2016년 논문 8,625편을 내려 받아 논문제목으로부터 Stomach와 Neoplasms 관련 용어의 동시출현여부를 분석하였다. 검색효율성은 KoreaMed에서 검색되는 MEDLINE 학술지를 대상으로 "Stomach Neoplasms"가 MeSH로 색인되어 있는 277편으로 검증하였는데 MEDLINE MeSH, MeSH on Demand, 그리고 KoreaMed MeSH Indexer의 "Stomach Neoplasms" 색인어 추출여부와 검색용 필터로 어휘군을 적용했을 때, 그리고 동시출현 용어의 단어인접검색 기법을 적용했을 때 "Stomach Neoplasms"의 매칭여부를 비교하였다. 가장 출현빈도가 높은 용어는 "Gastric Cancer"로 2,780회 출현하였다. "Gastric Adenocarcinoma", "Gastric MALT Lymphoma" 등과 같이 "Stomach" 용어와 "Neoplasms" 관련 조직학적 용어가 조합된 경우는 7,376개(88.51%)였다. 동시출현 거리가 2단어인 용어는 "Stomach"와 "Neoplasms"의 합성어로 5,234개(70.95%)였다. 연구 결과 MeSH용어를 제외하고 973개의 용어를 후보어휘군으로 선정하였다. MEDLINE MeSH와 KoreaMed MeSH Indexer의 MeSH 매칭률은 209편(75.5%)이었는데 검색필터를 적용한 결과 263편(94.9%)으로, 동시출현 용어의 13단어 단어인접탐색 기법을 적용한 경우 268편(96.7%)으로 매칭률이 향상되었다. 본 연구를 통해 자연어 검색에 있어서 검색효율을 향상시키는 수단으로 검색용 시소러스를 사용하면 색인비용에 대한 부담이 적고, 통제어의 망라적 장점과 자연어가 가지는 용어의 특정성을 유지할 수 있음을 증명하였다. 또한 불리안 검색보다는 단어인접탐색 기법을 활용하면 정확률을 높일 수 있어 검색 효율성이 향상됨을 알 수 있었다.

Keywords

References

  1. S. L. De Groote, M. Schultz, D. D. Blecic, "Information-seeking behavior and the use of online resources: a snapshot of current health sciences faculty", Journal of the Medical Library Association, vol. 102, no. 3, p. 169, 2014. DOI: https://doi.org/10.3163/1536-5050.102.3.006
  2. US National Library Medicine. Fact Sheet Bibliographic Services Division,(BSD) 2017. [cited 2017 Mar 2], Available From: https://www.nlm.nih.gov/archive/20050322/pubs/factsheet s/bsd.html.(accessed Mar., 31, 2017)
  3. S. N. Jeong, C. S. Lee, "MeSH Semi Indexing of the Korean Biomedical Literature, using NLM Medical Text Indexer", in, Korea Society for Information Management, pp. 21-28, 2010.
  4. Cochrane Library. How CENTRAL is created [cited 2017 Mar 31], Available From: http://www.cochranelibrary.com/help/central-help.html.(a ccessed Mar., 31, 2017)
  5. Cochrane Library. Cochrane Crowd [cited 2017 Mar 31], Available From: http://crowd.cochrane.org/index.html. (accessed Mar., 31, 2017)
  6. D. L. Sackett, W. M. Rosenberg, J. A. Gray, R. B. Haynes, W. S. Richardson, "Evidence based medicine: what it is and what it isn't", BMJ, vol. 312, no. 7023, pp. 71-72, 1996. DOI: https://doi.org/10.1136/bmj.312.7023.71
  7. C. S. Lee, "Medical Database Search", Journal of the Korean Medical Association, vol. 53, no. 8, pp. 668-686, 2010. DOI: https://doi.org/10.5124/jkma.2010.53.8.668
  8. M. Macedo-Rouet, J. F. Rouet, C. Ros, N. Vibert, "How do scientists select articles in the PubMed database? An empirical study of criteria and strategies", Revue Europeenne de Psychologie Appliquee/European Review of Applied Psychology, vol. 62, no. 2, pp. 63-72, 2012. DOI: https://doi.org/10.1016/j.erap.2012.01.003
  9. N. Baumann, "How to use the medical subject headings (MeSH)", International Journal of Clinical Practice, vol. 70, no. 2, pp. 171-174, 2016. DOI: https://doi.org/10.1111/ijcp.12767
  10. Korean Statistical Information System National Statistical Office. Cancer occurrence and death status. 2017 [cited 2017 Mar 2], Available From: http://kosis.nso.go.kr.(accessed Mar. 31, 2017)
  11. US National Library of Medicine. Medical Subject Headings 2017. Available From: https://meshb.nlm.nih.gov/#/fieldSearch. (accessed Mar., 31, 2017)
  12. A. Fritz, C. Percy, A. Jack, K. Shanmugaratnam, L. Sobin, D. M. Parkin, S. Whelan, International classification of diseases for oncology, World Health Organization, 2000.
  13. US National Library Medicine. Search Strategy Used to Create the Cancer Subset on PubMed. 2017 [cited 2017 Mar 2], Available From: https://www.nlm.nih.gov/bsd/pubmed_subsets/cancer_strategy.html.(accessed Mar., 31, 2017)
  14. C. C. Compton, D. R. Byrd, J. Garcia-Aguilar, S. H. Kurtzman, A. Olawaiye, M. K. Washington, "AJCC cancer staging atlas", pp. 143-153, Springer, New York, 2012. DOI: https://doi.org/10.1007/978-1-4614-2080-4
  15. US National Library of Medicine. MeSH on Demand. Available From: https://www.nlm.nih.gov/mesh/MeSHonDemand.html.(accessed Mar., 31, 2017)
  16. D. R. Swanson, N. R. Smalheiser, V. I. Torvik, "Ranking indirect connections in literature‐based discovery: The role of medical subject headings," Journal of the American Society for Information Science and Technology, vol. 57, no. 11, pp. 1427-1439, 2006. DOI: https://doi.org/10.1002/asi.20438
  17. S. Y. Bong, K. B. Hwang, "A Method for Author Keyphrase Recommendation for Bioinformatics Papers Using Assigned MeSH Terms", The HCI Society of Korea, pp. 236-238, 2011.
  18. J. G. Mork, A. J. Jimeno-Yepes, A. R. Aronson, "The NLM Medical Text Indexer System for Indexing Biomedical Literature", in BioASQ@ CLEF, 2013.
  19. A. Jimeno-Yepes, J. G. Mork, D. Demner-Fushman, A. R. Aronson, "A one-size-fits-all indexing method does not exist: automatic selection based on meta-learning", Journal of Computing Science and Engineering, vol. 6, no. 2, pp. 151-160, 2012. DOI: https://doi.org/10.5626/JCSE.2012.6.2.151
  20. ICHUSI Web. 2017 [cited 2017 Mar 2], : Available From http://www.jamas.or.jp/index.html. (accessed Mar. 31, 2017)
  21. US National Library Medicine. How can I become an indexer? 2017 [cited 2017 Mar 2], Available From: https://www.nlm.nih.gov/bsd/indexfaq.html#translator.(accessed Mar., 31, 2017)
  22. G. S. Go, W. K. Jung, Y. G. Shin, S. S. Park, "A Study on development of patent information retrieval using textmining", Journal of the Korean Academia-Industrial cooperation Society, vol. 12, no. 8, pp. 3677-3688, 2011. DOI: http://doi.org/10.5762/KAIS.2011.12.8.3677
  23. US National Library of Medicine. Unified Medical Language System (UMLS). Available From: https://www.nlm.nih.gov/research/umls/index.html. (accessed Mar., 31, 2017)