DOI QR코드

DOI QR Code

Analysis of Research Trends in Data Curation Using Text Mining Techniques

텍스트 마이닝을 활용한 국외 데이터 큐레이션 연구 동향 분석

  • 최재은 (이화여자대학교 문헌정보학과)
  • Received : 2024.08.14
  • Accepted : 2024.09.03
  • Published : 2024.09.30

Abstract

This study analyzes trends in data curation research. A total of 1,849 scholarly records were extracted from Scopus and WoS, with 1,797 papers selected after removing duplicates. Titles, keywords, and abstracts were analyzed through keyword frequency analysis, LDA topic modeling, and network analysis. Frequent keywords like 'research' and 'information' suggest that data curation is widely applied in medical research, biomedical research, data management, and infrastructure. LDA modeling identified five main topics: improving medical data quality, enhancing big data management, managing scientific data and repositories, annotating and modeling medical data, and gene/protein database research. Network analysis showed that 'analysis' was central in global discussions, while 'gene' and 'system' were locally central. These findings highlight the importance of data curation in various research areas.

본 연구의 목적은 국외 데이터 큐레이션 연구 동향을 분석하는 것이다. 이를 위해 Scopus와 WoS에서 1,849건의 학술 정보를 추출하였으며 중복 제거 등을 통해 최종 1,797건의 논문, 학술대회 발표자료 등의 표제, 키워드, 초록을 분석 대상으로 하였다. 전처리를 거친 키워드를 빈도분석 하였으며, LDA 토픽 모델링 분석을 통해 주요 주제를 도출하고 토픽의 키워드를 대상으로 네트워크 분석을 통해 중심성을 도출하였다. 키워드 빈도 분석 결과, 'research', 'information' 등이 자주 등장했으며, 이는 데이터 큐레이션이 의학 연구, 생의학 연구 및 연구데이터 관리, 연구 인프라 등 다양한 측면에서 이루어지고 있음을 보여준다. LDA 토픽 모델링을 통해서는 '임상 의료 데이터의 품질 제고와 분석', '빅데이터 관리와 처리 시스템의 효율성 향상', '과학 데이터의 관리와 디지털 리포지터리', '의료 및 생물학적 데이터의 주석과 모델링', '유전자 및 단백질 데이터베이스 연구' 5가지 토픽을 도출하였다. 키워드 네트워크 분석 결과, 'analysis'는 전역 중심성에서 높은 수치를 나타내 데이터 활용 측면에서 분석 방법이나 분석 시스템 등으로 폭넓게 논의되고 있음을 알 수 있었고, 지역 중심성에서는 'research', 'gene', 'system' 등이 상위에 위치한 것으로 나타났다.

Keywords

References

  1. Choi, Dong Hoon, Park, Jae Won, Kim, Byung kyu, & Shin, Jin Sup (2017). Development of collaborative environment for community-driven scientific data curation. The Journal of the Korea Contents Association, 17(9), 1-11. https://doi.org/10.5392/JKCA.2017.17.09.001 
  2. Han, Na eun (2023). Proposal of a conceptual model for research data curation based on activity theory. Journal of Korean Library and Information Science Society, 54(1), 167-190. https://doi.org/10.16981/kliss.54.1.202303.167 
  3. Han, Sang woo (2023). An analysis of domestic research trend on research data using keyword network analysis. Journal of Korean Library and Information Science Society, 54(4), 393-414. https://doi.org/10.16981/kliss.54.4.202312.393 
  4. Jeong, Sun Kyeong (2022). The study on data governance research trends based on text mining: Based on the publication of Korean academic journals from 2009 to 2021. Journal of Digital Convergence, 20(4), 133-145. https://doi.org/10.14400/JDC.2022.20.4.133 
  5. Jin, Bo Ra & Youn, You Ra (2017). A study on the guidelines for the development of data curation policy. Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, 7(6), 767-776. https://doi.org/10.35873/ajmahs.2017.7.6.072 
  6. Kim, Jin Hee, Choi, Seo Yeon, Lim, Cheol Il, & Ham, Yoon Hee (2019). Development of a data curation training program for research support for librarians. Journal of Education & Culture, 25(6), 757-779. https://doi.org/10.24159/joec.2019.25.6.757 
  7. Kim, Pan Jun (2015). An analytical study on research trends of digital curation: Focused on library and information science. Journal of the Korean Society for Information Management, 32(1), 265-295. https://doi.org/10.3743/KOSIM.2015.32.1.265 
  8. Lee, Hye Kyung & Lee, Yong Gu (2023). Intellectual structure analysis on the field of open data using co-word analysis. Journal of the Korean Society for Information Management, 40(4), 429-450. https://doi.org/10.3743/KOSIM.2023.40.4.429 
  9. Lee, Hyun Jo, Cho, Han Jin, & Chae, Choel Joo (2022). A study on Digital Agriculture Data Curation Service Plan for Digital Agriculture. Journal of the Korea Society of Computer and Information, 27(2), 171-177. https://doi.org/10.9708/jksci.2022.27.02.171 
  10. Lee, Jae Yoon (2006). Centrality measures for bibliometric network analysis. Journal of the Korean Society for Library and Information Science, 40(3), 191-214. 
  11. Lee, Jae Yoon (2013). A comparison study on the weighted network centrality measures of tnet and WNET. Journal of the Korean Society for Information Management, 30(4), 241-264. https://doi.org/10.3743/KOSIM.2013.30.4.241 
  12. Lee, Jae Yoon [n.d.]. WNET (Weighted Network Analysis): PFNet, PNNC, and Weighted Network Centralities (v.0.4.1) [Computer software]. 
  13. Lee, Je Wook (2022). Strategy to activate sports public data curation service. Sports Entertainment and Law, 25(4), 195-205. http://doi.org/10.19051/kasel.2022.25.4.195 
  14. Lee, Jung Mee (2020). A study on data curation of university libraries for improving teaching and learning support. Journal of the Korean Society for Library and Information Science, 54(1), 175-195. https://doi.org/10.4275/KSLIS.2020.54.1.175 
  15. Lee, Min Jung (2023). The U.S. Declaration of the "Year of Open Science" and Its Policy Implications (KISTEP Brief 59). Korea Institute of Science and Technology Evaluation and Planning. 
  16. Lee, Sang Hyuen (2020). A study on the activation of public data record management and the application of curation: Focusing on public data portal, Hyean system. The Korean Journal of Archival, Information and Cultural Studies, (11), 115-153. https://doi.org/10.23035/kaics.2020.1.11.115 
  17. Lee, Soo Sang (2012). Network Analysis Methodology. Seoul: Nonhyeong. 
  18. Lee, Won Sang & Sohn, So young (2015). Topic model analysis of research trend on spatial big data. Journal of the Korean Institute of Industrial Engineers, 41(1), 64-73. https://doi.org/10.7232/JKIIE.2015.41.1.064 
  19. Lee, You Kyong & Chung, Eun Kyung (2015). An investigation on core competencies of data curator. Journal of the Korean Biblia Society for Library and Information Science, 26(3), 129-150. https://doi.org/10.14699/kbiblia.2015.26.3.129 
  20. Ministry of Science and ICT & Korea Institute of Science and Technology Evaluation and Planning (2020). Standard Manual for Research Management in National R&D Projects. 
  21. Ministry of Science and ICT (2023). Draft Legislation on the Promotion of National Research Data Management and Utilization. 
  22. Park, Dae Yeong, Kim, Deok Hyeon, & Kim, Keun Wook (2021). Topic modeling-based domestic and foreign public data research trends comparative analysis. Journal of Digital Convergence, 19(2), 1-12. https://doi.org/10.14400/JDC.2021.19.2.001 
  23. Park, Min Seok & Lee, Ji Soo (2024). A systematic review of trends of domestic digital curation research. Journal of Korean Society of Archives and Records Management, 24(2), 41-63. https://doi.org/10.14404/JKSARM.2024.24.2.041 
  24. Yu, Hwasun & Jung, Do Bum (2023). A research trend analysis of data policy using text mining. The Journal of the Korea Contents Association, 23(3), 17-26. https://doi.org/10.5392/JKCA.2023.23.03.017
  25. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022. 
  26. Hemphill, L., Pienta, A., Lafia, S., Akmon, D., & Bleckley, D. A. (2022). How do properties of data, their curation, and their funding relate to reuse?. Journal of the Association for Information Science and Technology, 73(10), 1432-1444. https://doi.org/10.1002/asi.24646 
  27. Johnston, L. R. (2017). Curating Research Data Volume One: Practical Strategies for Your Digital Repository. Chicago: Association of College and Research Libraries. 
  28. Johnston, L. R., Curty, R., Braxton, S. M., Carlson, J., Hadley, H., Lafferty-Hess, S., Luong, Hoa., Petters, Jonathan L., & Kozlowski, W. A. (2024). Understanding the value of curation: A survey of US data repository curation practices and perceptions. PloS One, 19(6), e0301171. https://doi.org/10.1371/journal.pone.0301171 
  29. Kim, J. (2014). Growth and trends in digital curation research: The case of the international journal of digital curation. Proceedings of the American Society for Information Science and Technology, 51(1), 1-4. https://doi.org/10.1002/meet.2014.14505101074 
  30. Lee, J. Y., Syn, S. Y., & Kim, S. (2024). Global research trends in research data management: A bibliometrics approach. Journal of Librarianship and Information Science, 09610006241239083. https://doi.org/10.1177/09610006241239083 
  31. Mannheimer, S. (2024). Scaling Up: How Data Curation can Help Address Key Issues in Qualitative Data Reuse and Big Social Research. Cham: Springer International Publishing AG. 
  32. Marsolek, W., Wright, S. J., Luong, H., Braxton, S. M., Carlson, J., & Lafferty-Hess, S. (2023). Understanding the value of curation: A survey of researcher perspectives of data curation services from six US institutions. PloS one, 18(11), e0293534. https://doi.org/10.1371/journal.pone.0293534 
  33. Mohammadi, E. & Karami, A. (2022). Exploring research trends in big data across disciplines: A text mining analysis. Journal of Information Science, 48(1), 44-56. https://doi.org/10.1177/0165551520932855 
  34. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. W., Bonino da Silva Santos, L., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., Hoen, P. A. C. 't, Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., & Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1), 1-9. https://doi.org/10.1038/sdata.2016.18