DOI QR코드

DOI QR Code

Research Trends in Record Management Using Unstructured Text Data Analysis

비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향

  • Received : 2023.10.16
  • Accepted : 2023.11.06
  • Published : 2023.11.30

Abstract

This study aims to analyze the frequency of keywords used in Korean abstracts, which are unstructured text data in the domestic record management research field, using text mining techniques to identify domestic record management research trends through distance analysis between keywords. To this end, 1,157 keywords of 77,578 journals were visualized by extracting 1,157 articles from 7 journal types (28 types) searched by major category (complex study) and middle category (literature informatics) from the institutional statistics (registered site, candidate site) of the Korean Citation Index (KCI). Analysis of t-Distributed Stochastic Neighbor Embedding (t-SNE) and Scattertext using Word2vec was performed. As a result of the analysis, first, it was confirmed that keywords such as "record management" (889 times), "analysis" (888 times), "archive" (742 times), "record" (562 times), and "utilization" (449 times) were treated as significant topics by researchers. Second, Word2vec analysis generated vector representations between keywords, and similarity distances were investigated and visualized using t-SNE and Scattertext. In the visualization results, the research area for record management was divided into two groups, with keywords such as "archiving," "national record management," "standardization," "official documents," and "record management systems" occurring frequently in the first group (past). On the other hand, keywords such as "community," "data," "record information service," "online," and "digital archives" in the second group (current) were garnering substantial focus.

본 연구에서는 텍스트 마이닝 기법을 활용하여 국내 기록관리 연구 분야의 비정형 텍스트 데이터인 국문 초록에서 사용된 키워드 빈도를 분석하여 키워드 간 거리 분석을 통해 국내기록관리 연구 동향을 파악하는 것이 목적이다. 이를 위해 한국학술지인용색인(Korea Citation Index, KCI)의 학술지 기관통계(등재지, 등재후보지)에서 대분류(복합학), 중분류 (문헌정보학)으로 검색된 학술지(28종) 중 등재지 7종 1,157편을 추출하여 77,578개의 키워드를 시각화하였다. Word2vec를 활용한 t-SNE, Scattertext 등의 분석을 수행하였다. 분석 결과, 첫째로 1,157편의 논문에서 얻은 77,578개의 키워드를 빈도 분석한 결과, "기록관리" (889회), "분석"(888회), "아카이브"(742회), "기록물"(562회), "활용"(449회) 등의 키워드가 연구자들에 의해 주요 주제로 다뤄지고 있음을 확인하였다. 둘째로, Word2vec 분석을 통해 키워드 간의 벡터 표현을 생성하고 유사도 거리를 조사한 뒤, t-SNE와 Scattertext를 활용하여 시각화하였다. 시각화 결과에서 기록관리 연구 분야는 두 그룹으로 나누어졌는데 첫 번째 그룹(과거)에는 "아카이빙", "국가기록관리", "표준화", "공문서", "기록관리제도" 등의 키워드가 빈도가 높게 나타났으며, 두 번째 그룹(현재)에는 "공동체", "데이터", "기록정보서비스", "온라인", "디지털 아카이브" 등의 키워드가 주요한 관심을 받고 있는 것으로 나타났다.

Keywords

References

  1. Bae, Kyu-Yong, Park, Ju-Hyun, Kim, JeongSeon & Lee, Yung-Seop (2013). Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm. Journal of the Korean Data And Information Science Sociaty, 24(6), 1429-1437. https://doi.org/10.7465/jkdi.2013.24.6.1429 
  2. Cho, Su-Gon & Kim, Seoung-Bum (2012). Finding Meaningful Pattern of Key Words in IIE Transactions Using Text Mining. Journal of the Korean Institute of Industrial Engineers, 38(1), 67-73. https://doi.org/10.7232/JKIIE.2012.38.1.067 
  3. Choi, Yilang (2015). A Study on the Research Trends of Archival Studies in Korea : Focused on Research Papers between 2004 and 2013. The Korean Journal of Archival Studies, 43, 147-177. https://doi.org/10.20923/kjas.2015.43.147 
  4. Jung, Yong-Bok & Park, Eui-Seob (2015). Keyword Analysis of Two SCI Journals on Rock Engineering by using Text Mining. Tunnel and Underground Space, 25(4), 303-319. https://doi.org/10.7474/TUS.2015.25.4.303 
  5. Kim, Gyuha & Park, Cheolyong (2015). Analysis of English abstracts in Journal of the Korean Data & Information Science Society using topic models and social network analysis. Journal of the Korean Data And Information Science Sociaty, 26(1), 151-159. https://doi.org/10.7465/jkdi.2015.26.1.151 
  6. Kim, Gyuhwan, Jang, BoSeong & Yi, Hyunjung (2009). A Study on Intellectual Structure of Records Management and Archives in Korea: Based on Syntactic and Semantic Structure of Article Titles. Journal of the Korean Society for Library and Information Science, 43(3), 417-439. https://doi.org/10.4275/KSLIS.2009.43.3.417 
  7. Kim, Ji Young, Kim, Eun Hye & Lee, Ji Young (2015). A Study on the Research Trends and Knowledge Structure of Dance Management Using Text-Mining and Semantic Network Analysis. Korean Journal of Sport Management, 24(3), 85-103. https://doi.org/10.31308/KSSM.24.3.6 
  8. Kim, Pan Jun & Suh, hye-Ran (2012). A Study on the Analysis of Intellectual Structure of Electronic Records Research in Korea Using Profiling. Journal of Korean Society of Archives and Records Management, 12(2), 29-50. https://doi.org/10.14404/JKSARM.2012.12.2.029 
  9. Lee, Jae-Yun, Moon, Ju-Young & Kim, Hee-Jung (2007). Examining the Intellectual Structure of Records Management & Archival Science in Korea with Text Mining. Journal of the Korean Society for Library and Information Science, 41(1), 345-372. https://doi.org/10.4275/KSLIS.2007.41.1.345 
  10. Nam, TeaWoo & Lee, Jin-Young (2009). A Study on the Research Trends of Records and Archives Management in Korea. Journal of Korean Library and Information Science Society, 40(2), 451-472. https://doi.org/10.16981/kliss.40.2.200906.451 
  11. Park, JunHyeong & Oh, Hyo-Jung (2017). Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea: focused on LDA and HDP. Journal of Korean Library and Information Science Society, 48(4), 235-258. https://doi.org/10.16981/kliss.48.4.201712.235 
  12. Park, JunHyeong, Ryu, Pum-Mo & Oh Hyo-Jung (2018). Timeline-Based Topic Trend Analysis of Archives Management in Korea. Korean Society of Archives and Records Management, 18(1), 29-47. https://doi.org/10.14404/JKSARM.2018.18.1.029 
  13. Ree, Sangbok (2019). Analysis of Research Trends in Journal of Korean Society for Quality Management by Text Mining Processing. Journal of Korean Society for Quality Management, 47(3), 597-613. https://doi.org/10.7469/JKSQM.2019.47.3.597 
  14. Sohn, Hye In & Nam, Young Joon (2016). A Study on the Research Trends of Archives Management in Korea: Focused on the Journal of Records Management & Archives Society of Korea and The Korean Journal of Archival Studies. Journal of the Korean Society for Information Management, 33(1), 85-110. https://doi.org/10.3743/KOSIM.2016.33.1.085 
  15. Yoon, Hee-Young & Kwak, Il-Youp (2020). The Association Modeling on Keywords and Documents of Korea International Trade Research using Paper Abstract data. Korea International Commerce Review, 35(2), 45-64. https://doi.org/10.18104/kaic.2020.35.2.45 
  16. Ethayarajh, K., Duvenaud, D. & Hirst, G. (2019). Towards Understanding Linear Word Analogies. The Association for Computational Linguistics, 57, 3253-3262. https://doi.org/10.18653/v1/P19-1315 
  17. Kessler, J. (2020). Visualizing thousands of phrases with Scattertext, PyTextRank and Phrasemachine.  Analytics Vidhya. Available: https://medium.com/analytics-vidhya/visualizing-phrase-prominence-and-category-association-with-scattertext-and-pytextrank-f7a5f036d4d2 
  18. Maaten, L. & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605. 
  19. Opacich, J. (2021). Interpreting Scattertext: A seductive tool for plotting text. Towards Data Science.  Available: https://towardsdatascience.com/interpreting-scattertext-a-seductive-tool-for-plotting-text-2e94e5824858 
  20. Parra, C., Cebollada, S., Paya, L., Holloway, M. & Reinoso, O. (2020). A Novel Method to Estimate the Position of a Mobile Robot in Underfloor Environments Using RGB-D Point Clouds. IEEE Access, 8, 9084 9101. - https://doi.org/10.1109/ACCESS.2020.2964317