토픽모델링 분석을 활용한 국가연구개발사업과제와 국회 상임위원회 사이의 정책 인식 비교 : ICT 분야를 중심으로

Comparison of policy perceptions between national R&D projects and standing committees using topic modeling analysis : focusing on the ICT field

  • 송병기 (정보통신정책연구원 재무회계팀) ;
  • 김상웅 (정보통신정책연구원 기획전략실)
  • Song, Byoungki (Finance & Accounting Team, Korea Information Society Development Institute) ;
  • Kim, Sangung (Department of Planning&Strategy, Korea Information Society Development Institute)
  • 투고 : 2022.04.13
  • 심사 : 2022.07.20
  • 발행 : 2022.07.28


본 논문에서는 여러 연구기관에서 논의하고 있는 데이터 기반 평가 방법론 중 토픽모델링 기법을 이용하여 계량적인 값을 도출하고 그 과정에서 실제 전문가들이 수행하는 국가연구개발사업과제와 이를 법률과 정책실무에서 다루는 국회 상임위원회 간의 정책적 인식 차이가 있는지 ICT 분야를 중심으로 파악해 보고자 한다. 먼저 HAN 모델로 사업과제 데이터를 학습하여 ICT 문서를 분류하는 모델을 만들고, 해당 모델을 통해 분류된 ICT 문서를 대상으로 LDA 토픽모델링 분석을 수행하여 국가연구개발사업과제 데이터와 국회 상임위원회 회의록에서 도출된 토픽과 분포를 비교한다. 구체적으로 총 26개의 토픽이 도출되었으며, 각 토픽이 포함하는 단어와 문서 분포 비율을 살펴봤을 때, 국가사업과제는 상대적으로 전문적인 주제의 문서가 많았으며, 국회 상임위원회는 상대적으로 사회적이고 대중적인 문제를 다루는 것으로 나타나 인식에 다소 차이가 있는 것으로 보였다. 인식의 차이를 수치적으로 확인할 수 있는 만큼, 향후 정책이나 과제 평가에 사용할 수 있는 지표에 대한 기초연구로 활용 가능할 것이다.

In this paper, numerical values are derived using topic modeling among data-based evaluation methodologies discussed by various research institutes. In addition, we will focus on the ICT field to see if there is a difference in policy perception between the national R&D project and standing committee. First, we create model for classifying ICT documents by learning R&D project data using HAN model. And we perform LDA topic modeling analysis on ICT documents classified by applying the model, compare the distribution with the topics derived from the R&D project data and proceedings of standing committees. Specifically, a total of 26 topics were derived. Also, R&D project data had professionally topics, and the standing committee-discuss relatively social and popular issues. As the difference in perception can be numerically confirmed, it can be used as a basic study on indicators that can be used for future policy or project evaluation.



  2. H. S. Kim. (2015). Study of program characteristics by performance criteria for evaluation of national research and development projects. Jincheon-gun, Chungcheongbuk-do : KISDI.
  3. National Assembly Budget Office. (2020). A study on the evaluation system of project planning for national R&D projects. Seoul : NABO.
  4. E. Jimenez-Contreras, F. M. Anegon & E. D. Lopez-Cozar. (2003). The evolution of research activity in Spain: The impact of the National Commission for the Evaluation of Research Activity(CNEAI). Research Policy, 32(1), 123-142.
  5. Korea Legislation Research Institute. (2019). The study for data-based legislative assessment methodology. Sejong : KLRI.
  6. National Assembly Budget Office. (2007). A study on the project evaluation methodology. Seoul : NABO.
  7. Korea Institute of Science and Technology Evaluation and Planning. (2018). A study on the establishment of decision support system for research innovation policy based on bigdata. Eumseong-gun, Chungcheongbuk-do : KISTEP.
  8. Science and Technology Policy Institute. (2020). Innovation Strategy for the data-based R&D management system of the Korean government. Sejong : STEPI.
  9. G. Y. Rhee, S. C. Park & S. Y. Ryoo. (2020). Performance measurement model for open bigdata platform. Daegu : NIA.
  10. H. Y. Yang. (2012). Technology Planning Methodology Using Big Data. Eumseong-gun, Chungcheongbuk-do : KISTEP.
  11. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola & E. Hovy. (2016). Hierarchical attention networks for document classification. Proceedings of the Conference of the North American Chapter of the Association for Computaional Linguistics: Human Language Technologies, 1480-1489. DOI : 10.18653/v1/N16-1174
  12. I. H. Jang, K. Y. Park & J. K. Lee. (2018). Analysis of the online review based on the theme using the hierarchical attention network. Journal of information technology services: Korea Society of IT Services, 17(2), 165-177. DOI : 10.9716/KITS.2018.17.2.165
  13. S. Y. Woo. (2019). Classification of civil appeals using hierarchical attention network focusing on Seoul civil appeas data. Thesis of Master's Degree, Yonsei University, Seoul.
  14. H. C. Jang, D. H. Han, T. S. Ryu, H. K. Jang & H. S. Lim. (2018). Patent Document Classification by Using Hierarchical Attention Network. Proceedings of the Korea Information Processing Society Conference : Korea Information Processing Society, 369-372. DOI : 10.18653/v1/N16-1174
  15. D. Bahdanau, K. H. Cho & Y. Bengio. (2015). Neural machine translation by jointly learning to align and translate. Proceedings of the 3rd International Conference on Learning Representaions (ICLR 2015). DOI : arXiv:14090473
  16. N. Pappas & A. P. Belis. (2017). Multilingual hierarchical attention networks for document classification. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. DOI : arXiv:170700896
  17. S. M. Heo & J. Y. Yang. (2020). Analysis of research topics and trends on COVID-19 in Korea using latent dirivhlet allocation(LDA). Journal of The Korea Society of Computer and Information : Korea Society of Computer Information, 25(12), 83-91. DOI : 10.9708/jksci.2020.25.12.083
  18. S. H. Moon, S. H. Chung & S. H. Chi. (2018). Topic modeling of news article about international construction market using latent dirichlet allocation. Journal of the korean society of civil engineers: Korean Society of Civil Engineers, 38(4), 595-599. DOI : 10.12652/Ksce.2018.38.4.0595
  19. J. H. Park & M. Song. (2013). A Study on the Research Trends in Library & Information Science in Korea using Topic Modeling. Journal of the Korean Society for Information Management: Korea Society for Information Management, 30(1), 7-32. DOI : 10.3743/KOSIM.2013.30.1.007
  20. D. M. Blei, A. Y. Ng & M. I. Jordan. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 33, 993-1022.
  21. J. H. Park & H. J. Oh. (2017). Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea: focused on LDA and HDP. Journal of Korean Library and Information Science Society (JKLISS), 48(4), 41-61.
  22. H. S. Kang & J. H. Yang. (2018). Selection of the Optimal Morphological Analyzer for a Korean Word2vec Model. Korea Information Processing Society's 2018 Autumn Academic Conference, 376-379.
  23. H. K. Jung & N. K. Kim. (2018). Analyzing the Effect of Characteristics of Dictionary on the Accuracy of Document Classifiers. Management & information systems, 37(4), 41-61.
  24. D. Newman, J. H. Lau, K. Grieser & T. Baldwin. (2010). Automatic Evaluation of Topic Coherence. Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 100-108.