DOI QR코드

DOI QR Code

지식 간 내용적 연관성 파악 기법의 지식 서비스 관리 접목을 위한 정량적/정성적 고려사항 검토

Quantitative and Qualitative Considerations to Apply Methods for Identifying Content Relevance between Knowledge Into Managing Knowledge Service

  • Yoo, Keedong (Department of Business Administration, Dankook University)
  • 투고 : 2021.07.26
  • 심사 : 2021.08.20
  • 발행 : 2021.08.31

초록

내용적 연관성에 기반한 연관지식의 파악은 핵심 지식에 대한 서비스와 보안의 기본적인 기능이다. 본 연구는 내용적 연관성을 기준으로 연관지식을 파악하는 기존의 방식, 즉 키워드 기반 방식과 워드임베딩 방식의 연관문서 네트워크 구성 성능을 비교하여 어떤 방식이 정량적/정성적 측면에서 우월한 성능을 나타내는가를 검토한다. 검토 결과 키워드 기반 방식은 핵심 문서 파악 능력과 시맨틱 정보 표현 능력 면에서 우월한 성능을, 워드임베딩 방식은 F1-Score와 Accuracy, 연관성 강도 표현 능력, 대량 문서 처리 능력 면에서 우월한 성능을 나타냈다. 본 연구의 결과는 기업과 사용자의 요구를 반영하여 보다 현실적인 연관지식 서비스 관리에 활용될 수 있다.

Identification of associated knowledge based on content relevance is a fundamental functionality in managing service and security of core knowledge. This study compares the performance of methods to identify associated knowledge based on content relevance, i.e., the associated document network composition performance of keyword-based and word-embedding approach, to examine which method exhibits superior performance in terms of quantitative and qualitative perspectives. As a result, the keyword-based approach showed superior performance in core document identification and semantic information representation, while the word embedding approach showed superior performance in F1-Score and Accuracy, association intensity representation, and large-volume document processing. This study can be utilized for more realistic associated knowledge service management, reflecting the needs of companies and users.

키워드

참고문헌

  1. Allan, J., "Building hypertext using information retrieval," Information Processing & Management, Vol. 33, pp. 145-159, 1997. https://doi.org/10.1016/S0306-4573(96)00059-3
  2. Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T., "Enriching word vectors with subword information," arXiv preprint ar Xiv:1607.04606, 2016.
  3. Choi, J. and Hwang, Y. S., "Patent keyword network analysis for improving technology development efficiency," Technological Forecasting and Social Change, Vol. 83, pp. 170-182, 2014. https://doi.org/10.1016/j.techfore.2013.07.004
  4. Choi, J., Yi, S., and Lee, K. C., "Analysis of keyword networks in MIS research and implications for predicting knowledge evolution," Information & Management, Vol. 48, pp. 371-381, 2011. https://doi.org/10.1016/j.im.2011.09.004
  5. Dai, A. M., Olah, C., and Le, Q. V., "Document embedding with paragraph vectors," arXiv preprint arXiv:1507.07998, 2015.
  6. De Boom, C., Canneyt, S., Demeester, T. and Dhoedt, B., "Representation learning for very short texts using weighted word embedding aggregation," Pattern Recognition Letters, Vol. 80, pp. 150-156, 2016. https://doi.org/10.1016/j.patrec.2016.06.012
  7. Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
  8. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., Advances in knowledge discovery and data mining, 21, AAAI press Menlo Park, 1996.
  9. Feldman, R. and Dagan, I., "Knowledge Discovery in Textual Databases (KDT)," Proceedings of the 1st Internatinal Conference on KDD, pp. 112-117, 1995.
  10. Frantzi, K., Ananiadou, S., and Mima, H., "Automatic recognition of multi-word terms: The C-value/NC-value Method," International Journal of Digital Libraries, Vol. 3, No. 2, pp. 117-132, 2000.
  11. Han, J., Bertin, N., Hao, T., Goldberg, D. S., Berriz, G. F., Zhang, L. V., Dupuy, D., Walhout, A. J. M., Cuslck, M. E., Roth, F. P., and Vidal, M., "Evidence for dynamically organized modularity in the yeast protein-protein interaction network", Nature, Vol. 430, No. 6995, pp. 88-93, 2004. https://doi.org/10.1038/nature02555
  12. Haveliwala, T. H., Gionis, A., Klein, D., and Indyk, P., "Evaluating strategies for similarity search on the web," Proceedings of the 11th international conference on World Wide Web, pp. 432-442, 2002.
  13. Henzinger, M. R., "Hyperlink analysis for the web," IEEE Internet Computing, Vol. 5, pp. 45-50, 2001. https://doi.org/10.1109/4236.895141
  14. Hwang, S. and Kim, D., "BERT-based Classification Model for Korean Documents," The Journal of Society for e-Business Studies, Vol. 25, No. 1, pp. 203-214, 2020. https://doi.org/10.7838/JSEBS.2020.25.1.203
  15. Kamkarhaghighi, M. and Makrehchi, M., "Content Tree Word Embedding for document representation," Expert Systems with Applications, Vol. 90, pp. 241-249, 2017. https://doi.org/10.1016/j.eswa.2017.08.021
  16. Kenter, T., Borisov, A., and De Rijke, M., "Siamese cbow: Optimizing word embeddings for sentence representations," arXiv preprint arXiv:1606.04640, 2016.
  17. Kil, H., "A Study on the Centrality Types of Reading Fingerprint Text," Journal of Cheongram Korean Language Education, Vol. 74, pp. 39-70, 2020.
  18. Klimek, P., Jovanovic, A. S., Egloff, R., and Schneider, R., "Successful fish go with the flow: Citation impact prediction based on centrality measures for term-document networks," Scientometrics, Vol. 107, pp. 1265-1282, 2016. https://doi.org/10.1007/s11192-016-1926-1
  19. Le, Q. and Mikolov, T., "Distributed representations of sentences and documents," Proceedings of the International Conference on Machine Learning, pp. 1188-1196, 2014.
  20. Lee, D. and Kim, K., "Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec," The Journal of Society for e-Business Studies, Vol. 23, No. 2, pp. 83-96, 2018. https://doi.org/10.7838/JSEBS.2018.23.2.083
  21. Pennington, J., Socher, R., and Manning, C., "Glove: Global vectors for word representation," Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543, 2014.
  22. Rose, S., Engel, D., Cramer, N., and Cowley, W., "Automatic keyword extraction from individual documents," Text Mining: Applications and Theory, pp. 1-20, 2010.
  23. Yoo, K., "Application suite for autonomous management and service of verbal knowledge", The Journal of Society for e-Business Studies, Vol. 21, No. 1, pp. 79-90, 2016. https://doi.org/10.7838/jsebs.2016.21.1.079
  24. Yoo, K., "Keyword-based networked knowledge map expressing content relevance between knowledge," Journal of Intelligence and Information Systems, Vol. 24, No. 3, pp. 119-134, 2018. https://doi.org/10.13088/JIIS.2018.24.3.119
  25. Yoo, S. and Jeong, O., "An intelligent chatbot utilizing BERT model and knowledge graph," The Journal of Society for e-Business Studies, Vol. 24, No. 3, pp. 87-98, 2019.
  26. Zhu, L., Liu, X., He, S., Shi, J., and Pang, M., "Keywords co-occurrence mapping knowledge domain research base on the theory of Big Data in oil and gas industry," Scientometrics, Vol. 105, pp. 249-260, 2015. https://doi.org/10.1007/s11192-015-1658-7
  27. Zhuge, H. and Zhang, J., "Automatically constructing semantic link network on documents," Concurrency and Computation: Practice and Experience, Vol. 23, pp. 956-971, 2011. https://doi.org/10.1002/cpe.1624