DOI QR코드

DOI QR Code

A Design of Similar Video Recommendation System using Extracted Words in Big Data Cluster

빅데이터 클러스터에서의 추출된 형태소를 이용한 유사 동영상 추천 시스템 설계

  • Lee, Hyun-Sup (Department of application software Engineering, Dong-Eui University) ;
  • Kim, Jindeog (Department of Computer Engineering, Dong-Eui University)
  • Received : 2020.01.20
  • Accepted : 2020.01.29
  • Published : 2020.02.29

Abstract

In order to recommend contents, the company generally uses collaborative filtering that takes into account both user preferences and video (item) similarities. Such services are primarily intended to facilitate user convenience by leveraging personal preferences such as user search keywords and viewing time. It will also be ranked around the keywords specified in the video. However, there is a limit to analyzing video similarities using limited keywords. In such cases, the problem becomes serious if the specified keyword does not properly reflect the item. In this paper, I would like to propose a system that identifies the characteristics of a video as it is by the system without human intervention, and analyzes and recommends similarities between videos. The proposed system analyzes similarities by taking into account all words (keywords) that have different meanings from training videos, and in such cases, the methods handled by big data clusters are applied because of the large scale of data and operations.

최근 널리 이용되고 있는 동영상 공유 서비스에서는 콘텐츠 추천 시스템이 매우 중요한 요소이다. 콘텐츠 추천을 위해서 일반적으로 사용자 선호도와 동영상(아이템) 유사도를 동시에 고려하는 협업 필터링을 사용하고 있다. 그러한 서비스는 주로 사용자의 검색 키워드와 시청시간과 같은 개인 선호도를 활용하여 사용자의 편의를 도모한다. 또한 동영상에 지정한 키워드를 중심으로 랭킹화한다. 그러나 한정된 키워드만을 이용한 동영상 유사도를 분석한다는 한계가 있다. 이런 경우 지정한 키워드가 아이템을 제대로 반영하지 못하는 경우 그 문제가 심각해진다. 이 논문에서는 교육 동영상으로부터 차별화된 의미를 갖는 모든 단어를 고려하여 유사도를 분석하며, 이런 경우 데이터와 연산의 규모가 방대하기 때문에 빅데이터 클러스터에서 처리하는 방법을 적용한다. 제안한 시스템은 빅데이터 영상 분석을 통해 동영상 공유 서비스 플랫폼의 기본 모듈로 활용될 것으로 기대한다.

Keywords

References

  1. Hannanum, KAIST Semantic Web R.C [Internet]. Available : http://semanticweb.kaist.ac.kr/hannanum/.
  2. Kkma, IDS(Intelligent Data Systems) [Internet]. Available : http://semanticweb.kaist.ac.kr/hannanum/.
  3. Komoran, Shineware [Internet]. Available : http://semanticweb.kaist.ac.kr/hannanum/.
  4. Google AutoML Vision [Internet]. Available : https://cloud.google.com/vision/?hl=ko#.
  5. Google Vision API [Internet]. Available : https://cloud.google.com/vision/?hl=ko#.
  6. G.J. Jeong, I.W. Cha, H.S. Jeong, H.S. Lee, and J.D. Kim, "Frequency Analysis of Keywords in the video in the Cluster Environment," The Korea Institute of Information and Communication Engineering Comprehensive Conference Proceeding Book, vol. 23, no. 2, pp. 540-541, 2019.
  7. [SPARK] What is Apache Spark? [Internet]. Available : https://12bme.tistory.com/433.
  8. Cosine Simularity [Internet]. Available : https://blog.naver.com/myincizor/221643594756.
  9. J.J. Kim, "Method for Searching Patent Document by Applying Degree of Similarity and System thereof," Patent of R.O.K, 10-2007-0047544.
  10. L.L. Zhang, and H.J. Hong, "Examining the Intellectual Structure of Reading Studies with Co-Word Analysis Based on the Importance of Journals and Sequence of Keywords," Journal of The Korean Biblia Society For Library And Information Science, vol. 25, no. 1, pp. 295-318, 2014. https://doi.org/10.14699/kbiblia.2014.25.1.295
  11. Y.B. Kwon, S.D. Lee, H. Yang, and Y.H. Joo, "The Analysis of the Conferences for the Computer Network Using the Miner and the Cosine Similarity based upon Keywords," Journal of Korean Service Society, vol. 11, no. 1, pp. 223-238, 2012. https://doi.org/10.9716/KITS.2012.11.1.223
  12. D.X. Kim, and S.W. Lee, "News Topic Extraction based on Word Similarity," Journal of KIISE, vol. 44, no. 11, pp. 1138-1148, 2017. https://doi.org/10.5626/JOK.2017.44.11.1138
  13. M.N. Im, and J.I. Kim, "A Space-Efficient Inverted Index Technique using Data Rearrangement for String Similarity Searches," Journal of KIISE, vol. 42, no. 10, pp. 1247-1253, 2015. https://doi.org/10.5626/JOK.2015.42.10.1247
  14. W.S. Ha, "Collaborative Filtering using Web Documents Classification by Associative Word Frequency," Inha University Master's Thesis, 2005.
  15. H.D. Lee, and J.B. Kim, "Issue Keyword Extraction Method Using Document Similarity Method," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 7, no. 8, pp. 383-391, 2017. https://doi.org/10.14257/AJMAHS.2017.08.69
  16. D.K. Lee, K.J. Oh, and H.J. Choi, "Measuring the Syntactic Similarity between Korean Sentences Using RNN," KOREA INFORMATION SCIENCE SOCIETY Comprehensive Conference Proceeding, pp. 792-794, 2016.