DOI QR코드

DOI QR Code

The Analysis of Changes in East Coast Tourism using Topic Modeling

토핑 모델링을 활용한 동해안 관광의 변화 분석

  • Jeong, Eun-Hee (Department of Regional Economics, Kangwon National University)
  • Received : 2020.12.07
  • Accepted : 2020.12.15
  • Published : 2020.12.30

Abstract

The amount of data is increasing through various IT devices in a hyper-connected society where the 4th revolution is progressing, and new value can be created by analyzing that data. This paper was collected total 1,526 articles from 2017 to 2019 in central magazines, economic magazines, regional associations, and major broadcasting companies with the keyword "(East Coast Tourism or East Coast Travel) and Gangwon-do" through Bigkinds. It was performed the topic modeling using LDA algorithm implemented in the R language to analyze the collected 1,526 articles. It was extracted keywords for each year from 2017 to 2019, and classified and compared keywords with high frequency for each year. It was setted the optimal number of topics to 8 using Log Likelihood and Perplexity, and then inferred 8 topics using the Gibbs Sampling method. The inferred topics were Gangneung and Beach, Goseong and Mt.Geumgang, KTX and Donghae-Bukbu line, weekend sea tour, Sokcho and Unification Observatory, Yangyang and Surfing, experience tour, and transportation network infra. The changes of articles on East coast tourism was was analyzed using the proportion of the inferred eight topics. As the result, the proportion of Unification Observatory and Mt. Geumgang showed no significant change, the proportion of KTX and experience tour increased, and the proportion of other topics decreased in 2018 compared to 2017. In 2019, the proportion of KTX and experience tour decreased, but the proportion of other topics showed no significant change.

4차혁명이 진행되고 있는 초연결사회에선 다양한 IT기기를 통해 데이터량이 증가하고 있고, 이렇게 생산된 데이터를 분석하여 새로운 가치를 창출 할 수 있다. 본 연구에서는 빅카인즈에서 2017년부터 2019년까지 중앙지, 경제지, 지역조합지, 주요방송사 등에서 "(동해안 관광 또는 동해안 여행) 그리고 강원도"라는 키워드로 기사를 총 1,526건을 수집하였다. 수집된 1,526건의 기사를 분석하기 위해 R언어로 구현된 LDA 알고리즘을 이용하여 토픽 모델링을 수행하였다. 2017년부터 2019년까지 각각의 년도별 키워드를 추출하고, 각 년도별로 빈도수가 높은 키워드를 분류하여 비교하였다. Log Likelihood와 Perplexity를 이용하여 최적의 토픽 수를 8로 설정한 후, 깁스 샘플링 방법으로 8가지의 토픽을 추론하였다. 추론된 토픽들은 강릉과 해변, 고성과 금강산, KTX와 동해북부선, 주말바다여행, 속초와 통일전망대, 양양과 서핑, 체험관광, 교통망 인프라이다. 추론된 8개의 토픽의 비중을 이용해 동해안 관광에 대한 기사들의 변화를 분석하였다. 그 결과, 통일전망대와 금강산의 비중은 큰 변화가 없는 것으로 나타났고, KTX와 체험관광의 비중은 증가하였고, 그 외의 토픽들의 비중은 2017년에 비해 2018년에 감소하였다. 2019년에는 KTX와 체험관광의 비중은 감소하였으나, 나머지 토픽들의 비중은 큰 변화가 없는 것으로 나타났다.

Keywords

References

  1. S.Y. Kim, J.H. Do, B.R. Kim, "Bigdata", KISTEP, 2018-11, pp.1-33 2018.
  2. J.H. Park and M. Song, "A study on the rese arch trends in library & inormation science in Korea usning topic modeling", Journal of the Korea Society for Information Management, vol. 30, No. 1, pp.7-32, 2013. https://doi.org/10.3743/KOSIM.2013.30.1.007
  3. S.T. Floar, "Blogger-link-topic model for blog mining," Pacific-Aisa Conference on Knowledge Discovery and Data mining, pp.28-39, 2017
  4. S.H. Noh, "Analaysis of Issues Related to the Fourth Industrial Revolution Based on Topic Modeling," Journal of Digital Contents Society, vol.21, no.3, pp.551-560, Mar. 2020 https://doi.org/10.9728/dcs.2020.21.3.551
  5. S.H. Noh, "Analaysis of Issues Related to Artificial Intelligence Based on Topic Modeling," Journal of Digital Convergence, vol.18, no.5, pp.75-87, May. 2020 https://doi.org/10.14400/JDC.2020.18.5.075
  6. S. I. Hwang, M. K. Kim, "An Analysis of Aritifical Intelligence(A.I.) related Studies' Trends in Korea Focused on Topic Modeling and Semantic Network Analysis," Journal of Digital Contents Society, vol.20, no.9, pp.1847-1 855, Sep. 2019 https://doi.org/10.9728/dcs.2019.20.9.1847
  7. K.W. Cho, Y.W. Woo, "Topic Modeling on Research Trends of Industry 4.0 Using Text Mining," Journal of the Korea Institute of Information and Communication Engineering, vol.23, no.7, pp.764-770, July. 2019. https://doi.org/10.6109/JKIICE.2019.23.7.764
  8. D.H. Kim, D.S. Seo, S.Y. Cho, P.S. Kang, "Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec," ScienceDirect, Vol. 477, pp.15-29, Mar. 2019.
  9. https://ratsgo.github.io/from%20freq uency%20to%20semantics/2017/06/01/LDA/