• 제목/요약/키워드: latent dirichlet allocation

검색결과 214건 처리시간 0.023초

여행 사이트 리뷰를 활용한 관광지 만족도 요인 추출 및 평가 (Extraction of Satisfaction Factors and Evaluation of Tourist Attractions based on Travel Site Review Comments)

  • 조수현;김보섭;박민식;이기창;강필성
    • 대한산업공학회지
    • /
    • 제43권1호
    • /
    • pp.62-71
    • /
    • 2017
  • In order to attract foreign tourists, it is important to understand what factors on domestic tour spots are critically considered and how they are evaluated after visit. However, most of the researches on tour business have collected information from tourists through survey on a small number of tourists, which leads to inaccurate and biased conclusion. In this paper, we suggest a data-driven methodology to figure out tourists' satisfaction factors and estimate sentiment scores on them. To do so, we collected review comments data from popular web site. Latent dirichlet allocation is employed to extract key factors and elastic net is used to estimate sentiment scores. Then, an aggregated evaluation score is generated by combining the factors and the sentiment scores per topics. Our proposed method can be used to recommend travel schedules with themes and discover new spots.

LDA를 이용한 온라인 리뷰의 다중 토픽별 감성분석 - TripAdvisor 사례를 중심으로 - (Multi-Topic Sentiment Analysis using LDA for Online Review)

  • 홍태호;니우한잉;임강;박지영
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제27권1호
    • /
    • pp.89-110
    • /
    • 2018
  • Purpose There is much information in customer reviews, but finding key information in many texts is not easy. Business decision makers need a model to solve this problem. In this study we propose a multi-topic sentiment analysis approach using Latent Dirichlet Allocation (LDA) for user-generated contents (UGC). Design/methodology/approach In this paper, we collected a total of 104,039 hotel reviews in seven of the world's top tourist destinations from TripAdvisor (www.tripadvisor.com) and extracted 30 topics related to the hotel from all customer reviews using the LDA model. Six major dimensions (value, cleanliness, rooms, service, location, and sleep quality) were selected from the 30 extracted topics. To analyze data, we employed R language. Findings This study contributes to propose a lexicon-based sentiment analysis approach for the keywords-embedded sentences related to the six dimensions within a review. The performance of the proposed model was evaluated by comparing the sentiment analysis results of each topic with the real attribute ratings provided by the platform. The results show its outperformance, with a high ratio of accuracy and recall. Through our proposed model, it is expected to analyze the customers' sentiments over different topics for those reviews with an absence of the detailed attribute ratings.

A Development of LDA Topic Association Systems Based on Spark-Hadoop Framework

  • Park, Kiejin;Peng, Limei
    • Journal of Information Processing Systems
    • /
    • 제14권1호
    • /
    • pp.140-149
    • /
    • 2018
  • Social data such as users' comments are unstructured in nature and up-to-date technologies for analyzing such data are constrained by the available storage space and processing time when fast storing and processing is required. On the other hand, it is even difficult in using a huge amount of dynamically generated social data to analyze the user features in a high speed. To solve this problem, we design and implement a topic association analysis system based on the latent Dirichlet allocation (LDA) model. The LDA does not require the training process and thus can analyze the social users' hourly interests on different topics in an easy way. The proposed system is constructed based on the Spark framework that is located on top of Hadoop cluster. It is advantageous of high-speed processing owing to that minimized access to hard disk is required and all the intermediately generated data are processed in the main memory. In the performance evaluation, it requires about 5 hours to analyze the topics for about 1 TB test social data (SNS comments). Moreover, through analyzing the association among topics, we can track the hourly change of social users' interests on different topics.

토픽 모델링을 이용한 건설현장 추락재해 분석 (Falling Accidents Analysis in Construction Sites by Using Topic Modeling)

  • 류한국
    • 한국융합학회논문지
    • /
    • 제10권7호
    • /
    • pp.175-182
    • /
    • 2019
  • 본 연구는 기계학습 기법 중 토픽 모델링을 활용하여 건설현장에서 발생하는 추락재해에 대한 토픽을 분류하고 각 토픽에 따른 재해요인을 분석하였다. 잠재 디리클레 할당 기반의 토픽 모델링을 적용하기 위해 텍스트 데이터의 전처리를 하였고 Perplexity 점수로 평가하여 모형의 신뢰성을 높였다. 각 토픽에서 공통으로 도출된 추락재해의 대부분은 소규모 사업장에 속한 일용직 작업자들에게 발생하였다. 추락재해의 대부분의 원인은 안전장비 미착용, 현장 정리 정돈 미흡, 안전장비의 성능 및 착용 상태로 인해 제대로 작동하지 않은 것으로 판단되었다. 추락재해를 예방하고 절감하기 위해서는 소규모 사업장에 맞는 안전교육과 작업장의 정리 정돈과 개인 안전장비의 적절한 착용 상태 및 성능을 확인하는 것이 중요한 것으로 도출되었다.

Topic Modeling and Sentiment Analysis of Twitter Discussions on COVID-19 from Spatial and Temporal Perspectives

  • AlAgha, Iyad
    • Journal of Information Science Theory and Practice
    • /
    • 제9권1호
    • /
    • pp.35-53
    • /
    • 2021
  • The study reported in this paper aimed to evaluate the topics and opinions of COVID-19 discussion found on Twitter. It performed topic modeling and sentiment analysis of tweets posted during the COVID-19 outbreak, and compared these results over space and time. In addition, by covering a more recent and a longer period of the pandemic timeline, several patterns not previously reported in the literature were revealed. Author-pooled Latent Dirichlet Allocation (LDA) was used to generate twenty topics that discuss different aspects related to the pandemic. Time-series analysis of the distribution of tweets over topics was performed to explore how the discussion on each topic changed over time, and the potential reasons behind the change. In addition, spatial analysis of topics was performed by comparing the percentage of tweets in each topic among top tweeting countries. Afterward, sentiment analysis of tweets was performed at both temporal and spatial levels. Our intention was to analyze how the sentiment differs between countries and in response to certain events. The performance of the topic model was assessed by being compared with other alternative topic modeling techniques. The topic coherence was measured for the different techniques while changing the number of topics. Results showed that the pooling by author before performing LDA significantly improved the produced topic models.

Topics and Sentiment Analysis Based on Reviews of Omni-Channel Retailing

  • KIM, Soon-Hong;YOO, Byong-Kook
    • 유통과학연구
    • /
    • 제19권4호
    • /
    • pp.25-35
    • /
    • 2021
  • Purpose: This study aims to analyze the factors affecting customer satisfaction in the customer reviews of omni-channel, posted on Internet blogs, cafes, and YouTube using text mining analysis. Research, data, and Methodology: In this study, frequency analysis is performed and the LDA (Latent Dirichlet Allocation) is used to analyze social big data to respond to reviewers' reaction to the recently opened omni-channel shopping reviews by L Shopping Company. Additionally, based on the topic analysis, we conduct a sentiment analysis on purchase reviews and analyze the characteristics of each topic on the positive or negative sentiments of omni-channel app users. Results: As a result of a topic analysis, four main topics are derived: delivery and events, economic value, recommendations and convenience, and product quality and brand awareness. The emotional analysis reveals that the reviewers have many positive evaluations for price policy and product promotion, but negative evaluations for app use, delivery, and product quality. Conclusions: Retailers can establish customized marketing strategies by identifying the customer's major interests through text mining analysis. Additionally, the analysis of sentiment by subject becomes an important indicator for developing products and services that customers want by identifying areas that satisfy customers and areas that evoke negative reactions.

Analyzing Technological Trends of Smart Factory using Topic Modeling

  • Hussain, Adnan;Kim, Chulhyun;Battsengel, Ganchimeg;Jeon, Jeonghwan
    • Asian Journal of Innovation and Policy
    • /
    • 제10권3호
    • /
    • pp.380-403
    • /
    • 2021
  • Recently, smart factories have gained significant importance since the development of the fourth industrial revolution and the rise of global industrial competition. Therefore, the industries' survival to meet the global market trends requires accurate technological planning. Although, different works are available to investigate forecasting technologies and their influence on the smart factory. However, little significant work is available yet on the analysis of technological trends concerning the smart factory, which is the core focus herein. This work was performed to analyze the technological trends of the smart factory, followed by a detailed investigation of recent research hotspots/frontiers in the field. A well-known topic modeling technique, namely Latent Dirichlet Allocation (LDA), was employed for this study described above. The technological trends were further strengthened with the in-depth analysis of a smart factory-based case study. The findings produced the technological trends which possess significant potential in determining the technological strategies. Moreover, the results of this work may be helpful for researchers and enterprises in forecasting and planning future technological evolution.

농업·농촌 부문 공기업의 공익적 가치 인식 연구 - 한국농어촌공사를 대상으로 - (A Study on the Perception of Public Value from Public Corporation in the Agricultural and Rural Sector - The Case of Korea Rural Community Corporation -)

  • 임채환;범진우;안동환;유도일
    • 농촌계획
    • /
    • 제27권4호
    • /
    • pp.83-96
    • /
    • 2021
  • This study analyzes the perception of public value created by Korea Rural Community Corporation, a representative public corporation in the agricultural and rural sector. We categorize agricultural and rural public values as 'stable food supply,' 'conservation of national environment and nature,' 'formation and cultivation of water resources,' 'prevention of soil loss and flooding,' 'conservation of ecological system,' 'conservation of rural tradition and culture.' For the qualitative analysis, we apply content analysis. And, for the quantitative analysis, we use topic modeling and Latent Dirichlet Allocation (LDA) analysis which is used widely in the field of text-mining. Results show that internal perception for value suppliers are mainly created for 'stable food supply,' 'formation and cultivation of water resources,' and 'conservation of rural tradition and culture.' External perception for value demanders are created for all public values, but its evaluation and demand include various aspects including both positive and negative opinions.

토픽모델링을 활용한 해운물류 뉴스 분석 (Analysis of Shipping and Logistics News Articles using Topic Modeling)

  • 윤희영;곽일엽
    • 무역학회지
    • /
    • 제46권4호
    • /
    • pp.61-76
    • /
    • 2021
  • This study focuses on three logistics-related news (Logistics Newspaper, Korea Shipping Gadget, and Korea Shipping Newspaper) in order to present changes in logistics issues, centering on Corona 19, which has recently had the greatest impact in the world. For data collection, two-year news articles in 2019 and 2020 (title, article, content, date, article classification, article URL) were collected through web crawling (using Python's BeautifulSoup, requests module) on the homepages of three representative logistics-related media companies. As for the data analysis methods, fundamental statistical analysis, Latent Dirichlet Allocation (LDA) for topic modeling, and Scattertext were performed. The analysis results were as follows. First, among the three news media related to logistics, the Korea Shipping Newspaper was carrying out the most active media activities. Second, through topic modeling with LDA, eight logistics-related topics were identified, and keywords and significant issues of each topic were presented. Third, the keywords were visually expressed through Scattertext. This is the first study to present changes in the logistics field, focusing on articles from representative logistics-related media in 2019 and 2020. In particular, 2019 and 2020 can be divided into before and after the outbreak of Corona 19, which has had a great impact not only on the logistics field but also on our lives as a whole. For future work, a multi-faceted approach is required, such as comparative studies of logistics issues between countries or presenting implications based on long-term time-series articles.

잠재 디리클레 할당(LDA)을 이용한 항공안전 의무보고 토픽 예측 모형 (Aviation Safety Mandatory Report Topic Prediction Model using Latent Dirichlet Allocation (LDA))

  • 김준환;백현진;전성진;최영재
    • 한국항공운항학회지
    • /
    • 제31권3호
    • /
    • pp.42-49
    • /
    • 2023
  • Not only in aviation industry but also in other industries, safety data plays a key role to improve the level of safety performance. By analyzing safety data such as aviation safety report (text data), hazard can be identified and removed before it leads to a tragic accident. However, pre-processing of raw data (or natural language data) collected from each site should be carried out first to utilize proactive or predictive safety management system. As air traffic volume increases, the amount of data accumulated is also on the rise. Accordingly, there are clear limitation in analyzing data directly by manpower. In this paper, a topic prediction model for aviation safety mandatory report is proposed. In addition, the prediction accuracy of the proposed model was also verified using actual aviation safety mandatory report data. This research model is meaningful in that it not only effectively supports the current aviation safety mandatory report analysis work, but also can be applied to various data produced in the aviation safety field in the future.