• Title/Summary/Keyword: LDA Topic Model

Search Result 109, Processing Time 0.03 seconds

Analysis of Changes in Restaurant Attributes According to the Spread of Infectious Diseases: Application of Text Mining Techniques (감염병 확산에 따른 레스토랑 선택속성 변화 분석: 텍스트마이닝 기법 적용)

  • Joonil Yoo;Eunji Lee;Chulmo Koo
    • Information Systems Review
    • /
    • v.25 no.4
    • /
    • pp.89-112
    • /
    • 2023
  • In March 2020, as it was declared a COVID-19 pandemic, various quarantine measures were taken. Accordingly, many changes have occurred in the tourism and hospitality industries. In particular, quarantine guidelines, such as the introduction of non-face-to-face services and social distancing, were implemented in the restaurant industry. For decades, research on restaurant attributes has emphasized the importance of three attributes: atmosphere, service quality, and food quality. Nevertheless, to the best of our knowledge, research on restaurant attributes considering the COVID-19 situation is insufficient. To respond to this call, this study attempted an exploratory approach to classify new restaurant attributes based on understanding environmental changes. This study considered 31,115 online reviews registered in Naverplace as an analysis unit, with 475 general restaurants located in Euljiro, Seoul. Further, we attempted to classify restaurant attributes by clustering words within online reviews through TF-IDF and LDA topic modeling techniques. As a result of the analysis, the factors of "prevention of infectious diseases" were derived as new attributes of restaurants in the context of COVID-19 situations, along with the atmosphere, service quality, and food quality. This study is of academic significance by expanding the literature of existing restaurant attributes in that it categorized the three attributes presented by existing restaurant attributes and further presented new attributes. Moreover, the analysis results have led to the formulation of practical recommendations, considering both the operational aspects of restaurants and policy implications.

Prediction of Customer Satisfaction Using RFE-SHAP Feature Selection Method (RFE-SHAP을 활용한 온라인 리뷰를 통한 고객 만족도 예측)

  • Olga Chernyaeva;Taeho Hong
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.325-345
    • /
    • 2023
  • In the rapidly evolving domain of e-commerce, our study presents a cohesive approach to enhance customer satisfaction prediction from online reviews, aligning methodological innovation with practical insights. We integrate the RFE-SHAP feature selection with LDA topic modeling to streamline predictive analytics in e-commerce. This integration facilitates the identification of key features-specifically, narrowing down from an initial set of 28 to an optimal subset of 14 features for the Random Forest algorithm. Our approach strategically mitigates the common issue of overfitting in models with an excess of features, leading to an improved accuracy rate of 84% in our Random Forest model. Central to our analysis is the understanding that certain aspects in review content, such as quality, fit, and durability, play a pivotal role in influencing customer satisfaction, especially in the clothing sector. We delve into explaining how each of these selected features impacts customer satisfaction, providing a comprehensive view of the elements most appreciated by customers. Our research makes significant contributions in two key areas. First, it enhances predictive modeling within the realm of e-commerce analytics by introducing a streamlined, feature-centric approach. This refinement in methodology not only bolsters the accuracy of customer satisfaction predictions but also sets a new standard for handling feature selection in predictive models. Second, the study provides actionable insights for e-commerce platforms, especially those in the clothing sector. By highlighting which aspects of customer reviews-like quality, fit, and durability-most influence satisfaction, we offer a strategic direction for businesses to tailor their products and services.

A Comparative Analysis of Travelers' Online Reviews among China, USA, and South Korea using Sentiment Analysis in the Era of the COVID-19 Pandemic (코로나19 팬데믹 상황에서 감성분석을 이용한 미국, 중국, 한국 여행자의 온라인 리뷰 비교 분석)

  • Hong, Junwoo;Hong, Taeho
    • Journal of Information Technology Services
    • /
    • v.20 no.5
    • /
    • pp.159-176
    • /
    • 2021
  • In this study, we performed a comparative analysis of the sentiment value for the tourists in USA, China, and Korea on the COVID19 pandemic era to explore and find out the features of the tourists by using online reviews. We collected a total of 243,826 online hotel reviews for metropolitan city and vacation spot in the three countries to compare the features between the business and the vacation trips. We collected the online reviews into the tow groups from Jan. 1, 2019 to Nov. 31, 2019 for before COVID19 pandemic and from Apr. 1, 2020 to Deb 28, 2021 for during COVID19. Online reviews were categorized into 6 dimensions using LDA model. Sentiment analysis were presented for 6 dimensions by utilizing a lexicon base. We proposed an approach to analyzing the importance of each attribute by applying 6-dimensional sentiment values to conjoint analysis. Our empirical analysis showed that the proposed approach could explore and find out the changed features of travelers during the COVID19 pandemic.

Technology Mining and Sentiment Analysis on Hydrogen Fuel Cell Using National R&D and Social Data (국가R&D와 소셜 데이터를 활용한 수소연료전지 기술마이닝과 감성분석)

  • Lee, Byeong-Hee;Choi, Jung-Woo;Kim, Tae-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.341-343
    • /
    • 2022
  • 온실가스 배출 문제가 세계적인 현안으로 부각되면서 수소를 에너지원으로 사용하는 수소경제가 주목받고 있다. 수소연료전지는 수소경제의 구성요소 중 하나로, 수소를 활용해 열과 전기를 생산하며 에너지 변환 효율이 높이는데 장점이 있다. 본 연구는 세계적인 온라인 커뮤니티인 레딧(Reddit)에서 수집한 수소연료전지와 관련된 소셜 데이터를 텍스트마이닝과 감성분석 기법으로 분석하였다. 분석 결과 9,211건의 댓글을 LDA(Latent Dirichlet Allocation)을 이용해 4개의 토픽 그룹으로 분류할 수 있었다. 이 중 수소연료전지와 관련이 높은 그룹을 선정해 STM(Structural Topic Model) 분석으로 10개 토픽을 추출하였고, 기후 환경, 수소 산업, 수소 차와 관련 있는 토픽 3개를 발견할 수 있었다. 이 연구 결과를 통해 수소연료전지의 세계적으로 실제적인 내용을 빠르고 효과적으로 파악하여 수소연료전지에 대한 예측하고, 우리나라의 수소연료전지 관련 국가R&D의 정책적 방향을 제시하고자 한다.

Topic Modeling Insomnia Social Media Corpus using BERTopic and Building Automatic Deep Learning Classification Model (BERTopic을 활용한 불면증 소셜 데이터 토픽 모델링 및 불면증 경향 문헌 딥러닝 자동분류 모델 구축)

  • Ko, Young Soo;Lee, Soobin;Cha, Minjung;Kim, Seongdeok;Lee, Juhee;Han, Ji Yeong;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.2
    • /
    • pp.111-129
    • /
    • 2022
  • Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from 'insomnia', a community on 'Reddit', a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups ('Negative emotions', 'Advice and help and gratitude', 'Insomnia-related diseases', 'Sleeping pills', 'Exercise and eating habits', 'Physical characteristics', 'Activity characteristics', 'Environmental characteristics') could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

Unsupervised Motion Learning for Abnormal Behavior Detection in Visual Surveillance (영상감시시스템에서 움직임의 비교사학습을 통한 비정상행동탐지)

  • Jeong, Ha-Wook;Chang, Hyung-Jin;Choi, Jin-Young
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.48 no.5
    • /
    • pp.45-51
    • /
    • 2011
  • In this paper, we propose an unsupervised learning method for modeling motion trajectory patterns effectively. In our approach, observations of an object on a trajectory are treated as words in a document for latent dirichlet allocation algorithm which is used for clustering words on the topic in natural language process. This allows clustering topics (e.g. go straight, turn left, turn right) effectively in complex scenes, such as crossroads. After this procedure, we learn patterns of word sequences in each cluster using Baum-Welch algorithm used to find the unknown parameters in a hidden markov model. Evaluation of abnormality can be done using forward algorithm by comparing learned sequence and input sequence. Results of experiments show that modeling of semantic region is robust against noise in various scene.

User Experience Analysis and Management Based on Text Mining: A Smart Speaker Case (텍스트 마이닝 기반 사용자 경험 분석 및 관리: 스마트 스피커 사례)

  • Dine Yeon;Gayeon Park;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.22 no.2
    • /
    • pp.77-99
    • /
    • 2020
  • Smart speaker is a device that provides an interactive voice-based service that can search and use various information and contents such as music, calendar, weather, and merchandise using artificial intelligence. Since AI technology provides more sophisticated and optimized services to users by accumulating data, early smart speaker manufacturers tried to build a platform through aggressive marketing. However, the frequency of using smart speakers is less than once a month, accounting for more than one third of the total, and user satisfaction is only 49%. Accordingly, the necessity of strengthening the user experience of smart speakers has emerged in order to acquire a large number of users and to enable continuous use. Therefore, this study analyzes the user experience of the smart speaker and proposes a method for enhancing the user experience of the smart speaker. Based on the analysis results in two stages, we propose ways to enhance the user experience of smart speakers by model. The existing research on the user experience of the smart speaker was mainly conducted by survey and interview-based research, whereas this study collected the actual review data written by the user. Also, this study interpreted the analysis result based on the smart speaker user experience dimension. There is an academic significance in interpreting the text mining results by developing the smart speaker user experience dimension. Based on the results of this study, we can suggest strategies for enhancing the user experience to smart speaker manufacturers.

A study on trends and predictions through analysis of linkage analysis based on big data between autonomous driving and spatial information (자율주행과 공간정보의 빅데이터 기반 연계성 분석을 통한 동향 및 예측에 관한 연구)

  • Cho, Kuk;Lee, Jong-Min;Kim, Jong Seo;Min, Guy Sik
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.2
    • /
    • pp.101-115
    • /
    • 2020
  • In this paper, big data analysis method was used to find out global trends in autonomous driving and to derive activate spatial information services. The applied big data was used in conjunction with news articles and patent document in order to analysis trend in news article and patents document data in spatial information. In this paper, big data was created and key words were extracted by using LDA (Latent Dirichlet Allocation) based on the topic model in major news on autonomous driving. In addition, Analysis of spatial information and connectivity, global technology trend analysis, and trend analysis and prediction in the spatial information field were conducted by using WordNet applied based on key words of patent information. This paper was proposed a big data analysis method for predicting a trend and future through the analysis of the connection between the autonomous driving field and spatial information. In future, as a global trend of spatial information in autonomous driving, platform alliances, business partnerships, mergers and acquisitions, joint venture establishment, standardization and technology development were derived through big data analysis.

An Exploratory research on patent trends and technological value of Organic Light-Emitting Diodes display technology (Organic Light-Emitting Diodes 디스플레이 기술의 특허 동향과 기술적 가치에 관한 탐색적 연구)

  • Kim, Mingu;Kim, Yongwoo;Jung, Taehyun;Kim, Youngmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.135-155
    • /
    • 2022
  • This study analyzes patent trends by deriving sub-technical fields of Organic Light-Emitting Diodes (OLEDs) industry, and analyzing technology value, originality, and diversity for each sub-technical field. To collect patent data, a set of international patent classification(IPC) codes related to OLED technology was defined, and OLED-related patents applied from 2005 to 2017 were collected using a set of IPC codes. Then, a large number of collected patent documents were classified into 12 major technologies using the Latent Dirichlet Allocation(LDA) topic model and trends for each technology were investigated. Patents related to touch sensor, module, image processing, and circuit driving showed an increasing trend, but virtual reality and user interface recently decreased, and thin film transistor, fingerprint recognition, and optical film showed a continuous trend. To compare the technological value, the number of forward citations, originality, and diversity of patents included in each technology group were investigated. From the results, image processing, user interface(UI) and user experience(UX), module, and adhesive technology with high number of forward citations, originality and diversity showed relatively high technological value. The results provide useful information in the process of establishing a company's technology strategy.