• 제목/요약/키워드: latent dirichlet allocation (LDA)

검색결과 181건 처리시간 0.022초

기계학습 기반 토픽모델링을 이용한 학술지 "자원환경지질"의 연구주제 분류 및 연구동향 분석 (Topic Model Analysis of Research Themes and Trends in the Journal of Economic and Environmental Geology)

  • 김태용;박혜민;허준용;양민준
    • 자원환경지질
    • /
    • 제54권3호
    • /
    • pp.353-364
    • /
    • 2021
  • 국내 지질학의 연구 분야는 20세기 중반 이후부터 꾸준하게 발전되어왔다. 학술지 "자원환경지질"은 국내 지질학을 대표하는 역사가 긴 학술지로 지질학을 바탕으로 하는 융복합연구 논문이 게재되고 있다. 본 연구는 학술지 "자원환경지질"에 게재된 논문을 대상으로 문헌 고찰(literature review)을 수행하여 지질학의 역사와 발전에 대해 논의하고자 한다. 1968년부터 2020년까지 총 2,571편의 논문 제목, 주제어, 다국어 초록을 수집하였으며, Latent Dirichlet Allocation (LDA) 기반 토픽모델링을 실시하여 연구 주제를 분류하고 연구 동향과 주제간 연관성을 확인하였다. 학술지 "자원환경지질"은 총 8개의 연구주제('암석학 및 지구화학', '수문학 및 수리지질학', '광상학', '화산학', '토양오염 및 복원학', '기초지질 및 구조지질학', '지구물리 및 물리탐사', '점토광물')로 분류할 수 있었다. 1994년 이전에는 '광상학', '화산학', '기초지질 및 구조지질학'의 연구주제들이 활발하게 연구되었으며, 이후 '수문학 및 수리지질학', '토양오염 및 복원학', '지구물리 및 물리탐사', '점토광물'의 연구주제들이 성행하였다. 연관성분석(network analysis)결과, 학술지 "자원환경지질"은 '광상학'을 기반으로 융복합적 연구 논문들이 게재되었다는 것을 확인하였다. 본 연구의 결과는 지질학을 다루는 연구자들에게 문헌 고찰의 새로운 방법론을 제시하여 지질학의 역사에 대한 이해를 제공했음에 의의가 있다.

SVD-LDA: A Combined Model for Text Classification

  • Hai, Nguyen Cao Truong;Kim, Kyung-Im;Park, Hyuk-Ro
    • Journal of Information Processing Systems
    • /
    • 제5권1호
    • /
    • pp.5-10
    • /
    • 2009
  • Text data has always accounted for a major portion of the world's information. As the volume of information increases exponentially, the portion of text data also increases significantly. Text classification is therefore still an important area of research. LDA is an updated, probabilistic model which has been used in many applications in many other fields. As regards text data, LDA also has many applications, which has been applied various enhancements. However, it seems that no applications take care of the input for LDA. In this paper, we suggest a way to map the input space to a reduced space, which may avoid the unreliability, ambiguity and redundancy of individual terms as descriptors. The purpose of this paper is to show that LDA can be perfectly performed in a "clean and clear" space. Experiments are conducted on 20 News Groups data sets. The results show that the proposed method can boost the classification results when the appropriate choice of rank of the reduced space is determined.

URI 중의성 해소 및 오류 감소를 위한 LDA 기반 접근법 (LDA-based Approach for URI Disambiguation and Error Reduction)

  • 김지성;김영식;함영균;황도삼;최기선
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2014년도 제26회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.107-111
    • /
    • 2014
  • URI 중의성 해소 문제는 주어진 문서 내의 특정 단어에 연결 가능한 여러 URI가 주어졌을 때 진짜 URI 하나를 선택해내는 문제라고 할 수 있다. 이 문제는 다양한 해결법들이 존재할 수 있지만 기존에 연구된 문서의 문맥 간 유사도를 이용하여 해결하는 방법을 본 논문에서는 사용한다. 문맥 간 유사도를 이용하는 방법은 영어 디비피디아 URI spotting에서 TF*ICF방법으로 이미 연구가 되어있다. 본 논문에서는 Latent Dirichlet Allocation을 이용하여 URI 중의성 해소 문제를 다룰 것이며 그 범위를 한국어 디비피디아로 한정한다. 새로 제안하는 방법이 URI 중의성 해소 문제를 얼마나 잘 해결하며, 기존의 연구와 비교하여 얼마나 향상될 수 있는지를 분석한다. 또한 기존의 방법과 새로 제안한 방법 각자가 고유하게 풀 수 있는 문제가 존재함을 보이고, 두 방법을 병합하였을 때 보다 높은 성능에 도달할 수 있음을 전망한다.

  • PDF

Modeling Topic Extraction-based Sentiment Analysis Based on User Reviews

  • Kim, Tae-Yeun
    • 통합자연과학논문집
    • /
    • 제14권2호
    • /
    • pp.35-40
    • /
    • 2021
  • In this paper, we proposed a multi-subject-level sentiment analysis model for user reviews using the Latent Dirichlet Allocation (LDA) method targeting user-generated content (UGC). Data were collected from users' online reviews of hotels in major tourist cities in the world, and 30 hotel-related topics were extracted using the entire user reviews through the LDA technique. Six major hotel-related themes (Cleanliness, Location, Rooms, Service, Sleep Quality, and Value) were selected from the extracted themes, and emotions were evaluated for sentences corresponding to six themes in each user review in the proposed sentiment analysis model. Sentiment was analyzed using a dictionary. In addition, the performance of the proposed sentiment analysis model was evaluated by comparing the emotional values for each subject in the user reviews and the detailed scores evaluated by the user directly for each hotel attribute. As a result of analyzing the values of accuracy and recall of the proposed sentiment analysis model, it was analyzed that the efficiency was high.

Analyzing Customer Experience in Hotel Services Using Topic Modeling

  • Nguyen, Van-Ho;Ho, Thanh
    • Journal of Information Processing Systems
    • /
    • 제17권3호
    • /
    • pp.586-598
    • /
    • 2021
  • Nowadays, users' reviews and feedback on e-commerce sites stored in text create a huge source of information for analyzing customers' experience with goods and services provided by a business. In other words, collecting and analyzing this information is necessary to better understand customer needs. In this study, we first collected a corpus with 99,322 customers' comments and opinions in English. From this corpus we chose the best number of topics (K) using Perplexity and Coherence Score measurements as the input parameters for the model. Finally, we conducted an experiment using the latent Dirichlet allocation (LDA) topic model with K coefficients to explore the topic. The model results found hidden topics and keyword sets with high probability that are interesting to users. The application of empirical results from the model will support decision-making to help businesses improve products and services as well as business management and development in the field of hotel services.

Analyzing User Feedback on a Fan Community Platform 'Weverse': A Text Mining Approach

  • Thi Thao Van Ho;Mi Jin Noh;Yu Na Lee;Yang Sok Kim
    • 스마트미디어저널
    • /
    • 제13권6호
    • /
    • pp.62-71
    • /
    • 2024
  • This study applies topic modeling to uncover user experience and app issues expressed in users' online reviews of a fan community platform, Weverse on Google Play Store. It allows us to identify the features which need to be improved to enhance user experience or need to be maintained and leveraged to attract more users. Therefore, we collect 88,068 first-level English online reviews of Weverse on Google Play Store with Google-Play-Scraper tool. After the initial preprocessing step, a dataset of 31,861 online reviews is analyzed using Latent Dirichlet Allocation (LDA) topic modeling with Gensim library in Python. There are 5 topics explored in this study which highlight significant issues such as network connection error, delayed notification, and incorrect translation. Besides, the result revealed the app's effectiveness in fostering not only interaction between fans and artists but also fans' mutual relationships. Consequently, the business can strengthen user engagement and loyalty by addressing the identified drawbacks and leveraging the platform for user communication.

Paying Back to Good Deeds: A Text Mining Approach to Explore Don-jjul as Pro-consumption Behavior

  • Hojin Choo;Sue Hyun Lee
    • Asia Marketing Journal
    • /
    • 제26권2호
    • /
    • pp.104-128
    • /
    • 2024
  • More consumers are choosing pro-consumption for social change, but scholars know little about why and how consumers engage in pro-consumption behaviors. A newly emerged pro-consumption behavior called "Don-jjul," which appeared during the COVID-19 pandemic in South Korea, refers to compensating businesses that have engaged in altruistic actions by boosting their sales. This study used Latent Dirichlet Allocation (LDA) of topic modeling, sentiment analysis, and in-depth interviews to investigate the perceptions, motivations, and emotions regarding Don-jjul. As a result, the study revealed pro-consumers' perceptions of Don-jjul as "collective pro-consumption for contributing to social well-being." Don-jjul has two main motives: "supporting underdogs with difficulties" and "compensating good businesses economically." We also found two ambivalent emotions evoked by Don-jjul: "respect for good business owners" and "concerns regarding the misuse of Don-jjul." The results contribute to pro-consumption research for social well-being, providing business opportunities for retailers and CSR managers with a deep understanding of pro-consumers.

Topic Extraction and Classification Method Based on Comment Sets

  • Tan, Xiaodong
    • Journal of Information Processing Systems
    • /
    • 제16권2호
    • /
    • pp.329-342
    • /
    • 2020
  • In recent years, emotional text classification is one of the essential research contents in the field of natural language processing. It has been widely used in the sentiment analysis of commodities like hotels, and other commentary corpus. This paper proposes an improved W-LDA (weighted latent Dirichlet allocation) topic model to improve the shortcomings of traditional LDA topic models. In the process of the topic of word sampling and its word distribution expectation calculation of the Gibbs of the W-LDA topic model. An average weighted value is adopted to avoid topic-related words from being submerged by high-frequency words, to improve the distinction of the topic. It further integrates the highest classification of the algorithm of support vector machine based on the extracted high-quality document-topic distribution and topic-word vectors. Finally, an efficient integration method is constructed for the analysis and extraction of emotional words, topic distribution calculations, and sentiment classification. Through tests on real teaching evaluation data and test set of public comment set, the results show that the method proposed in the paper has distinct advantages compared with other two typical algorithms in terms of subject differentiation, classification precision, and F1-measure.

A Study on Leadership Trends from the Perspective of Domestic Researcher's Using BERTopic and LDA

  • Sung-Su, SHIN;Hoe-Chang, Yang
    • 동아시아경상학회지
    • /
    • 제11권1호
    • /
    • pp.53-71
    • /
    • 2023
  • Purpose - This study aims to find clues necessary for the direction of leadership development suitable for the current situation by exploring the direction in which leadership has been studied from the perspective of domestic researchers, along with the arrangement of leadership theories studied in various ways. Research design, data, and methodology - A total of 7,425 papers were obtained due to the search, and 5,810 papers with English abstracts were used for analysis. For analysis, word frequency analysis, word clouding, and co-occurrence were confirmed using Python 3.7. In addition, after classifying topics related to research trends through BERTopic and LDA, trends were identified through dynamic topic modeling and OLS regression analysis. Result - As a result of the BERTopic, 14 topics such as 'Leadership management and performance' and 'Sports leadership' were derived. As a result of conducting LDA on 1,976 outliers, five topics were derived. As a result of trend analysis on topics by year, it was confirmed that five topics, such as 'military police leadership' received relative attention. Conclusion - Through the results of this study, a study on the reinterpretation of past leadership studies, a study on LMX with an expanded perspective, and a study on integrated leadership sub-factors of modern leadership theory were proposed.

Online Shopping Research Trend Analysis Using BERTopic and LDA

  • Yoon-Hwang, JU;Woo-Ryeong, YANG;Hoe-Chang, YANG
    • 융합경영연구
    • /
    • 제11권1호
    • /
    • pp.21-30
    • /
    • 2023
  • Purpose: As one of the ongoing studies on the distribution industry, the purpose of this study is to identify the research trends on online shopping so far to propose not only the development of online shopping companies but also the possibility of coexistence between online and offline retailers and the development of the distribution industry. Research design, data and methodology: In this study, the English abstracts of 645 papers on online shopping registered in scienceON were obtained. For the analysis through BERTopic and LDA using Python 3.7 and identifying which topics were interesting to researchers. Results: As a result of word frequency analysis and co-occurrence analysis, it was found that studies related to online shopping were frequently conducted on factors such as products, services, and shopping malls. As a result of BERTopic, five topics such as 'service quality' and 'sales strategy' were derived, and as a result of LDA, three topics including 'purchase experience' were derived. It was confirmed that 'Customer Recommendation' and 'Fashion Mall' showed relatively high interest, and 'Sales Strategy' showed relatively low interest. Conclusions: It was suggested that more diverse studies related to the online shopping mall platform, sales content, and usage influencing factors are needed to develop the online shopping industry.