• Title/Summary/Keyword: Latent topic model

Search Result 79, Processing Time 0.024 seconds

Tweets analysis using a Dynamic Topic Modeling : Focusing on the 2019 Koreas-US DMZ Summit (트윗의 타임 시퀀스를 활용한 DTM 분석 : 2019 남북미정상회동 이벤트를 중심으로)

  • Ko, EunJi;Choi, SunYoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.308-313
    • /
    • 2021
  • In this study, tweets about the 2019 Koreas-US DMZ Summit were collected along with a time sequence and analyzed by a sequential topic modeling method, Dynamic Topic Modeling(DTM). In microblogging services such as Twitter, unstructured data that mixes news and an opinion about a single event occurs at the same time on a large scale, and information and reactions are produced in the same message format. Therefore, to grasp a topic trend, the contextual meaning can be found only by performing pattern analysis reflecting the characteristics of sequential data. As a result of calculating the DTM after obtaining the topic coherence score and evaluating the Latent Dirichlet Allocation(LDA), 30 topics related to news reports and opinions were derived, and the probability of occurrence of each topic and keywords were dynamically evolving. In conclusion, the study found that DTM is a suitable model for analyzing the trend of integrated topics in a specific event over time.

Convergence Study on Research Topics for Thyroid Cancer in Korea (국내 갑상선암 논문 토픽에 대한 융합연구)

  • Yang, Ji-Yeon
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.2
    • /
    • pp.75-81
    • /
    • 2019
  • The purpose of this study was to perform a convergence study for the investigation of the trend of research topics related to thyroid cancer in Korea. We collected related research papers from DBpia and employed LDA-based topic model. In result, we identified four research topics, each of which concerns "Surgery", "Disease aggressiveness", "Survival analysis", and "Well-being of patients". With multinomial logistic regression, we found significant time trend, where "Surgery"-related topic was popular before 2000, topics regarding "Disease aggressiveness" and "Survival analysis" were frequently addressed in the 2000s, and "Survival analysis" and especially "Well-being of patients" have been pursued since 2010. The findings would serve as a reference guide for research directions. Future work may examine whether the recent change in research topics is observed in other diseases.

Data Analysis of Dropouts of University Students Using Topic Modeling (토픽모델링을 활용한 대학생의 중도탈락 데이터 분석)

  • Jeong, Do-Heon;Park, Ju-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.88-95
    • /
    • 2021
  • This study aims to provide implications for establishing support policies for students by empirically analyzing data on university students dropouts. To this end, data of students enrolled in D University after 2017 were sampled and collected. The collected data was analyzed using topic modeling(LDA: Latent Dirichlet Allocation) technique, which is a probabilistic model based on text mining. As a result of the study, it was found that topics that were characteristic of dropout students were found, and the classification performance between groups through topics was also excellent. Based on these results, a specific educational support system was proposed to prevent dropout of university students. This study is meaningful in that it shows the use of text mining techniques in the education field and suggests an education policy based on data analysis.

Reviews Analysis of Korean Clinics Using LDA Topic Modeling (토픽 모델링을 활용한 한의원 리뷰 분석과 마케팅 제언)

  • Kim, Cho-Myong;Jo, A-Ram;Kim, Yang-Kyun
    • The Journal of Korean Medicine
    • /
    • v.43 no.1
    • /
    • pp.73-86
    • /
    • 2022
  • Objectives: In the health care industry, the influence of online reviews is growing. As medical services are provided mainly by providers, those services have been managed by hospitals and clinics. However, direct promotions of medical services by providers are legally forbidden. Due to this reason, consumers, like patients and clients, search a lot of reviews on the Internet to get any information about hospitals, treatments, prices, etc. It can be determined that online reviews indicate the quality of hospitals, and that analysis should be done for sustainable hospital marketing. Method: Using a Python-based crawler, we collected reviews, written by real patients, who had experienced Korean medicine, about more than 14,000 reviews. To extract the most representative words, reviews were divided by positive and negative; after that reviews were pre-processed to get only nouns and adjectives to get TF(Term Frequency), DF(Document Frequency), and TF-IDF(Term Frequency - Inverse Document Frequency). Finally, to get some topics about reviews, aggregations of extracted words were analyzed by using LDA(Latent Dirichlet Allocation) methods. To avoid overlap, the number of topics is set by Davis visualization. Results and Conclusions: 6 and 3 topics extracted in each positive/negative review, analyzed by LDA Topic Model. The main factors, consisting of topics were 1) Response to patients and customers. 2) Customized treatment (consultation) and management. 3) Hospital/Clinic's environments.

A Study on the Research Topics and Trends in Korean Journal of Remote Sensing: Focusing on Natural & Environmental Disasters (토픽모델링을 이용한 대한원격탐사학회지의 연구주제 분류 및 연구동향 분석: 자연·환경재해 분야를 중심으로)

  • Kim, Taeyong;Park, Hyemin;Heo, Junyong;Yang, Minjune
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_2
    • /
    • pp.1869-1880
    • /
    • 2021
  • Korean Journal of Remote Sensing (KJRS), leading the field of remote sensing and GIS in South Korea for over 37 years, has published interdisciplinary research papers. In this study, we performed the topic modeling based on Latent Dirichlet Allocation (LDA), a probabilistic generative model, to identify the research topics and trends using 1) the whole articles, and 2) specific articles related to natural and environmental disasters published in KJRS by analyzing titles, keywords, and abstracts. The results of LDA showed that 4 topics('Polar', 'Hydrosphere', 'Geosphere', and 'Atmosphere') were identified in the whole articles and the topic of 'Polar' was dominant among them (linear slope=3.51 × 10-3, p<0.05) over time. For the specific articles related to natural and environmental disasters, the optimal number of topics were 7 ('Marine pollution', 'Air pollution', 'Volcano', 'Wildfire', 'Flood', 'Drought', and 'Heavy rain') and the topic of 'Air pollution' was dominant (linear slope=2.61 × 10-3, p<0.05) over time. The results from this study provide the history and insight into natural and environmental disasters in KRJS with multidisciplinary researchers.

Topic Model Analysis of Research Themes and Trends in the Journal of Economic and Environmental Geology (기계학습 기반 토픽모델링을 이용한 학술지 "자원환경지질"의 연구주제 분류 및 연구동향 분석)

  • Kim, Taeyong;Park, Hyemin;Heo, Junyong;Yang, Minjune
    • Economic and Environmental Geology
    • /
    • v.54 no.3
    • /
    • pp.353-364
    • /
    • 2021
  • Since the mid-twentieth century, geology has gradually evolved as an interdisciplinary context in South Korea. The journal of Economic and Environmental Geology (EEG) has a long history of over 52 years and published interdisciplinary articles based on geology. In this study, we performed a literature review using topic modeling based on Latent Dirichlet Allocation (LDA), an unsupervised machine learning model, to identify geological topics, historical trends (classic topics and emerging topics), and association by analyzing titles, keywords, and abstracts of 2,571 publications in EEG during 1968-2020. The results showed that 8 topics ('petrology and geochemistry', 'hydrology and hydrogeology', 'economic geology', 'volcanology', 'soil contaminant and remediation', 'general and structural geology', 'geophysics and geophysical exploration', and 'clay mineral') were identified in the EEG. Before 1994, classic topics ('economic geology', 'volcanology', and 'general and structure geology') were dominant research trends. After 1994, emerging topics ('hydrology and hydrogeology', 'soil contaminant and remediation', 'clay mineral') have arisen, and its portion has gradually increased. The result of association analysis showed that EEG tends to be more comprehensive based on 'economic geology'. Our results provide understanding of how geological research topics branch out and merge with other fields using a useful literature review tool for geological research in South Korea.

Learning Similarity with Probabilistic Latent Semantic Analysis for Image Retrieval

  • Li, Xiong;Lv, Qi;Huang, Wenting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.4
    • /
    • pp.1424-1440
    • /
    • 2015
  • It is a challenging problem to search the intended images from a large number of candidates. Content based image retrieval (CBIR) is the most promising way to tackle this problem, where the most important topic is to measure the similarity of images so as to cover the variance of shape, color, pose, illumination etc. While previous works made significant progresses, their adaption ability to dataset is not fully explored. In this paper, we propose a similarity learning method on the basis of probabilistic generative model, i.e., probabilistic latent semantic analysis (PLSA). It first derives Fisher kernel, a function over the parameters and variables, based on PLSA. Then, the parameters are determined through simultaneously maximizing the log likelihood function of PLSA and the retrieval performance over the training dataset. The main advantages of this work are twofold: (1) deriving similarity measure based on PLSA which fully exploits the data distribution and Bayes inference; (2) learning model parameters by maximizing the fitting of model to data and the retrieval performance simultaneously. The proposed method (PLSA-FK) is empirically evaluated over three datasets, and the results exhibit promising performance.

WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS (WV-BTM: SNS 단문의 주제 분석을 위한 토픽 모델 정확도 개선 기법)

  • Song, Ae-Rin;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.51-58
    • /
    • 2018
  • As the amount of users and data of NS explosively increased, research based on SNS Big data became active. In social mining, Latent Dirichlet Allocation(LDA), which is a typical topic model technique, is used to identify the similarity of each text from non-classified large-volume SNS text big data and to extract trends therefrom. However, LDA has the limitation that it is difficult to deduce a high-level topic due to the semantic sparsity of non-frequent word occurrence in the short sentence data. The BTM study improved the limitations of this LDA through a combination of two words. However, BTM also has a limitation that it is impossible to calculate the weight considering the relation with each subject because it is influenced more by the high frequency word among the combined words. In this paper, we propose a technique to improve the accuracy of existing BTM by reflecting semantic relation between words.

Analysis of Research Topics and Trends on COVID-19 in Korea Using Latent Dirichlet Allocation (LDA)

  • Heo, Seong-Min;Yang, Ji-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.83-91
    • /
    • 2020
  • This study aims to identify research topics and examine the trend of Covid19-related papers on DBpia. Applying latent Dirichlet allocation (LDA), we have extracted seven research topics, each of which concerns "International Dynamics", "Technology & Security", "Psychological Impact", "Biomedical-Related", "Economic Impact", "Online Education", and "Religion-Related". In addition, we used the multinomial logistic model to examine the trend of research topics. We found that the papers mainly cover topics related to "International Dynamics" and "Biomedical-Related" before June 2020, but the topics have become diverse since then. In particular, topics regarding "Economic Impact", "Online Education" and "Psychological Impact" has drawn increased attention of researchers. The findings would provide a guideline for collaboration in Covid19-related research, and could serve as a reference work for active research.

Modeling Topic Extraction-based Sentiment Analysis Based on User Reviews

  • Kim, Tae-Yeun
    • Journal of Integrative Natural Science
    • /
    • v.14 no.2
    • /
    • pp.35-40
    • /
    • 2021
  • In this paper, we proposed a multi-subject-level sentiment analysis model for user reviews using the Latent Dirichlet Allocation (LDA) method targeting user-generated content (UGC). Data were collected from users' online reviews of hotels in major tourist cities in the world, and 30 hotel-related topics were extracted using the entire user reviews through the LDA technique. Six major hotel-related themes (Cleanliness, Location, Rooms, Service, Sleep Quality, and Value) were selected from the extracted themes, and emotions were evaluated for sentences corresponding to six themes in each user review in the proposed sentiment analysis model. Sentiment was analyzed using a dictionary. In addition, the performance of the proposed sentiment analysis model was evaluated by comparing the emotional values for each subject in the user reviews and the detailed scores evaluated by the user directly for each hotel attribute. As a result of analyzing the values of accuracy and recall of the proposed sentiment analysis model, it was analyzed that the efficiency was high.