• Title/Summary/Keyword: 토픽분석

Search Result 660, Processing Time 0.035 seconds

Tweets analysis using a Dynamic Topic Modeling : Focusing on the 2019 Koreas-US DMZ Summit (트윗의 타임 시퀀스를 활용한 DTM 분석 : 2019 남북미정상회동 이벤트를 중심으로)

  • Ko, EunJi;Choi, SunYoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.308-313
    • /
    • 2021
  • In this study, tweets about the 2019 Koreas-US DMZ Summit were collected along with a time sequence and analyzed by a sequential topic modeling method, Dynamic Topic Modeling(DTM). In microblogging services such as Twitter, unstructured data that mixes news and an opinion about a single event occurs at the same time on a large scale, and information and reactions are produced in the same message format. Therefore, to grasp a topic trend, the contextual meaning can be found only by performing pattern analysis reflecting the characteristics of sequential data. As a result of calculating the DTM after obtaining the topic coherence score and evaluating the Latent Dirichlet Allocation(LDA), 30 topics related to news reports and opinions were derived, and the probability of occurrence of each topic and keywords were dynamically evolving. In conclusion, the study found that DTM is a suitable model for analyzing the trend of integrated topics in a specific event over time.

An Exploratory Research Trends Analysis in Journal of the Korea Contents Association using Topic Modeling (토픽 모델링을 활용한 한국콘텐츠학회 논문지 연구 동향 탐색)

  • Seok, Hye-Eun;Kim, Soo-Young;Lee, Yeon-Su;Cho, Hyun-Young;Lee, Soo-Kyoung;Kim, Kyoung-Hwa
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.95-106
    • /
    • 2021
  • The purpose of this study is to derive major topics in content R&D and provide directions for academic development by exploring research trends over the past 20 years using topic modeling targeting 9,858 papers published in the Journal of the Korean Contents Association. To secure the reliability and validity of the extracted topics, not only the quantitative evaluation technique but also the qualitative technique were applied step-by-step and repeated until a corpus of the level agreed upon by the researchers was generated, and detailed analysis procedures were presented accordingly. As a result of the analysis, 8 core topics were extracted. This shows that the Korean Contents Association is publishing convergence and complex research papers in various fields without limiting to a specific academic field. Also, before 2012, the proportion of topics in the field of engineering and technology appeared relatively high, while after 2012, the proportion of topics in the field of social sciences appeared relatively high. Specifically, the topic of 'social welfare' showed a fourfold increase in the second half compared to the first half. Through topic-specific trend analysis, we focused on the turning point in time at which the inflection point of the trend line appeared, explored the external variables that affected the research trend of the topic, and identified the relationship between the topic and the external variable. It is hoped that the results of this study can provide implications for active discussions in domestic content-related R&D and industrial fields.

Trend Analysis of Research Related to Personality of University Students Through Network Analysis (네트워크 분석을 통한 대학생 인성 관련 연구의 동향 분석)

  • Kim, Sei-Kyung
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.47-56
    • /
    • 2021
  • The purpose of this study is to use network analysis to identify trends in university personality-related studies and provide implications for future research directions. For the purpose of this study, 194 papers related to personality of university students published in Korean scholarly journals. First, research began to be published in 2004, slightly increased in 2012, continued an upward curve from 2015, peaked in 2017, and is confirmed to be a downward trend. Second, the main keywords with the centrality analysis were 'society' and 'cultivation'. Third, keywords on the cognitive side and individual dimension of personality in the first period (2004 - 2010), social dimension and emotional side of personality in the second period (2011-2015), and social level and cognitive, emotional, and behavioral aspects of personality in the third period (2016-2020). Fourth, Topic 2 consisted of keywords of ability, life, interpersonal, satisfaction, and adaptation, and Topic 1 consisted of competence, morality, citizens, society, and practice. Fifth, Topic 4 alone in the first period, in the order of Topic 1 and Topic 2 in the second period, and in the order of Topic 2 and Topic 1 in the third period.

A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS (LDA, Top2Vec, BERTopic 모형의 토픽모델링 비교 연구 - 국외 문헌정보학 분야를 중심으로 -)

  • Yong-Gu Lee;SeonWook Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.1
    • /
    • pp.5-30
    • /
    • 2024
  • The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.

A Topic Analysis of Abstracts in Journal of Korean Data Analysis Society (한국자료분석학회지에 대한 토픽분석)

  • Kang, Changwan;Kim, Kyu Kon;Choi, Seungbae
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2907-2915
    • /
    • 2018
  • Journal of the Korean Data Analysis Society founded in 1998 has played the role of a major application journal. In this study, we checked the objective of this journal by checking the abstracts for 10 years. Abstract data was crawled from the online journal site (kdas.jems.or.kr) and analyzed by topic model. As a result, we found 18 topics from 2680 abstracts that had several contents, for example, nursing, marketing, economics, regression, factor analysis, data mining and statistical inferences. Topic1 (regression) is most frequent with 460 documents and we found the usefulness of regression in the applied science area. We confirmed the significant 10 association rules using by Fisher's exact test. Also, for exploring the trend of topics, we conducted the topic analysis for two periods which are 2006-2011 period and 2012-2016 period. We found that the control study was more frequent than survey study over time and regression and factor analysis were frequent regardless of time.

Topic-Network based Topic Shift Detection on Twitter (트위터 데이터를 이용한 네트워크 기반 토픽 변화 추적 연구)

  • Jin, Seol A;Heo, Go Eun;Jeong, Yoo Kyung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.1
    • /
    • pp.285-302
    • /
    • 2013
  • This study identified topic shifts and patterns over time by analyzing an enormous amount of Twitter data whose characteristics are high accessibility and briefness. First, we extracted keywords for a certain product and used them for representing the topic network allows for intuitive understanding of keywords associated with topics by nodes and edges by co-word analysis. We conducted temporal analysis of term co-occurrence as well as topic modeling to examine the results of network analysis. In addition, the results of comparing topic shifts on Twitter with the corresponding retrieval results from newspapers confirm that Twitter makes immediate responses to news media and spreads the negative issues out quickly. Our findings may suggest that companies utilize the proposed technique to identify public's negative opinions as quickly as possible and to apply for the timely decision making and effective responses to their customers.

Comparative Analysis of the Keywords in Taekwondo News Articles by Year: Applying Topic Modeling Method (태권도 뉴스기사의 연도별 주제어 비교분석: 토픽모델링 적용)

  • Jeon, Minsoo;Lim, Hyosung
    • Journal of Digital Convergence
    • /
    • v.19 no.11
    • /
    • pp.575-583
    • /
    • 2021
  • This study aims to analyze Taekwondo trends according to news articles by year by applying topic modeling. In order to examine the Taekwondo trend through media reports, articles including news articles and Taekwondo specialized media articles were collected through Big Kinds of the Korea Press Foundation. The search period was divided into three sections: before 2000, 2001~2010, and 2011~2020. A total of 12,124 items were selected as research data. For topic analysis, pre-processing was performed, and topic analysis was performed using the LDA algorithm. In this case, python 3 was applied for all analysis. First, as a result of analyzing the topics of media articles by year, 'World' was the most common keyword before 2000. 'South and North Korea' was next common and 'Olympic' was the third commonest topic. From 2001 to 2010, 'World' was the most common topic, followed by 'Association' and 'World Taekwondo'. From 2011 to 2020, 'World', 'Demonstration', and 'Kukkiwon' was the most common topic in that order. Second, as a result of analyzing news articles before 2000 by topic modeling, topics were divided into two categories. Specifically, Topic 1 was selected as 'South-North Korea sports exchange' and Topic 2 was selected as 'Adoption of Olympic demonstration events'. Third, as a result of analyzing news articles from 2001 to 2010 by topic modeling, three topics were selected. Topic 1 was selected as 'Taekwondo Demonstration Performance and Corruption', Topic 2 was selected as 'Muju Taekwondo Park Creation', and Topic 3 was selected as 'World Taekwondo Festival'. Fourth, as a result of analyzing news articles from 2011 to 2020 by topic modeling, three topics were selected. Topic 1 was selected as 'Successful Hosting of the 2018 Pyeongchang Winter Olympics', Topic 2 was selected as 'North-South Korea Taekwondo Joint Demonstration Performance', and Topic 3 was selected as '2017 Muju World Taekwondo Championships'.

Analysis of Potential Bugs using Topic Model of Open Source Project (오픈소스 프로젝트의 토픽 모델링을 통한 잠재결함 분석 연구)

  • Lee, Jung-Been;Lee, Taek;In, Hoh Peter
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.551-552
    • /
    • 2017
  • 하나의 프로젝트에는 다양한 기능과 역할을 가진 소스코드가 존재한다. 그러나 기존 정적 분석 도구들은 이러한 특성을 고려하지 않고, 모든 소스코드에 동일한 탐색 정책과 우선순위를 적용하고 있다. 본 연구에서는 오픈소스 프로젝트로부터 수집한 소스코드들을 토픽모델링을 이용하여 특정 토픽으로 분류하고, 분류된 토픽에 해당되는 코드 안에서 높은 영향력을 갖는 잠재결함(Potential Bug)의 특징을 분석하였다. 이 결과를 바탕으로 개발자에게 개발 중인 소스코드의 특성에 따라 어떤 잠재결함에 더 우선순위를 두어야 하는지에 대한 지침을 제공할 수 있다.

A Study on the Analysis of R&D Trends and the Development of Logic Models for Autonomous Vehicles (자율주행자동차 R&D 동향분석과 논리모형 개발에 대한 연구)

  • Kim, Gil-Lae
    • Journal of Digital Convergence
    • /
    • v.19 no.5
    • /
    • pp.31-39
    • /
    • 2021
  • This study collected 1,870 English news articles related to research and development of autonomous vehicles in order to identify various issues emerging in the research and development process of autonomous vehicles at home and abroad, and conducted topic modeling after data pre-processing. As a result of topic modeling, we extracted 20 topics, and we performed naming operations for topics and interpreted their meanings. A logical model for autonomous vehicle research and development projects was presented in response to the R&D process of input, activity, output, and outcome of derived topics. The analysis results of this study will be used as basic data to accurately determine the progress of domestic and foreign self-driving car research and development projects and prepare for the rapidly changing technology development.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.