• Title/Summary/Keyword: 토픽모델

Search Result 176, Processing Time 0.033 seconds

A Reply Graph-based Social Mining Method with Topic Modeling (토픽 모델링을 이용한 댓글 그래프 기반 소셜 마이닝 기법)

  • Lee, Sang Yeon;Lee, Keon Myung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.640-645
    • /
    • 2014
  • Many people use social network services as to communicate, to share an information and to build social relationships between others on the Internet. Twitter is such a representative service, where millions of tweets are posted a day and a huge amount of data collection has been being accumulated. Social mining that extracts the meaningful information from the massive data has been intensively studied. Typically, Twitter easily can deliver and retweet the contents using the following-follower relationships. Topic modeling in tweet data is a good tool for issue tracking in social media. To overcome the restrictions of short contents in tweets, we introduce a notion of reply graph which is constructed as a graph structure of which nodes correspond to users and of which edges correspond to existence of reply and retweet messages between the users. The LDA topic model, which is a typical method of topic modeling, is ineffective for short textual data. This paper introduces a topic modeling method that uses reply graph to reduce the number of short documents and to improve the quality of mining results. The proposed model uses the LDA model as the topic modeling framework for tweet issue tracking. Some experimental results of the proposed method are presented for a collection of Twitter data of 7 days.

Investigation of Research Trends in the D(Data)·N(Network)·A(A.I) Field Using the Dynamic Topic Model (다이나믹 토픽 모델을 활용한 D(Data)·N(Network)·A(A.I) 중심의 연구동향 분석)

  • Wo, Chang Woo;Lee, Jong Yun
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.21-29
    • /
    • 2020
  • The Topic Modeling research, the methodology for deduction keyword within literature, has become active with the explosion of data from digital society transition. The research objective is to investigate research trends in D.N.A.(Data, Network, Artificial Intelligence) field using DTM(Dynamic Topic Model). DTM model was applied to the 1,519 of research projects with SW·A.I technology classifications among ICT(Information and Communication Technology) field projects between 6 years(2015~2020). As a result, technology keyword for D.N.A. field; Big data, Cloud, Artificial Intelligence, extended keyword; Unstructured, Edge Computing, Learning, Recognition was appeared every year, and accordingly that the above technology is being researched inclusively from other projects can be inferred. Finally, it is expected that the result from this paper become useful for future policy·R&D planning and corporation's technology·marketing strategy.

Examining Suicide Tendency Social Media Texts by Deep Learning and Topic Modeling Techniques (딥러닝 및 토픽모델링 기법을 활용한 소셜 미디어의 자살 경향 문헌 판별 및 분석)

  • Ko, Young Soo;Lee, Ju Hee;Song, Min
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.3
    • /
    • pp.247-264
    • /
    • 2021
  • This study aims to create a deep learning-based classification model to classify suicide tendency by suicide corpus constructed for the present study. Also, to analyze suicide factors, the study classified suicide tendency corpus into detailed topics by using topic modeling, an analysis technique that automatically extracts topics. For this purpose, 2,011 documents of the suicide-related corpus collected from social media naver knowledge iN were directly annotated into suicide-tendency documents or non-suicide-tendency documents based on suicide prevention education manual issued by the Central Suicide Prevention Center, and we also conducted the deep learning model(LSTM, BERT, ELECTRA) performance evaluation based on the classification model, using annotated corpus data. In addition, one of the topic modeling techniques, LDA identified suicide factors by classifying thematic literature, and co-word analysis and visualization were conducted to analyze the factors in-depth.

A Topic Analysis of Abstracts in Journal of Korean Data Analysis Society (한국자료분석학회지에 대한 토픽분석)

  • Kang, Changwan;Kim, Kyu Kon;Choi, Seungbae
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2907-2915
    • /
    • 2018
  • Journal of the Korean Data Analysis Society founded in 1998 has played the role of a major application journal. In this study, we checked the objective of this journal by checking the abstracts for 10 years. Abstract data was crawled from the online journal site (kdas.jems.or.kr) and analyzed by topic model. As a result, we found 18 topics from 2680 abstracts that had several contents, for example, nursing, marketing, economics, regression, factor analysis, data mining and statistical inferences. Topic1 (regression) is most frequent with 460 documents and we found the usefulness of regression in the applied science area. We confirmed the significant 10 association rules using by Fisher's exact test. Also, for exploring the trend of topics, we conducted the topic analysis for two periods which are 2006-2011 period and 2012-2016 period. We found that the control study was more frequent than survey study over time and regression and factor analysis were frequent regardless of time.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.

Technology Trend Analysis in the Automotive Semiconductor Industry using Topic Model and Patent Analysis (토픽모델 및 특허분석을 통한 차량용 반도체 기술 추세 분석)

  • Nam, Daekyeong;Choi, Gyunghyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.21 no.3
    • /
    • pp.1155-1178
    • /
    • 2018
  • Future automobiles are evolving into movable living spaces capable of eco-friendly autonomous driving. The role of electrically processing, controlling, and commanding various information in the vehicle is essential. It is expected that the automotive semiconductor will play a key role in the future automobile such as self-driving and eco-friendly automobile. In order to foster the automotive semiconductor industry, it is necessary to grasp technology trends and to acquire technology and quality that reflects the requirements in advance, thereby achieving technological innovation with industrial competitiveness. However, there is a lack of systematic analysis of technology trends to date. In this study, we analyzed the technology trends of automotive semiconductors using patent analysis and topic model, and confirmed technologies such as electric cars, driving assistance, and digital manufacturing. The technology trends showed that element technology and technical characteristics change according to technology convergence, market needs, and government regulations. Through this research, it is expected that it will help to make R&D policy for automotive semiconductor industry and to make decision for industrial technology strategy establishment. In addition, it is expected that it will be used effectively in detail research direction and patent strategy establishment by providing detailed classification of technology and trend analysis result of technology.

Adaptive User and Topic Modeling based Automatic TV Recommendation (적응적 사용자 및 토픽 모델링 기반의 자동 TV 프로그램 추천)

  • Kim, EunHui;Pyo, Shinjee;Kim, Munchurl
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.431-434
    • /
    • 2012
  • 시간 흐름에 따라 TV 프로그램 스케줄은 변화하고 스케줄의 변화는 사용자 선호에 영향을 미친다. 이러한 스케줄 변화에 따른 토픽의 흐름이 사용자 선호도에 미치는 영향 외에도, 개성에 따른 선호도의 변화는 개인별 차이가 크다. 본 논문은 사용자 선호도 변화에 적응적으로 대응하면서 시간 변화에도 일정한 관심을 보이는 사용자의 선호도에는 가중치를 더한 모델을 목표로 한다. 따라서 제안 모델은 현재의 시청 데이터를 기준으로 한 사용자별 선호도의 선행 정보(prior)로 이전 시청선호를 두었고, 선호도 변화와 일관성을 고려하여 하나의 시청길이에 대한 선호도뿐만 아니라 여러 시청 길이의 선호도를 결합한 선호도를 구성할 수 있는 확장성 있는 모델을 제시한다. 선호도의 일관성에 대한 가중치 연산에 있어 전체 확률모델의 확률을 향상시키는 연산을 통해 정교성을 더한 모델을 제시한다. 실제 사용자들이 시청한 데이터인 2011 TNMS데이터를 기준으로 제안 모델의 성능을 확인한 결과, 기존의 LDA, MDTM모델 보다 나은 성능을 보임을 확인할 수 있었으며, 1주일 단위 추천결과, 5개 추천 시, 최대 67.9%의 추천 정확도를 확인할 수 있었다.

  • PDF

A Study on the Document Topic Extraction System Based on Big Data (빅데이터 기반 문서 토픽 추출 시스템 연구)

  • Hwang, Seung-Yeon;An, Yoon-Bin;Shin, Dong-Jin;Oh, Jae-Kon;Moon, Jin Yong;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.5
    • /
    • pp.207-214
    • /
    • 2020
  • Nowadays, the use of smart phones and various electronic devices is increasing, the Internet and SNS are activated, and we live in the flood of information. The amount of information has grown exponentially, making it difficult to look at a lot of information, and more and more people want to see only key keywords in a document, and the importance of research to extract topics that are the core of information is increasing. In addition, it is also an important issue to extract the topic and compare it with the past to infer the current trend. Topic modeling techniques can be used to extract topics from a large volume of documents, and these extracted topics can be used in various fields such as trend prediction and data analysis. In this paper, we inquire the topic of the three-year papers of 2016, 2017, and 2018 in the field of computing using the LDA algorithm, one of Probabilistic Topic Model Techniques, in order to analyze the rapidly changing trends and keep pace with the times. Then we analyze trends and flows of research.

Tweets analysis using a Dynamic Topic Modeling : Focusing on the 2019 Koreas-US DMZ Summit (트윗의 타임 시퀀스를 활용한 DTM 분석 : 2019 남북미정상회동 이벤트를 중심으로)

  • Ko, EunJi;Choi, SunYoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.308-313
    • /
    • 2021
  • In this study, tweets about the 2019 Koreas-US DMZ Summit were collected along with a time sequence and analyzed by a sequential topic modeling method, Dynamic Topic Modeling(DTM). In microblogging services such as Twitter, unstructured data that mixes news and an opinion about a single event occurs at the same time on a large scale, and information and reactions are produced in the same message format. Therefore, to grasp a topic trend, the contextual meaning can be found only by performing pattern analysis reflecting the characteristics of sequential data. As a result of calculating the DTM after obtaining the topic coherence score and evaluating the Latent Dirichlet Allocation(LDA), 30 topics related to news reports and opinions were derived, and the probability of occurrence of each topic and keywords were dynamically evolving. In conclusion, the study found that DTM is a suitable model for analyzing the trend of integrated topics in a specific event over time.