• Title/Summary/Keyword: Topic modeling analysis

Search Result 681, Processing Time 0.028 seconds

Comparing Social Media and News Articles on Climate Change: Different Viewpoints Revealed

  • Kang Nyeon Lee;Haein Lee;Jang Hyun Kim;Youngsang Kim;Seon Hong Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.11
    • /
    • pp.2966-2986
    • /
    • 2023
  • Climate change is a constant threat to human life, and it is important to understand the public perception of this issue. Previous studies examining climate change have been based on limited survey data. In this study, the authors used big data such as news articles and social media data, within which the authors selected specific keywords related to climate change. Using these natural language data, topic modeling was performed for discourse analysis regarding climate change based on various topics. In addition, before applying topic modeling, sentiment analysis was adjusted to discover the differences between discourses on climate change. Through this approach, discourses of positive and negative tendencies were classified. As a result, it was possible to identify the tendency of each document by extracting key words for the classified discourse. This study aims to prove that topic modeling is a useful methodology for exploring discourse on platforms with big data. Moreover, the reliability of the study was increased by performing topic modeling in consideration of objective indicators (i.e., coherence score, perplexity). Theoretically, based on the social amplification of risk framework (SARF), this study demonstrates that the diffusion of the agenda of climate change in public news media leads to personal anxiety and fear on social media.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.

Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea: focused on LDA and HDP (국내 기록관리학 연구동향 분석을 위한 토픽모델링 기법 비교 - LDA와 HDP를 중심으로 -)

  • Park, JunHyeong;Oh, Hyo-Jung
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.4
    • /
    • pp.235-258
    • /
    • 2017
  • The purpose of this study is to analyze research trends of archives management in Korea by comparing LDA (Latent Semantic Allocation) topic modeling, which is the most famous method in text mining, and HDP (Hierarchical Dirichlet Process) topic modeling, which is developed LDA topic modeling. Firstly we collected 1,027 articles related to archives management from 1997 to 2016 in two journals related with archives management and four journals related with library and information science in Korea and performed several preprocessing steps. And then we conducted LDA and HDP topic modelings. For a more in-depth comparison analysis, we utilized LDAvis as a topic modeling visualization tool. At the results, LDA topic modeling was influenced by frequently keywords in all topics, whereas, HDP topic modeling showed specific keywords to easily identify the characteristics of each topic.

Topic Modeling with Deep Learning-based Sentiment Filters (감정 딥러닝 필터를 활용한 토픽 모델링 방법론)

  • Choi, Byeong-Seol;Kim, Namgyu
    • The Journal of Information Systems
    • /
    • v.28 no.4
    • /
    • pp.271-291
    • /
    • 2019
  • Purpose The purpose of this study is to propose a methodology to derive positive keywords and negative keywords through deep learning to classify reviews into positive reviews and negative ones, and then refine the results of topic modeling using these keywords. Design/methodology/approach In this study, we extracted topic keywords by performing LDA-based topic modeling. At the same time, we performed attention-based deep learning to identify positive and negative keywords. Finally, we refined the topic keywords using these keywords as filters. Findings We collected and analyzed about 6,000 English reviews of Gyeongbokgung, a representative tourist attraction in Korea, from Tripadvisor, a representative travel site. Experimental results show that the proposed methodology properly identifies positive and negative keywords describing major topics.

A study on research trends for pregnancy in adolescence: Focusing on text network analysis and topic modeling (청소년 임신에 대한 연구 동향 분석: 텍스트 네트워크 분석과 토픽 모델링)

  • Park, Seungmi;Kwak, Eunju;Park, Hye Ok;Hong, Jung Eun
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.30 no.2
    • /
    • pp.149-159
    • /
    • 2024
  • Purpose: The aim of this study was to identify core keywords and topic groups in the "adolescent pregnancy" field of research for a better understanding of research trends in the past 10 years. Methods: Topics related to adolescent pregnancy were extracted from 3,819 articles that were published in journals between January 2013 and July 2023. Abstracts were retrieved from five databases (MEDLINE, CINAHL, Embase, RISS, and KISS). Keywords were extracted from the abstracts and cleaned using semantic morphemes. Text network analysis and topic modeling were performed using NetMiner 4.3.3. Results: The most important keywords were "health," "woman," "risk," "group," "girl," "school," "service," "family," "program," and "contraception." Five topic groups were identified through topic modeling. Through the topic modeling analysis, five themes were derived: "health service," "community program for school girls," "risks for adult women," "relationship risks," and "sexual contraceptive knowledge." Conclusion: This study utilized text network analysis and topic modeling to analyze keywords from abstracts of research conducted over the past decade on adolescent pregnancy. Given that adolescent pregnancy leads to physical, mental, social, and economic issues, it is imperative to provide integrated intervention programs, including prenatal/postnatal care, psychological services, proper contraception methods, and sex education, through school and community partnerships, as well as related research studies. Nurses can play a vital role by actively engaging in prevention efforts and directly supporting and educating socially disadvantaged adolescent mothers, which could significantly contribute to improving their quality of life.

A Study on the Topic Modeling Analysis of Book Reports on Personality Types and Interest Types (성격유형과 흥미유형에 따른 독서 감상문 토픽 분석 연구)

  • Jeong-Hoon Lim
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.175-198
    • /
    • 2023
  • This study aimed to investigate the difference in response to reading as shown in book reports by personality type and interest type. For this purpose, personality type analysis data, interest type analysis data, and book report data written in subject reading activities were collected from 81 third graders at D Science High School in Daejeon. Topic analysis was conducted on the collected book reports, and the probability of a topic being mentioned was statistically tested according to personality type (thinking type, feeling type) and interest type (investigative type, types other than investigative). Subsequently, the conceptual connection structure of words was measured by keyword network analysis, and the analysis results of topic modeling were complemented by the centrality index. As a result of the study, the topic regression analysis showed statistically significant differences between thinking type (T) and feeling type (F) in topic 2 (understanding and studying) and topic 3 (reading and thinking), and statistically significant differences between investigative type and non-investigative type in topic 2 (understanding and studying). The results of this study can be used as a basis for tailored book recommendations and personalized reading education.

Topic Modeling Analysis of Beauty Industry using BERTopic and LDA

  • YANG, Hoe-Chang;LEE, Won-Dong
    • The Journal of Economics, Marketing and Management
    • /
    • v.10 no.6
    • /
    • pp.1-7
    • /
    • 2022
  • Purpose: The purpose of this study is identifying the research trends of degree papers related to the beauty industry and providing information which can contribute to the development of the domestic beauty industry and the direction of various research about beauty industry. Research design, data and methodology: This study used 154 academic papers and 189 academic papers with English abstracts out of 299 academic papers. All of these papers were found by searching for the keyword "beauty industry" in ScienceON on August 15, 2022. For the analysis, BERTopic and LDA (Latent Dirichlet Allocation) analysis were conducted using Python 3.7. Also, OLS regression analysis was conducted to understand the annual increase and decrease trend of each topic derived with trend analysis. Results: As a result of word frequency analysis, the frequency of satisfaction, management, behavior, and service was found to be high. In addition, it was found that 'service', 'satisfaction' and 'customer' were frequently associated with program and relationship in the word co-occurrence frequency analysis. As a result of topic modeling, six topics were derived: 'Beauty shop', 'Health education', 'Cosmetics', 'Customer satisfaction', 'Beauty education', and 'Beauty business'. The trend analysis result of each topic confirmed that 'Beauty education' and 'Health education' are getting more attention as time goes by. Conclusions: The future studies must resolve the extreme polarization between the structure of the small beauty industry and beauty stores. Furthermore, the researches have to direct various ways to create the performance of internal personnel. The ways to maximize product capabilities such as competitive cosmetics and brands are also needed attentions.

A Study on Issue Tracking on Multi-cultural Studies Using Topic Modeling (토픽 모델링을 활용한 다문화 연구의 이슈 추적 연구)

  • Park, Jong Do
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.273-289
    • /
    • 2019
  • The goal of this study is to analyze topics discussed in academic papers on multiculture in Korea to figure out research trends in the field. In order to do topic analysis, LDA (Latent Dirichlet Allocation)-based topic modeling methods are employed. Through the analysis, it is possible to track topic changes in the field and it is found that topics related to 'social integration' and 'multicultural education in schools' are hot topics, and topics related to 'cultural identity and nationalism' are cold topics among top five topics in the field.

Online Reviews Analysis for Prediction of Product Ratings based on Topic Modeling (토픽 모델링에 기반한 온라인 상품 평점 예측을 위한 온라인 사용 후기 분석)

  • Park, Sang Hyun;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Information Technology Services
    • /
    • v.16 no.3
    • /
    • pp.113-125
    • /
    • 2017
  • Customers have been affected by others' opinions when they make a purchase. Thanks to the development of technologies, people are sharing their experiences such as reviews or ratings through online or social network services, However, although ratings are intuitive information for others, many reviews include only texts without ratings. Also, because of huge amount of reviews, customers and companies can't read all of them so they are hard to evaluate to a product without ratings. Therefore, in this study, we propose a methodology to predict ratings based on reviews for a product. In a methodology, we first estimate the topic-review matrix using the Latent Dirichlet Allocation technic which is widely used in topic modeling. Next, we predict ratings based on the topic-review matrix using the artificial neural network model which is based on the backpropagation algorithm. Through experiments with actual reviews, we find that our methodology can predict ratings based on customers' reviews. And our methodology performs better with reviews which include certain opinions. As a result, our study can be used for customers and companies that want to know exactly a product with ratings. Moreover, we hope that our study leads to the implementation of future studies that combine machine learning and topic modeling.

Identification of Convergence Trend in the Field of Business Model Based on Patents (특허 데이터 기반 비즈니스 모델 분야 융합 트렌드 파악)

  • Sunho Lee;Chie Hoon Song
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.3
    • /
    • pp.635-644
    • /
    • 2024
  • Although the business model(BM) patents act as a creative bridge between technology and the marketplace, limited scholarly attention has been paid to the content analysis of BM patents. This study aims to contextualize converging BM patents by employing topic modeling technique and clustering highly marketable topics, which are expressed through a topic-market impact matrix. We relied on BM patent data filed between 2010 and 2022 to derive empirical insights into the commercial potential of emerging business models. Subsequently, nine topics were identified, including but not limited to "Data Analytics and Predictive Modeling" and "Mobile-Based Digital Services and Advertising." The 2x2 matrix allows to position topics based on the variables of topic growth rate and market impact, which is useful for prioritizing areas that require attention or are promising. This study differentiates itself by going beyond simple topic classification based on topic modeling, reorganizing the findings into a matrix format. T he results of this study are expected to serve as a valuable reference for companies seeking to innovate their business models and enhance their competitive positioning.