• Title/Summary/Keyword: 토픽분석

Search Result 660, Processing Time 0.031 seconds

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

기업가정신에 대한 연구동향 분석

  • Jang, Seong-Hui
    • 한국벤처창업학회:학술대회논문집
    • /
    • 2022.04a
    • /
    • pp.73-79
    • /
    • 2022
  • 본 연구는 동시출현단어 분석과 토픽모델링을 통해 기업가정신의 연구주제와 연구 동향을 분석하여 기업가정신 연구에 대한 향후 연구방향을 수립하기 위한 정보를 제공하는 것이 목적이다. 이를 위해 Web of Science 데이터베이스에서 "entrepreneurship"을 기본검색어로 설정하고, 2002년부터 2021년까지 발표한 영어 논문으로 제한하여 기업가정신 논문의 데이터를 다운로드하여 데이터를 확보하였다. 본 연구에서는 VOSviewer 프로그램을 이용하여 동시출현단어 분석을 하였고, R 프로그램을 이용하여 토픽모델링 분석을 하였다. 동시출현단어 분석 결과, 기업가정신과 혁신 클러스터, 기업가정신 교육 클러스터, 사회적 기업가정신과 지속가능성 클러스터, 기업성과 클러스터, 그리고 지식 및 기술이전 클러스터 등 5개의 클러스터로 구분되었다. 토픽모델링 분석 결과, 창업환경 및 경제발전, 국제 기업가정신, 다양한 기업가정신, 벤처기업과 자본조달, 정부정책 및 지원, 사회적 기업가정신, 경영관련 이슈, 지역도시계획 및 개발, 기업가정신 교육, 기업가의 혁신과 성과, 기업가정신 연구, 기업가의 창업의도 등 12개의 토픽으로 분석되었다. 본 연구의 결과는 기업가정신 연구에 대한 전반적인 연구동향을 파악할 뿐만 아니라, 기업가정신과 관련된 어떠한 연구 주제들이 다루어져 왔는지에 대해 분석함으로써 기업가정신에 대한 연구의 이해도를 높이고 기업가정신 연구가 가져올 방향성을 제안하는데 활용할 수 있을 것으로 기대된다.

  • PDF

A Topic Analysis of SW Education Textdata Using R (R을 활용한 SW교육 텍스트데이터 토픽분석)

  • Park, Sunju
    • Journal of The Korean Association of Information Education
    • /
    • v.19 no.4
    • /
    • pp.517-524
    • /
    • 2015
  • In this paper, to find out the direction of interest related to the SW education, SW education news data were gathered and its contents were analyzed. The topic analysis of SW education news was performed by collecting the data of July 23, 2013 to October 19, 2015. By analyzing the relationship among the most mentioned top 20 words with the web crawling using R, the result indicated that the 20 words are the closely relevant data as the thickness of the node size of the 20 words was balancing each other in the co-occurrence matrix graph focusing on the 'SW education' word. Moreover, our analysis revealed that the data were mainly composed of the topics about SW talent, SW support Program, SW educational mandate, SW camp, SW industry and the job creation. This could be used for big data analysis to find out the thoughts and interests of such people in the SW education.

Analysis of Topic Changes in Metaverse Application Reviews Before and After the COVID-19 Pandemic Using Causal Impact Analysis Techniques (Causal Impact 분석 기법을 접목한 COVID-19 팬데믹 전·후 메타버스 애플리케이션 리뷰의 토픽 변화 분석)

  • Lee, Sowon;Mijin Noh;MuMoungCho Han;YangSok Kim
    • Smart Media Journal
    • /
    • v.13 no.1
    • /
    • pp.36-44
    • /
    • 2024
  • Metaverse is attracting attention as the development of virtual environment technology and the emergence of untact culture due to the COVID-19 pandemic. In this study, by analyzing users' reviews on the "Zepeto" application, which has recently attracted attention as a metaverse service, we tried to confirm changes in the requirements for the metaverse after the COVID-19 pandemic. To this end, 109,662 reviews of "Zepeto" applications written on the Google Play Store from September 2018 to March 2023 were collected, topics were extracted using LDA topic modeling technique, and topics were analyzed using the Causal Impact technique to examine how topics changed before and after based on "March 11, 2020" when the COVID-19 pandemic was declared. As a result of the analysis, five topics were extracted: application functional problems (topic1), security problems (topic 2), complaints about cryptocurrency (Zem) in the application (topic 3), application performance (topic 4), and personal information-related problems (topic 5). Among them, it was confirmed that security problems (topic 2) were most affected by the COVID-19 pandemic.

Investigation of Research Trends in Information Systems Domain Using Topic Modeling and Time Series Regression Analysis (토픽모델링과 시계열회귀분석을 활용한 정보시스템분야 연구동향 분석)

  • Kim, Chang-Sik;Choi, Su-Jung;Kwahk, Kee-Young
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1143-1150
    • /
    • 2017
  • The objective of this study is to examine the trends in information systems research. The abstracts of 1,245 articles were extracted from three leading Korean journals published between 2002 and 2016: Asia Pacific Journal of Information Systems, Information Systems Review, and The Journal of Information Systems. Time series analysis and topic modeling methods were implemented. The topic modeling results showed that the research topics were mainly "systems implementation", "communication innovation", and "customer loyalty". The time series regression results indicated that "customer satisfaction", "communication innovation", "information security", and "personal privacy" were hot topics, and on the other hand, "system implementation" and "web site" were the least popular. This study also provided suggestions for future research.

Sentiment Analysis Model with Semantic Topic Classification of Reviews (리뷰의 의미적 토픽 분류를 적용한 감성 분석 모델)

  • Lim, Myung Jin;Kim, Pankoo;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.9 no.2
    • /
    • pp.69-77
    • /
    • 2020
  • Unlike the past, which was limited to terrestrial broadcasts, many dramas are currently being broadcast on cable channels and the Internet web. After watching the drama, viewers actively express their opinions through reviews and studies related to the analysis of these reviews are actively being conducted. Due to the nature of the drama, the genre is not clear, and due to the various age groups of viewers, reviews and ratings from other viewers help to decide which drama to watch. However, since it is difficult for viewers to check and analyze many reviews individually, a data analysis technique is required to automatically analyze them. Accordingly, this paper classifies the topics of reviews that have an important influence on drama selection and reclassifies them into semantic topics according to the similarity of words. In addition, we propose a model that classifies reviews into sentences according to semantic topics and sentiment analysis through sentiment words.

A Study on Issue Tracking on Multi-cultural Studies Using Topic Modeling (토픽 모델링을 활용한 다문화 연구의 이슈 추적 연구)

  • Park, Jong Do
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.273-289
    • /
    • 2019
  • The goal of this study is to analyze topics discussed in academic papers on multiculture in Korea to figure out research trends in the field. In order to do topic analysis, LDA (Latent Dirichlet Allocation)-based topic modeling methods are employed. Through the analysis, it is possible to track topic changes in the field and it is found that topics related to 'social integration' and 'multicultural education in schools' are hot topics, and topics related to 'cultural identity and nationalism' are cold topics among top five topics in the field.

Topic modeling and topic change trend analysis for advanced construction technologies (건설신기술에 대한 토픽 모델링 및 토픽 변화추이 분석)

  • Jeong, Seong Yun;Kim, Nam Gon
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.102-110
    • /
    • 2021
  • Currently, the advanced construction technology endorsement system is being operated to promote the development of domestic construction technology. We tried to examine the implicit meanings inherent in advanced construction technologies by analyzing the relationship between emerging vocabularies with high importance in relation to the advanced construction technologies endorsed through this system. For this purpose, 918 cases of advanced construction technology information were collected. Based on the endorsed year and summary of the advanced construction technologies, the importance of the emerging vocabularies was measured for each advanced construction technology. And, based on the LDA model, the degree of influence between related vocabularies was evaluated for each of the four topic areas. Topics according to the technical application fields were analyzed. From 1990 to 2021, the trend of changes in highly influential vocabularies by each topic was inferred. In the future, changes in the degree of influence of the topics of environment, machinery, facilities, and maintenance and reinforcement of structures and related technology fields were predicted.

How the Journal of the Korean Association for Science Education(JKASE) Changed for the Past 44 Years?: Topic Modeling Analysis Using Latent Dirichlet Allocation (한국과학교육학회지는 44년간 어떤 주제로 어떻게 변화했는가? -잠재 디리클레 할당(LDA)을 활용한 토픽모델링 분석-)

  • Chang, Jina;Na, Jiyeon
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.2
    • /
    • pp.185-200
    • /
    • 2022
  • The purpose of this study is to understand the trends and changes of the articles publishing the Journal of the Korean Association for Science Education(JKASE) in the past forty-four years. To this end, Latent Dirichlet Allocation(LDA) topic modeling analysis was performed on a total of 2,115 English abstracts of papers published in the JKASE from 1978 to 2021. As a result of LDA topic modeling analysis, a total of 23 topics were extracted, and each topic was presented with its related keywords and articles. Next, in order to examine how these topics have changed over time, we visualized the average weights of each topic for a 4-year cycle by using heatmaps. The topics that have risen or fallen were identified. The results of this study provide new insights into science education research in Korea in terms of revealing not only traditional research topics that have been consistently studied but also the topics that have changed in response to the development of educational philosophy or research methods, social or policy demands related to science education.

Analysis of Changes in Discourse of Major Media on Park Issues - Focusing on Newspaper Articles Published from 1995 to 2019 - (공원 이슈에 대한 주요 언론의 담론변화분석 - 1995년부터 2019년까지 신문 기사를 중심으로 -)

  • Ko, Ha-jung
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.5
    • /
    • pp.46-58
    • /
    • 2021
  • Parks became essential to people after the introduction of modern parks in Korea. Following mayoral elections by popular vote, issues surrounding parks, such as the creation of parks, have arisen and have been publicized by the media, allowing for the formation of discourse. Accordingly, this study conducted a topic analysis by collecting news articles from major media outlets in Korea that addressed issues related to parks since 1995, after the introduction of mayoral elections by popular vote, and analyzed changes over time in the discourse on parks through semantic network analysis. As a result of a Latent Dirichlet allocation topic modeling analysis, the following five topics were classified: urban park expansion (Topic 1), historical and cultural parks (Topic 2), use programs (Topic 3), zoo event (Topic 4), and conflicts in the park creation process (Topic 5). The park-related discourse addressed by the media is as follows. First, the creation process and conflicts regarding the quantitative expansion of parks are treated as the central discourse. Second, the names of parks appear as keywords every time a new park is created, and they are mentioned continuously from then on, thereby playing an important role in the formation of discourse. Third, 'residents' form discourse about the public nature of the park as the principal agent in park-related media. This study has significance in that it examines how parks are interpreted and how discourse is formed and changed by the media. It is expected that discourse on parks will be addressed from various perspectives in further research focusing on other media, such as regional and specialized magazines.