• Title/Summary/Keyword: 토픽모델링

Search Result 527, Processing Time 0.026 seconds

Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea: focused on LDA and HDP (국내 기록관리학 연구동향 분석을 위한 토픽모델링 기법 비교 - LDA와 HDP를 중심으로 -)

  • Park, JunHyeong;Oh, Hyo-Jung
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.4
    • /
    • pp.235-258
    • /
    • 2017
  • The purpose of this study is to analyze research trends of archives management in Korea by comparing LDA (Latent Semantic Allocation) topic modeling, which is the most famous method in text mining, and HDP (Hierarchical Dirichlet Process) topic modeling, which is developed LDA topic modeling. Firstly we collected 1,027 articles related to archives management from 1997 to 2016 in two journals related with archives management and four journals related with library and information science in Korea and performed several preprocessing steps. And then we conducted LDA and HDP topic modelings. For a more in-depth comparison analysis, we utilized LDAvis as a topic modeling visualization tool. At the results, LDA topic modeling was influenced by frequently keywords in all topics, whereas, HDP topic modeling showed specific keywords to easily identify the characteristics of each topic.

A Comparison of Author Name Disambiguation Performance through Topic Modeling (토픽모델링을 통한 저자명 식별 성능 비교)

  • Kim, Ha Jin;Jung, Hyo-jung;Song, Min
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2014.08a
    • /
    • pp.149-152
    • /
    • 2014
  • 본 연구에서는 저자명 모호성 해소를 위해 토픽모델링 기법을 사용하여 저자명을 식별 하였다. 기존의 토픽모델링은 용어 자질만을 고려하였지만 본 연구에서는 제 3의 메타데이터 자질을 활용하여 ACT(Author-Conference Topic Model) 모델과 DMR(Dirichlet-multinomial Regression) 토픽모델링을 대상으로 저자명 식별 성능을 평가, 비교하였다. 또한 수작업으로 저자 식별 작업을 한 데이터셋을 기반으로 저자 당 논문 수와 토픽 수에 차이를 두고 연구를 진행하였다. 그 결과 저자명 식별에 있어 ACT 모델보다 DMR 토픽모델링의 성능이 더 우수한 것을 알 수 있었다.

  • PDF

A Study on the Application of Topic Modeling for the Book Report Text (독후감 텍스트의 토픽모델링 적용에 관한 탐색적 연구)

  • Lee, Soo-Sang
    • Journal of Korean Library and Information Science Society
    • /
    • v.47 no.4
    • /
    • pp.1-18
    • /
    • 2016
  • The purpose of this study is to explore application of topic modeling for topic analysis of book report. Topic modeling can be understood as one method of topic analysis. This analysis was conducted with texts in 23 book reports using LDA function of the "topicmodels" package provided by R. According to the result of topic modeling, 16 topics were extracted. The topic network was constructed by the relation between the topics and keywords, and the book report network was constructed by the relation between book report cases and topics. Next, Centrality analysis was conducted targeting the topic network and book report network. The result of this study is following these. First, 16 topics are shown as network which has one component. In other words, 16 topics are interrelated. Second, book report was divided into 2 groups, book reports with high centrality and book reports with low centrality. The former group has similarities with others, the latter group has differences with others in aspect of the topics of book reports. The result of topic modeling is useful to identify book reports' topics combining with network analysis.

Research Trend Analysis on Smart healthcare by using Topic Modeling and Ego Network Analysis (토픽모델링과 에고 네트워크 분석을 활용한 스마트 헬스케어 연구동향 분석)

  • Yoon, Jee-Eun;Suh, Chang-Jin
    • Journal of Digital Contents Society
    • /
    • v.19 no.5
    • /
    • pp.981-993
    • /
    • 2018
  • Smart healthcare is convergence of ICT and healthcare services, and interdisciplinary research has been actively conducted in various fields. The objective of this study is to investigate trends of smart healthcare research using topic modeling and ego network analysis. Text analysis, frequency analysis, topic modeling, word cloud, and ego network analysis were conducted for the abstracts of 2,690 articles in Scopus from 2001 to April 2018. Topic Modeling analysis resulted in eight topics, Topics included "AI in healthcare", "Smart hospital", "Healthcare platform", "Blockchain in healthcare", "Smart health data", "Mobile healthcare", " Wellness care", "Cognitive healthcare". In order to examine the topic modeling results core deeply, we analyzed word cloud and ego network analysis for eight topics. This study aims to identify trends in smart healthcare research and suggest implications for establishing future research direction.

Topic Model Augmentation and Extension Method using LDA and BERTopic (LDA와 BERTopic을 이용한 토픽모델링의 증강과 확장 기법 연구)

  • Kim, SeonWook;Yang, Kiduk
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.99-132
    • /
    • 2022
  • The purpose of this study is to propose AET (Augmented and Extended Topics), a novel method of synthesizing both LDA and BERTopic results, and to analyze the recently published LIS articles as an experimental approach. To achieve the purpose of this study, 55,442 abstracts from 85 LIS journals within the WoS database, which spans from January 2001 to October 2021, were analyzed. AET first constructs a WORD2VEC-based cosine similarity matrix between LDA and BERTopic results, extracts AT (Augmented Topics) by repeating the matrix reordering and segmentation procedures as long as their semantic relations are still valid, and finally determines ET (Extended Topics) by removing any LDA related residual subtopics from the matrix and ordering the rest of them by F1 (BERTopic topic size rank, Inverse cosine similarity rank). AET, by comparing with the baseline LDA result, shows that AT has effectively concretized the original LDA topic model and ET has discovered new meaningful topics that LDA didn't. When it comes to the qualitative performance evaluation, AT performs better than LDA while ET shows similar performances except in a few cases.

A System for Keyword Extraction and Keyword-based Sentiment Analysis for Topic Analysis in Discussion (토론 대화에서의 토픽 분석을 위한 키워드 추출 및 키워드 기반 감성분석 시스템)

  • Yong-Bin Jeong;Yu-Jin Oh;Jae-Wan Park;Sae-Mi Jang;Young-Gyun Hahm
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.164-169
    • /
    • 2022
  • 토픽 모델링은 비즈니스 분석이나 기술 동향 파악 등 다방면에서 많이 사용되고 있는 기술이다. 하지만 대표적인 방법인 LDA와 같은 비지도학습의 경우, 그 알고리즘 구조상 문서의 수가 많을 때 토픽 모델링이 가능하다. 본 논문에서는 문서의 수가 적은 경우도, 키워드 및 키프레이즈를 이용한 군집화를 통해 토픽 모델링을 하고 감성분석을 통해 토픽에 대한 분석도 제시하였다. 이에 필요한 데이터 제작 및 키워드 추출, 키워드 기반 감성분석, 키워드 임베딩 및 군집화를 구현하였고, 결과를 정성적으로 보았을 때 유의미한 분석이 되는 것을 확인하였다.

  • PDF

토픽모델링을 활용한 부산항 항만안전성 이슈 동향에 관한 연구

  • 이정민;하도연;김율성
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.11a
    • /
    • pp.66-67
    • /
    • 2023
  • 최근 들어, 현대사회는 예측이 불가능한 다양한 위험성들이 존재하여 글로벌 의존도가 높은 항만물류산업의 위험부담이 증가하고 있다. 이에 본 연구에서는 항만산업의 안전성에 영향을 미치는 요인을 알아보기 위해 과거부터 현재까지 국내 항만 안전성에 영향을 미친 이슈들을 시계열적으로 살펴보고자 하였다. 이를 위하여 국내를 대표하는 부산항의 항만 안전성과 관련된 뉴스 기사 텍스트 데이터를 활용하여 LDA 토픽모델링 분석을 진행하여 부산항 항만안전 주요 이슈들의 동향을 살펴보고자 하였다.

  • PDF

Seasonal analysis of Beach-related Issues using Local Newspaper Articles and Topic Modeling (지역신문기사 자료와 토픽모델링을 이용한 해변 관련 계절별 현안분석)

  • Yoo, Mu-Sang;Jeong, Su-Yeon;Kim, Geon-Hu;Sohn, Chul
    • Journal of the Korean Regional Science Association
    • /
    • v.34 no.4
    • /
    • pp.19-34
    • /
    • 2018
  • The purpose of this study is to analyze the seasonal issues using the local newspaper articles with the keyword beach from 2004 to 2017. Topic modeling and Time series regression analysis based on open source programs were performed for analysis. Topic modeling results showed 35 topics in spring, 47 topics in summer, 36 topics in autumn and 35 topics in winter. The common themes were 'beaches', 'festivals and events', 'accident and environmental issues', 'tourism', 'development and sale', 'administration and policy' and 'weather'. Time series regression analysis showed in the spring, 5 Hot-Topics and 2 Cold-Topic were found out of the 35 topics. In the summer, 6 Hot-Topics and 3 Cold-Topic were found out of the 47 topics. In the autumn, 4 Hot-Topics and 3 Cold-Topic were found out of the 36 topics. In the winter, 3 Hot-Topics and 3 Cold-Topic were found out of the 35 topics. And for each season, topics that do not fall into the Hot-Topic and Cold-Topic are classified as Neutral-Topic. In this study if seasonal uses are different such as beaches are deemed that seasonal topic modeling for analysis of regional issues will yield more useful results and enable detailed diagnosis.

Evaluation of Topic Modeling Performance for Overseas Construction Market Analysis Using LDA and BERTopic on News Articles (LDA 및 BERTopic 기반 해외건설시장 뉴스 기사 토픽모델링 성능평가)

  • Baik, Joonwoo;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.811-819
    • /
    • 2023
  • Understanding the local conditions is a crucial factor in enhancing the success potential of overseas construction projects. This can be achieved through the analysis of news articles of the target market using topic modeling techniques. In this study, the authors aimed to analyze news articles using two topic modeling methods, namely Latent Dirichlet Allocation (LDA) and BERTopic, in order to determine the optimal approach for market condition analysis. To evaluate the alignment between the generated topics and the actual themes of the news documents, the research collected 6,273 BBC news articles, created ground truth data for individual news article topics, and finally compared this ground truth with the results of the topic modeling. The F1 score for LDA was 0.011, while BERTopic achieved a score of 0.244. These results indicate that BERTopic more accurately reflected the actual topics of news articles, making it more effective for understanding the overseas construction market.

기업가정신에 대한 연구동향 분석

  • Jang, Seong-Hui
    • 한국벤처창업학회:학술대회논문집
    • /
    • 2022.04a
    • /
    • pp.73-79
    • /
    • 2022
  • 본 연구는 동시출현단어 분석과 토픽모델링을 통해 기업가정신의 연구주제와 연구 동향을 분석하여 기업가정신 연구에 대한 향후 연구방향을 수립하기 위한 정보를 제공하는 것이 목적이다. 이를 위해 Web of Science 데이터베이스에서 "entrepreneurship"을 기본검색어로 설정하고, 2002년부터 2021년까지 발표한 영어 논문으로 제한하여 기업가정신 논문의 데이터를 다운로드하여 데이터를 확보하였다. 본 연구에서는 VOSviewer 프로그램을 이용하여 동시출현단어 분석을 하였고, R 프로그램을 이용하여 토픽모델링 분석을 하였다. 동시출현단어 분석 결과, 기업가정신과 혁신 클러스터, 기업가정신 교육 클러스터, 사회적 기업가정신과 지속가능성 클러스터, 기업성과 클러스터, 그리고 지식 및 기술이전 클러스터 등 5개의 클러스터로 구분되었다. 토픽모델링 분석 결과, 창업환경 및 경제발전, 국제 기업가정신, 다양한 기업가정신, 벤처기업과 자본조달, 정부정책 및 지원, 사회적 기업가정신, 경영관련 이슈, 지역도시계획 및 개발, 기업가정신 교육, 기업가의 혁신과 성과, 기업가정신 연구, 기업가의 창업의도 등 12개의 토픽으로 분석되었다. 본 연구의 결과는 기업가정신 연구에 대한 전반적인 연구동향을 파악할 뿐만 아니라, 기업가정신과 관련된 어떠한 연구 주제들이 다루어져 왔는지에 대해 분석함으로써 기업가정신에 대한 연구의 이해도를 높이고 기업가정신 연구가 가져올 방향성을 제안하는데 활용할 수 있을 것으로 기대된다.

  • PDF