• 제목/요약/키워드: 토픽 추출

Search Result 209, Processing Time 0.024 seconds

End-to-end Neural Model for Keyphrase Extraction using Twitter Hash-tag Data (트위터 해시 태그를 이용한 End-to-end 뉴럴 모델 기반 키워드 추출)

  • Lee, Young-Hoon;Na, Seung-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.176-178
    • /
    • 2018
  • 트위터는 최대 140자의 단문을 주고받는 소셜 네트워크 서비스이다. 트위터의 해시 태그는 주로 문장의 핵심 단어나 주요 토픽 등을 링크하게 되는데 본 논문에서는 이러한 정보를 이용하여 키워드 추출에 활용한다. 문장을 Character CNN, Bi-LSTM을 통해 문장 표현을 얻어내고 각 Span에서 이러한 문장 표현을 활용하여 Span 표현을 생성한다. Span 표현을 이용하여 각 Span에 대한 Score를 얻고 높은 점수의 Span을 이용하여 키워드를 추출한다.

  • PDF

Topic Modeling to Identify Cloud Security Trends using news Data Before and After the COVID-19 Pandemic (뉴스 데이터 토픽 모델링을 활용한 COVID-19 대유행 전후의 클라우드 보안 동향 파악)

  • Soun U Lee;Jaewoo Lee
    • Convergence Security Journal
    • /
    • v.22 no.2
    • /
    • pp.67-75
    • /
    • 2022
  • Due to the COVID-19 pandemic, many companies have introduced remote work. However, the introduction of remote work has increased attacks on companies to access sensitive information, and many companies have begun to use cloud services to respond to security threats. This study used LDA topic modeling techniques by collecting news data with the keyword 'cloud security' to analyze changes in domestic cloud security trends before and after the COVID-19 pandemic. Before the COVID-19 pandemic, interest in domestic cloud security was low, so representation or association could not be found in the extracted topics. However, it was analyzed that the introduction of cloud is necessary for high computing performance for AI, IoT, and blockchain, which are IT technologies that are currently being studied. On the other hand, looking at topics extracted after the COVID-19 pandemic, it was confirmed that interest in the cloud increased in Korea, and accordingly, interest in cloud security improved. Therefore, security measures should be established to prepare for the ever-increasing usage of cloud services.

A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS (LDA, Top2Vec, BERTopic 모형의 토픽모델링 비교 연구 - 국외 문헌정보학 분야를 중심으로 -)

  • Yong-Gu Lee;SeonWook Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.1
    • /
    • pp.5-30
    • /
    • 2024
  • The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.

Keyword trends analysis related to the aviation industry during the Covid-19 period using text mining (텍스트마이닝을 활용한 Covid-19 기간 동안의 항공산업 관련 키워드 트렌드 분석)

  • Choi, Donghyun;Song, Bomi;Park, Dahyeon;Lee, Sungwoo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.2
    • /
    • pp.115-128
    • /
    • 2022
  • The purpose of this study is to conduct keyword trend analysis using articles data on the impact of Covid-19 in the aviation in dustry. In this study, related articles were extracted centering on the keyword "Airline" by dividing the period of 6months before and after Covid-19 occurrence. After that, Topic modeling(LDA) was performed. Through this, The main topic was extracted in the event of an epidemic such as Covid-19, It is expected to be used as primary data to predict the aviation industry's impact when occurrence like Covid-19.

Topic-Network based Topic Shift Detection on Twitter (트위터 데이터를 이용한 네트워크 기반 토픽 변화 추적 연구)

  • Jin, Seol A;Heo, Go Eun;Jeong, Yoo Kyung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.1
    • /
    • pp.285-302
    • /
    • 2013
  • This study identified topic shifts and patterns over time by analyzing an enormous amount of Twitter data whose characteristics are high accessibility and briefness. First, we extracted keywords for a certain product and used them for representing the topic network allows for intuitive understanding of keywords associated with topics by nodes and edges by co-word analysis. We conducted temporal analysis of term co-occurrence as well as topic modeling to examine the results of network analysis. In addition, the results of comparing topic shifts on Twitter with the corresponding retrieval results from newspapers confirm that Twitter makes immediate responses to news media and spreads the negative issues out quickly. Our findings may suggest that companies utilize the proposed technique to identify public's negative opinions as quickly as possible and to apply for the timely decision making and effective responses to their customers.

Hot Topic Prediction Scheme Considering User Influences in Social Networks (소셜 네트워크에서 사용자의 영향력을 고려한 핫 토픽 예측 기법)

  • Noh, Yeon-woo;Kim, Dae-yun;Han, Jieun;Yook, Misun;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.24-36
    • /
    • 2015
  • Recently, interests in detecting hot topics have been significantly growing as it becomes important to find out and analyze meaningful information from the large amount of data which flows in from social network services. Since it deals with a number of random writings that are not confirmed in advance due to the characteristics of SNS, there is a problem that the reliability of the results declines when hot topics are predicted from the writings. To solve such a problem, this paper proposes a high reliable hot topic prediction scheme considering user influences in social networks. The proposed scheme extracts a set of keywords with hot issues instantly through the modified TF-IDF algorithm based on Twitter. It improves the reliability of the results of hot topic prediction by giving weights of user influences to the tweets. To show the superiority of the proposed scheme, we compare it with the existing scheme through performance evaluation. Our experimental results show that our proposed method has improved precision and recall compared to the existing method.

Research Trend Analysis on Living Lab Using Text Mining (텍스트 마이닝을 이용한 리빙랩 연구동향 분석)

  • Kim, SeongMook;Kim, YoungJun
    • Journal of Digital Convergence
    • /
    • v.18 no.8
    • /
    • pp.37-48
    • /
    • 2020
  • This study aimed at understanding trends of living lab studies and deriving implications for directions of the studies by utilizing text mining. The study included network analysis and topic modelling based on keywords and abstracts from total 166 thesis published between 2011 and November 2019. Centrality analysis showed that living lab studies had been conducted focusing on keywords like innovation, society, technology, development, user and so on. From the topic modelling, 5 topics such as "regional innovation and user support", "social policy program of government", "smart city platform building", "technology innovation model of company" and "participation in system transformation" were extracted. Since the foundation of KNoLL in 2017, the diversification of living lab study subjects has been made. Quantitative analysis using text mining provides useful results for development of living lab studies.

Building a Korean-English Parallel Corpus by Measuring Sentence Similarities Using Sequential Matching of Language Resources and Topic Modeling (언어 자원과 토픽 모델의 순차 매칭을 이용한 유사 문장 계산 기반의 위키피디아 한국어-영어 병렬 말뭉치 구축)

  • Cheon, JuRyong;Ko, YoungJoong
    • Journal of KIISE
    • /
    • v.42 no.7
    • /
    • pp.901-909
    • /
    • 2015
  • In this paper, to build a parallel corpus between Korean and English in Wikipedia. We proposed a method to find similar sentences based on language resources and topic modeling. We first applied language resources(Wiki-dictionary, numbers, and online dictionary in Daum) to match word sequentially. We construct the Wiki-dictionary using titles in Wikipedia. In order to take advantages of the Wikipedia, we used translation probability in the Wiki-dictionary for word matching. In addition, we improved the accuracy of sentence similarity measuring method by using word distribution based on topic modeling. In the experiment, a previous study showed 48.4% of F1-score with only language resources based on linear combination and 51.6% with the topic modeling considering entire word distributions additionally. However, our proposed methods with sequential matching added translation probability to language resources and achieved 9.9% (58.3%) better result than the previous study. When using the proposed sequential matching method of language resources and topic modeling after considering important word distributions, the proposed system achieved 7.5%(59.1%) better than the previous study.

An Analysis of the 2017 Korean Presidential Election Using Text Mining (텍스트 마이닝을 활용한 2017년 한국 대선 분석)

  • An, Eunhee;An, Jungkook
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.5
    • /
    • pp.199-207
    • /
    • 2020
  • Recently, big data analysis has drawn attention in various fields as it can generate value from large amounts of data and is also used to run political campaigns or predict results. However, existing research had limitations in compiling information about candidates at a high-level by analyzing only specific SNS data. Therefore, this study analyses news trends, topics extraction, sentiment analysis, keyword analysis, comment analysis for the 2017 presidential election of South Korea. The results show that various topics had been generated, and online opinions are extracted for trending keywords of respective candidates. This study also shows that portal news and comments can serve as useful tools for predicting the public's opinion on social issues. This study will This paper advances a building strategic course of action by providing a method of analyzing public opinion across various fields.

Topic Automatic Extraction Model based on Unstructured Security Intelligence Report (비정형 보안 인텔리전스 보고서 기반 토픽 자동 추출 모델)

  • Hur, YunA;Lee, Chanhee;Kim, Gyeongmin;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.6
    • /
    • pp.33-39
    • /
    • 2019
  • As cyber attack methods are becoming more intelligent, incidents such as security breaches and international crimes are increasing. In order to predict and respond to these cyber attacks, the characteristics, methods, and types of attack techniques should be identified. To this end, many security companies are publishing security intelligence reports to quickly identify various attack patterns and prevent further damage. However, the reports that each company distributes are not structured, yet, the number of published intelligence reports are ever-increasing. In this paper, we propose a method to extract structured data from unstructured security intelligence reports. We also propose an automatic intelligence report analysis system that divides a large volume of reports into sub-groups based on their topics, making the report analysis process more effective and efficient.