• Title/Summary/Keyword: Topic Clustering

Search Result 101, Processing Time 0.027 seconds

A Topic Analysis of Fine Particle Matter by Using Newspaper Articles (신문기사를 이용한 미세먼지 이슈의 토픽 분석)

  • Yang, Ji-Yeon
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.6
    • /
    • pp.1-14
    • /
    • 2022
  • This study aims to identify topics in newspaper articles related to fine particle matter and to investigate the characteristics and time series trend of each topic. Related national newspaper articles during 1990 and 2021 were collected from Bigkinds. A total of 18 topics have been discovered using LDA, and 11 clusters deduced from clustering. Hot topics include related products/residence, overseas cause(China), power plant as a domestic cause, nationwide emergency reduction measures, international cooperation, political issues, current situation & countermeasure in other countries, and consumption patterns. Cold topics include the concentration standard and indoor air quality improvement. These findings would be useful in inferring the political direction and strategies. In particular, the consumer protection policy should be expanded as the related market is growing. It will also be necessary to pursue policies that will promote public safety and health, and that will enhance public consensus and international cooperation.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • A Study on an Effective Event Detection Method for Event-Focused News Summarization (사건중심 뉴스기사 자동요약을 위한 사건탐지 기법에 관한 연구)

    • Chung, Young-Mee;Kim, Yong-Kwang
      • Journal of the Korean Society for information Management
      • /
      • v.25 no.4
      • /
      • pp.227-243
      • /
      • 2008
    • This study investigates an event detection method with the aim of generating an event-focused news summary from a set of news articles on a certain event using a multi-document summarization technique. The event detection method first classifies news articles into the event related topic categories by employing a SVM classifier and then creates event clusters containing news articles on an event by a modified single pass clustering algorithm. The clustering algorithm applies a time penalty function as well as cluster partitioning to enhance the clustering performance. It was found that the event detection method proposed in this study showed a satisfactory performance in terms of both the F-measure and the detection cost.

    Unsupervised Motion Pattern Mining for Crowded Scenes Analysis

    • Wang, Chongjing;Zhao, Xu;Zou, Yi;Liu, Yuncai
      • KSII Transactions on Internet and Information Systems (TIIS)
      • /
      • v.6 no.12
      • /
      • pp.3315-3337
      • /
      • 2012
    • Crowded scenes analysis is a challenging topic in computer vision field. How to detect diverse motion patterns in crowded scenarios from videos is the critical yet hard part of this problem. In this paper, we propose a novel approach to mining motion patterns by utilizing motion information during both long-term period and short interval simultaneously. To capture long-term motions effectively, we introduce Motion History Image (MHI) representation to access to the global perspective about the crowd motion. The combination of MHI and optical flow, which is used to get instant motion information, gives rise to discriminative spatial-temporal motion features. Benefitting from the robustness and efficiency of the novel motion representation, the following motion pattern mining is implemented in a completely unsupervised way. The motion vectors are clustered hierarchically through automatic hierarchical clustering algorithm building on the basis of graphic model. This method overcomes the instability of optical flow in dealing with time continuity in crowded scenes. The results of clustering reveal the situations of motion pattern distribution in current crowded videos. To validate the performance of the proposed approach, we conduct experimental evaluations on some challenging videos including vehicles and pedestrians. The reliable detection results demonstrate the effectiveness of our approach.

    Malicious Codes Re-grouping Methods using Fuzzy Clustering based on Native API Frequency (Native API 빈도 기반의 퍼지 군집화를 이용한 악성코드 재그룹화 기법연구)

    • Kwon, O-Chul;Bae, Seong-Jae;Cho, Jae-Ik;Moon, Jung-Sub
      • Journal of the Korea Institute of Information Security & Cryptology
      • /
      • v.18 no.6A
      • /
      • pp.115-127
      • /
      • 2008
    • The Native API is a system call which can only be accessed with the authentication of the administrator. It can be used to detect a variety of malicious codes which can only be executed with the administrator's authority. Therefore, much research is being done on detection methods using the characteristics of the Native API. Most of these researches are being done by using supervised learning methods of machine learning. However, the classification standards of Anti-Virus companies do not reflect the characteristics of the Native API. As a result the population data used in the supervised learning methods are not accurate. Therefore, more research is needed on the topic of classification standards using the Native API for detection. This paper proposes a method for re-grouping malicious codes using fuzzy clustering methods with the Native API standard. The accuracy of the proposed re-grouping method uses machine learning to compare detection rates with previous classifying methods for evaluation.

    Soft Computing as a Methodology to Risk Engineering

    • Miyamoto Sadaaki
      • Proceedings of the Korean Institute of Intelligent Systems Conference
      • /
      • 2006.05a
      • /
      • pp.3-6
      • /
      • 2006
    • Methods for risk engineering is a bundle of engineering tools including fundamental concepts and approaches of soft computing with application to real issues of risk management. In this talk fundamental concepts and soft computing approaches of risk engineering will be introduced. As the term of risk implies both advantageous and hazardous uncertainty in its origins, a fundamental theory to describe uncertainties is introduced that includes traditional probability and statistical models, fuzzy systems, as well as less popular modal logic. In particular, modal logic capabilities to express various kinds of uncertainties are emphasized and relations with rough sets and evidence theory are described. Another topic is data mining related to problems in risk management. Some risk mining techniques including fuzzy clustering are introduced and a recently developed algorithm is overviewed. A numerical example is shown.

    • PDF

    A Topic Classification System in cQA Services Based on Semi-Automatic Learning Using Wikipedia (위키피디아를 이용한 반자동 학습 기반의 cQA 서비스 주제 분류 시스템)

    • Kim, Taehyun
      • Annual Conference on Human and Language Technology
      • /
      • 2015.10a
      • /
      • pp.139-141
      • /
      • 2015
    • 본 논문은 커뮤니티 기반의 질의-응답 서비스에서 사용자 질의의 주제를 분류하는 시스템을 소개한다. 커뮤니티 기반의 질의-응답 서비스는 분야에 따라 다양한 주제를 가질 수 있으며 오늘 날 사용자 질의의 주제 분류에는 통계 기반의 분류 방법이 많이 이용되고 있다. 통계 기반의 분류 방법으로 사용자 질의를 분류하기 위해서는 주제에 적합한 대량의 학습 말뭉치가 필요하다. 주제에 적합한 대량의 학습 말뭉치를 사람이 직접 구축하는 것은 많은 시간과 비용이 든다. 따라서 본 논문에서는 이러한 문제를 해결하기 위해 위키피디아 문서를 Supervised K-means Clustering 기법으로 주제별로 분류함으로써 학습 말뭉치를 반자동으로 구축하는 방법을 제안한다. 그 다음, 생성된 학습 말뭉치로 지지 벡터 기계를 학습하여 사용자 질의의 주제를 분류하게 된다. 위키피디아 문서와 사용자 질의는 다른 도메인의 문서임에도 불구하고 본 논문의 시스템으로 사용자 질의의 주제를 분류한 결과 77.33%의 정확도를 보였다.

    • PDF

    Reconstruction of Categories on the National Petition Site Using K-Means clustering and Topic Modeling (K-means 클러스터링과 토픽 모델링을 기반으로 한 국민청원 사이트의 카테고리 재구성)

    • Woo, Yun Hui;Kim, Hyon Hee
      • Proceedings of the Korea Information Processing Society Conference
      • /
      • 2019.05a
      • /
      • pp.302-305
      • /
      • 2019
    • 국민 청원 사이트가 뛰어난 접근성과 신속성으로 인하여 국민들로부터 많은 관심을 받고 있다. 현재 국민청원 사이트의 카테고리 분류는 '미래', '성장동력' 등을 포함한 16개의 카테고리 및 기타로 구성되어 있으나 그 기준이 모호하여 많은 청원글들이 기타 카테고리로 분류되고 있는 상황이다. 이는 청원글의 내용을 명확히 반영하지 않고 미리 정의된 카테고리 구조를 사용하고 있는데서 기인한다고 할 수 있다. 본 논문에서는 보다 구체적으로 정의된 카테고리를 정의하고자 추천 순으로 1,500개의 청원글을 수집하였고, 수집된 청원글의 내용을 바탕으로 카테고리 구조를 추출하였다. 먼저, k-평균 알고리즘을 적용하여 청원글을 군집하여 대분류를 정의하였고, 보다 구체적인 세부 분류를 정의하기 위하여 토픽모델링을 실시하였다. 본 논문에서 제시하는 계층적 카테고리 구조는 청원글의 내용을 바탕으로 대분류와 세부분류로 구성된 것이므로 새로운 청원글을 등록하거나 분류하는 데 적절한 것으로 보인다.

    A Study on Cluster Topic Selection in Hierarchical Clustering (계층적 클러스터링에서 분류 대표어 선정에 관한 연구)

    • Yi, Sang-Seon;Lee, Shin-Won;An, Dong-Un;Chung, Sung-Jong
      • Proceedings of the Korea Information Processing Society Conference
      • /
      • 2004.05a
      • /
      • pp.669-672
      • /
      • 2004
    • 정보의 양이 많아지면서 정보 검색 시스템에 검색 결과를 자동으로 구조화하는 계층적 클러스터링을 적용하는 시도가 늘고 있다. 계층적 클러스터링은 문서 간의 유사도를 통해 클러스터를 계층 구조로 만들어 검색 성능을 높이고 결과를 사용자에게 이해하기 쉽게 보여준다. 계층 구조는 검색 결과를 요약하는 것이기 때문에 클러스터의 내용을 효과적으로 함축할 수 있는 대표어의 선정이 중요하다. 각 클러스터의 대표어를 선정하기 위해 대표어에 명사인 단어만 추출하고 상위 클러스터 대표어에 사용된 단어는 하위 클러스터에 사용하지 않는 방법을 적용하여 대표어의 질을 높였다.

    • PDF

    Femtocell Networks Interference Management Approaches

    • Alotaibi, Sultan
      • International Journal of Computer Science & Network Security
      • /
      • v.22 no.4
      • /
      • pp.329-339
      • /
      • 2022
    • Small cells, particularly femtocells, are regarded a promising solution for limited resources required to handle the increasing data demand. They usually boost wireless network capacity. While widespread usage of femtocells increases network gain, it also raises several challenges. Interference is one of such concerns. Interference management is also seen as a main obstacle in the adoption of two-tier networks. For example, placing femtocells in a traditional macrocell's geographic area. Interference comes in two forms: cross-tier and co-tier. There have been previous studies conducted on the topic of interference management. This study investigates the principle of categorization of interference management systems. Many methods exist in the literature to reduce or eliminate the impacts of co-tier, cross-tier, or a combination of the two forms of interference. Following are some of the ways provided to manage interference: FFR, Cognitive Femtocell and Cooperative Resource Scheduling, Beamforming Strategy, Transmission Power Control, and Clustering/Graph-Based. Approaches, which were proposed to solve the interference problem, had been presented for each category in this work.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.