• 제목/요약/키워드: Topic Information

검색결과 1,928건 처리시간 0.026초

분야연상어를 이용한 화제분야의 계산방법과 단락검색 (Passage Retrieval and Calculation Method of Topic Field by Using Field-Associated Terms)

  • 이상곤
    • 정보처리학회논문지B
    • /
    • 제12B권1호
    • /
    • pp.57-68
    • /
    • 2005
  • 텍스트에 임베디드 되어 있는 부가적인 정보를 이용하여 문서의 실제적인 의미단위인 텍스트를 분리하는 단락검색은 중요한 기술이다. 본 논문에서는 문서의 분야에 적합한 단락만을 분리하여 사용자의 요구에 적합한 단락을 추출하는 기술을 설명한다. 문서에서 분야연상어론 추출하여, 각 문장마다 화제의 분야가 어떻게 커져가고, 줄어들고, 변화하여 가는지를 측정하는 방법을 실험을 통해 설명한다. 긴 문서에서 어떤 화제가 출현하는가를 파악하고, 화제가 계속되거나 혹은 전환되는 지점을 측정하고, 분야별로 단락을 구분하는 방법을 계산한다. 12,500개의 한국어 신문기사를 이용하여 실험한 결과 $88{\%}$의 정확률과 $78{\%}$의 재현율을 얻을 수 있었다.

동적 토픽분석을 활용한 스마트그리드 연구동향 분석 (Research Trend Analysis for Smart Grids Using Dynamic Topic Modeling)

  • 나상태;안주언;정민호;김자희
    • 전기학회논문지
    • /
    • 제66권4호
    • /
    • pp.613-620
    • /
    • 2017
  • The power grid has been changed to a smart grid system to satisfy the growing need for power grid complexity, demand, reliability, security, and efficiency with a combination of existing power and ICT technology. This study analyzes the research trends in smart grid technology in the period since the introduction of the smart grid system and compares it with industrial trends to grasp the progress and characteristics of Smart Grid technology and look for ways to innovate the technology. To do this, we analyze the research trends using dynamic topic modeling, which is capable of time-series research topic analysis. Next, we compare the results of research trends with industrial trends analyzed by Gartner's experts to demonstrate that smart grid research is evolving to the level of industrialization. The results of this study are quantitative analysis through data mining, and it is expected that it will be used in many fields such as companies that want to participate in industry and government agencies that need to establish policies by showing more objective analysis results.

A Study of Verb-Second Phenomena in Medieval Spanish Complex Sentences

  • Cho Eun-Young
    • 한국언어정보학회지:언어와정보
    • /
    • 제9권2호
    • /
    • pp.85-105
    • /
    • 2005
  • This study aims at investigating the 'verb-second' phenomena indicated in complex sentences of medieval Spanish. Especially, when the complex sentence is composed of a preposed adverbial clause and its succeeding main clause, the subject inversion is noticeable in the latter. The fundamental motive of this type of inversion is due to the 'verb-second' structure, in which a topic appears in the first position and the verb immediately after the topic. So it can be said that the subject inversion is a prerequisite for a verb to be located in the second position when the adverbial clause functions as a topic to the main clause, as is often the case with Germanic languages like German, Dutch, etc.. On the contrary, modern Spanish complex sentences do not show this phenomenon, with a strong tendency to locate a grammatical subject in the preverbal position. Therefore, medieval Spanish might be typologically closer to Germanic languages than to modern Spanish. In order to argue for this assumption, the formal and functional criteria by which the preposed adverbial clause could be defined as a topic NP will be examined across the comparition with left-dislocation structure.

  • PDF

사회연결망분석을 활용한 스마트그리드 연구동향 분석 (Research Trend Analysis for Smart Grid using Social Network Analysis)

  • 나상태;안주언;;김자희
    • 전기학회논문지
    • /
    • 제66권12호
    • /
    • pp.1697-1704
    • /
    • 2017
  • As the power grid has been changed to a smart grid, existing power technologies are evolving into convergence technology through interdisciplinary research. According to the government policy to increase the proportion of renewable energy to 20% by 2030, the speed seems to be accelerating. This study analyzes the relationship between research technologies in order to grasp research trends of smart grid technology. To this end, we analyze the relationship between keywords extracted from topic modeling using social network analysis methodology. This is because, in the field where interdisciplinary research such as smart grid is active, each research topic is not independent, but research technologies emerging in one topic coexist in different topics, and linkage between research technologies can be important information. Therefore, this study can contribute to the analysis of research trend as it can be used as a package tool together with a topic modeling methodology.

다중 네트워크 분석과 토픽 모델링을 이용한 임진왜란 시기 사료에 관한 연구 (A Study on the Imjin War's Historical Materials with Multi-layer Network Analysis and Topic Modeling)

  • 조현철;송민
    • 한국비블리아학회지
    • /
    • 제33권1호
    • /
    • pp.167-198
    • /
    • 2022
  • 융합 과학 연구가 활성화되며 인문학에서도 디지털 인문학(Digital Humanities) 연구가 장려되고 있다. 이에 본 연구는 역사 데이터에 텍스트마이닝과 개체계량학 연구 방법을 적용한 시론(試論) 연구를 제안하고자 하였다. 선조실록(宣祖實錄)·선조수정실록(宣祖修正實錄), 난중잡록(亂中雜錄), 징비록(懲毖錄)을 활용하였으며, 사료(史料)에서 주제 변화와 공통 개체를 탐색하기 위해서 네트워크 분석과 DMR 토픽모델을 사용하였다. 분석 결과를 통해서 텍스트 데이터에 대한 계량 분석의 활용 가능성 확인, 특정 주제의 시기적 변화, 인물 개체 간 미발견 관계를 제시함으로써 연구의 확장 가능성을 제안할 수 있었다.

Comparing Social Media and News Articles on Climate Change: Different Viewpoints Revealed

  • Kang Nyeon Lee;Haein Lee;Jang Hyun Kim;Youngsang Kim;Seon Hong Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권11호
    • /
    • pp.2966-2986
    • /
    • 2023
  • Climate change is a constant threat to human life, and it is important to understand the public perception of this issue. Previous studies examining climate change have been based on limited survey data. In this study, the authors used big data such as news articles and social media data, within which the authors selected specific keywords related to climate change. Using these natural language data, topic modeling was performed for discourse analysis regarding climate change based on various topics. In addition, before applying topic modeling, sentiment analysis was adjusted to discover the differences between discourses on climate change. Through this approach, discourses of positive and negative tendencies were classified. As a result, it was possible to identify the tendency of each document by extracting key words for the classified discourse. This study aims to prove that topic modeling is a useful methodology for exploring discourse on platforms with big data. Moreover, the reliability of the study was increased by performing topic modeling in consideration of objective indicators (i.e., coherence score, perplexity). Theoretically, based on the social amplification of risk framework (SARF), this study demonstrates that the diffusion of the agenda of climate change in public news media leads to personal anxiety and fear on social media.

텍스트 분석 기술 및 활용 동향 (Investigations on Techniques and Applications of Text Analytics)

  • 김남규;이동훈;최호창
    • 한국통신학회논문지
    • /
    • 제42권2호
    • /
    • pp.471-492
    • /
    • 2017
  • 최근 데이터의 양 자체가 해결해야 할 문제의 일부분이 되는 빅데이터(Big Data) 분석에 대한 수요와 관심이 급증하고 있다. 빅데이터는 기존의 정형 데이터 뿐 아니라 이미지, 동영상, 로그 등 다양한 형태의 비정형 데이터 또한 포함하는 개념으로 사용되고 있으며, 다양한 유형의 데이터 중 특히 정보의 표현 및 전달을 위한 대표적 수단인 텍스트(Text) 분석에 대한 연구가 활발하게 이루어지고 있다. 텍스트 분석은 일반적으로 문서 수집, 파싱(Parsing) 및 필터링(Filtering), 구조화, 빈도 분석 및 유사도 분석의 순서로 수행되며, 분석의 결과는 워드 클라우드(Word Cloud), 워드 네트워크(Word Network), 토픽 모델링(Topic Modeling), 문서 분류, 감성 분석 등의 형태로 나타나게 된다. 특히 최근 다양한 소셜미디어(Social Media)를 통해 급증하고 있는 텍스트 데이터로부터 주요 토픽을 파악하기 위한 수요가 증가함에 따라, 방대한 양의 비정형 텍스트 문서로부터 주요 토픽을 추출하고 각 토픽별 해당 문서를 묶어서 제공하는 토픽 모델링에 대한 연구 및 적용 사례가 다양한 분야에서 생성되고 있다. 이에 본 논문에서는 텍스트 분석 관련 주요 기술 및 연구 동향을 살펴보고, 토픽 모델링을 활용하여 다양한 분야의 문제를 해결한 연구 사례를 소개한다.

토픽 모델링을 이용한 방송미디어 관련 소셜 미디어 콘텐츠 분석 (Analysis of Social Media Contents about Broadcast Media through Topic Modeling)

  • 박상언
    • 한국IT서비스학회지
    • /
    • 제15권2호
    • /
    • pp.81-92
    • /
    • 2016
  • Numerous people share their TV experience with other viewers on social media such as personal blogs and Twitter. It means that broadcast media, especially TV, affects the responses on social media. Moreover, the responses affect broadcast media ratings back. Social TV tried to use the relationship in marketing activities such as advertisement by analyzing the TV related social behavior. However, most of them used just the quantities of social media responses. This study analyzes the subjects of the responses on social media about specific TV dramas through topic modeling, and the relationship between the changes of popular topics and viewer ratings of the drama over specified periods. Five representative Korean dramas of 2014 were selected and Blog contents including viewer ratings about the dramas were collected from naver.com which is the representative portal in South Korea. The proposed analysis framework consists of three steps which are Blogs crawling, topic modeling, and topic trend analysis. We found some implications from the results of the topic trend analysis. Firstly, there were specific topics on dramas in social media. Secondly, the topics had some meaningful relationships with viewer ratings. Lastly, there were differences between the topics of dramas with higher viewer ratings and those with lower viewer ratings.

Topic Signature를 이용한 댓글 분류 시스템 (Comments Classification System using Topic Signature)

  • 배민영;차정원
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제35권12호
    • /
    • pp.774-779
    • /
    • 2008
  • 본 논문에서는 토픽 시그너처(Topic Signature)를 이용하여 댓글을 분류하는 시스템에 대해서 설명한다. 토픽 시그너처는 자질을 선택하는 방법으로 문서요약이나 문서분류에서 사용하는 방법이다. 댓글은 문장의 길이가 짧고 띄어쓰기가 거의 없으며 특수문자들이 많은 특성을 가지고 있다. 따라서 우리는 댓글을 7개의 음절로 나누고 이를 다시 Tri-gram으로 나누어 분류의 기본단위로 본다. 이 Tri-gram을 토픽 시그너처를 이용한 학습 단위로 사용하고, 학습한 자질을 베이지안(Bayesian) 모델을 사용하여 분류한다. 다양한 방법의 모델과 비교 실험을 통하여 구현한 시스템의 성능이 기존의 방법보다 상승되었음을 실험 결과를 통해 알 수 있었다.

Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model

  • Jeong, Young-Seob;Jin, Sou-Young;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권1호
    • /
    • pp.81-98
    • /
    • 2013
  • Since Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) were introduced, many revised or extended topic models have appeared. Due to the intractable likelihood of these models, training any topic model requires to use some approximation algorithm such as variational approximation, Laplace approximation, or Markov chain Monte Carlo (MCMC). Although these approximation algorithms perform well, training a topic model is still computationally expensive given the large amount of data it requires. In this paper, we propose a new method, called non-simultaneous sampling deactivation, for efficient approximation of parameters in a topic model. While each random variable is normally sampled or obtained by a single predefined burn-in period in the traditional approximation algorithms, our new method is based on the observation that the random variable nodes in one topic model have all different periods of convergence. During the iterative approximation process, the proposed method allows each random variable node to be terminated or deactivated when it is converged. Therefore, compared to the traditional approximation ways in which usually every node is deactivated concurrently, the proposed method achieves the inference efficiency in terms of time and memory. We do not propose a new approximation algorithm, but a new process applicable to the existing approximation algorithms. Through experiments, we show the time and memory efficiency of the method, and discuss about the tradeoff between the efficiency of the approximation process and the parameter consistency.