• Title/Summary/Keyword: 토픽 정보추출

Search Result 162, Processing Time 0.031 seconds

Analysis of English abstracts in Journal of the Korean Data & Information Science Society using topic models and social network analysis (토픽 모형 및 사회연결망 분석을 이용한 한국데이터정보과학회지 영문초록 분석)

  • Kim, Gyuha;Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.151-159
    • /
    • 2015
  • This article analyzes English abstracts of the articles published in Journal of the Korean Data & Information Science Society using text mining techniques. At first, term-document matrices are formed by various methods and then visualized by social network analysis. LDA (latent Dirichlet allocation) and CTM (correlated topic model) are also employed in order to extract topics from the abstracts. Performances of the topic models are compared via entropy for several numbers of topics and weighting methods to form term-document matrices.

Analysis of Issues Related to Artificial Intelligence Based on Topic Modeling (토픽모델링을 활용한 인공지능 관련 이슈 분석)

  • Noh, Seol-Hyun
    • Journal of Digital Convergence
    • /
    • v.18 no.5
    • /
    • pp.75-87
    • /
    • 2020
  • The present study determined new value that can be created through the convergence between artificial intelligence technology (AIT) and all industries by deriving and thoroughly analyzing major issues related to artificial intelligence (AI). This study analyzes domestic articles related to AI using topic modeling method based on LDA algorithm. Keywords were extracted from 3,889 articles of eleven metropolitan newspapers, eight business newspapers and major broadcasting companies; articles were selected by searching for the keyword "artificial intelligence". Keywords were extracted by optimizing the relevance parameter λ to improve the measure of pointwise mutual information (PMI), which shows the association among the keywords of each topic, and topic names were inferred from keywords based on valid evidence. The extracted topics widely showed changes occurring throughout society, economy, industries, culture, and the support policy and vision of the government.

Subtopic Mining from the View of Dependency Structure (의존 구문 구조 관점으로 본 서브토픽 마이닝)

  • Kim, Se-Jong;Lee, Jong-Hyeok
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.294-296
    • /
    • 2012
  • 본 논문은 일본어 웹 문서 말뭉치로부터 의존 구문 구조 관점으로 바라본 단어들의 동시발생(co-occurrence) 정보를 사용하여 서브토픽 마이닝(subtopic mining)을 수행하는 방법론을 제안한다. 우리는 의존 구문 구조를 반영하는 간단한 패턴들을 사용하여 서브토픽들을 추출 및 생성하고, 제안한 수식을 바탕으로 순위화한다. 본 방법론은 기존의 주요 상용 검색 서비스에서 제공하는 연관 검색어 및 추천 검색어를 사용한 방법론보다 좋은 성능을 보였다.

Analysis of trends in deep learning and reinforcement learning

  • Dong-In Choi;Chungsoo Lim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.55-65
    • /
    • 2023
  • In this paper, we apply KeyBERT(Keyword extraction with Bidirectional Encoder Representations of Transformers) algorithm-driven topic extraction and topic frequency analysis to deep learning and reinforcement learning research to discover the rapidly changing trends in them. First, we crawled abstracts of research papers on deep learning and reinforcement learning, and temporally divided them into two groups. After pre-processing the crawled data, we extracted topics using KeyBERT algorithm, and then analyzed the extracted topics in terms of topic occurrence frequency. This analysis reveals that there are distinct trends in research work of all analyzed algorithms and applications, and we can clearly tell which topics are gaining more interest. The analysis also proves the effectiveness of the utilized topic extraction and topic frequency analysis in research trend analysis, and this trend analysis scheme is expected to be used for research trend analysis in other research fields. In addition, the analysis can provide insight into how deep learning will evolve in the near future, and provide guidance for select research topics and methodologies by informing researchers of research topics and methodologies which are recently attracting attention.

Exploring Regional Decline Risk Areas and Factors Using Topic Modeling and Cluster Analysis (토픽모델링과 군집분석을 통한 지방 소멸 위험지역과 요인의 탐색)

  • Ji-Min Kim;Heeryon Cho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.349-350
    • /
    • 2023
  • 우리나라는 지속적인 저출산과 고령화로 인해 지방 소멸 위험지역이 점차 늘어나고 있다. 본 연구는 지방 소멸과 관련된 다양한 요인을 '인구 소멸'이라는 키워드를 포함하는 신문 기사에 대한 토픽모델링을 통해 발견하고, 추출된 토픽과 관련된 공공 데이터를 수집하여 비슷한 특징을 가지는 지역을 묶는 군집분석을 수행한다. 그리고 지방소멸위험지수로 분류된 소멸 위험지역과 군집분석 결과를 비교한다.

Twitter Sentiment Analysis for the Recent Trend Extracted from the Newspaper Article (신문기사로부터 추출한 최근동향에 대한 트위터 감성분석)

  • Lee, Gyoung Ho;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.10
    • /
    • pp.731-738
    • /
    • 2013
  • We analyze public opinion via a sentiment analysis of tweets collected by using recent topic keywords extracted from newspaper articles. Newspaper articles collected within a certain period of time are clustered by using K-means algorithm and topic keywords for each cluster are extracted by using term frequency. A sentiment analyzer learned by a machine learning method can classify tweets according to their polarity values. We have an assumption that tweets collected by using these topic keywords deal with the same topics as the newspaper articles mentioned if the tweets and the newspapers are generated around the same time. and we tried to verify the validity of this assumption.

Extracting Korean-English Parallel Sentences based on Measure of Sentences Similarity Using Sequential Matching of Heterogeneous Language Resources (이질적인 언어 자원의 순차적 매칭을 이용한 문장 유사도 계산 기반의 위키피디아 한국어-영어 병렬 문장 추출 방법)

  • Cheon, Juryong;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.127-132
    • /
    • 2014
  • 본 논문은 위키피디아로부터 한국어-영어 간 병렬 문장을 추출하기 위해 이질적 언어 자원의 순차적 매칭을 적용한 유사도 계산 방법을 제안한다. 선행 연구에서는 병렬 문장 추출을 위해 언어 자원별로 유사도를 계산하여 선형 결합하였고, 토픽모델을 이용해 추정한 단어의 토픽 분포를 유사도 계산에 추가로 이용함으로써 병렬 문장 추출 성능을 향상시켰다. 하지만, 이는 언어 자원들이 독립적으로 사용되어 각 언어자원이 가지는 오류가 문장 간 유사도 계산에 반영되는 문제와 관련이 적은 단어 간의 분포가 유사도 계산에 반영되는 문제가 있다. 본 논문에서는 이질적인 언어 자원들을 이용해 순차적으로 단어를 매칭함으로써 언어 자원들의 독립적인 사용으로 각 자원의 오류가 유사도에 반영되는 문제를 해결하였고, 관련이 높은 단어의 분포만을 유사도 계산에 이용함으로써 관련이 적은 단어의 분포가 반영되는 문제를 해결하였다. 실험을 통해, 언어 자원들을 이용해 순차적으로 매칭한 유사도 계산 방법은 선행 연구에 비해 F1-score 48.4%에서 51.3%로 향상된 성능을 보였고, 관련이 높은 단어의 분포만을 유사도 계산에 이용한 방법은 약 10%에서 34.1%로 향상된 성능을 얻었다. 마지막으로, 제안한 유사도 방법들을 결합함으로써 선행연구의 51.6%에서 2.7%가 향상된 54.3%의 성능을 얻었다.

  • PDF

Subtopic Mining of Two-level Hierarchy Based on Hierarchical Search Intentions and Web Resources (계층적 검색 의도와 웹 자원을 활용한 2계층 구조의 서브토픽 마이닝)

  • Kim, Se-Jong;Lee, Jong-Hyeok
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.83-88
    • /
    • 2016
  • Subtopic mining is the extraction and ranking of possible subtopics, which disambiguate and specify the search intentions of an input query in terms of relevance, popularity, and diversity. This paper describes the limitations of previous studies on the utilization of web resources, and proposes a subtopic mining method with a two-level hierarchy based on hierarchical search intentions and web resources, in order to overcome these limitations. Considering the characteristics of resources provided by the official subtopic mining task, we extract various second-level subtopics reflecting hierarchical search intentions from web documents, and expand and re-rank them using other provided resources. Terms in subtopics with wider search intentions are used to generate first-level subtopics. Our method performed better than state-of-the-art methods in almost every aspect.

Design and Implementation of Topic Map Generation System based Tag (태그 기반 토픽맵 생성 시스템의 설계 및 구현)

  • Lee, Si-Hwa;Lee, Man-Hyoung;Hwang, Dae-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.5
    • /
    • pp.730-739
    • /
    • 2010
  • One of core technology in Web 2.0 is tagging, which is applied to multimedia data such as web document of blog, image and video etc widely. But unlike expectation that the tags will be reused in information retrieval and then maximize the retrieval efficiency, unacceptable retrieval results appear owing to toot limitation of tag. In this paper, in the base of preceding research about image retrieval through tag clustering, we design and implement a topic map generation system which is a semantic knowledge system. Finally, tag information in cluster were generated automatically with topics of topic map. The generated topics of topic map are endowed with mean relationship by use of WordNet. Also the topics are endowed with occurrence information suitable for topic pair, and then a topic map with semantic knowledge system can be generated. As the result, the topic map preposed in this paper can be used in not only user's information retrieval demand with semantic navigation but alse convenient and abundant information service.

A Study on Analysis of national R&D research trends for Artificial Intelligence using LDA topic modeling (LDA 토픽모델링을 활용한 인공지능 관련 국가R&D 연구동향 분석)

  • Yang, MyungSeok;Lee, SungHee;Park, KeunHee;Choi, KwangNam;Kim, TaeHyun
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.47-55
    • /
    • 2021
  • Analysis of research trends in specific subject areas is performed by examining related topics and subject changes by using topic modeling techniques through keyword extraction for most of the literature information (paper, patents, etc.). Unlike existing research methods, this paper extracts topics related to the research topic using the LDA topic modeling technique for the project information of national R&D projects provided by the National Science and Technology Knowledge Information Service (NTIS) in the field of artificial intelligence. By analyzing these topics, this study aims to analyze research topics and investment directions for national R&D projects. NTIS provides a vast amount of national R&D information, from information on tasks carried out through national R&D projects to research results (thesis, patents, etc.) generated through research. In this paper, the search results were confirmed by performing artificial intelligence keywords and related classification searches in NTIS integrated search, and basic data was constructed by downloading the latest three-year project information. Using the LDA topic modeling library provided by Python, related topics and keywords were extracted and analyzed for basic data (research goals, research content, expected effects, keywords, etc.) to derive insights on the direction of research investment.