• Title/Summary/Keyword: 구조적 토픽모델

Search Result 22, Processing Time 0.035 seconds

Topic maps Matching and Merging Techniques based on Partitioning of Topics (토픽 분할에 의한 토픽맵 매칭 및 통합 기법)

  • Kim, Jung-Min;Chung, Hyun-Sook
    • The KIPS Transactions:PartD
    • /
    • v.14D no.7
    • /
    • pp.819-828
    • /
    • 2007
  • In this paper, we propose a topic maps matching and merging approach based on the syntactic or semantic characteristics and constraints of the topic maps. Previous schema matching approaches have been developed to enhance effectiveness and generality of matching techniques. However they are inefficient because the approaches should transform input ontologies into graphs and take into account all the nodes and edges of the graphs, which ended up requiring a great amount of processing time. Now, standard languages for developing ontologies are RDF/OWL and Topic Maps. In this paper, we propose an enhanced version of matching and merging technique based on topic partitioning, several matching operations and merging conflict detection.

A study on integration of semantic topic based Knowledge model (의미적 토픽 기반 지식모델의 통합에 관한 연구)

  • Chun, Seung-Su;Lee, Sang-Jin;Bae, Sang-Tea
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.181-183
    • /
    • 2012
  • 최근 자연어 및 정형언어 처리, 인공지능 알고리즘 등을 활용한 효율적인 의미 기반 지식모델의 생성과 분석 방법이 제시되고 있다. 이러한 의미 기반 지식모델은 효율적 의사결정트리(Decision Making Tree)와 특정 상황에 대한 체계적인 문제해결(Problem Solving) 경로 분석에 활용된다. 특히 다양한 복잡계 및 사회 연계망 분석에 있어 정적 지표 생성과 회귀 분석, 행위적 모델을 통한 추이분석, 거시예측을 지원하는 모의실험(Simulation) 모형의 기반이 된다. 본 연구에서는 이러한 의미 기반 지식모델을 통합에 있어 텍스트 마이닝을 통해 도출된 토픽(Topic) 모델 간 통합 방법과 정형적 알고리즘을 제시한다. 이를 위해 먼저, 텍스트 마이닝을 통해 도출되는 키워드 맵을 동치적 지식맵으로 변환하고 이를 의미적 지식모델로 통합하는 방법을 설명한다. 또한 키워드 맵으로부터 유의미한 토픽 맵을 투영하는 방법과 의미적 동치 모델을 유도하는 알고리즘을 제안한다. 통합된 의미 기반 지식모델은 토픽 간의 구조적 규칙과 정도 중심성, 근접 중심성, 매개 중심성 등 관계적 의미분석이 가능하며 대규모 비정형 문서의 의미 분석과 활용에 실질적인 기반 연구가 될 수 있다.

Mapping of Characteristics and Hierarchy between Heterogeneous Ontology Languages (이형 온톨로지 언어의 속성 및 계층구조 매핑)

  • Hong, Hyeun-Sool
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10b
    • /
    • pp.131-136
    • /
    • 2007
  • 토픽맵은 RDF에 기반을 둔 OWL과 많은 유사점을 갖지만, 양자는 역사적, 기술적, 의도하는 목적에서 차이가 있다. 토픽맵은 ISO 표준이지만, OWL은 W3C의 온톨로지 개발 표준언어로서 양자는 각각의 제약언어, 데이터 모델, 그리고 일련의 구문들을 별개로 갖는다. 그러나 토픽맵과 OWL 양자는 지식을 표현하는 온톨로지 언어라는 공통적 특성을 가지며, 술어로직에 기반을 두고 있고, XML포맷이기 때문에 상호간에 매핑이 가능하다. 논문의 목적은 토픽맵과 OWL의 메타모델로부터 온톨로지 정보자원의 공유, 교환, 통합에 접근시킨다. 따라서 각각의 메타모델에서 주요 요소를 추출하고, 이들의 의미적인 측면과 구조적인 측면의 요소들의 손실이 발생되지 않도록 매핑을 수행한다.

  • PDF

Topic and Sentiment Analysis on COVID19 Research in Korea Using Text Analysis (텍스트 분석을 이용한 코로나19 관련 국내논문의 토픽 및 감성연구)

  • Heo, Seong-Min;Yang, Ji-Yeon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.329-331
    • /
    • 2021
  • 본 연구에서는 코로나19 관련 연구논문의 연구주제를 탐색하고 동향을 검토하고 있다. 또한 감성분석을 통해 부정적인 어조가 강한 경고가 되는 주제들을 알아본다. 잠재 디리슐레 할당(LDA)를 이용하여 총 8개의 토픽을 발견하 였고, 이를 구조적 토픽 모델링(STM)과 비교하여 비교적 안정적인 결과임을 확인하였다. 또한 k-means 군집 알고리즘을 통해 각 토픽별로 세부 연구주제를 발견하였고 주성분 분석을 이용하여 이를 시각적으로 표현하였다. 감성분석을 통해 각 토픽별 긍정적, 부정적인 단어들을 살펴보고 감성점수를 계산하여 연구논문의 주된 어조를 파악하였는데, 특히 생물 의학 관련, 국제적 역학관계, 심리적 영향과 관련된 연구에서 부정적인 어조가 강한 것으로 나타나 해당 부문에 대해서 주의와 관심이 요구된다. 향후 연구자들이 연구의 방향성을 탐색하고 정책결정자들이 연구지원 사업을 결정하는데 기초자료로 활용될 수 있을 것이다.

  • PDF

Text Classification using Cloze Question based on KorBERT (KorBERT 기반 빈칸채우기 문제를 이용한 텍스트 분류)

  • Heo, Jeong;Lee, Hyung-Jik;Lim, Joon-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.486-489
    • /
    • 2021
  • 본 논문에서는 KorBERT 한국어 언어모델에 기반하여 텍스트 분류문제를 빈칸채우기 문제로 변환하고 빈칸에 적합한 어휘를 예측하는 방식의 프롬프트기반 분류모델에 대해서 소개한다. [CLS] 토큰을 이용한 헤드기반 분류와 프롬프트기반 분류는 사전학습의 NSP모델과 MLM모델의 특성을 반영한 것으로, 텍스트의 의미/구조적 분석과 의미적 추론으로 구분되는 텍스트 분류 태스크에서의 성능을 비교 평가하였다. 의미/구조적 분석 실험을 위해 KLUE의 의미유사도와 토픽분류 데이터셋을 이용하였고, 의미적 추론 실험을 위해서 KLUE의 자연어추론 데이터셋을 이용하였다. 실험을 통해, MLM모델의 특성을 반영한 프롬프트기반 텍스트 분류에서는 의미유사도와 토픽분류 태스크에서 우수한 성능을 보였고, NSP모델의 특성을 반영한 헤드기반 텍스트 분류에서는 자연어추론 태스크에서 우수한 성능을 보였다.

  • PDF

A Multi-Strategic Mapping Approach for Distributed Topic Maps (분산 토픽맵의 다중 전략 매핑 기법)

  • Kim Jung-Min;Shin Hyo-phil;Kim Hyoung-Joo
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.1
    • /
    • pp.114-129
    • /
    • 2006
  • Ontology mapping is the task of finding semantic correspondences between two ontologies. In order to improve the effectiveness of ontology mapping, we need to consider the characteristics and constraints of data models used for implementing ontologies. Earlier research on ontology mapping, however, has proven to be inefficient because the approach should transform input ontologies into graphs and take into account all the nodes and edges of the graphs, which ended up requiring a great amount of processing time. In this paper, we propose a multi-strategic mapping approach to find correspondences between ontologies based on the syntactic or semantic characteristics and constraints of the topic maps. Our multi-strategic mapping approach includes a topic name-based mapping, a topic property-based mapping, a hierarchy-based mapping, and an association-based mapping approach. And it also uses a hybrid method in which a combined similarity is derived from the results of individual mapping approaches. In addition, we don't need to generate a cross-pair of all topics from the ontologies because unmatched pairs of topics can be removed by characteristics and constraints of the topic maps. For our experiments, we used oriental philosophy ontologies, western philosophy ontologies, Yahoo western philosophy dictionary, and Yahoo german literature dictionary as input ontologies. Our experiments show that the automatically generated mapping results conform to the outputs generated manually by domain experts, which is very promising for further work.

Semantic Dependency Link Topic Model for Biomedical Acronym Disambiguation (의미적 의존 링크 토픽 모델을 이용한 생물학 약어 중의성 해소)

  • Kim, Seonho;Yoon, Juntae;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.652-665
    • /
    • 2014
  • Many important terminologies in biomedical text are expressed as abbreviations or acronyms. We newly suggest a semantic link topic model based on the concepts of topic and dependency link to disambiguate biomedical abbreviations and cluster long form variants of abbreviations which refer to the same senses. This model is a generative model inspired by the latent Dirichlet allocation (LDA) topic model, in which each document is viewed as a mixture of topics, with each topic characterized by a distribution over words. Thus, words of a document are generated from a hidden topic structure of a document and the topic structure is inferred from observable word sequences of document collections. In this study, we allow two distinct word generation to incorporate semantic dependencies between words, particularly between expansions (long forms) of abbreviations and their sentential co-occurring words. Besides topic information, the semantic dependency between words is defined as a link and a new random parameter for the link presence is assigned to each word. As a result, the most probable expansions with respect to abbreviations of a given abstract are decided by word-topic distribution, document-topic distribution, and word-link distribution estimated from document collection though the semantic dependency link topic model. The abstracts retrieved from the MEDLINE Entrez interface by the query relating 22 abbreviations and their 186 expansions were used as a data set. The link topic model correctly predicted expansions of abbreviations with the accuracy of 98.30%.

An Analysis on Research Trends of Digital Humanities (디지털 인문학 분야의 국내외 연구 동향 분석)

  • Jeong, Yoo Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.37 no.2
    • /
    • pp.311-331
    • /
    • 2020
  • The purpose of this study is investigate the research trend on digital humanities. Previous studies focused on analyzing representative cases and national policies, not overall research trends in the digital humanities field. To this problem, this study intends to identify the intellectual structure of the digital humanities by adopting bibliometric approach. In this study, 1,765 articles retrieved from Web of Science and 514 records from RISS were analyzed to investigate research trends. Structural topic models were applied to examine research topics and to grasp the time-series trend. The results show that humanities-based convergence studies and digitization were main research interests in both side. In Korea, research topics related to cultural contents and storytelling were prominent, while in terms of providing digitized data, library and information science field was one of the important research topic abroad.

Analyzing Changes in Consumers' Interest Areas Related to Skin under the Pandemic: Focusing on Structural Topic Modeling (팬데믹에 따른 소비자의 피부 관련 관심 영역 변화 분석: 구조적 토픽모델링을 중심으로)

  • Nakyung Kim;Jiwon Park;HyungBin Moon
    • Knowledge Management Research
    • /
    • v.25 no.1
    • /
    • pp.173-192
    • /
    • 2024
  • This study aims to understand the changes in the beauty industry due to the pandemic from the consumer's perspective based on consumers' opinions about their skin online before and after the pandemic. Furthermore, this study tries to derive strategies for companies and governments to support sustainable growth and innovation in the beauty industry. To this end, posts on social media from 2017 to 2022 that contained the keyword 'skin concerns' are collected, and after data preprocessing, 96,908 posts are used for the structural topic model. To examine whether consumers' interest areas related to skin change according to the pandemic situation, the analysis period is divided into 7 periods, and the variables that distinguish each stage are used as meta-variables for the structural topic model. As a result, it is found that consumers' interests can be divided into 22 topics, which can be categorized into four main categories: beauty manufacturing, beauty services, skin concerns, and other. The results of this study are expected to be utilized in construction of product development and marketing strategies of related companies and the establishment of economic support policies by the government in response to changes in demand in the beauty industry due to the pandemic.

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.