• Title/Summary/Keyword: Topic Model

Search Result 835, Processing Time 0.023 seconds

Generative probabilistic model with Dirichlet prior distribution for similarity analysis of research topic

  • Milyahilu, John;Kim, Jong Nam
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.4
    • /
    • pp.595-602
    • /
    • 2020
  • We propose a generative probabilistic model with Dirichlet prior distribution for topic modeling and text similarity analysis. It assigns a topic and calculates text correlation between documents within a corpus. It also provides posterior probabilities that are assigned to each topic of a document based on the prior distribution in the corpus. We then present a Gibbs sampling algorithm for inference about the posterior distribution and compute text correlation among 50 abstracts from the papers published by IEEE. We also conduct a supervised learning to set a benchmark that justifies the performance of the LDA (Latent Dirichlet Allocation). The experiments show that the accuracy for topic assignment to a certain document is 76% for LDA. The results for supervised learning show the accuracy of 61%, the precision of 93% and the f1-score of 96%. A discussion for experimental results indicates a thorough justification based on probabilities, distributions, evaluation metrics and correlation coefficients with respect to topic assignment.

The viewpoint-based product information modeling in collaborative product development (협업적 제품개발에서의 관점기반 제품정보 모델링)

  • 채희권;최영환;김광수
    • Proceedings of the CALSEC Conference
    • /
    • 2003.09a
    • /
    • pp.54-59
    • /
    • 2003
  • The information sharing is essential to make collaboration by participants in the collaboration environment. The sharing of the information is necessary to reduce time-to-market of new Product. In this paper, V2-model is proposed far supporting the sharing of the information on product development. V2-model supports collaborative product development in design and supply chain. Through viewpoints, V2-model supports 1) two-level structure that consist of private level and public level ,2) level-up process and 3) product development process. The public level information supports to share the product information on collaborative supply chain and design. The viewpoints in V2-model are divided into public viewpoints that point to the public level information and private viewpoints that point to the private level information. Private viewpoints are transformed into public viewpoints. The extended Topic Map has B-Topic, S-Topic and View for representing V2-model in this paper. The level-up process of V2-model is implemented through the merging of S-Topics. V2-model is implemented with washing machine model using extended Topic Maps. In this model, the public viewpoints and private viewpoints are represented and the level-up process, which transforms private viewpoints into public viewpoints, is implemented.

  • PDF

Jointly Image Topic and Emotion Detection using Multi-Modal Hierarchical Latent Dirichlet Allocation

  • Ding, Wanying;Zhu, Junhuan;Guo, Lifan;Hu, Xiaohua;Luo, Jiebo;Wang, Haohong
    • Journal of Multimedia Information System
    • /
    • v.1 no.1
    • /
    • pp.55-67
    • /
    • 2014
  • Image topic and emotion analysis is an important component of online image retrieval, which nowadays has become very popular in the widely growing social media community. However, due to the gaps between images and texts, there is very limited work in literature to detect one image's Topics and Emotions in a unified framework, although topics and emotions are two levels of semantics that often work together to comprehensively describe one image. In this work, a unified model, Joint Topic/Emotion Multi-Modal Hierarchical Latent Dirichlet Allocation (JTE-MMHLDA) model, which extends previous LDA, mmLDA, and JST model to capture topic and emotion information at the same time from heterogeneous data, is proposed. Specifically, a two level graphical structured model is built to realize sharing topics and emotions among the whole document collection. The experimental results on a Flickr dataset indicate that the proposed model efficiently discovers images' topics and emotions, and significantly outperform the text-only system by 4.4%, vision-only system by 18.1% in topic detection, and outperforms the text-only system by 7.1%, vision-only system by 39.7% in emotion detection.

  • PDF

A Process-Centered Knowledge Model for Analysis of Technology Innovation Procedures

  • Chun, Seungsu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.3
    • /
    • pp.1442-1453
    • /
    • 2016
  • Now, there are prodigiously expanding worldwide economic networks in the information society, which require their social structural changes through technology innovations. This paper so tries to formally define a process-centered knowledge model to be used to analyze policy-making procedures on technology innovations. The eventual goal of the proposed knowledge model is to apply itself to analyze a topic network based upon composite keywords from a document written in a natural language format during the technology innovation procedures. Knowledge model is created to topic network that compositing driven keyword through text mining from natural language in document. And we show that the way of analyzing knowledge model and automatically generating feature keyword and relation properties into topic networks.

Mobile Content Curation Service Based on Real-Time Request/Response Model (실시간 요청/응답 모델에 기반한 모바일 콘텐츠 큐레이션 서비스)

  • Kim, Namyun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.1-6
    • /
    • 2014
  • This paper proposes a mobile content curation service to collect various online/offline publications. The company publishes one-time topic information to a broker server in advance and customer curates topic information on a mobile device by requesting it. The main characteristics of the proposed service are: it is based on request/response model rather than existing publish/subscribe model, can easily specify topic information by input string without QR code or audio recognition, and retrieves all of topic information anywhere anytime by storing it on mobile device. This service can be used for second screen campaign for TV and various online/offline events.

A Study on the Research Topics and Trends in South Korea: Focusing on Particulate Matter (토픽모델링을 이용한 국내 미세먼지 연구 분류 및 연구동향 분석)

  • Park, Hyemin;Kim, Taeyong;Kwon, Daewoong;Heo, Junyong;Lee, Juyeon;Yang, Minjune
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.873-885
    • /
    • 2022
  • The particulate matter (PM) has emerged as a hot topic around the world as it has been reported that PM is related to an increase in mortality and prevalence rates. In South Korea, the importance of PM has been recognized since the late 1990s, and various studies on PM have been conducted. This study investigated the PM research topics and trends for papers (D=2,764) published in Research Information Sharing Service (RISS) using topic modeling based on Latent Dirichlet Allocation (LDA). As a result, a total of 10 topics were identified in the whole papers, and the PM research topics were classified as 'PM reduction (Topic 1)', 'Government policy and management (Topic 2)', 'Characteristics of PM (Topic 3)', 'PM model (Topic 4)', 'Environmental education (Topic 5)', 'Bio (Topic 6)', 'Traffic (Topic 7)', 'Asian dust (Topic 8)', 'Indoor PM (Topic 9)', 'Human risk (Topic 10)'. In particular, the proportion of papers on topics 'Government policy and management (Topic 2)', 'PM model (Topic 4)', 'Environmental education (Topic 5)', and 'Bio (Topic 6)' to the toal number of papers increased over time (linear slope > 0). The results of this study provide the new literature review methodology related to particulate matter and the history and insight.

Topic Model Analysis of Research Trend on Spatial Big Data (공간빅데이터 연구 동향 파악을 위한 토픽모형 분석)

  • Lee, Won Sang;Sohn, So Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.1
    • /
    • pp.64-73
    • /
    • 2015
  • Recent emergence of spatial big data attracts the attention of various research groups. This paper analyzes the research trend on spatial big data by text mining the related Scopus DB. We apply topic model and network analysis to the extracted abstracts of articles related to spatial big data. It was observed that optics, astronomy, and computer science are the major areas of spatial big data analysis. The major topics discovered from the articles are related to mobile/cloud/smart service of spatial big data in urban setting. Trends of discovered topics are provided over periods along with the results of topic network. We expect that uncovered areas of spatial big data research can be further explored.

Recognizing Actions from Different Views by Topic Transfer

  • Liu, Jia
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.4
    • /
    • pp.2093-2108
    • /
    • 2017
  • In this paper, we describe a novel method for recognizing human actions from different views via view knowledge transfer. Our approach is characterized by two aspects: 1) We propose a unsupervised topic transfer model (TTM) to model two view-dependent vocabularies, where the original bag of visual words (BoVW) representation can be transferred into a bag of topics (BoT) representation. The higher-level BoT features, which can be shared across views, can connect action models for different views. 2) Our features make it possible to obtain a discriminative model of action under one view and categorize actions in another view. We tested our approach on the IXMAS data set, and the results are promising, given such a simple approach. In addition, we also demonstrate a supervised topic transfer model (STTM), which can combine transfer feature learning and discriminative classifier learning into one framework.

A Method of Calculating Topic Keywords for Topic Labeling (토픽 레이블링을 위한 토픽 키워드 산출 방법)

  • Kim, Eunhoe;Suh, Yuhwa
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.16 no.3
    • /
    • pp.25-36
    • /
    • 2020
  • Topics calculated using LDA topic modeling have to be labeled separately. When labeling a topic, we look at the words that represent the topic, and label the topic. Therefore, it is important to first make a good set of words that represent the topic. This paper proposes a method of calculating a set of words representing a topic using TextRank, which extracts the keywords of a document. The proposed method uses Relevance to select words related to the topic with discrimination. It extracts topic keywords using the TextRank algorithm and connects keywords with a high frequency of simultaneous occurrence to express the topic with a higher coverage.

Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model

  • Jeong, Young-Seob;Jin, Sou-Young;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.1
    • /
    • pp.81-98
    • /
    • 2013
  • Since Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) were introduced, many revised or extended topic models have appeared. Due to the intractable likelihood of these models, training any topic model requires to use some approximation algorithm such as variational approximation, Laplace approximation, or Markov chain Monte Carlo (MCMC). Although these approximation algorithms perform well, training a topic model is still computationally expensive given the large amount of data it requires. In this paper, we propose a new method, called non-simultaneous sampling deactivation, for efficient approximation of parameters in a topic model. While each random variable is normally sampled or obtained by a single predefined burn-in period in the traditional approximation algorithms, our new method is based on the observation that the random variable nodes in one topic model have all different periods of convergence. During the iterative approximation process, the proposed method allows each random variable node to be terminated or deactivated when it is converged. Therefore, compared to the traditional approximation ways in which usually every node is deactivated concurrently, the proposed method achieves the inference efficiency in terms of time and memory. We do not propose a new approximation algorithm, but a new process applicable to the existing approximation algorithms. Through experiments, we show the time and memory efficiency of the method, and discuss about the tradeoff between the efficiency of the approximation process and the parameter consistency.