• Title/Summary/Keyword: topic model

Search Result 846, Processing Time 0.022 seconds

Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model

  • Jeong, Young-Seob;Jin, Sou-Young;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.1
    • /
    • pp.81-98
    • /
    • 2013
  • Since Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) were introduced, many revised or extended topic models have appeared. Due to the intractable likelihood of these models, training any topic model requires to use some approximation algorithm such as variational approximation, Laplace approximation, or Markov chain Monte Carlo (MCMC). Although these approximation algorithms perform well, training a topic model is still computationally expensive given the large amount of data it requires. In this paper, we propose a new method, called non-simultaneous sampling deactivation, for efficient approximation of parameters in a topic model. While each random variable is normally sampled or obtained by a single predefined burn-in period in the traditional approximation algorithms, our new method is based on the observation that the random variable nodes in one topic model have all different periods of convergence. During the iterative approximation process, the proposed method allows each random variable node to be terminated or deactivated when it is converged. Therefore, compared to the traditional approximation ways in which usually every node is deactivated concurrently, the proposed method achieves the inference efficiency in terms of time and memory. We do not propose a new approximation algorithm, but a new process applicable to the existing approximation algorithms. Through experiments, we show the time and memory efficiency of the method, and discuss about the tradeoff between the efficiency of the approximation process and the parameter consistency.

Abnormal Behavior Recognition Based on Spatio-temporal Context

  • Yang, Yuanfeng;Li, Lin;Liu, Zhaobin;Liu, Gang
    • Journal of Information Processing Systems
    • /
    • v.16 no.3
    • /
    • pp.612-628
    • /
    • 2020
  • This paper presents a new approach for detecting abnormal behaviors in complex surveillance scenes where anomalies are subtle and difficult to distinguish due to the intricate correlations among multiple objects' behaviors. Specifically, a cascaded probabilistic topic model was put forward for learning the spatial context of local behavior and the temporal context of global behavior in two different stages. In the first stage of topic modeling, unlike the existing approaches using either optical flows or complete trajectories, spatio-temporal correlations between the trajectory fragments in video clips were modeled by the latent Dirichlet allocation (LDA) topic model based on Markov random fields to obtain the spatial context of local behavior in each video clip. The local behavior topic categories were then obtained by exploiting the spectral clustering algorithm. Based on the construction of a dictionary through the process of local behavior topic clustering, the second phase of the LDA topic model learns the correlations of global behaviors and temporal context. In particular, an abnormal behavior recognition method was developed based on the learned spatio-temporal context of behaviors. The specific identification method adopts a top-down strategy and consists of two stages: anomaly recognition of video clip and anomalous behavior recognition within each video clip. Evaluation was performed using the validity of spatio-temporal context learning for local behavior topics and abnormal behavior recognition. Furthermore, the performance of the proposed approach in abnormal behavior recognition improved effectively and significantly in complex surveillance scenes.

A Semantic Aspect-Based Vector Space Model to Identify the Event Evolution Relationship within Topics

  • Xi, Yaoyi;Li, Bicheng;Liu, Yang
    • Journal of Computing Science and Engineering
    • /
    • v.9 no.2
    • /
    • pp.73-82
    • /
    • 2015
  • Understanding how the topic evolves is an important and challenging task. A topic usually consists of multiple related events, and the accurate identification of event evolution relationship plays an important role in topic evolution analysis. Existing research has used the traditional vector space model to represent the event, which cannot be used to accurately compute the semantic similarity between events. This has led to poor performance in identifying event evolution relationship. This paper suggests constructing a semantic aspect-based vector space model to represent the event: First, use hierarchical Dirichlet process to mine the semantic aspects. Then, construct a semantic aspect-based vector space model according to these aspects. Finally, represent each event as a point and measure the semantic relatedness between events in the space. According to our evaluation experiments, the performance of our proposed technique is promising and significantly outperforms the baseline methods.

Analyzing Customer Experience in Hotel Services Using Topic Modeling

  • Nguyen, Van-Ho;Ho, Thanh
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.586-598
    • /
    • 2021
  • Nowadays, users' reviews and feedback on e-commerce sites stored in text create a huge source of information for analyzing customers' experience with goods and services provided by a business. In other words, collecting and analyzing this information is necessary to better understand customer needs. In this study, we first collected a corpus with 99,322 customers' comments and opinions in English. From this corpus we chose the best number of topics (K) using Perplexity and Coherence Score measurements as the input parameters for the model. Finally, we conducted an experiment using the latent Dirichlet allocation (LDA) topic model with K coefficients to explore the topic. The model results found hidden topics and keyword sets with high probability that are interesting to users. The application of empirical results from the model will support decision-making to help businesses improve products and services as well as business management and development in the field of hotel services.

A Study on Technology Trend of Power Semiconductor Packaging using Topic model (토픽모델을 이용한 전력반도체 패키징 기술 동향 연구)

  • Park, Keunseo;Choi, Gyunghyun
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.27 no.2
    • /
    • pp.53-58
    • /
    • 2020
  • Analysis of electric semiconductor packaging technology for electric vehicles was performed. Topic modeling using LDA technique was performed by collecting valid patents by deriving valid patents. It was classified into 20 topics, and the definition of technology was defined through extracted words for each topic. In order to analyze the trend of each topic, the trend of power semiconductor packaging technology was analyzed by deriving hot and cold topics by topic through regression analysis on frequency by year. The package structure technology according to the withstand voltage, the input/output-related control technology and the heat dissipation technology were derived as the hot topic technology, and the inductance reduction technology was derived as the cold topic technology.

A Prestigious University Students' Perceptions of their Educational Attainment by a Topic model (토픽모델을 활용한 명문대 재학생의 학벌에 관한 인식 분석)

  • Young Son Jung;Seung-Yun Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.503-512
    • /
    • 2024
  • This study examines the essays of academic background, written by students from a university, which is classified into prestigious universities in Korean society. By Latent Dirichlet Allocation, 172 essays were analyzed to explore the students' perspectives of the academic fractionalism. The analysis identified five topics such as, functional aspects (Topic 1), double-edged nature (Topic 2), power communities (Topic 3), symbols of victory (Topic 4), and dysfunctional aspects (Topic 5). The most frequently appearing keywords are 'individual,' 'status,' and 'means' in Topic 1, 'definition,' 'school,' and 'meaning' in Topic 2, 'people,' 'origin,' and 'power' in Topic 3, 'university,' 'ability,' and 'effort' in Topic 4, and 'academic achievement,' 'South Korea,' and 'origin' in Topic 5. By exploring the topics, we found that students regarded class reproduction by education as important social issues and they showed little interest in other factors influencing academic fractionalism, such as race or ethnicity. these findings suggest that professars, who teach the impact of education on academic fractionalism, deal with the influence of diverse factors on academic fractionalism.

Identifying Topic-Specific Experts on Microblog

  • Yu, Yan;Mo, Lingfei;Wang, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2627-2647
    • /
    • 2016
  • With the rapid growth of microblog, expert identification on microblog has been playing a crucial role in many applications. While most previous expert identification studies only assess global authoritativeness of a user, there is no way to differentiate the authoritativeness in a particular aspect of topics. In this paper, we propose a novel model, which jointly models text and following relationship in the same generative process. Furthermore, we integrate a similarity-based weight scheme into the model to address the popular bias problem, and use followee topic distribution as prior information to make user's topic distribution more precisely. Our empirical study on two large real-world datasets shows that our proposed model produces significantly higher quality results than the prior arts.

Identifying Critical Factors for Successful Games by Applying Topic Modeling

  • Kwak, Mookyung;Park, Ji Su;Shon, Jin Gon
    • Journal of Information Processing Systems
    • /
    • v.18 no.1
    • /
    • pp.130-145
    • /
    • 2022
  • Games are widely used in many fields, but not all games are successful. Then what makes games successful? The question gave us the motivation of this paper, which is to identify critical factors for successful games with topic modeling technique. It is supposed that game reviews written by experts sit on abundant insights and topics of how games succeed. To excavate these insights and topics, latent Dirichlet allocation, a topic modeling analysis technique, was used. This statistical approach provided words that implicate topics behind them. Fifty topics were inferred based on these words, and these topics were categorized by stimulation-response-desiregoal (SRDG) model, which makes a streamlined flow of how players engage in video games. This approach can provide game designers with critical factors for successful games. Furthermore, from this research result, we are going to develop a model for immersive game experiences to explain why some games are more addictive than others and how successful gamification works.

PC-SAN: Pretraining-Based Contextual Self-Attention Model for Topic Essay Generation

  • Lin, Fuqiang;Ma, Xingkong;Chen, Yaofeng;Zhou, Jiajun;Liu, Bo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3168-3186
    • /
    • 2020
  • Automatic topic essay generation (TEG) is a controllable text generation task that aims to generate informative, diverse, and topic-consistent essays based on multiple topics. To make the generated essays of high quality, a reasonable method should consider both diversity and topic-consistency. Another essential issue is the intrinsic link of the topics, which contributes to making the essays closely surround the semantics of provided topics. However, it remains challenging for TEG to fill the semantic gap between source topic words and target output, and a more powerful model is needed to capture the semantics of given topics. To this end, we propose a pretraining-based contextual self-attention (PC-SAN) model that is built upon the seq2seq framework. For the encoder of our model, we employ a dynamic weight sum of layers from BERT to fully utilize the semantics of topics, which is of great help to fill the gap and improve the quality of the generated essays. In the decoding phase, we also transform the target-side contextual history information into the query layers to alleviate the lack of context in typical self-attention networks (SANs). Experimental results on large-scale paragraph-level Chinese corpora verify that our model is capable of generating diverse, topic-consistent text and essentially makes improvements as compare to strong baselines. Furthermore, extensive analysis validates the effectiveness of contextual embeddings from BERT and contextual history information in SANs.

Method of Extracting the Topic Sentence Considering Sentence Importance based on ELMo Embedding (ELMo 임베딩 기반 문장 중요도를 고려한 중심 문장 추출 방법)

  • Kim, Eun Hee;Lim, Myung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.39-46
    • /
    • 2021
  • This study is about a method of extracting a summary from a news article in consideration of the importance of each sentence constituting the article. We propose a method of calculating sentence importance by extracting the probabilities of topic sentence, similarity with article title and other sentences, and sentence position as characteristics that affect sentence importance. At this time, a hypothesis is established that the Topic Sentence will have a characteristic distinct from the general sentence, and a deep learning-based classification model is trained to obtain a topic sentence probability value for the input sentence. Also, using the pre-learned ELMo language model, the similarity between sentences is calculated based on the sentence vector value reflecting the context information and extracted as sentence characteristics. The topic sentence classification performance of the LSTM and BERT models was 93% accurate, 96.22% recall, and 89.5% precision, resulting in high analysis results. As a result of calculating the importance of each sentence by combining the extracted sentence characteristics, it was confirmed that the performance of extracting the topic sentence was improved by about 10% compared to the existing TextRank algorithm.