• Title/Summary/Keyword: Dirichlet distribution

Search Result 75, Processing Time 0.021 seconds

Learning Probabilistic Kernel from Latent Dirichlet Allocation

  • Lv, Qi;Pang, Lin;Li, Xiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2527-2545
    • /
    • 2016
  • Measuring the similarity of given samples is a key problem of recognition, clustering, retrieval and related applications. A number of works, e.g. kernel method and metric learning, have been contributed to this problem. The challenge of similarity learning is to find a similarity robust to intra-class variance and simultaneously selective to inter-class characteristic. We observed that, the similarity measure can be improved if the data distribution and hidden semantic information are exploited in a more sophisticated way. In this paper, we propose a similarity learning approach for retrieval and recognition. The approach, termed as LDA-FEK, derives free energy kernel (FEK) from Latent Dirichlet Allocation (LDA). First, it trains LDA and constructs kernel using the parameters and variables of the trained model. Then, the unknown kernel parameters are learned by a discriminative learning approach. The main contributions of the proposed method are twofold: (1) the method is computationally efficient and scalable since the parameters in kernel are determined in a staged way; (2) the method exploits data distribution and semantic level hidden information by means of LDA. To evaluate the performance of LDA-FEK, we apply it for image retrieval over two data sets and for text categorization on four popular data sets. The results show the competitive performance of our method.

Topics and Sentiment Analysis Based on Reviews of Omni-Channel Retailing

  • KIM, Soon-Hong;YOO, Byong-Kook
    • Journal of Distribution Science
    • /
    • v.19 no.4
    • /
    • pp.25-35
    • /
    • 2021
  • Purpose: This study aims to analyze the factors affecting customer satisfaction in the customer reviews of omni-channel, posted on Internet blogs, cafes, and YouTube using text mining analysis. Research, data, and Methodology: In this study, frequency analysis is performed and the LDA (Latent Dirichlet Allocation) is used to analyze social big data to respond to reviewers' reaction to the recently opened omni-channel shopping reviews by L Shopping Company. Additionally, based on the topic analysis, we conduct a sentiment analysis on purchase reviews and analyze the characteristics of each topic on the positive or negative sentiments of omni-channel app users. Results: As a result of a topic analysis, four main topics are derived: delivery and events, economic value, recommendations and convenience, and product quality and brand awareness. The emotional analysis reveals that the reviewers have many positive evaluations for price policy and product promotion, but negative evaluations for app use, delivery, and product quality. Conclusions: Retailers can establish customized marketing strategies by identifying the customer's major interests through text mining analysis. Additionally, the analysis of sentiment by subject becomes an important indicator for developing products and services that customers want by identifying areas that satisfy customers and areas that evoke negative reactions.

Semiparametric Bayesian Hierarchical Selection Models with Skewed Elliptical Distribution (왜도 타원형 분포를 이용한 준모수적 계층적 선택 모형)

  • 정윤식;장정훈
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.101-115
    • /
    • 2003
  • Lately there has been much theoretical and applied interest in linear models with non-normal heavy tailed error distributions. Starting Zellner(1976)'s study, many authors have explored the consequences of non-normality and heavy-tailed error distributions. We consider hierarchical models including selection models under a skewed heavy-tailed e..o. distribution proposed originally by Chen, Dey and Shao(1999) and Branco and Dey(2001) with Dirichlet process prior(Ferguson, 1973) in order to use a meta-analysis. A general calss of skewed elliptical distribution is reviewed and developed. Also, we consider the detail computational scheme under skew normal and skew t distribution using MCMC method. Finally, we introduce one example from Johnson(1993)'s real data and apply our proposed methodology.

Stochastic Time Duration Model with Gamma-Dirichlet Distribution for Global and Local Duration of HMM (Gamma-Dirichlet 분포에 의한 HMM의 전역 및 지역 시간지속 모델)

  • Sin, Bong-Kee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.517-521
    • /
    • 2008
  • HMM의 약점인 상태 지속 분포를 개선하는 새로운 개념의 확률적 전역+지역 시간 지속 분포 segment 모델(GL-STDM)을 제안한다. 즉, 시계열 신호의 전역적 시간 정보를 표현하고, 각 상태 별 duration 모델과 각 상태의 duration 정보 사이의 상관관계를 표현하는 global pattern (shape 또는 long-term dependency)을 제안한다. 그러나 제안 모델은, Markov 가정을 깨뜨리기 때문에 dynamic programming이 자랑하는 단순함, 효율성을 유지하지는 못한다. 하지만 최근 부각되는 방법인 Monte Carlo 표본 기법을 이용하여 효과적으로 문제를 해결하는 방법을 제시하였다. 본 논문에서는 제안 모델 GL-STDM의 개념과 정의, 그리고 추론 방법과 모델 평가 방법을 기술하였다.

  • PDF

Automatic TV Program Recommendation using LDA based Latent Topic Inference (LDA 기반 은닉 토픽 추론을 이용한 TV 프로그램 자동 추천)

  • Kim, Eun-Hui;Pyo, Shin-Jee;Kim, Mun-Churl
    • Journal of Broadcast Engineering
    • /
    • v.17 no.2
    • /
    • pp.270-283
    • /
    • 2012
  • With the advent of multi-channel TV, IPTV and smart TV services, excessive amounts of TV program contents become available at users' sides, which makes it very difficult for TV viewers to easily find and consume their preferred TV programs. Therefore, the service of automatic TV recommendation is an important issue for TV users for future intelligent TV services, which allows to improve access to their preferred TV contents. In this paper, we present a recommendation model based on statistical machine learning using a collaborative filtering concept by taking in account both public and personal preferences on TV program contents. For this, users' preference on TV programs is modeled as a latent topic variable using LDA (Latent Dirichlet Allocation) which is recently applied in various application domains. To apply LDA for TV recommendation appropriately, TV viewers's interested topics is regarded as latent topics in LDA, and asymmetric Dirichlet distribution is applied on the LDA which can reveal the diversity of the TV viewers' interests on topics based on the analysis of the real TV usage history data. The experimental results show that the proposed LDA based TV recommendation method yields average 66.5% with top 5 ranked TV programs in weekly recommendation, average 77.9% precision in bimonthly recommendation with top 5 ranked TV programs for the TV usage history data of similar taste user groups.

Noise reduction algorithm for an image using nonparametric Bayesian method (비모수 베이지안 방법을 이용한 영상 잡음 제거 알고리즘)

  • Woo, Ho-young;Kim, Yeong-hwa
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.555-572
    • /
    • 2018
  • Noise reduction processes that reduce or eliminate noise (caused by a variety of reasons) in noise contaminated image is an important theme in image processing fields. Many studies are being conducted on noise removal processes due to the importance of distinguishing between noise added to a pure image and the unique characteristics of original images. Adaptive filter and sigma filter are typical noise reduction filters used to reduce or eliminate noise; however, their effectiveness is affected by accurate noise estimation. This study generates a distribution of noise contaminating image based on a Dirichlet normal mixture model and presents a Bayesian approach to distinguish the characteristics of an image against the noise. In particular, to distinguish the distribution of noise from the distribution of characteristics, we suggest algorithms to develop a Bayesian inference and remove noise included in an image.

Feature Expansion based on LDA Word Distribution for Performance Improvement of Informal Document Classification (비격식 문서 분류 성능 개선을 위한 LDA 단어 분포 기반의 자질 확장)

  • Lee, Hokyung;Yang, Seon;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1008-1014
    • /
    • 2016
  • Data such as Twitter, Facebook, and customer reviews belong to the informal document group, whereas, newspapers that have grammar correction step belong to the formal document group. Finding consistent rules or patterns in informal documents is difficult, as compared to formal documents. Hence, there is a need for additional approaches to improve informal document analysis. In this study, we classified Twitter data, a representative informal document, into ten categories. To improve performance, we revised and expanded features based on LDA(Latent Dirichlet allocation) word distribution. Using LDA top-ranked words, the other words were separated or bundled, and the feature set was thus expanded repeatedly. Finally, we conducted document classification with the expanded features. Experimental results indicated that the proposed method improved the micro-averaged F1-score of 7.11%p, as compared to the results before the feature expansion step.

Development of Simulation Method of Doppler Power Spectrum and Raw Time Series Signal Using Average Moments of Radar Wind Profiler (윈드프로파일러의 평균모멘트 값을 이용한 도플러 파워 스펙트럼 및 시계열 원시신호 시뮬레이션기법 개발)

  • Lee, Sang-Yun;Lee, Gyu-Won
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.6
    • /
    • pp.1037-1044
    • /
    • 2020
  • Since radar wind profiler (RWP) provides wind field data with high time and space resolution in all weather conditions, their verification of the accuracy and quality is essential. The simultaneous wind measurement from rawinsonde is commonly used to evaluate wind vectors from RWP. In this study, the simulation algorithm which produces the spectrum and raw time series (I/Q) data from the average values of moments is presented as a step-by-step verification method for the signal processing algorithm. The possibility of the simulation algorithm was also confirmed through comparison with the raw data of LAP-3000. The Doppler power spectrum was generated by assuming the density function of the skew-normal distribution and by using the moment values as the parameter. The simulated spectrum was generated through random numbers. In addition, the coherent averaged I/Q data was generated by random phase and inverse discrete Fourier transform, and raw I/Q data was generated through the Dirichlet distribution.

A Bayes Reliability Estimation from Life Test in a Stress-Strength Model

  • Park, Sung-Sub;Kim, Jae-Joo
    • Journal of the Korean Statistical Society
    • /
    • v.12 no.1
    • /
    • pp.1-9
    • /
    • 1983
  • A stress-strength model is formulated for s out of k system of identical components. We consider the estimation of system reliability from survival count data from a Bayesian viewpoint. We assume a quadratic loss and a Dirichlet prior distribution. It is shown that a Bayes sequential procedure can be established. The Bayes estimator is compared with the UMVUE obtained by Bhattacharyya and with an estimator based on Mann-Whitney statistic.

  • PDF

A Penalized Likelihood Method for Model Complexity

  • Ahn, Sung M.
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.1
    • /
    • pp.173-184
    • /
    • 2001
  • We present an algorithm for the complexity reduction of a general Gaussian mixture model by using a penalized likelihood method. One of our important assumptions is that we begin with an overfitted model in terms of the number of components. So our main goal is to eliminate redundant components in the overfitted model. As shown in the section of simulation results, the algorithm works well with the selected densities.

  • PDF