• Title/Summary/Keyword: dirichlet distribution

Search Result 75, Processing Time 0.024 seconds

A Bayes Sequential Selection of the Least Probale Event

  • Hwang, Hyung-Tae;Kim, Woo-Chul
    • Journal of the Korean Statistical Society
    • /
    • v.11 no.1
    • /
    • pp.25-35
    • /
    • 1982
  • A problem of selecting the least probable cell in a multinomial distribution is studied in a Bayesian framework. We consider two loss components the cost of sampling and the difference in cell probabilities between the selected and the least probable cells. A Bayes sequential selection rule is derived with respect to a Dirichlet prior, and it is compared with the best fixed sample size selection rule. The continuation sets with respect to the vague prior are tabulated for certain cases.

  • PDF

Bayesian Inference for Multinomial Group Testing

  • Heo, Tae-Young;Kim, Jong-Min
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.1
    • /
    • pp.81-92
    • /
    • 2007
  • This paper consider trinomial group testing concerned with classification of N given units into one of k disjoint categories. In this paper, we propose Bayesian inference for estimating individual category proportions using the trinomial group testing model proposed by Bar-Lev et al. (2005). We compared a relative efficience (RE) based on the mean squared error (MSE) of MLE and Bayes estimators with various prior information. The impact of different prior specifications on the estimates is also investigated using selected prior distribution. The impact of different priors on the Bayes estimates is modest when the sample size and group size we large.

Topic Modeling and Sentiment Analysis of Twitter Discussions on COVID-19 from Spatial and Temporal Perspectives

  • AlAgha, Iyad
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.1
    • /
    • pp.35-53
    • /
    • 2021
  • The study reported in this paper aimed to evaluate the topics and opinions of COVID-19 discussion found on Twitter. It performed topic modeling and sentiment analysis of tweets posted during the COVID-19 outbreak, and compared these results over space and time. In addition, by covering a more recent and a longer period of the pandemic timeline, several patterns not previously reported in the literature were revealed. Author-pooled Latent Dirichlet Allocation (LDA) was used to generate twenty topics that discuss different aspects related to the pandemic. Time-series analysis of the distribution of tweets over topics was performed to explore how the discussion on each topic changed over time, and the potential reasons behind the change. In addition, spatial analysis of topics was performed by comparing the percentage of tweets in each topic among top tweeting countries. Afterward, sentiment analysis of tweets was performed at both temporal and spatial levels. Our intention was to analyze how the sentiment differs between countries and in response to certain events. The performance of the topic model was assessed by being compared with other alternative topic modeling techniques. The topic coherence was measured for the different techniques while changing the number of topics. Results showed that the pooling by author before performing LDA significantly improved the produced topic models.

Estimating dose-response curves using splines: a nonparametric Bayesian knot selection method

  • Lee, Jiwon;Kim, Yongku;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.3
    • /
    • pp.287-299
    • /
    • 2022
  • In radiation epidemiology, the excess relative risk (ERR) model is used to determine the dose-response relationship. In general, the dose-response relationship for the ERR model is assumed to be linear, linear-quadratic, linear-threshold, quadratic, and so on. However, since none of these functions dominate other functions for expressing the dose-response relationship, a Bayesian semiparametric method using splines has recently been proposed. Thus, we improve the Bayesian semiparametric method for the selection of the tuning parameters for splines as the number and location of knots using a Bayesian knot selection method. Equally spaced knots cannot capture the characteristic of radiation exposed dose distribution which is highly skewed in general. Therefore, we propose a nonparametric Bayesian knot selection method based on a Dirichlet process mixture model. Inference of the spline coefficients after obtaining the number and location of knots is performed in the Bayesian framework. We apply this approach to the life span study cohort data from the radiation effects research foundation in Japan, and the results illustrate that the proposed method provides competitive curve estimates for the dose-response curve and relatively stable credible intervals for the curve.

Classification and Allocation method of e-mail using possibility distribution and prediction (확률 분포와 추론에 의한 이메일 분류 및 정리 방법)

  • Go, Nam-Hyeon;Kim, Ji-Yun;Choi, Man-Kyu
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.07a
    • /
    • pp.95-96
    • /
    • 2016
  • 본 논문에서는 디리클레 분포와 베이즈 추론 모델을 활용하여 전자우편을 분류하고 정리하는 방법을 제안한다. 과거 원치 않는 광고성 이메일인 스팸 탐지에서 시작한 전자우편 분류는 지속적인 송수신 량의 증가와 내용의 다양화로 인해 광고성과 정보성의 판단 기준이 모호해진 상태이다. 스팸 탐지와 같은 이분법적 분류 방식이 아닌 내용의 주제 별로 자동 분류할 수 있는 방법이 필요하다. 본 논문에서 다루는 제안 기법은 전자우편의 내용에서 다뤄질 수 있는 주제의 종류를 예측하기 위한 방법을 제공한다. 발신하거나 수신된 전자우편이 속한 주제를 자동으로 정할 수 있다. 본 제안 기법의 활용을 통해 전자우편의 분류만이 아닌 업무 및 시장 동향 분석과 정보보안 분야에서는 악성코드 분류에 사용될 수 있을 것으로 기대된다.

  • PDF

Semantic Dependency Link Topic Model for Biomedical Acronym Disambiguation (의미적 의존 링크 토픽 모델을 이용한 생물학 약어 중의성 해소)

  • Kim, Seonho;Yoon, Juntae;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.652-665
    • /
    • 2014
  • Many important terminologies in biomedical text are expressed as abbreviations or acronyms. We newly suggest a semantic link topic model based on the concepts of topic and dependency link to disambiguate biomedical abbreviations and cluster long form variants of abbreviations which refer to the same senses. This model is a generative model inspired by the latent Dirichlet allocation (LDA) topic model, in which each document is viewed as a mixture of topics, with each topic characterized by a distribution over words. Thus, words of a document are generated from a hidden topic structure of a document and the topic structure is inferred from observable word sequences of document collections. In this study, we allow two distinct word generation to incorporate semantic dependencies between words, particularly between expansions (long forms) of abbreviations and their sentential co-occurring words. Besides topic information, the semantic dependency between words is defined as a link and a new random parameter for the link presence is assigned to each word. As a result, the most probable expansions with respect to abbreviations of a given abstract are decided by word-topic distribution, document-topic distribution, and word-link distribution estimated from document collection though the semantic dependency link topic model. The abstracts retrieved from the MEDLINE Entrez interface by the query relating 22 abbreviations and their 186 expansions were used as a data set. The link topic model correctly predicted expansions of abbreviations with the accuracy of 98.30%.

Topic Extraction and Classification Method Based on Comment Sets

  • Tan, Xiaodong
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.329-342
    • /
    • 2020
  • In recent years, emotional text classification is one of the essential research contents in the field of natural language processing. It has been widely used in the sentiment analysis of commodities like hotels, and other commentary corpus. This paper proposes an improved W-LDA (weighted latent Dirichlet allocation) topic model to improve the shortcomings of traditional LDA topic models. In the process of the topic of word sampling and its word distribution expectation calculation of the Gibbs of the W-LDA topic model. An average weighted value is adopted to avoid topic-related words from being submerged by high-frequency words, to improve the distinction of the topic. It further integrates the highest classification of the algorithm of support vector machine based on the extracted high-quality document-topic distribution and topic-word vectors. Finally, an efficient integration method is constructed for the analysis and extraction of emotional words, topic distribution calculations, and sentiment classification. Through tests on real teaching evaluation data and test set of public comment set, the results show that the method proposed in the paper has distinct advantages compared with other two typical algorithms in terms of subject differentiation, classification precision, and F1-measure.

Online Shopping Research Trend Analysis Using BERTopic and LDA

  • Yoon-Hwang, JU;Woo-Ryeong, YANG;Hoe-Chang, YANG
    • The Journal of Economics, Marketing and Management
    • /
    • v.11 no.1
    • /
    • pp.21-30
    • /
    • 2023
  • Purpose: As one of the ongoing studies on the distribution industry, the purpose of this study is to identify the research trends on online shopping so far to propose not only the development of online shopping companies but also the possibility of coexistence between online and offline retailers and the development of the distribution industry. Research design, data and methodology: In this study, the English abstracts of 645 papers on online shopping registered in scienceON were obtained. For the analysis through BERTopic and LDA using Python 3.7 and identifying which topics were interesting to researchers. Results: As a result of word frequency analysis and co-occurrence analysis, it was found that studies related to online shopping were frequently conducted on factors such as products, services, and shopping malls. As a result of BERTopic, five topics such as 'service quality' and 'sales strategy' were derived, and as a result of LDA, three topics including 'purchase experience' were derived. It was confirmed that 'Customer Recommendation' and 'Fashion Mall' showed relatively high interest, and 'Sales Strategy' showed relatively low interest. Conclusions: It was suggested that more diverse studies related to the online shopping mall platform, sales content, and usage influencing factors are needed to develop the online shopping industry.

Analytical and Numerical Model Study to Predict the Temperature Distribution Around an Underground Food Cold Storage Pilot Cavern (냉동저장 공동 주변의 온도분포 예측을 위한 해석해 및 수치모델 적용에 관한 연구)

  • 이대혁;김호영
    • Tunnel and Underground Space
    • /
    • v.12 no.3
    • /
    • pp.142-151
    • /
    • 2002
  • Claesson(2001)'s analytical solution, and two numerical models with Dirichlet and Neuman interior boundary condition respectively were investigated to estimate the transient temperature distribution with distances from the Taejon underground food cold storage pilot cavern. Claesson's solution, which is based on constant temperature boundary condition at the rock wall during a temperature decline step, showed relatively good agreement with temperature measurements in the rock mass in order of average error difference, 0.89$\^{C}$ without any adjustments on laboratory thermal properties to represent the rock mass. For the numerical model with heat flux through the rock wall, a boundary condition setting technique was newly proposed to overcome the difficulty of prescribing variable convective heat tranfer coefficient and far-field air temperature inside the cavern as they may be certainly changed according to the cooling-down time. The results showed also good agreement with measurements in order of average error difference, 1.58$\^{C}$, and were compared to those of the numerical model with fixed temperature at the rock wall. Finally, the most proper procedure to precisely predict the temperature profile around a cavern was proposed as a series of analysis steps including an analytical exact solution and numerical models.

Detection of Abnormal Behavior by Scene Analysis in Surveillance Video (감시 영상에서의 장면 분석을 통한 이상행위 검출)

  • Bae, Gun-Tae;Uh, Young-Jung;Kwak, Soo-Yeong;Byun, Hye-Ran
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.12C
    • /
    • pp.744-752
    • /
    • 2011
  • In intelligent surveillance system, various methods for detecting abnormal behavior were proposed recently. However, most researches are not robust enough to be utilized for actual reality which often has occlusions because of assumption the researches have that individual objects can be tracked. This paper presents a novel method to detect abnormal behavior by analysing major motion of the scene for complex environment in which object tracking cannot work. First, we generate Visual Word and Visual Document from motion information extracted from input video and process them through LDA(Latent Dirichlet Allocation) algorithm which is one of document analysis technique to obtain major motion information(location, magnitude, direction, distribution) of the scene. Using acquired information, we compare similarity between motion appeared in input video and analysed major motion in order to detect motions which does not match to major motions as abnormal behavior.