• Title/Summary/Keyword: EM 알고리즘

Search Result 236, Processing Time 0.024 seconds

Introduction to numba library in Python for efficient statistical computing (효율적인 통계 계산을 위한 파이썬 numba 라이브러리의 소개)

  • Cho, Younsang;Yu, Donghyeon;Son, Won;Park, Seoncheol
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.665-682
    • /
    • 2020
  • This paper introduces numba library in Python, which improves computational efficiency of the provided implemented code written by naive Python language by applying just-in-time (JIT) compilation. To apply just-in-time compilation, the numba only needs to use a decorator on a target Python function. We provide implementation examples with numba for the permutation test and the parameter estimation for Gaussian mixture distribution. We also numerically show the efficiency of numba by comparing the total computation times of the implementation using naive python and the implementation using numba for each application.

A comparison study for accuracy of exit poll based on nonresponse model (무응답모형에 기반한 출구조사의 예측 정확성 비교 연구)

  • Kwak, Jeongae;Choi, Boseung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.53-64
    • /
    • 2014
  • One of the major problems to forecast election, especially based on survey, is nonresponse. We may have different forecasting results depend on method of imputation. Handling nonresponse is more important in a survey about sensitive subject, such as presidential election. In this research, we consider a model based method of nonresponse imputation. A model based imputation method should be constructed based on assumption of nonresponse mechanism and may produce different results according to the nonresponse mechanism. An assumption of the nonresponse mechanism is very important precondition to forecast the accurate results. However, there is no exact way to verify assumption of the nonresponse mechanism. In this paper, we compared the accuracy of prediction and assumption of nonresponse mechanism based on the result of presidential election exit poll. We consider maximum likelihood estimation method based on EM algorithm to handle assumption of the model of nonresponse. We also consider modified within precinct error which Bautista (2007) proposed to compare the predict result.

A joint modeling of longitudinal zero-inflated count data and time to event data (경시적 영과잉 가산자료와 생존자료의 결합모형)

  • Kim, Donguk;Chun, Jihun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1459-1473
    • /
    • 2016
  • Both longitudinal data and survival data are collected simultaneously in longitudinal data which are observed throughout the passage of time. In this case, the effect of the independent variable becomes biased (provided that sole use of longitudinal data analysis does not consider the relation between both data used) if the missing that occurred in the longitudinal data is non-ignorable because it is caused by a correlation with the survival data. A joint model of longitudinal data and survival data was studied as a solution for such problem in order to obtain an unbiased result by considering the survival model for the cause of missing. In this paper, a joint model of the longitudinal zero-inflated count data and survival data is studied by replacing the longitudinal part with zero-inflated count data. A hurdle model and proportional hazards model were used for each longitudinal zero inflated count data and survival data; in addition, both sub-models were linked based on the assumption that the random effect of sub-models follow the multivariate normal distribution. We used the EM algorithm for the maximum likelihood estimator of parameters and estimated standard errors of parameters were calculated using the profile likelihood method. In simulation, we observed a better performance of the joint model in bias and coverage probability compared to the separate model.

A New Face Tracking and Recognition Method Adapted to the Environment (환경에 적응적인 얼굴 추적 및 인식 방법)

  • Ju, Myung-Ho;Kang, Hang-Bong
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.385-394
    • /
    • 2009
  • Face tracking and recognition are difficult problems because the face is a non-rigid object. The main reasons for the failure to track and recognize the faces are the changes of a face pose and environmental illumination. To solve these problems, we propose a nonlinear manifold framework for the face pose and the face illumination normalization processing. Specifically, to track and recognize a face on the video that has various pose variations, we approximate a face pose density to single Gaussian density by PCA(Principle Component Analysis) using images sampled from training video sequences and then construct the GMM(Gaussian Mixture Model) for each person. To solve the illumination problem for the face tracking and recognition, we decompose the face images into the reflectance and the illuminance using the SSR(Single Scale Retinex) model. To obtain the normalized reflectance, the reflectance is rescaled by histogram equalization on the defined range. We newly approximate the illuminance by the trained manifold since the illuminance has almost variations by illumination. By combining these two features into our manifold framework, we derived the efficient face tracking and recognition results on indoor and outdoor video. To improve the video based tracking results, we update the weights of each face pose density at each frame by the tracking result at the previous frame using EM algorithm. Our experimental results show that our method is more efficient than other methods.

The Analysis of the Number of Donations Based on a Mixture of Poisson Regression Model (포아송 분포의 혼합모형을 이용한 기부 횟수 자료 분석)

  • Kim In-Young;Park Su-Bum;Kim Byung-Soo;Park Tae-Kyu
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.1-12
    • /
    • 2006
  • The aim of this study is to analyse a survey data on the number of charitable donations using a mixture of two Poisson regression models. The survey was conducted in 2002 by Volunteer 21, an nonprofit organization, based on Koreans, who were older than 20. The mixture of two Poisson distributions is used to model the number of donations based on the empirical distribution of the data. The mixture of two Poisson distributions implies the whole population is subdivided into two groups, one with lesser number of donations and the other with larger number of donations. We fit the mixture of Poisson regression models on the number of donations to identify significant covariates. The expectation-maximization algorithm is employed to estimate the parameters. We computed 95% bootstrap confidence interval based on bias-corrected and accelerated method and used then for selecting significant explanatory variables. As a result, the income variable with four categories and the volunteering variable (1: experience of volunteering, 0: otherwise) turned out to be significant with the positive regression coefficients both in the lesser and the larger donation groups. However, the regression coefficients in the lesser donation group were larger than those in larger donation group.

A Short-Term Traffic Information Prediction Model Using Bayesian Network (베이지안 네트워크를 이용한 단기 교통정보 예측모델)

  • Yu, Young-Jung;Cho, Mi-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.4
    • /
    • pp.765-773
    • /
    • 2009
  • Currently Telematics traffic information services have been various because we can collect real-time traffic information through Intelligent Transport System. In this paper, we proposed and implemented a short-term traffic information prediction model for giving to guarantee the traffic information with high quality in the near future. A Short-term prediction model is for forecasting traffic flows of each segment in the near future. Our prediction model gives an average speed on the each segment from 5 minutes later to 60 minutes later. We designed a Bayesian network for each segment with some casual nodes which makes an impact to the road situation in the future and found out its joint probability density function on the supposition of GMM(Gaussian Mixture Model) using EM(Expectation Maximization) algorithm with training real-time traffic data. To validate the precision of our prediction model we had conducted various experiments with real-time traffic data and computed RMSE(Root Mean Square Error) between a real speed and its prediction speed. As the result, our model gave 4.5, 4.8, 5.2 as an average value of RMSE about 10, 30, 60 minutes later, respectively.

An estimation method for non-response model using Monte-Carlo expectation-maximization algorithm (Monte-Carlo expectation-maximaization 방법을 이용한 무응답 모형 추정방법)

  • Choi, Boseung;You, Hyeon Sang;Yoon, Yong Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.587-598
    • /
    • 2016
  • In predicting an outcome of election using a variety of methods ahead of the election, non-response is one of the major issues. Therefore, to address the non-response issue, a variety of methods of non-response imputation may be employed, but the result of forecasting tend to vary according to methods. In this study, in order to improve electoral forecasts, we studied a model based method of non-response imputation attempting to apply the Monte Carlo Expectation Maximization (MCEM) algorithm, introduced by Wei and Tanner (1990). The MCEM algorithm using maximum likelihood estimates (MLEs) is applied to solve the boundary solution problem under the non-ignorable non-response mechanism. We performed the simulation studies to compare estimation performance among MCEM, maximum likelihood estimation, and Bayesian estimation method. The results of simulation studies showed that MCEM method can be a reasonable candidate for non-response model estimation. We also applied MCEM method to the Korean presidential election exit poll data of 2012 and investigated prediction performance using modified within precinct error (MWPE) criterion (Bautista et al., 2007).

A Study on a Model Parameter Compensation Method for Noise-Robust Speech Recognition (잡음환경에서의 음성인식을 위한 모델 파라미터 변환 방식에 관한 연구)

  • Chang, Yuk-Hyeun;Chung, Yong-Joo;Park, Sung-Hyun;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.5
    • /
    • pp.112-121
    • /
    • 1997
  • In this paper, we study a model parameter compensation method for noise-robust speech recognition. We study model parameter compensation on a sentence by sentence and no other informations are used. Parallel model combination(PMC), well known as a model parameter compensation algorithm, is implemented and used for a reference of performance comparision. We also propose a modified PMC method which tunes model parameter with an association factor that controls average variability of gaussian mixtures and variability of single gaussian mixture per state for more robust modeling. We obtain a re-estimation solution of environmental variables based on the expectation-maximization(EM) algorithm in the cepstral domain. To evaluate the performance of the model compensation methods, we perform experiments on speaker-independent isolated word recognition. Noise sources used are white gaussian and driving car noise. To get corrupted speech we added noise to clean speech at various signal-to-noise ratio(SNR). We use noise mean and variance modeled by 3 frame noise data. Experimental result of the VTS approach is superior to other methods. The scheme of the zero order VTS approach is similar to the modified PMC method in adapting mean vector only. But, the recognition rate of the Zero order VTS approach is higher than PMC and modified PMC method based on log-normal approximation.

  • PDF

Surficial Sediment Classification using Backscattered Amplitude Imagery of Multibeam Echo Sounder(300 kHz) (다중빔 음향 탐사시스템(300 kHz)의 후방산란 자료를 이용한 해저면 퇴적상 분류에 관한 연구)

  • Park, Yo-Sup;Lee, Sin-Je;Seo, Won-Jin;Gong, Gee-Soo;Han, Hyuk-Soo;Park, Soo-Chul
    • Economic and Environmental Geology
    • /
    • v.41 no.6
    • /
    • pp.747-761
    • /
    • 2008
  • In order to experiment the acoustic remote classification of seabed sediment, we achieved ground-truth data(i.e. video and grab samples, etc.) and developed post-processing for automatic classification procedure on the basis of 300 kHz MultiBeam Echo Sounder(MBES) backscattering data, which was acquired using KONGBERG Simrad EM3000 at Sock-Cho Port, East Sea of South Korea. Sonar signal and its classification performance were identified with geo-referenced video imagery with the aid of GIS (Geographic Information System). The depth range of research site was from 5 m to 22.7 m, and the backscattering amplitude showed from -36dB to -15dB. The mean grain sizes of sediment from equi-distanced sampling site(50 m interval) varied from 2.86$(\phi)$ to 0.88(\phi). To acquire the main feature for the seabed classification from backscattering amplitude of MBES, we evaluated the correlation factors between the backscattering amplitude and properties of sediment samples. The performance of seabed remote classification proposed was evaluated with comparing the correlation of human expert segmentation to automatic algorithm results. The cross-model perception error ratio on automatic classification algorithm shows 8.95% at rocky bottoms, and 2.06% at the area representing low mean grain size.

Analysis Method for Full-length LiDAR Waveforms (라이다 파장 분석 방법론에 대한 연구)

  • Jung, Myung-Hee;Yun, Eui-Jung;Kim, Cheon-Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.4 s.316
    • /
    • pp.28-35
    • /
    • 2007
  • Airbone laser altimeters have been utilized for 3D topographic mapping of the earth, moon, and planets with high resolution and accuracy, which is a rapidly growing remote sensing technique that measures the round-trip time emitted laser pulse to determine the topography. The traveling time from the laser scanner to the Earth's surface and back is directly related to the distance of the sensor to the ground. When there are several objects within the travel path of the laser pulse, the reflected laser pluses are distorted by surface variation within the footprint, generating multiple echoes because each target transforms the emitted pulse. The shapes of the received waveforms also contain important information about surface roughness, slope and reflectivity. Waveform processing algorithms parameterize and model the return signal resulting from the interaction of the transmitted laser pulse with the surface. Each of the multiple targets within the footprint can be identified. Assuming each response is gaussian, returns are modeled as a mixture gaussian distribution. Then, the parameters of the model are estimated by LMS Method or EM algorithm However, each response actually shows the skewness in the right side with the slowly decaying tail. For the application to require more accurate analysis, the tail information is to be quantified by an approach to decompose the tail. One method to handle with this problem is proposed in this study.