• 제목/요약/키워드: missing data estimation method

검색결과 87건 처리시간 0.024초

K-NN과 최대 우도 추정법을 결합한 소프트웨어 프로젝트 수치 데이터용 결측값 대치법 (A Missing Data Imputation by Combining K Nearest Neighbor with Maximum Likelihood Estimation for Numerical Software Project Data)

  • 이동호;윤경아;배두환
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제36권4호
    • /
    • pp.273-282
    • /
    • 2009
  • 소프트웨어 프로젝트 데이터를 이용한 각종 분석 예측 모델 생성시 직면하는 문제 중 하나는 데이터에 포함된 결측값이며 이에 대한 효과적인 방안은 결측값 대치 법이다. 대표적인 결측값 대치법인 K 최근접 이웃 대치법은 대치과정에서 결측값을 포함하는 인스턴스의 관측정보를 활용하지 못한다는 단점이 있다. 본 연구에서는 이러한 단점을 극복하기 위해 K 최근접 이웃 대치법과 최대 우도 추정법을 결합한 새로운 소프트웨어 프로젝트 수치 데이터용 결측값 대치법을 제안한다. 또한 결측값 대치법의 정확도를 비교하기 위한 새로운 측도를 함께 제안한다.

Estimation in the exponential distribution under progressive Type I interval censoring with semi-missing data

  • Shin, Hyejung;Lee, Kwangho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권6호
    • /
    • pp.1271-1277
    • /
    • 2012
  • In this paper, we propose an estimation method of the parameter in an exponential distribution based on a progressive Type I interval censored sample with semi-missing observation. The maximum likelihood estimator (MLE) of the parameter in the exponential distribution cannot be obtained explicitly because the intervals are not equal in length under the progressive Type I interval censored sample with semi-missing data. To obtain the MLE of the parameter for the sampling scheme, we propose a method by which progressive Type I interval censored sample with semi-missing data is converted to the progressive Type II interval censored sample. Consequently, the estimation procedures in the progressive Type II interval censored sample can be applied and we obtain the MLE of the parameter and survival function. It will be shown that the obtained estimators have good performance in terms of the mean square error (MSE) and mean integrated square error (MISE).

EXTENSION OF FACTORING LIKELIHOOD APPROACH TO NON-MONOTONE MISSING DATA

  • Kim, Jae-Kwang
    • Journal of the Korean Statistical Society
    • /
    • 제33권4호
    • /
    • pp.401-410
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Rubin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood. Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method.

Variance estimation for distribution rate in stratified cluster sampling with missing values

  • Heo, Sunyeong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권2호
    • /
    • pp.443-449
    • /
    • 2017
  • Estimation of population proportion like the distribution rate of LED TV and the prevalence of a disease are often estimated based on survey sample data. Population proportion is generally considered as a special form of population mean. In complex sampling like stratified multistage sampling with unequal probability sampling, the denominator of mean may be random variable and it is estimated like ratio estimator. In this research, we examined the estimation of distribution rate based on stratified multistage sampling, and determined some numerical outcomes using stratified random sample data with about 25% of missing observations. In the data used for this research, the survey weight was determined by deterministic way. So, the weights are not random variable, and the population distribution rate and its variance estimator can be estimated like population mean estimation. When the weights are not random variable, if one estimates the variance of proportion estimator using ratio method, then the variances may be inflated. Therefore, in estimating variance for population proportion, we need to examine the structure of data and survey design before making any decision for estimation methods.

Compressive sensing-based two-dimensional scattering-center extraction for incomplete RCS data

  • Bae, Ji-Hoon;Kim, Kyung-Tae
    • ETRI Journal
    • /
    • 제42권6호
    • /
    • pp.815-826
    • /
    • 2020
  • We propose a two-dimensional (2D) scattering-center-extraction (SCE) method using sparse recovery based on the compressive-sensing theory, even with data missing from the received radar cross-section (RCS) dataset. First, using the proposed method, we generate a 2D grid via adaptive discretization that has a considerably smaller size than a fully sampled fine grid. Subsequently, the coarse estimation of 2D scattering centers is performed using both the method of iteratively reweighted least square and a general peak-finding algorithm. Finally, the fine estimation of 2D scattering centers is performed using the orthogonal matching pursuit (OMP) procedure from an adaptively sampled Fourier dictionary. The measured RCS data, as well as simulation data using the point-scatterer model, are used to evaluate the 2D SCE accuracy of the proposed method. The results indicate that the proposed method can achieve higher SCE accuracy for an incomplete RCS dataset with missing data than that achieved by the conventional OMP, basis pursuit, smoothed L0, and existing discrete spectral estimation techniques.

Partitioning likelihood method in the analysis of non-monotone missing data

  • Kim Jae-Kwang
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2004년도 학술발표논문집
    • /
    • pp.1-8
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Robin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method. A numerical example is also presented to illustrate the method.

  • PDF

결측치가 존재하는 유전형 자료에서의 연관불균형과 일배체형을 사용한 결측치 대치 방법 (A New Method for Imputation of Missing Genotype using Linkage Disequilibrium and Haplotype Information)

  • 박윤주;김영진;박정선;김규찬;고인송;정호열
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제32권2호
    • /
    • pp.99-107
    • /
    • 2005
  • 본 논문에서는 단일염기변이(SNP: Single Nucleotide Polymorphism)와 같은 유전형(Rcnotype)자료에서 결측치가 발생하였을 경우 유전형 자료의 특이성을 고려해 자료 원래의 정보손실을 최소화하는 대치법인 연관불균형 기반의 대치법(linkage disequilibrium- based imputation)과 일배체형 기반의 대치법(haplotype-based imputation)을 제시한다. 이러한 결측치 대치는 실험상에서 발생하는 결측치에 의한 중요한 정보의 손실을 최소화 한다는 점에서 필요한 방법이다. 일반적으로 그동안 생물학 자료의 결측치 대치는 대부분 주형질 대치법(major allele imputation)이 활용되어왔는데 유전형 자료에서의 이 방법의 사용은 사료의 특이성으로 인하여 결측치에 대한 높은 오차율(error rate)을 보임으로서 자료의 신뢰성을 떨어뜨릴 수 있다. 본 논문에서는 유전형 자료인 단일염기변이 자료의 시뮬레이션을 통하여 기존의 주형질 대치법과 논문에서 제안된 연관불균형 기반의 대치법과 일배체형 기반의 대치법을 비교하고 그 결과를 보여 준다.

공간 데이터와 시계열 데이터로부터 유도된 공분산행렬을 결합한 강수량 결측값 추정 모형 (Development of a Model Combining Covariance Matrices Derived from Spatial and Temporal Data to Estimate Missing Rainfall Data)

  • 성찬용
    • 한국환경과학회지
    • /
    • 제22권3호
    • /
    • pp.303-308
    • /
    • 2013
  • This paper proposed a new method for estimating missing values in time series rainfall data. The proposed method integrated the two most widely used estimation methods, general linear model(GLM) and ordinary kriging(OK), by taking a weighted average of covariance matrices derived from each of the two methods. The proposed method was cross-validated using daily rainfall data at thirteen rain gauges in the Hyeong-san River basin. The goodness-of-fit of the proposed method was higher than those of GLM and OK, which can be attributed to the weighting algorithm that was designed to minimize errors caused by violations of assumptions of the two existing methods. This result suggests that the proposed method is more accurate in missing values in time series rainfall data, especially in a region where the assumptions of existing methods are not met, i.e., rainfall varies by season and topography is heterogeneous.

On statistical Computing via EM Algorithm in Logistic Linear Models Involving Non-ignorable Missing data

  • Jun, Yu-Na;Qian, Guoqi;Park, Jeong-Soo
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.181-186
    • /
    • 2005
  • Many data sets obtained from surveys or medical trials often include missing observations. When these data sets are analyzed, it is general to use only complete cases. However, it is possible to have big biases or involve inefficiency. In this paper, we consider a method for estimating parameters in logistic linear models involving non-ignorable missing data mechanism. A binomial response and normal exploratory model for the missing data are used. We fit the model using the EM algorithm. The E-step is derived by Metropolis-hastings algorithm to generate a sample for missing data and Monte-carlo technique, and the M-step is by Newton-Raphson to maximize likelihood function. Asymptotic variances of the MLE's are derived and the standard error and estimates of parameters are compared.

  • PDF

경험적 베이지안 방법을 이용한 결측자료 연구 (Analysis of Missing Data Using an Empirical Bayesian Method)

  • 윤용화;최보승
    • 응용통계연구
    • /
    • 제27권6호
    • /
    • pp.1003-1016
    • /
    • 2014
  • 조사를 통하여 수집된 자료에 기반하여 분석을 수행하는데 있어서 결측값에 대한 적절한 대체 방법은 보다 정확한 결과를 얻기 위한 매우 중요한 절차이다. 본 연구에서는 모형에 기반하여 결측자료에 대한 대체방법과 모형 추정방법을 다루었다. 특히 최대우도추정 방법의 적용에서 발생할 수 있는 변방값 문제(bounday soluntion problem)를 해결하기 위하여 베이지안 방법을 적용하였다. 분석된 결과를 바탕으로 하여 예측을 수행한 후 결측체계에 따른 정확성 비교를 수행하여 결측체계에 따른 결측모형의 선택 문제를 다루었다. 예측의 정확도를 측정하기 위하여 Bautista 등 (2007)이 제안한 MWPE(modified within precinct error) 이용하여 비교를 수행 하였다. 본 연구에서 제시된 방법들은 2012년에 시행된 제 18대 대통령 선거 당일 시행된 출구조사의 자료를 적용하여 분석을 수행하였다. 분석 결과 임의결측체계의 가정에 따른 결과가 비임의체계 가정에 따른 결과보다 예측의 정확도가 더 높았다.