• Title/Summary/Keyword: statistical estimator

Search Result 797, Processing Time 0.02 seconds

Principal Components Logistic Regression based on Robust Estimation (로버스트추정에 바탕을 둔 주성분로지스틱회귀)

  • Kim, Bu-Yong;Kahng, Myung-Wook;Jang, Hea-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.531-539
    • /
    • 2009
  • Logistic regression is widely used as a datamining technique for the customer relationship management. The maximum likelihood estimator has highly inflated variance when multicollinearity exists among the regressors, and it is not robust against outliers. Thus we propose the robust principal components logistic regression to deal with both multicollinearity and outlier problem. A procedure is suggested for the selection of principal components, which is based on the condition index. When a condition index is larger than the cutoff value obtained from the model constructed on the basis of the conjoint analysis, the corresponding principal component is removed from the logistic model. In addition, we employ an algorithm for the robust estimation, which strives to dampen the effect of outliers by applying the appropriate weights and factors to the leverage points and vertical outliers identified by the V-mask type criterion. The Monte Carlo simulation results indicate that the proposed procedure yields higher rate of correct classification than the existing method.

A Parameter Estimation Method using Nonlinear Least Squares (비선형 최소제곱법을 이용한 모수추정 방법론)

  • Oh, Suna;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.431-440
    • /
    • 2013
  • We consider the problem of estimating the parameters of heavy tailed distributions. In general, maximum likelihood estimation(MLE) is the most preferred method of parameter estimation because it has good properties such as asymptotic consistency, normality and efficiency. However, MLE is not always the best solution because MLE is unstable or does not exist in some cases. This paper proposes another parameter estimation method, non-linear least squares(NLS) and compares its performance to MLE. The NLS estimator is achieved by minimizing sum of squared difference between empirical cumulative distribution function(CDF) and a theoretical distribution function. In this article, we compare the NLS method to MLE using simulated data from heavy tailed distributions. The NLS method is shown to perform better than MLE in Burr distribution when the sample size is small; in addition, it performs well in a Frechet distribution.

A Study for the Drivers of Movie Box-office Performance (영화흥행 영향요인 선택에 관한 연구)

  • Kim, Yon Hyong;Hong, Jeong Han
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.441-452
    • /
    • 2013
  • This study analyzed the relationship between key film and a box office record success factors based on movies released in the first quarter of 2013 in Korea. An over-fitting problem can happen if there are too many explanatory variables inserted to regression model; in addition, there is a risk that the estimator is instable when there is multi-collinearity among the explanatory variables. For this reason, optimal variable selection based on high explanatory variables in box-office performance is of importance. Among the numerous ways to select variables, LASSO estimation applied by a generalized linear model has the smallest prediction error that can efficiently and quickly find variables with the highest explanatory power to box-office performance in order.

Bayesian estimation for frequency using resampling methods (재표본 방법론을 활용한 베이지안 주파수 추정)

  • Pak, Ro Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.877-888
    • /
    • 2017
  • Spectral analysis is used to determine the frequency of time series data. We first determine the frequency of the series through the power spectrum or the periodogram and then calculate the period of a cycle that may exist in a time series. Estimating the frequency using a Bayesian technique has been developed and proven to be useful; however, the Bayesian estimator for the frequency cannot be analytically solved through mathematical equations and may be handled numerically or computationally. In this paper, we make an inference on the Bayesian frequency through both resampling a parameter by Markov chain Monte Carlo (MCMC) methods and resampling data by bootstrap methods for a time series. We take the Korean real estate price index as an example for Bayesian frequency estimation. We have found a difference in the periods between the sale price index and the long term rental price index, but the difference is not statistically significant.

Divide and conquer kernel quantile regression for massive dataset (대용량 자료의 분석을 위한 분할정복 커널 분위수 회귀모형)

  • Bang, Sungwan;Kim, Jaeoh
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.569-578
    • /
    • 2020
  • By estimating conditional quantile functions of the response, quantile regression (QR) can provide comprehensive information of the relationship between the response and the predictors. In addition, kernel quantile regression (KQR) estimates a nonlinear conditional quantile function in reproducing kernel Hilbert spaces generated by a positive definite kernel function. However, it is infeasible to use the KQR in analysing a massive data due to the limitations of computer primary memory. We propose a divide and conquer based KQR (DC-KQR) method to overcome such a limitation. The proposed DC-KQR divides the entire data into a few subsets, then applies the KQR onto each subsets and derives a final estimator by aggregating all results from subsets. Simulation studies are presented to demonstrate the satisfactory performance of the proposed method.

The Influence of Software Engineering Levels on Defect Removal Efficiency (소프트웨어공학수준이 결함제거효율성에 미치는 영향)

  • Lee, Jong Moo;Kim, Seung Kwon;Park, Ho In
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.9 no.4
    • /
    • pp.239-249
    • /
    • 2013
  • The role of software process is getting more important to make good quality softwares. One of the measures to improve the software process is Defect Removal Efficiency(DRE). DRE gives a measure of the development team ability to remove defects prior to release. It is calculated as a ratio of defects resolved to total number of defects found. Software Engineering Levels are usually decided by CMMI Model. The model is designed to help organizations improve their software product and service development, acquisition, and maintenance processes. The score of software engineering levels can be calculated by CMMI model. The levels are composed of the three groups(absent, average, and advanced). This study is to find if there is any difference among the three categories in term of the result of software engineering levels on DRE. We propose One way ANOVA to analyze influence of software engineering levels on DRE. Bootstrap method is also used to estimate the sampling distribution of the original sample because the data are not sampled randomly. The method is a statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample. The data were collected in 106 software development projects by the survey. The result of this study tells that there is some difference of DRE among the groups. The higher the software engineering level of a specific company becomes, the better its DRE gets, which means that the companies trying to improve software process can increase their good management performance.

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator

  • Park, Hyung Woo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.962-977
    • /
    • 2019
  • The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.

Bias adjusted estimation in a sample survey with linear response rate (응답률이 선형인 표본조사에서 편향 보정 추정)

  • Chung, Hee Young;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.631-642
    • /
    • 2019
  • Many methods have been developed to solve problems found in sample surveys involving a large number of item non-responses that cause inaccuracies in estimation. However, the non-response adjustment method used under the assumption of random non-response generates a bias in cases where the response rate is affected by the variable of interest. Chung and Shin (2017) and Min and Shin (2018) proposed a method to improve the accuracy of estimation by appropriately adjusting a bias generated when the response rate is a function of the variables of interest. In this study, we studied a case where the response rate function is linear and the error of the super population model follows normal distribution. We also examined the effect of the number of stratum population on bias adjustment. The performance of the proposed estimator was examined through simulation studies and confirmed through actual data analysis.

A study on estimating rifle ammunition RSR based on truncated Weibull model (우측중도절단된 와이블 분포를 이용한 소총 탄약 소요보급률 추정 연구)

  • Park, Jaeshin;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.129-138
    • /
    • 2019
  • Ammunition is an integral element of a weapon systems and in calculating fighting strength. The Korea Army utilizes the basic load (B/L) concept to supply ammunition smoothly. The required supply rate (RSR) is the basis of a B/L that is estimated from real combat data that includes a troop's mission and operation terrain. The current RSR is based on Korean War data and the sample mean has some problems in applications to modern combat. Therefore, this study used Korea Combat Training Center (KCTC) data that is similar to real combat to estimate rifle ammunition RSR. We used a quantile of truncated Weibull distribution to estimate rifle ammunition RSR considering that rifle ammunition consumption data in KCTC is truncated. As a result, we obtained a rifle ammunition RSR which covers most ammunition consumption by reflecting the individual consumption of rifle ammunition.

Comments on the regression coefficients (다중회귀에서 회귀계수 추정량의 특성)

  • Kahng, Myung-Wook
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.4
    • /
    • pp.589-597
    • /
    • 2021
  • In simple and multiple regression, there is a difference in the meaning of regression coefficients, and not only are the estimates of regression coefficients different, but they also have different signs. Understanding the relative contribution of explanatory variables in a regression model is an important part of regression analysis. In a standardized regression model, the regression coefficient can be interpreted as the change in the response variable with respect to the standard deviation when the explanatory variable increases by the standard deviation in a situation where the values of the explanatory variables other than the corresponding explanatory variable are fixed. However, the size of the standardized regression coefficient is not a proper measure of the relative importance of each explanatory variable. In this paper, the estimator of the regression coefficient in multiple regression is expressed as a function of the correlation coefficient and the coefficient of determination. Furthermore, it is considered in terms of the effect of an additional explanatory variable and additional increase in the coefficient of determination. We also explore the relationship between estimates of regression coefficients and correlation coefficients in various plots. These results are specifically applied when there are two explanatory variables.