• 제목/요약/키워드: statistical regression modeling

검색결과 192건 처리시간 0.031초

데이터 마이닝 기법의 기업도산예측 실증분석 (A Study of Data Mining Techniques in Bankruptcy Prediction)

  • Lee, Kidong
    • 한국경영과학회지
    • /
    • 제28권2호
    • /
    • pp.105-127
    • /
    • 2003
  • In this paper, four different data mining techniques, two neural networks and two statistical modeling techniques, are compared in terms of prediction accuracy in the context of bankruptcy prediction. In business setting, how to accurately detect the condition of a firm has been an important event in the literature. In neural networks, Backpropagation (BP) network and the Kohonen self-organizing feature map, are selected and compared each other while in statistical modeling techniques, discriminant analysis and logistic regression are also performed to provide performance benchmarks for the neural network experiment. The findings suggest that the BP network is a better choice among the data mining tools compared. This paper also identified some distinctive characteristics of Kohonen self-organizing feature map.

Bayesian Conway-Maxwell-Poisson (CMP) regression for longitudinal count data

  • Morshed Alam ;Yeongjin Gwon ;Jane Meza
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.291-309
    • /
    • 2023
  • Longitudinal count data has been widely collected in biomedical research, public health, and clinical trials. These repeated measurements over time on the same subjects need to account for an appropriate dependency. The Poisson regression model is the first choice to model the expected count of interest, however, this may not be an appropriate when data exhibit over-dispersion or under-dispersion. Recently, Conway-Maxwell-Poisson (CMP) distribution is popularly used as the distribution offers a flexibility to capture a wide range of dispersion in the data. In this article, we propose a Bayesian CMP regression model to accommodate over and under-dispersion in modeling longitudinal count data. Specifically, we develop a regression model with random intercept and slope to capture subject heterogeneity and estimate covariate effects to be different across subjects. We implement a Bayesian computation via Hamiltonian MCMC (HMCMC) algorithm for posterior sampling. We then compute Bayesian model assessment measures for model comparison. Simulation studies are conducted to assess the accuracy and effectiveness of our methodology. The usefulness of the proposed methodology is demonstrated by a well-known example of epilepsy data.

다축-다변량회귀분석 기법을 이용한 회분식 공정의 이상감지 및 통계적 제어 방법 (Fault Detection & SPC of Batch Process using Multi-way Regression Method)

  • 우경섭;이창준;한경훈;고재욱;윤인섭
    • Korean Chemical Engineering Research
    • /
    • 제45권1호
    • /
    • pp.32-38
    • /
    • 2007
  • 통계적인 공정 제어 기법을 회분식 공정에 적용하여, 일반적인 회분식 공정의 데이터를 통해 보다 빠르고, 손쉽게 공정의 상태를 진단할 수 있는 시스템을 구현해 보았다. 대표적인 회분식 공정의 하나인 반도체 식각공정과 반회분식 스타이렌-부타디엔 고무 생산 공정의 데이터를 이용하여 공정 변수와 공정의 상태간의 연관 관계를 규명할 수 있는 모델을 수립하였으며, 이 모델의 출력(output) 결과를 이용해 통계적 공정 제어 차트를 구성하고, 시간에 따른 공정의 추이를 분석해 이상을 판별해 보았다. 회분식 공정의 다축(multi-way) 데이터를 두개의 축으로 만드는 펼치기(unfolding) 과정을 거쳤으며, 모델링 방법으로는 Support Vector Regression 및 Partial Least Square 등의 다변량 회귀분석 방법을 이용하였다. 또한 에러차트 및 변수 기여도 차트(variable contribution chart)를 이용해 이상의 세기, 형태 및 이상 데이터에 대한 각 변수들의 기여도를 계산해 보았으며, 그 결과 이상의 발생 유무 및 발생시점 뿐만아니라 이상의 세기 및 원인 까지 진단해 볼 수 있는 우수한 성능을 보이는 것을 확인할 수 있었다.

Piecewise Regression Model for Solenoid Embedded Inductors Based on the Quasi-newton Method

  • Ko, Young-Don;Kim, Kil-Han;Yun, Il-Gu;Lee, Kyu-Bok;Kim, Jong-Kyu
    • Transactions on Electrical and Electronic Materials
    • /
    • 제6권6호
    • /
    • pp.256-261
    • /
    • 2005
  • This paper presents that the modeling to predict the characteristics with respect to the performance of solenoid embedded inductors manufactured by LTCC process via the nonlinear regression model based on the quasi-Newton method. In order to reduce the runs, the design of experiments (DOE) was used to generate the design space. The nonlinear process models were constructed by the piecewise regression model based on the quasi-Newton method for estimating the model coefficient with the break point on the statistical confidence intervals. Those models were verified by the model accuracy checking based on the assumption statistically.

Semiparametric and Nonparametric Modeling for Matched Studies

  • Kim, In-Young;Cohen, Noah
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 추계 학술발표회 논문집
    • /
    • pp.179-182
    • /
    • 2003
  • This study describes a new graphical method for assessing and characterizing effect modification by a matching covariate in matched case-control studies. This method to understand effect modification is based on a semiparametric model using a varying coefficient model. The method allows for nonparametric relationships between effect modification and other covariates, or can be useful in suggesting parametric models. This method can be applied to examining effect modification by any ordered categorical or continuous covariates for which cases have been matched with controls. The method applies to effect modification when causality might be reasonably assumed. An example from veterinary medicine is used to demonstrate our approach. The simulation results show that this method, when based on linear, quadratic and nonparametric effect modification, can be more powerful than both a parametric multiplicative model fit and a fully nonparametric generalized additive model fit.

  • PDF

A Plasma-Etching Process Modeling Via a Polynomial Neural Network

  • Kim, Dong-Won;Kim, Byung-Whan;Park, Gwi-Tae
    • ETRI Journal
    • /
    • 제26권4호
    • /
    • pp.297-306
    • /
    • 2004
  • A plasma is a collection of charged particles and on average is electrically neutral. In fabricating integrated circuits, plasma etching is a key means to transfer a photoresist pattern into an underlayer material. To construct a predictive model of plasma-etching processes, a polynomial neural network (PNN) is applied. This process was characterized by a full factorial experiment, and two attributes modeled are its etch rate and DC bias. According to the number of input variables and type of polynomials to each node, the prediction performance of the PNN was optimized. The various performances of the PNN in diverse environments were compared to three types of statistical regression models and the adaptive network fuzzy inference system (ANFIS). As the demonstrated high-prediction ability in the simulation results shows, the PNN is efficient and much more accurate from the point of view of approximation and prediction abilities.

  • PDF

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • 제10권1호
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

Efficient estimation and variable selection for partially linear single-index-coefficient regression models

  • Kim, Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • 제26권1호
    • /
    • pp.69-78
    • /
    • 2019
  • A structured model with both single-index and varying coefficients is a powerful tool in modeling high dimensional data. It has been widely used because the single-index can overcome the curse of dimensionality and varying coefficients can allow nonlinear interaction effects in the model. For high dimensional index vectors, variable selection becomes an important question in the model building process. In this paper, we propose an efficient estimation and a variable selection method based on a smoothing spline approach in a partially linear single-index-coefficient regression model. We also propose an efficient algorithm for simultaneously estimating the coefficient functions in a data-adaptive lower-dimensional approximation space and selecting significant variables in the index with the adaptive LASSO penalty. The empirical performance of the proposed method is illustrated with simulated and real data examples.

설계강우량의 Huff 4분위 방법 다항회귀식에 대한 유의성 검정 (Statistical significance test of polynomial regression equation for Huff's quartile method of design rainfall)

  • 박진희;이재준;이성호
    • 한국수자원학회논문집
    • /
    • 제51권3호
    • /
    • pp.263-272
    • /
    • 2018
  • 수공구조물 설계시 실측 유량의 자료 부족으로 홍수량의 빈도해석 결과보다는 강우자료를 수집하여 강우-유출 관계에 따라 산정된 설계강우량을 이용하여 특정 빈도에 해당하는 설계 홍수량을 사용하는 것이 일반적이다. 과거에는 첨두유량 산정을 위하여 합리식과 같은 경험식을 이용하였으나 지속기간이 장기화됨에 따라 실제 사상과는 다른 유출양상이 나타나게 되므로 확률강우량 시간분포의 정확성이 중요하게 되었다. 현재 실무에서는 설계강우량의 시간분포 방법으로 Huff의 4분위 방법 중 3분위를 사용하고 있으며 분위별 곡선에 대한 회귀식은 지속기간 전반에 걸쳐 정확도가 높은 이유로 6차식을 적용하고 있다. 그러나 통계 모델링에서는 간결함의 원리에 따라 회귀식이 간결할 필요가 있으며, 통계적 유의수준에 기초하여 회귀계수를 결정할 필요가 있다. 따라서 본 연구에서는 기상청 관할 69개 강우관측지점을 대상으로 설계강우량의 시간분포 방법으로 사용되고 있는 Huff 4분위 방법의 시간분포 회귀식에 대한 유의성 검정을 실시하였다. 기상청 관할 69개 강우관측지점의 Huff 4분위 방법의 시간분포 회귀식의 유의성 검정결과 대부분의 지점에서 4차식까지 회귀계수가 유의한 것으로 나타나 통계학적으로 Huff의 4분위 방법의 시간분포 회귀식은 4차까지만 고려하여도 무방한 것으로 분석되었다.

기계학습을 이용한 유동가속부식 모델링: 랜덤 포레스트와 비선형 회귀분석과의 비교 (Modeling of Flow-Accelerated Corrosion using Machine Learning: Comparison between Random Forest and Non-linear Regression)

  • 이경근;이은희;김성우;김경모;김동진
    • Corrosion Science and Technology
    • /
    • 제18권2호
    • /
    • pp.61-71
    • /
    • 2019
  • Flow-Accelerated Corrosion (FAC) is a phenomenon in which a protective coating on a metal surface is dissolved by a flow of fluid in a metal pipe, leading to continuous wall-thinning. Recently, many countries have developed computer codes to manage FAC in power plants, and the FAC prediction model in these computer codes plays an important role in predictive performance. Herein, the FAC prediction model was developed by applying a machine learning method and the conventional nonlinear regression method. The random forest, a widely used machine learning technique in predictive modeling led to easy calculation of FAC tendency for five input variables: flow rate, temperature, pH, Cr content, and dissolved oxygen concentration. However, the model showed significant errors in some input conditions, and it was difficult to obtain proper regression results without using additional data points. In contrast, nonlinear regression analysis predicted robust estimation even with relatively insufficient data by assuming an empirical equation and the model showed better predictive power when the interaction between DO and pH was considered. The comparative analysis of this study is believed to provide important insights for developing a more sophisticated FAC prediction model.