• Title/Summary/Keyword: Sample selection model

Search Result 197, Processing Time 0.027 seconds

Minimum Message Length and Classical Methods for Model Selection in Univariate Polynomial Regression

  • Viswanathan, Murlikrishna;Yang, Young-Kyu;WhangBo, Taeg-Keun
    • ETRI Journal
    • /
    • v.27 no.6
    • /
    • pp.747-758
    • /
    • 2005
  • The problem of selection among competing models has been a fundamental issue in statistical data analysis. Good fits to data can be misleading since they can result from properties of the model that have nothing to do with it being a close approximation to the source distribution of interest (for example, overfitting). In this study we focus on the preference among models from a family of polynomial regressors. Three decades of research has spawned a number of plausible techniques for the selection of models, namely, Akaike's Finite Prediction Error (FPE) and Information Criterion (AIC), Schwartz's criterion (SCH), Generalized Cross Validation (GCV), Wallace's Minimum Message Length (MML), Minimum Description Length (MDL), and Vapnik's Structural Risk Minimization (SRM). The fundamental similarity between all these principles is their attempt to define an appropriate balance between the complexity of models and their ability to explain the data. This paper presents an empirical study of the above principles in the context of model selection, where the models under consideration are univariate polynomials. The paper includes a detailed empirical evaluation of the model selection methods on six target functions, with varying sample sizes and added Gaussian noise. The results from the study appear to provide strong evidence in support of the MML- and SRM- based methods over the other standard approaches (FPE, AIC, SCH and GCV).

  • PDF

How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.41-51
    • /
    • 2022
  • We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.

Variable Selection for Logistic Regression Model Using Adjusted Coefficients of Determination (수정 결정계수를 사용한 로지스틱 회귀모형에서의 변수선택법)

  • Hong C. S.;Ham J. H.;Kim H. I.
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.435-443
    • /
    • 2005
  • Coefficients of determination in logistic regression analysis are defined as various statistics, and their values are relatively smaller than those for linear regression model. These coefficients of determination are not generally used to evaluate and diagnose logistic regression model. Liao and McGee (2003) proposed two adjusted coefficients of determination which are robust at the addition of inappropriate predictors and the variation of sample size. In this work, these adjusted coefficients of determination are applied to variable selection method for logistic regression model and compared with results of other methods such as the forward selection, backward elimination, stepwise selection, and AIC statistic.

An Application of the Clustering Threshold Gradient Descent Regularization Method for Selecting Genes in Predicting the Survival Time of Lung Carcinomas

  • Lee, Seung-Yeoun;Kim, Young-Chul
    • Genomics & Informatics
    • /
    • v.5 no.3
    • /
    • pp.95-101
    • /
    • 2007
  • In this paper, we consider the variable selection methods in the Cox model when a large number of gene expression levels are involved with survival time. Deciding which genes are associated with survival time has been a challenging problem because of the large number of genes and relatively small sample size (n<

An Integrated Mathematical Model for Supplier Selection

  • Asghari, Mohammad
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.1
    • /
    • pp.29-42
    • /
    • 2014
  • Extensive research has been conducted on supplier evaluation and selection as a strategic and crucial component of supply chain management in recent years. However, few articles in the previous literature have been dedicated to the use of fuzzy inference systems as an aid in decision-making. Therefore, this essay attempts to demonstrate the application of this method in evaluating suppliers, based on a comprehensive framework of qualitative and quantitative factors besides the effect of gradual coverage distance. The purpose of this study is to investigate the applicability of the numerous measures and metrics in a multi-objective optimization problem of the supply chain network design with the aim of managing the allocation of orders by coordinating the production lines to satisfy customers' demand. This work presents a dynamic non-linear programming model that examines the important aspects of the strategic planning of the manufacturing in supply chain. The effectiveness of the configured network is illustrated using a sample, following which an exact method is used to solve this multi-objective problem and confirm the validity of the model, and finally the results will be discussed and analyzed.

Determinants of Farmers' Participation and Acceptance Level in Uiseong Traditional Agricultural Water Utilization System Conservation Activities (의성 전통 수리 농업시스템 보전 활동에 대한 농가 참여 및 수용수준의 결정요인)

  • Kim, Se-Hyuk;Lee, Se-Yeop;Kim, Tae-Kyun
    • Journal of Korean Society of Rural Planning
    • /
    • v.27 no.3
    • /
    • pp.47-56
    • /
    • 2021
  • The purpose of this study is to identify the determinants of local farmers' participation and acceptance level in traditional agricultural technology conservation activities, using on the traditional agricultural water utilization system in Uiseong designated as Korea's important agricultural heritage system No. 10. The Heckman sample selection model was used to solve the selection bias. The results show that as the interest in the conservation of the agricultural ecological and environment is high, as the traditional agricultural system in Uiseong is used, as the cultivation area of paddy field increased, and as the age 50s or older, the willingness to participate in conservation activities increased. The results also indicate that as the experience of participating in the conservation of the agricultural ecological and environment and the hours of education are increased, the knowledge of the traditional agricultural system in Uiseong is low, and as the cultivation area of paddy field decreased, the acceptance time for conservation activities increased. The results of this study may contribute to government's policy for traditional agricultural technology conservation.

The Impact of Divorce on Tenure Choice for Women in Korea (자가점유로 분석한 이혼여성의 주거안정성)

  • Hwang, Jae-Hee;Lee, Seong-Woo
    • Journal of the Korean housing association
    • /
    • v.23 no.1
    • /
    • pp.55-66
    • /
    • 2012
  • Present study investigates on the impact of resources and characteristics of the tenure choice for divorced women in Korea. The authors utilize the micro data from the Korea Census (2% sample) provided by the National Statistical Office. The authors apply the bivariate probit model to eliminate selection bias that could incur due to sample selectivity, from a chain of marital disruption and tenure choices. This study starts with a descriptive explanation of homeownership after divorce from 1985 to 2005. It concluded that divorce results in a substantial attrition of homeownership. The authors found that out for many women, divorce initiates a process of downward mobility on the housing ladder. The probability to own housing is much lower for divorced women than for women who are not divorced. The present study concludes by suggesting some policy implications for divorced women who have limited access to housing stability. The authors also suggest some future studies that can compensate the empirical limitations of the present study.

Wood Classification of Japanese Fagaceae using Partial Sample Area and Convolutional Neural Networks

  • FATHURAHMAN, Taufik;GUNAWAN, P.H.;PRAKASA, Esa;SUGIYAMA, Junji
    • Journal of the Korean Wood Science and Technology
    • /
    • v.49 no.5
    • /
    • pp.491-503
    • /
    • 2021
  • Wood identification is regularly performed by observing the wood anatomy, such as colour, texture, fibre direction, and other characteristics. The manual process, however, could be time consuming, especially when identification work is required at high quantity. Considering this condition, a convolutional neural networks (CNN)-based program is applied to improve the image classification results. The research focuses on the algorithm accuracy and efficiency in dealing with the dataset limitations. For this, it is proposed to do the sample selection process or only take a small portion of the existing image. Still, it can be expected to represent the overall picture to maintain and improve the generalisation capabilities of the CNN method in the classification stages. The experiments yielded an incredible F1 score average up to 93.4% for medium sample area sizes (200 × 200 pixels) on each CNN architecture (VGG16, ResNet50, MobileNet, DenseNet121, and Xception based). Whereas DenseNet121-based architecture was found to be the best architecture in maintaining the generalisation of its model for each sample area size (100, 200, and 300 pixels). The experimental results showed that the proposed algorithm can be an accurate and reliable solution.

Traffic Forecasting Model Selection of Artificial Neural Network Using Akaike's Information Criterion (AIC(AKaike's Information Criterion)을 이용한 교통량 예측 모형)

  • Kang, Weon-Eui;Baik, Nam-Cheol;Yoon, Hye-Kyung
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.7 s.78
    • /
    • pp.155-159
    • /
    • 2004
  • Recently, there are many trials about Artificial neural networks : ANNs structure and studying method of researches for forecasting traffic volume. ANNs have a powerful capabilities of recognizing pattern with a flexible non-linear model. However, ANNs have some overfitting problems in dealing with a lot of parameters because of its non-linear problems. This research deals with the application of a variety of model selection criterion for cancellation of the overfitting problems. Especially, this aims at analyzing which the selecting model cancels the overfitting problems and guarantees the transferability from time measure. Results in this study are as follow. First, the model which is selecting in sample does not guarantees the best capabilities of out-of-sample. So to speak, the best model in sample is no relationship with the capabilities of out-of-sample like many existing researches. Second, in stability of model selecting criterion, AIC3, AICC, BIC are available but AIC4 has a large variation comparing with the best model. In time-series analysis and forecasting, we need more quantitable data analysis and another time-series analysis because uncertainty of a model can have an effect on correlation between in-sample and out-of-sample.

Unbiasedness or Statistical Efficiency: Comparison between One-stage Tobit of MLE and Two-step Tobit of OLS

  • Park, Sun-Young
    • International Journal of Human Ecology
    • /
    • v.4 no.2
    • /
    • pp.77-87
    • /
    • 2003
  • This paper tried to construct statistical and econometric models on the basis of economic theory in order to discuss the issue of statistical efficiency and unbiasedness including the sample selection bias correcting problem. Comparative analytical tool were one stage Tobit of Maximum Likelihood estimation and Heckman's two-step Tobit of Ordinary Least Squares. The results showed that the adequacy of model for the analysis on demand and choice, we believe that there is no big difference in explanatory variables between the first selection model and the second linear probability model. Since the Lambda, the self- selectivity correction factor, in the Type II Tobit is not statistically significant, there is no self-selectivity in the Type II Tobit model, indicating that Type I Tobit model would give us better explanation in the demand for and choice which is less complicated statistical method rather than type II model.