• Title/Summary/Keyword: 표본선택

Search Result 471, Processing Time 0.02 seconds

An Alternative Parametric Estimation of Sample Selection Model: An Application to Car Ownership and Car Expense (비정규분포를 이용한 표본선택 모형 추정: 자동차 보유와 유지비용에 관한 실증분석)

  • Choi, Phil-Sun;Min, In-Sik
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.345-358
    • /
    • 2012
  • In a parametric sample selection model, the distribution assumption is critical to obtain consistent estimates. Conventionally, the normality assumption has been adopted for both error terms in selection and main equations of the model. The normality assumption, however, may excessively restrict the true underlying distribution of the model. This study introduces the $S_U$-normal distribution into the error distribution of a sample selection model. The $S_U$-normal distribution can accommodate a wide range of skewness and kurtosis compared to the normal distribution. It also includes the normal distribution as a limiting distribution. Moreover, the $S_U$-normal distribution can be easily extended to multivariate dimensions. We provide the log-likelihood function and expected value formula based on a bivariate $S_U$-normal distribution in a sample selection model. The results of simulations indicate the $S_U$-normal model outperforms the normal model for the consistency of estimators. As an empirical application, we provide the sample selection model for car ownership and a car expense relationship.

Optimum Selection Probabilites in Stratified Two-stage Sampling (층화 이단계 표본추출시 최적 선택율)

  • 신민웅;오상훈
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.2
    • /
    • pp.429-437
    • /
    • 2001
  • 단순 이단계 표본 추출의 경우에 최적 선택률은 Hansen과 Hurwitz(1949)에 의하여 구하여졌다. 그러나 통계청에서 실시하는 표본조사등은 층화 이단계 추출을 한다. 따라서 실제적인 필요성에 의하여 층화 2단계 표본 설계를 시도 하였다. 층화 이단계 표본추출시에 주어진 비용아래서 모총계의 추정량의 분산을 최소로 하는 최적의 선택확률(optimum selection probability), 표본추출율과 부차 표본추출율을 Lagrangean 승수법에 의하여 구한다.

  • PDF

The wage determinants applying sample selection bias (표본선택 편의를 반영한 임금결정요인 분석)

  • Park, Sungik;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1317-1325
    • /
    • 2016
  • The purpose of this paper is to explain the factors affecting the wage of the vocational high school graduates. We particularly examine the effectiveness of controlling sample selection bias by employing the Tobit model and Heckman sample selection model. The major results are as follows. First it is shown that the Tobit model and Heckman sample selection model controlling sample selection bias is statistically significant. Hence all the independent variables seem to be statistically consistent with the theoretical model. Second, gender was statistically significant, both in the probability of employment and the wage. Third, the employment probability and wage of Maester high school graduates were shown to be high compared to all other graduates. Fourth, the higher parent's income, the higher are both the employment probability and the wage. Finally, parents education level, high school grade, satisfaction, and a number of licenses were found to be statistically significant, both in the probability of employment and wages.

Korean women wage analysis using selection models (표본 선택 모형을 이용한 국내 여성 임금 데이터 분석)

  • Jeong, Mi Ryang;Kim, Mijeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1077-1085
    • /
    • 2017
  • In this study, we have found the major factors which affect Korean women's wage analysing the data provided by 2015 Korea Labor Panel Survey (KLIPS). In general, wage data is difficult to analyze because random sampling is infeasible. Heckman sample selection model is the most widely used method for analysing the data with sample selection. Heckman proposed two kinds of selection models: the one is the model with maximum likelihood method and the other is the Heckman two stage model. Heckman two stage model is known to be robust to the normal assumption of bivariate error terms. Recently, Marchenko and Genton (2012) proposed the Heckman selectiont model which generalizes the Heckman two stage model and concluded that Heckman selection-t model is more robust to the error assumptions. Employing the two models, we carried out the analysis of the data and we compared those results.

A new sample selection model for overdispersed count data (과대산포 가산자료의 새로운 표본선택모형)

  • Jo, Sung Eun;Zhao, Jun;Kim, Hyoung-Moon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.733-749
    • /
    • 2018
  • Sample selection arises as a result of the partial observability of the outcome of interest in a study. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. Recently sample selection models for binomial and Poisson response variables have been proposed. Based on the theory of symmetry-modulated distribution, we extend these to a model for overdispersed count data. This type of data with no sample selection is often modeled using negative binomial distribution. Hence we propose a sample selection model for overdispersed count data using the negative binomial distribution. A real data application is employed. Simulation studies reveal that our estimation method based on profile log-likelihood is stable.

The wage determinants of college graduates using Heckman's sample selection model (Heckman의 표본선택모형을 이용한 대졸자의 임금결정요인 분석)

  • Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1099-1107
    • /
    • 2017
  • In this study, we analyzed the determinants of wages of college graduates by using the data of "2014 Graduates Occupational Mobility Survey" conducted by Korea Employment Information Service. In general, wages contain two complex pieces of information about whether an individual is employed and the size of the wage. However, in many previous researches on wage determinants, sample selection bias tends to be generated by performing linear regression analysis using only information on wage size. We used the Heckman sample selection models for analysis to overcome this problem. The main results are summarized as follows. First, the validity of the Heckman's sample selection model is statistically significant. Male is significantly higher in both job probability and wage than female. As age increases and parents' income increases, both the probability of employment and the size of wages are higher. Finally, as the university satisfaction increases and the number of certifications acquired increased, both the probability of employment and the wage tends to increase.

Corporate Debt Choice: Application of Panel Sample Selection Model (기업의 부채조달원 선택에 관한 연구: 패널표본선택모형의 적용)

  • Lee, Ho Sun
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.428-435
    • /
    • 2015
  • When I examined the corporate financing statistics in Korea, I have recognized that there are several trends of them. First, large enterprises use bank loan and direct financing like corporate bond as debt. Second, small and medium companies mainly use bank loan only. So I argue that there is sample selection bias in corporate debt choice and using sample selection methodology is more adequate when analysing the behavior in corporate debt choice. Therefore I have tested panel sample selection model, using the listed korean firm data from 1990 to 2013 and I have found that the panel sample selection model is appropriate.

깁스표본기법을 이용한 설명변수 선택문제에서 사전분포의 설정-선형회귀모형을 중심으로-

  • 박종선;남궁평;한숙영
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.2
    • /
    • pp.333-343
    • /
    • 1997
  • 선형회귀분석에서 변수의 선택문제는 최적의 모형을 찾는데 아주 중요한 부분을 차지한다. George와 McCulloch(1993)는 계층적 베이즈 모형과 깁스표본법을 이용하여 선형회귀모형에서 변수를 선택하는 문제를 고려하였다. 이 논문에서는 George와 McCulloch의 모형을 바탕으로 각각의 설명변수가 모형에 포함될 사전확률을 객관적인 기준에 의하여 결정하는 문제를 고려하여 보았다.

  • PDF

A Study on the Application of Disaggregate Logit Models from Choice-Based Freight Data (선택기반 화물데이타를 이용한 개별로짓모형의 적용에 관한 연구)

  • Nam K.C.
    • Journal of Korean Port Research
    • /
    • v.7 no.1
    • /
    • pp.25-42
    • /
    • 1993
  • 지난 20여년간 화물수송 분야에는 큰 변화가 있었다. 수송 공급 측면에서는 보다 다양하고 기술적으로 앞선 수송수단들이 등장했으며, 수송 수요 측면에서는 로지스틱스 개념의 도입으로 화주들의 보다 높은 수송 서비스가 요구 되었다. 수송수단의 수송 분담에 있어서도 특히 철도에서 공로로의 두드러진 화물이동 현상이 나타났다. 이러한 변화는 수송 현안 해결에 대한 관심을 높이고 화물수송수요 예측기법의 이론적, 개념적인 발달을 가져왔다. 그 중 두드러진 발달은 화주의 행태를 반영하는 행태모형의 개발과 새로운 자료수집 방법 및 자료형태이다. 전통적으로 화물수송 및 교통 연구에 널리 사용된 행태모형은 확률표본을 사용하여 왔으나, 80년대 부터 비확률 표본 사용에 관심이 높아졌다. 그 대표적인 것으로 기반근거 데이터를 들 수 있다. 이 데이터는 제한된 정보를 제공한다는 자료자체의 한계를 지니고 있으나, 자료수집이 용이하고 비용이 저렴하다는 장점을 가지고 있다. 화물수송 분야에서 선택기반 데이터를 이용한 연구는 현재까지 두 편이 발표 되어 있다. 따라서 볼 연구는 선택기반 데이터를 이용한 개별선택모형의 잠재력을 검증하는 것을 그 목적으로 하고, 네 종류의 제조품 그룹을 대상으로 기반근거 데이터를 수집하여 로짓모형을 추정하였으며, 추정결과를 이전 연구들의 결과와 비교하여 그 타당성을 검토 하였다. 추정된 결과는 통계적으로 유의하며 직관적으로 타당한 것으로 나타난다. 또한 그 결과는 문헌의 결과와도 일치하였다. 수송계획에 있어서 자료수집비용 절감의 필요성을 생각할 때 이것은 중요한 의미를 지닌다.

  • PDF

Classifier Selection using Feature Space Attributes in Local Region (국부적 영역에서의 특징 공간 속성을 이용한 다중 인식기 선택)

  • Shin Dong-Kuk;Song Hye-Jeong;Kim Baeksop
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.12
    • /
    • pp.1684-1690
    • /
    • 2004
  • This paper presents a method for classifier selection that uses distribution information of the training samples in a small region surrounding a sample. The conventional DCS-LA(Dynamic Classifier Selection - Local Accuracy) selects a classifier dynamically by comparing the local accuracy of each classifier at the test time, which inevitably requires long classification time. On the other hand, in the proposed approach, the best classifier in a local region is stored in the FSA(Feature Space Attribute) table during the training time, and the test is done by just referring to the table. Therefore, this approach enables fast classification because classification is not needed during test. Two feature space attributes are used entropy and density of k training samples around each sample. Each sample in the feature space is mapped into a point in the attribute space made by two attributes. The attribute space is divided into regular rectangular cells in which the local accuracy of each classifier is appended. The cells with associated local accuracy comprise the FSA table. During test, when a test sample is applied, the cell to which the test sample belongs is determined first by calculating the two attributes, and then, the most accurate classifier is chosen from the FSA table. To show the effectiveness of the proposed algorithm, it is compared with the conventional DCS -LA using the Elena database. The experiments show that the accuracy of the proposed algorithm is almost same as DCS-LA, but the classification time is about four times faster than that.