• Title/Summary/Keyword: 선형확률모형

Search Result 175, Processing Time 0.031 seconds

An educational tool for binary logistic regression model using Excel VBA (엑셀 VBA를 이용한 이분형 로지스틱 회귀모형 교육도구 개발)

  • Park, Cheolyong;Choi, Hyun Seok
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.403-410
    • /
    • 2014
  • Binary logistic regression analysis is a statistical technique that explains binary response variable by quantitative or qualitative explanatory variables. In the binary logistic regression model, the probability that the response variable equals, say 1, one of the binary values is to be explained as a transformation of linear combination of explanatory variables. This is one of big barriers that non-statisticians have to overcome in order to understand the model. In this study, an educational tool is developed that explains the need of the binary logistic regression analysis using Excel VBA. More precisely, this tool explains the problems related to modeling the probability of the response variable equal to 1 as a linear combination of explanatory variables and then shows how these problems can be solved through some transformations of the linear combination.

Parameters Estimation of Probability Distributions Using Meta-Heuristic Algorithms (Meta-Heuristic Algorithms를 이용한 확률분포의 매개변수 추정)

  • Yoon, Suk-Min;Lee, Tae-Sam;Kang, Myung-Gook;Jeong, Chang-Sam
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2012.05a
    • /
    • pp.464-464
    • /
    • 2012
  • 수문분야에 있어서 빈도해석의 목적은 특정 재현기간에 대한 발생 가능한 수문량의 규모를 파악하는데 있으며, 빈도해석의 정확도는 적합한 확률분포모형의 선택과 매개변수 추정방법에 의존하게 된다. 일반적으로 각 확률분포모형의 특성을 대표하는 매개변수를 추정하기 위해서는 모멘트 방법, 확률가중 모멘트 방법, 최대우도법 등을 이용하게 된다. 모멘트 방법에 의한 매개변수 추정은 해를 구하기 위한 과정이 단순한 반면, 비대칭형의 왜곡된 분포를 갖는 자료들에 대해서는 부정확한 결과를 나타내게 된다. 확률가중 모멘트 방법은 표본의 크기가 작거나 왜곡된 자료일 경우에도 비교적 안정적인 결과를 제공하는 반면, 확률 가중치가 정수로만 제한되는 단점을 갖고 있다. 그리고 대수 우도함수를 이용하여 매개변수를 추정하게 되는 최우도법은 가장 효율적인 매개변수 추정치를 얻을 수 있는 것으로 알려져 있으나, 비선형 연립방정식으로 표현되는 해를 구하기 위해서는 Newton-Raphson 방법을 사용하는 등 절차가 복잡하며, 때로는 수렴이 되지 않아 해룰 구하지 못하는 경우가 발생되게 된다. 이에 반해, 최근의 Genetic Algorithm, Ant Colony Optimization 및 Simulated Annealing과 같은 Meta-Heuristic Algorithm들은 복잡합 공학적 최적화 문제 있어서 효율적인 대안으로 주목받고 있으며, Hassanzadeh et al.(2011)에 의해 수문학적 빈도해석을 위한 매개변수 추정에 있어서도 그 적용성이 검증된바 있다. 본 연구의 목적은 연 최대강수 자료의 빈도해석에 적용되는 확률분포모형들의 매개변수 추정을 위해 Meta-Heuristic Algorithm을 적용하고자 함에 있다. 따라서 본 연구에서는 매개변수 추정을 위한 방법으로 Genetic Algorithm 및 Harmony Search를 적용하였고, 그 결과를 최우도법에 의한 결과와 비교하였다. GEV 분포를 이용하여 Simulation Test를 수행한 결과 Genetic Algorithm을 이용하여 추정된 매개변수들은 최우도법에 의한 결과들과 비교적 유사한 분포를 나타내었으나 과도한 계산시간이 요구되는 것으로 나타났다. 하지만 Harmony Search를 이용하여 추정된 매개변수들은 최우도법에 의한 결과들과 유사한 분포를 나타내었을 뿐만 아니라 계산시간 또한 매우 짧은 것으로 나타났다. 또한 국내 74개소의 강우관측소 자료와 Gamma, Log-normal, GEV 및 Gumbel 분포를 이용한 실증연구에 있어서도 Harmony Search를 이용한 매개변수 추정은 효율적인 매개 변수 추정치를 제공하는 것으로 나타났다.

  • PDF

Subset Selection in the Poisson Models - A Normal Predictors case - (포아송 모형에서의 설명변수 선택문제 - 정규분포 설명변수하에서 -)

  • 박종선
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.247-255
    • /
    • 1998
  • In this paper, a new subset selection problem in the Poisson model is considered under the normal predictors. It turns out that the subset model has bigger valiance than that of the Poisson model with random predictors and this has been used to derive new subset selection method similar to Mallows'$C_p$.

  • PDF

A Critical Evaluation of Dichotomous Choice Responses in Contingent Valuation Method (양분선택형 조건부가치측정법 응답자료의 실증적 쟁점분석)

  • Eom, Young Sook
    • Environmental and Resource Economics Review
    • /
    • v.20 no.1
    • /
    • pp.119-153
    • /
    • 2011
  • This study reviews various aspects of model formulating processes of dichotomous choice responses of the contingent valuation method (CVM), which has been increasingly used in the preliminary feasibility test of Korea public investment projects. The theoretical review emphasizes the consistency between WTP estimation process and WTP measurement process. The empirical analysis suggests that two common parametric models for dichotmous choice responses (RUM and RWTP) and two commonly used probability distributions of random components (probit and logit) resulted in all most the same empirical WTP distributions, as long as the WTP functions are specified to be a linear function of the bid amounts. However, the efficiency gain of DB response compared to SB response were supported on the ground that the two CV responses are derived from the same WTP distribution. Moreover for the exponential WTP function which guarantees the non-negative WTP measures, sample mean WTP were quite different from median WTP if the scale parameter of WTP function turned out to be large.

  • PDF

The wage determinants applying sample selection bias (표본선택 편의를 반영한 임금결정요인 분석)

  • Park, Sungik;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1317-1325
    • /
    • 2016
  • The purpose of this paper is to explain the factors affecting the wage of the vocational high school graduates. We particularly examine the effectiveness of controlling sample selection bias by employing the Tobit model and Heckman sample selection model. The major results are as follows. First it is shown that the Tobit model and Heckman sample selection model controlling sample selection bias is statistically significant. Hence all the independent variables seem to be statistically consistent with the theoretical model. Second, gender was statistically significant, both in the probability of employment and the wage. Third, the employment probability and wage of Maester high school graduates were shown to be high compared to all other graduates. Fourth, the higher parent's income, the higher are both the employment probability and the wage. Finally, parents education level, high school grade, satisfaction, and a number of licenses were found to be statistically significant, both in the probability of employment and wages.

Failure modeling to predict warranty cost for individual markets (자동차 부품의 시장별 품질보증 비용 예측을 위한 고장모형 수립)

  • Lee, Ho-Taek
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.6
    • /
    • pp.1346-1352
    • /
    • 2009
  • Warranty cost of automobile parts varies depending on the parts failure rate in a warranty region of individual markets. Parts failure rate is significantly affected by usage-rate given that other stressors of individual markets are similar. Accordingly, warranty cost can be predicted by failure modeling which reflects usage-rate and using a stochastic process. In this paper, one-dimensional approach is used by applying accelerated failure time model on the assumption that the usage-rate is linear. Such model can explain changes in parts failure rate depending on the changes in usage-rate since it can be expressed as a function of usage-rate. Therefore, acquisition of usage-rate in a new market will automatically lead to estimate of failure rate even without warranty data and warranty cost of parts can be predicted through a renewal process in replacement cases. A case study using warranty data of two real markets is presented in the application part of this paper.

Multi-dimension Categorical Data with Bayesian Network (베이지안 네트워크를 이용한 다차원 범주형 분석)

  • Kim, Yong-Chul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.2
    • /
    • pp.169-174
    • /
    • 2018
  • In general, the methods of the analysis of variance(ANOVA) for the continuous data and the chi-square test for the discrete data are used for statistical analysis of the effect and the association. In multidimensional data, analysis of hierarchical structure is required and statistical linear model is adopted. The structure of the linear model requires the normality of the data. A multidimensional categorical data analysis methods are used for causal relations, interactions, and correlation analysis. In this paper, Bayesian network model using probability distribution is proposed to reduce analysis procedure and analyze interactions and causal relationships in categorical data analysis.

Selection of Climate Indices for Nonstationary Frequency Analysis and Estimation of Rainfall Quantile (비정상성 빈도해석을 위한 기상인자 선정 및 확률강우량 산정)

  • Jung, Tae-Ho;Kim, Hanbeen;Kim, Hyeonsik;Heo, Jun-Haeng
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.39 no.1
    • /
    • pp.165-174
    • /
    • 2019
  • As a nonstationarity is observed in hydrological data, various studies on nonstationary frequency analysis for hydraulic structure design have been actively conducted. Although the inherent diversity in the atmosphere-ocean system is known to be related to the nonstationary phenomena, a nonstationary frequency analysis is generally performed based on the linear trend. In this study, a nonstationary frequency analysis was performed using climate indices as covariates to consider the climate variability and the long-term trend of the extreme rainfall. For 11 weather stations where the trend was detected, the long-term trend within the annual maximum rainfall data was extracted using the ensemble empirical mode decomposition. Then the correlation between the extracted data and various climate indices was analyzed. As a result, autumn-averaged AMM, autumn-averaged AMO, and summer-averaged NINO4 in the previous year significantly influenced the long-term trend of the annual maximum rainfall data at almost all stations. The selected seasonal climate indices were applied to the generalized extreme value (GEV) model and the best model was selected using the AIC. Using the model diagnosis for the selected model and the nonstationary GEV model with the linear trend, we identified that the selected model could compensate the underestimation of the rainfall quantiles.

비매개변수적 Kernel 가중함수의 수문학적 응용

  • 문영일
    • Water for future
    • /
    • v.33 no.5
    • /
    • pp.49-55
    • /
    • 2000
  • 전통적인 매개변수적 목적함수 추정방법은 관측자료의 모든 영역에 걸쳐 선형 또는 지수함수 형태의 가정을 기본으로 매개변수를 추정하는 반면 비매개 변수적 Kernel 가중함수를 이용한 방법은 목적함수의 형태에 대한 가정이 필요 없이 관심 있는 임의의 추정지점에서 이웃하는 자료를 이용하여 목적함수를 국지적으로 근사하는 방법이다. 추계학적 수문학의 전형적인 문제인 "목적함수의 가정"에 의해 발생되는 문제를 줄이려는 노력의 일환으로 비매개변수적 Kernel 가중함수를 이용하는 방법에 연구되었고, 본 지면에서는 Kernel 가중함수를 이용한 비매개변수적 확률밀도함수의 기본이론과 빈도해석, 회귀모형 및 비동질성 천이확률 등의 수문학적 응용에 대하여 살펴보았다.

  • PDF

Marginal Effect Analysis of Travel Behavior by Count Data Model (가산자료모형을 기초로 한 통행행태의 한계효과분석)

  • 장태연
    • Journal of Korean Society of Transportation
    • /
    • v.21 no.3
    • /
    • pp.15-22
    • /
    • 2003
  • In general, the linear regression model has been used to estimate trip generation in the travel demand forecasting procedure. However, the model suffers from several methodological limitations. First, trips as a dependent variable with non-negative integer show discrete distribution but the model assumes that the dependent variable is continuously distributed between -$\infty$ and +$\infty$. Second, the model may produce negative estimates. Third, even if estimated trips are within the valid range, the model offers only forecasted trips without discrete probability distribution of them. To overcome these limitations, a poisson model with a assumption of equidispersion has frequently been used to analyze count data such as trip frequencies. However, if the variance of data is greater than the mean. the poisson model tends to underestimate errors, resulting in unreliable estimates. Using overdispersion test, this study proved that the poisson model is not appropriate and by using Vuong test, zero inflated negative binomial model is optimal. Model reliability was checked by likelihood test and the accuracy of model by Theil inequality coefficient as well. Finally, marginal effect of the change of socio-demographic characteristics of households on trips was analyzed.