• Title/Summary/Keyword: 표본선택 모형

Search Result 119, Processing Time 0.027 seconds

Appropriate Sample Size for Bivariate Frequency Analysis of Rainfall Event using Peaks Over Threshold (POT) (강우사상 이변량 빈도해석을 위한 Peaks Over Threshold (POT) 방법을 이용한 적정 확률표본 선택 연구)

  • Joo, Kyungwon;Kim, Hanbeen;Ahn, Hyunjun;Heo, Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.304-304
    • /
    • 2018
  • 이변량 빈도해석은 일반적으로 고정지속기간 강우량에 대해 빈도해석하는 단변량 빈도해석에 비해 지속기간을 확률변수로 이용하여 강우량과 동시에 확률변수로 사용할 수 있다는 장점이 있다. 하지만 확률분포형의 차원이 증가하기 때문에 기존 단변량 빈도해석에서 요구되던 표본크기보다 더 많은 표본이 필요하다. 우리나라 강우관측소의 경우 오래된 관측소의 경우에도 기록년수가 60년을 넘지 않아 연최대계열로 확률표본을 작성할 경우 이변량 빈도해석을 수행하기에 부족할 수 있다. 따라서 본 연구에서는 Peaks Over Threshold (POT) 방법을 이용하여 적정 확률표본을 선택하는 연구를 진행하였다. 서울 기상청 지점의 강우자료로부터 최소무강우시간을 이용하여 모든 강우사상을 추출하였으며 각 강우사상의 강우량과 지속기간이 확률변수로 사용되었다. 기존에 알려진 POT 방법들과 Anderson-Darling 적합도 검정을 이용한 절단값 산정방법등을 적용하여 확률표본 개수의 변화에 따른 주변분포형의 적합도 검정과 이변량 확률모형의 적합성을 살펴보았다.

  • PDF

Variable Selection for Logistic Regression Model Using Adjusted Coefficients of Determination (수정 결정계수를 사용한 로지스틱 회귀모형에서의 변수선택법)

  • Hong C. S.;Ham J. H.;Kim H. I.
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.435-443
    • /
    • 2005
  • Coefficients of determination in logistic regression analysis are defined as various statistics, and their values are relatively smaller than those for linear regression model. These coefficients of determination are not generally used to evaluate and diagnose logistic regression model. Liao and McGee (2003) proposed two adjusted coefficients of determination which are robust at the addition of inappropriate predictors and the variation of sample size. In this work, these adjusted coefficients of determination are applied to variable selection method for logistic regression model and compared with results of other methods such as the forward selection, backward elimination, stepwise selection, and AIC statistic.

Undecided inference using bivariate probit models (이변량 프로빗모형을 이용한 미결정자 추론)

  • Hong, Chong-Sun;Jung, Mi-Yang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.6
    • /
    • pp.1017-1028
    • /
    • 2011
  • When it is not easy to decide the credit scoring for some loan applicants, credit evaluation is postponded and reserve to ask a specialist for further evaluation of undecided applicants. This undecided inference is one of problems that happen to most statistical models including the biostatistics and sportal statistics as well as credit evaluation area. In this work, the undecided inference is regarded as a missing data mechanism under the assumption of MNAR, and use the bivariate probit model which is one of sample selection models. Two undecided inference methods are proposed: one is to make use of characteristic variables to represent the state for decided applicants, and the other is that more accurate and additional informations are collected and apply these new variables. With an illustrated example, misclassification error rates for undecided and overall applicants are obtainded and compared according to various characteristic variables, undecided intervals, and thresholds. It is found that misclassification error rates could be reduced when the undecided interval is increased and more accurate information is put to model, since more accurate situation of decided applications are reflected in the bivariate probit model.

Nonstationary Frequency Analysis for Annual Maximum Data

  • Kim, Su-Yeong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.4-4
    • /
    • 2017
  • 수문자료의 빈도해석은 자료의 독립성(independence)와 정상성(stationarity)를 가정하여 이뤄진다. 그러나 관측 수문자료에서 비정상성 현상이 발생하고 있다는 사실이 관측되면서 수문자료에 대한 비정상성 빈도해석에 대한 필요성도 커지고 있다. 본 연구의 목적은 수문자료의 빈도해석에서 가장 널리 사용되고 있는 Gumbel 및 GEV 분포에 대한 비정상성 빈도해석 모형을 개발하는 것으로, 이를 위해 비정상성 Gumbel과 GEV 모형의 매개변수를 시간에 따라 변하는 형태로 정의하였다. 비정상성 Gumbel 및 GEV 모형의 정확도를 알아보기 위해 비정상성 모형과정상성 모형을 이용하여 Monte Carlo 모의실험을 수행하였다. 모의실험은 다양한 조건의 재현기간, 표본크기, 매개변수 조건을 고려하여 수행되었다. 그 결과 비정상성 모형의 오차는 비교적 표본크기가 클 때 가장 작은 것으로 나타났다. 또한 복잡한 매개변수의 조합을 가지는 비정상성 모형은 모두 동일한 경향성을 가질 때 가장 작은 오차를 보이는 것으로 나타났다. 비정상성 GEV 모형의 경우는 확률수문량 산정에 음(-)의 형상 매개변수가 큰 영향을 끼치는 것으로 나타났다. 또한 본 연구에서는 비정상성 조건에서 다양하게 존재하는 비정상성 모형 중 어떠한 모형이 주어진 자료에 대해 가장 적절한 모형인지 결정하기 위해 모의실험을 수행하였다. 널리 적용되고 있는 AIC, BIC, likelihood ratio test에 대해 정상성 및 비정상성 Gumbel 모형을 이용하여 모의실험을 수행한 결과, AIC가 비정상성 모형 중 적정 모형 선택에 가장 효과적인 것으로 나타났다. 개발된 비정상성 Gumbel 및 GEV 모형의 적용성을 알아보기 위해 우리나라 연최대강우 자료에 적용한 결과, 위치 매개변수에 시간항을 고려하는 Gumbel 모형이 최적모형으로 가장 많이 선택되는 것으로 나타났다. 따라서 현재 우리나라의 연최대강우자료 중 경향성이 나타나는 자료에 대해서는 위치 매개변수가 시간에 따라 변하는 특성이 가장 많이 나타나고 있는 것으로 판단된다.

  • PDF

A study on bias effect of LASSO regression for model selection criteria (모형 선택 기준들에 대한 LASSO 회귀 모형 편의의 영향 연구)

  • Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.643-656
    • /
    • 2016
  • High dimensional data are frequently encountered in various fields where the number of variables is greater than the number of samples. It is usually necessary to select variables to estimate regression coefficients and avoid overfitting in high dimensional data. A penalized regression model simultaneously obtains variable selection and estimation of coefficients which makes them frequently used for high dimensional data. However, the penalized regression model also needs to select the optimal model by choosing a tuning parameter based on the model selection criterion. This study deals with the bias effect of LASSO regression for model selection criteria. We numerically describes the bias effect to the model selection criteria and apply the proposed correction to the identification of biomarkers for lung cancer based on gene expression data.

Random Utility Models and the Value of National Parks in Korea (확률효용모형 분석을 통한 국립공원의 경제적 가치 평가)

  • Kwon, Oh Sang
    • Environmental and Resource Economics Review
    • /
    • v.14 no.1
    • /
    • pp.51-73
    • /
    • 2005
  • The purpose of this study is estimating the value of recreation of the eighteen national parks in Korea. A conditional logit model and a nested logit model have been estimated for the purpose. The data used for the study have been collected via a national level off-site survey. In addition, the annual aggregate data on the number of visitors to each park have been combined with the survey data to derive more reliable estimates. The paper finds that there are substantial differences in preferences for mountain and marine national parks. Not only the value of each park but also the values of the main characteristics of the parks are estimated.

  • PDF

Social Benefits of Improved Water Quality at the Taehwa River Based on Citizen's Willingness-to-Pay (시민지불의사에 기초한 태화강 수질개선의 사회적 편익)

  • Kim, Jae-Hong
    • Journal of Environmental Policy
    • /
    • v.6 no.1
    • /
    • pp.83-109
    • /
    • 2007
  • This study evaluates citizen's willingness-to-pay for the benefits from improved water quality of the Taehwa river in Ulsan, Korea, using a contingent valuation method with double-bounded dichotomous choice. The estimation results of the bivariate probit model shows the amounts of willingness-to-pay are monthly 3,458.5 Korean Won per household and yearly 14,760 million Korean Won for total households in Ulsan, Korea. These estimates are equivalent to the social values of improved water quality of the Taehwa river. This study also tests the inter-dependence between two answers, which may occur in the responses of the questions for the double-bounded dichotomous choice, and all the null hypotheses on the inter-dependence are rejected in this study.

  • PDF

Forecasting Korean CPI Inflation (우리나라 소비자물가상승률 예측)

  • Kang, Kyu Ho;Kim, Jungsung;Shin, Serim
    • Economic Analysis
    • /
    • v.27 no.4
    • /
    • pp.1-42
    • /
    • 2021
  • The outlook for Korea's consumer price inflation rate has a profound impact not only on the Bank of Korea's operation of the inflation target system but also on the overall economy, including the bond market and private consumption and investment. This study presents the prediction results of consumer price inflation in Korea for the next three years. To this end, first, model selection is performed based on the out-of-sample predictive power of autoregressive distributed lag (ADL) models, AR models, small-scale vector autoregressive (VAR) models, and large-scale VAR models. Since there are many potential predictors of inflation, a Bayesian variable selection technique was introduced for 12 macro variables, and a precise tuning process was performed to improve predictive power. In the case of the VAR model, the Minnesota prior distribution was applied to solve the dimensional curse problem. Looking at the results of long-term and short-term out-of-sample predictions for the last five years, the ADL model was generally superior to other competing models in both point and distribution prediction. As a result of forecasting through the combination of predictions from the above models, the inflation rate is expected to maintain the current level of around 2% until the second half of 2022, and is expected to drop to around 1% from the first half of 2023.

An Exploratory Study of Psychological Characteristics of Metaverse Users (메타버스 이용자의 심리 특성 탐색 연구)

  • Hyeonjeong Kim;HyunJung Kim;Beomsoo Kim;Hwan-Ho Noh
    • Knowledge Management Research
    • /
    • v.24 no.4
    • /
    • pp.63-85
    • /
    • 2023
  • This study aims to identify the primary user group in the growing metaverse space based on the increased interest during the COVID-19 era. It also aims to explore the predictive factors for metaverse adoption. To predict online activities, the study examined user purposes, motivations, and relevant demographic factors as predictive variables through model analysis. The data from the Korean Media Panel Survey were used, and a two-stage analysis with the Heckman two-stage sample selection model was conducted to predict metaverse users. The analysis revealed that the key factors influencing metaverse adoption were offline activities, openness, OTT usage, and purchasing of paid content. Moreover, in the second stage model, openness, gender, and paid content purchases were identified as significant variables for increasing metaverse usage time. These results indicate that understanding metaverse users is essential in the context of the rising interest in online activities during the COVID-19 era and can provide valuable insights for metaverse platform-related companies and developers.

세그먼트 변화를 추적하는 다차원척도법

  • 김주영
    • Asia Marketing Journal
    • /
    • v.1 no.4
    • /
    • pp.1-23
    • /
    • 1999
  • 포지셔닝맵은 마케팅전략의 핵심인 STP전략을 세우는데 유용한 도구이나 포지셔닝맵을 그리기 위해서는 여러 가지 분석도구를 혼합하여 사용하여야 하였다. 본 논문에서는 완벽하지 않은 소비자 pick any/N자료와 상표의 특성자료를 이용하여, 세분시장을 모델 내에서 구분하고, 이들의 이상점을 찾아주고, 나아가서 시간의 흐름에 따라 이상점의 변화를 찾아주면서 포지셔닝맵을 그려주는 새로운 external 다차원척도모형을 제시하고 있다. 모델의 성과를 확인하기 위해서 차원의 변화, 세분시장변화, 상표구성의 변화 및 소비자표본의 변화를 임의로 만들어서 가상의 자료를 통해서 검증하였다. 실제로 사용해 보려면 저자의 홈페이지에서 프로그램을 다운 받을 수도 있다.

  • PDF