• Title/Summary/Keyword: 분위수 회귀분석

Search Result 28, Processing Time 0.021 seconds

Variable selection with quantile regression tree (분위수 회귀나무를 이용한 변수선택 방법 연구)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1095-1106
    • /
    • 2016
  • The quantile regression method proposed by Koenker et al. (1978) focuses on conditional quantiles given by independent variables, and analyzes the relationship between response variable and independent variables at the given quantile. Considering the linear programming used for the estimation of quantile regression coefficients, the model fitting job might be difficult when large data are introduced for analysis. Therefore, dimension reduction (or variable selection) could be a good solution for the quantile regression of large data sets. Regression tree methods are applied to a variable selection for quantile regression in this paper. Real data of Korea Baseball Organization (KBO) players are analyzed following the variable selection approach based on the regression tree. Analysis result shows that a few important variables are selected, which are also meaningful for the given quantiles of salary data of the baseball players.

Penalized quantile regression tree (벌점화 분위수 회귀나무모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1361-1371
    • /
    • 2016
  • Quantile regression provides a variety of useful statistical information to examine how covariates influence the conditional quantile functions of a response variable. However, traditional quantile regression (which assume a linear model) is not appropriate when the relationship between the response and the covariates is a nonlinear. It is also necessary to conduct variable selection for high dimensional data or strongly correlated covariates. In this paper, we propose a penalized quantile regression tree model. The split rule of the proposed method is based on residual analysis, which has a negligible bias to select a split variable and reasonable computational cost. A simulation study and real data analysis are presented to demonstrate the satisfactory performance and usefulness of the proposed method.

Analysis of AI interview data using unified non-crossing multiple quantile regression tree model (통합 비교차 다중 분위수회귀나무 모형을 활용한 AI 면접체계 자료 분석)

  • Kim, Jaeoh;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.753-762
    • /
    • 2020
  • With an increasing interest in integrating artificial intelligence (AI) into interview processes, the Republic of Korea (ROK) army is trying to lead and analyze AI-powered interview platform. This study is to analyze the AI interview data using a unified non-crossing multiple quantile tree (UNQRT) model. Compared to the UNQRT, the existing models, such as quantile regression and quantile regression tree model (QRT), are inadequate for the analysis of AI interview data. Specially, the linearity assumption of the quantile regression is overly strong for the aforementioned application. While the QRT model seems to be applicable by relaxing the linearity assumption, it suffers from crossing problems among estimated quantile functions and leads to an uninterpretable model. The UNQRT circumvents the crossing problem of quantile functions by simultaneously estimating multiple quantile functions with a non-crossing constraint and is robust from extreme quantiles. Furthermore, the single tree construction from the UNQRT leads to an interpretable model compared to the QRT model. In this study, by using the UNQRT, we explored the relationship between the results of the Army AI interview system and the existing personnel data to derive meaningful results.

Divide and conquer kernel quantile regression for massive dataset (대용량 자료의 분석을 위한 분할정복 커널 분위수 회귀모형)

  • Bang, Sungwan;Kim, Jaeoh
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.569-578
    • /
    • 2020
  • By estimating conditional quantile functions of the response, quantile regression (QR) can provide comprehensive information of the relationship between the response and the predictors. In addition, kernel quantile regression (KQR) estimates a nonlinear conditional quantile function in reproducing kernel Hilbert spaces generated by a positive definite kernel function. However, it is infeasible to use the KQR in analysing a massive data due to the limitations of computer primary memory. We propose a divide and conquer based KQR (DC-KQR) method to overcome such a limitation. The proposed DC-KQR divides the entire data into a few subsets, then applies the KQR onto each subsets and derives a final estimator by aggregating all results from subsets. Simulation studies are presented to demonstrate the satisfactory performance of the proposed method.

A comparison study of multiple linear quantile regression using non-crossing constraints (비교차 제약식을 이용한 다중 선형 분위수 회귀모형에 관한 비교연구)

  • Bang, Sungwan;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.773-786
    • /
    • 2016
  • Multiple quantile regression that simultaneously estimate several conditional quantiles of response given covariates can provide a comprehensive information about the relationship between the response and covariates. Some quantile estimates can cross if conditional quantiles are separately estimated; however, this violates the definition of the quantile. To tackle this issue, multiple quantile regression with non-crossing constraints have been developed. In this paper, we carry out a comparison study on several popular methods for non-crossing multiple linear quantile regression to provide practical guidance on its application.

Accelerated Lifetime Data Analysis Using Quantile Regression (분위수 회귀를 이용한 가속수명시험 자료 분석)

  • Roh, Chee-Youn;Kim, Hee-Jeong;Na, Myung-Hwan
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.631-638
    • /
    • 2008
  • Accelerated Lifetime Test is a method of estimation of lifetime quality characteristics under operation condition with the accelerated lifetime data obtained under accelerated stress. In this paper we propose estimation method with accelerated lifetime data using quantile regression. We apply the method to real data with Arrhenius and Inverse power model.

Multivariate quantile regression tree (다변량 분위수 회귀나무 모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.533-545
    • /
    • 2017
  • Quantile regression models provide a variety of useful statistical information by estimating the conditional quantile function of the response variable. However, the traditional linear quantile regression model can lead to the distorted and incorrect results when analysing real data having a nonlinear relationship between the explanatory variables and the response variables. Furthermore, as the complexity of the data increases, it is required to analyse multiple response variables simultaneously with more sophisticated interpretations. For such reasons, we propose a multivariate quantile regression tree model. In this paper, a new split variable selection algorithm is suggested for a multivariate regression tree model. This algorithm can select the split variable more accurately than the previous method without significant selection bias. We investigate the performance of our proposed method with both simulation and real data studies.

Model selection via Bayesian information criterion for divide-and-conquer penalized quantile regression (베이즈 정보 기준을 활용한 분할-정복 벌점화 분위수 회귀)

  • Kang, Jongkyeong;Han, Seokwon;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.217-227
    • /
    • 2022
  • Quantile regression is widely used in many fields based on the advantage of providing an efficient tool for examining complex information latent in variables. However, modern large-scale and high-dimensional data makes it very difficult to estimate the quantile regression model due to limitations in terms of computation time and storage space. Divide-and-conquer is a technique that divide the entire data into several sub-datasets that are easy to calculate and then reconstruct the estimates of the entire data using only the summary statistics in each sub-datasets. In this paper, we studied on a variable selection method using Bayes information criteria by applying the divide-and-conquer technique to the penalized quantile regression. When the number of sub-datasets is properly selected, the proposed method is efficient in terms of computational speed, providing consistent results in terms of variable selection as long as classical quantile regression estimates calculated with the entire data. The advantages of the proposed method were confirmed through simulation data and real data analysis.

Quantile Co-integration Application for Maritime Business Fluctuation (분위수 공적분 모형과 해운 경기변동 분석)

  • Kim, Hyun-Sok
    • Journal of Korea Port Economic Association
    • /
    • v.38 no.2
    • /
    • pp.153-164
    • /
    • 2022
  • In this study, we estimate the quantile-regression framework of the shipping industry for the Capesize used ship, which is a typical raw material transportation from January 2000 to December 2021. This research aims two main contributions. First, we analyze the relationship between the Capesize used ship, which is a typical type in the raw material transportation market, and the freight market, for which mixed empirical analysis results are presented. Second, we present an empirical analysis model that considers the structural transformation proposed in the Hyunsok Kim and Myung-hee Chang(2020a) study in quantile-regression. In structural change investigations, the empirical results confirm that the quantile model is able to overcome the problems caused by non-stationarity in time series analysis. Then, the long-run relationship of the co-integration framework divided into long and short-run effects of exogenous variables, and this is extended to a prediction model subdivided by quantile. The results are the basis for extending the analysis based on the shipping theory to artificial intelligence and machine learning approaches.

Comparison of estimation methods for expectile regression (평률 회귀분석을 위한 추정 방법의 비교)

  • Kim, Jong Min;Kang, Kee-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.343-352
    • /
    • 2018
  • We can use quantile regression and expectile regression analysis to estimate trends in extreme regions as well as the average trends of response variables in given explanatory variables. In this paper, we compare the performance between the parametric and nonparametric methods for expectile regression. We introduce each estimation method and analyze through various simulations and the application to real data. The nonparametric model showed better results if the model is complex and difficult to deduce the relationship between variables. The use of nonparametric methods can be recommended in terms of the difficulty of assuming a parametric model in expectile regression.