• Title/Summary/Keyword: Random regression model

Search Result 494, Processing Time 0.053 seconds

Kernel-Trick Regression and Classification

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.2
    • /
    • pp.201-207
    • /
    • 2015
  • Support vector machine (SVM) is a well known kernel-trick supervised learning tool. This study proposes a working scheme for kernel-trick regression and classification (KtRC) as a SVM alternative. KtRC fits the model on a number of random subsamples and selects the best model. Empirical examples and a simulation study indicate that KtRC's performance is comparable to SVM.

Predictive Model of Optimal Continuous Positive Airway Pressure for Obstructive Sleep Apnea Patients with Obesity by Using Machine Learning (비만 폐쇄수면무호흡 환자에서 기계학습을 통한 적정양압 예측모형)

  • Kim, Seung Soo;Yang, Kwang Ik
    • Journal of Sleep Medicine
    • /
    • v.15 no.2
    • /
    • pp.48-54
    • /
    • 2018
  • Objectives: The aim of this study was to develop a predicting model for the optimal continuous positive airway pressure (CPAP) for obstructive sleep apnea (OSA) patient with obesity by using a machine learning. Methods: We retrospectively investigated the medical records of 162 OSA patients who had obesity [body mass index (BMI) ≥ 25] and undertaken successful CPAP titration study. We divided the data to a training set (90%) and a test set (10%), randomly. We made a random forest model and a least absolute shrinkage and selection operator (lasso) regression model to predict the optimal pressure by using the training set, and then applied our models and previous reported equations to the test set. To compare the fitness of each models, we used a correlation coefficient (CC) and a mean absolute error (MAE). Results: The random forest model showed the best performance {CC 0.78 [95% confidence interval (CI) 0.43-0.93], MAE 1.20}. The lasso regression model also showed the improved result [CC 0.78 (95% CI 0.42-0.93), MAE 1.26] compared to the Hoffstein equation [CC 0.68 (95% CI 0.23-0.89), MAE 1.34] and the Choi's equation [CC 0.72 (95% CI 0.30-0.90), MAE 1.40]. Conclusions: Our random forest model and lasso model ($26.213+0.084{\times}BMI+0.004{\times}$apnea-hypopnea index+$0.004{\times}oxygen$ desaturation index-$0.215{\times}mean$ oxygen saturation) showed the improved performance compared to the previous reported equations. The further study for other subgroup or phenotype of OSA is required.

Bayesian Inference for Censored Panel Regression Model

  • Lee, Seung-Chun;Choi, Byongsu
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.2
    • /
    • pp.193-200
    • /
    • 2014
  • It was recognized by some researchers that the disturbance variance in a censored regression model is frequently underestimated by the maximum likelihood method. This underestimation has implications for the estimation of marginal effects and asymptotic standard errors. For instance, the actual coverage probability of the confidence interval based on a maximum likelihood estimate can be significantly smaller than the nominal confidence level; consequently, a Bayesian estimation is considered to overcome this difficulty. The behaviors of the maximum likelihood and Bayesian estimators of disturbance variance are examined in a fixed effects panel regression model with a limited dependent variable, which is known to have the incidental parameter problem. Behavior under random effect assumption is also investigated.

System Identification of a Diesel Engine -Throttle-Smoke Response- (디젤 기관(機關)의 계통식별(系統識別) -연료주입율(燃料注入率) 대(對) 매연반응(煤煙反應)-)

  • Cho, H.K.
    • Journal of Biosystems Engineering
    • /
    • v.16 no.2
    • /
    • pp.111-117
    • /
    • 1991
  • An empirical model for diesel engine control was obtained using a system identification method. A pseudo-random binary sequence was used as an input signal. Spectral anaylsis was used to find the frequency response of system. Model parameters of transfer functions were obtained using nonlinear regression.

  • PDF

Genetic Parameters for Milk Yield and Lactation Persistency Using Random Regression Models in Girolando Cattle

  • Canaza-Cayo, Ali William;Lopes, Paulo Savio;da Silva, Marcos Vinicius Gualberto Barbosa;de Almeida Torres, Robledo;Martins, Marta Fonseca;Arbex, Wagner Antonio;Cobuci, Jaime Araujo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.28 no.10
    • /
    • pp.1407-1418
    • /
    • 2015
  • A total of 32,817 test-day milk yield (TDMY) records of the first lactation of 4,056 Girolando cows daughters of 276 sires, collected from 118 herds between 2000 and 2011 were utilized to estimate the genetic parameters for TDMY via random regression models (RRM) using Legendre's polynomial functions whose orders varied from 3 to 5. In addition, nine measures of persistency in milk yield ($PS_i$) and the genetic trend of 305-day milk yield (305MY) were evaluated. The fit quality criteria used indicated RRM employing the Legendre's polynomial of orders 3 and 5 for fitting the genetic additive and permanent environment effects, respectively, as the best model. The heritability and genetic correlation for TDMY throughout the lactation, obtained with the best model, varied from 0.18 to 0.23 and from -0.03 to 1.00, respectively. The heritability and genetic correlation for persistency and 305MY varied from 0.10 to 0.33 and from -0.98 to 1.00, respectively. The use of $PS_7$ would be the most suitable option for the evaluation of Girolando cattle. The estimated breeding values for 305MY of sires and cows showed significant and positive genetic trends. Thus, the use of selection indices would be indicated in the genetic evaluation of Girolando cattle for both traits.

Nonparametric Estimators for Percentile Regression Functions

  • Jee, Eun-Sook
    • The Mathematical Education
    • /
    • v.30 no.1
    • /
    • pp.47-50
    • /
    • 1991
  • We consider the .regression model H = h(x) + E, where h is an unknown smooth regression function ard E is the random error with unknown distribution F. in this context we present and eamine the asymptotic behavior of some nonparametric estimators for the percentile functions ζ$\_$p/(x)+ζ$\_$p/, where 0 < p < 1 and ζ$\_$p/ = inf {x : F{x} $\geq$ p}

  • PDF

A Design and Implement of Efficient Agricultural Product Price Prediction Model

  • Im, Jung-Ju;Kim, Tae-Wan;Lim, Ji-Seoup;Kim, Jun-Ho;Yoo, Tae-Yong;Lee, Won Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.5
    • /
    • pp.29-36
    • /
    • 2022
  • In this paper, we propose an efficient agricultural products price prediction model based on dataset which provided in DACON. This model is XGBoost and CatBoost, and as an algorithm of the Gradient Boosting series, the average accuracy and execution time are superior to the existing Logistic Regression and Random Forest. Based on these advantages, we design a machine learning model that predicts prices 1 week, 2 weeks, and 4 weeks from the previous prices of agricultural products. The XGBoost model can derive the best performance by adjusting hyperparameters using the XGBoost Regressor library, which is a regression model. The implemented model is verified using the API provided by DACON, and performance evaluation is performed for each model. Because XGBoost conducts its own overfitting regulation, it derives excellent performance despite a small dataset, but it was found that the performance was lower than LGBM in terms of temporal performance such as learning time and prediction time.

Comparative study of prediction models for corporate bond rating (국내 회사채 신용 등급 예측 모형의 비교 연구)

  • Park, Hyeongkwon;Kang, Junyoung;Heo, Sungwook;Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.367-382
    • /
    • 2018
  • Prediction models for a corporate bond rating in existing studies have been developed using various models such as linear regression, ordered logit, and random forest. Financial characteristics help build prediction models that are expected to be contained in the assigning model of the bond rating agencies. However, the ranges of bond ratings in existing studies vary from 5 to 20 and the prediction models were developed with samples in which the target companies and the observation periods are different. Thus, a simple comparison of the prediction accuracies in each study cannot determine the best prediction model. In order to conduct a fair comparison, this study has collected corporate bond ratings and financial characteristics from 2013 to 2017 and applied prediction models to them. In addition, we applied the elastic-net penalty for the linear regression, the ordered logit, and the ordered probit. Our comparison shows that data-driven variable selection using the elastic-net improves prediction accuracy in each corresponding model, and that the random forest is the most appropriate model in terms of prediction accuracy, which obtains 69.6% accuracy of the exact rating prediction on average from the 5-fold cross validation.

Care Cost Prediction Model for Orphanage Organizations in Saudi Arabia

  • Alhazmi, Huda N;Alghamdi, Alshymaa;Alajlani, Fatimah;Abuayied, Samah;Aldosari, Fahd M
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.4
    • /
    • pp.84-92
    • /
    • 2021
  • Care services are a significant asset in human life. Care in its overall nature focuses on human needs and covers several aspects such as health care, homes, personal care, and education. In fact, care deals with many dimensions: physical, psychological, and social interconnections. Very little information is available on estimating the cost of care services that provided to orphans and abandoned children. Prediction of the cost of the care system delivered by governmental or non-governmental organizations to support orphans and abandoned children is increasingly needed. The purpose of this study is to analyze the care cost for orphanage organizations in Saudi Arabia to forecast the cost as well as explore the most influence factor on the cost. By using business analytic process that applied statistical and machine learning techniques, we proposed a model includes simple linear regression, Naive Bayes classifier, and Random Forest algorithms. The finding of our predictive model shows that Naive Bayes has addressed the highest accuracy equals to 87% in predicting the total care cost. Our model offers predictive approach in the perspective of business analytics.

A Study on the Development of Model for Estimating the Thickness of Clay Layer of Soft Ground in the Nakdong River Estuary (낙동강 조간대 연약지반의 지역별 점성토층 두께 추정 모델 개발에 관한 연구)

  • Seongin, Ahn;Dong-Woo, Ryu
    • Tunnel and Underground Space
    • /
    • v.32 no.6
    • /
    • pp.586-597
    • /
    • 2022
  • In this study, a model was developed for the estimating the locational thickness information of the upper clay layer to be used for the consolidation vulnerability evaluation in the Nakdong river estuary. To estimate ground layer thickness information, we developed four spatial estimation models using machine learning algorithms, which are RF (Random Forest), SVR (Support Vector Regression) and GPR (Gaussian Process Regression), and geostatistical technique such as Ordinary Kriging. Among the 4,712 borehole data in the study area collected for model development, 2,948 borehole data with an upper clay layer were used, and Pearson correlation coefficient and mean squared error were used to quantitatively evaluate the performance of the developed models. In addition, for qualitative evaluation, each model was used throughout the study area to estimate the information of the upper clay layer, and the thickness distribution characteristics of it were compared with each other.