• Title/Summary/Keyword: Random regression model

Search Result 467, Processing Time 0.032 seconds

Developing a Pedestrian Satisfaction Prediction Model Based on Machine Learning Algorithms (기계학습 알고리즘을 이용한 보행만족도 예측모형 개발)

  • Lee, Jae Seung;Lee, Hyunhee
    • Journal of Korea Planning Association
    • /
    • v.54 no.3
    • /
    • pp.106-118
    • /
    • 2019
  • In order to develop pedestrian navigation service that provides optimal pedestrian routes based on pedestrian satisfaction levels, it is required to develop a prediction model that can estimate a pedestrian's satisfaction level given a certain condition. Thus, the aim of the present study is to develop a pedestrian satisfaction prediction model based on three machine learning algorithms: Logistic Regression, Random Forest, and Artificial Neural Network models. The 2009, 2012, 2013, 2014, and 2015 Pedestrian Satisfaction Survey Data in Seoul, Korea are used to train and test the machine learning models. As a result, the Random Forest model shows the best prediction performance among the three (Accuracy: 0.798, Recall: 0.906, Precision: 0.842, F1 Score: 0.873, AUC: 0.795). The performance of Artificial Neural Network is the second (Accuracy: 0.773, Recall: 0.917, Precision: 0.811, F1 Score: 0.868, AUC: 0.738) and Logistic Regression model's performance follows the second (Accuracy: 0.764, Recall: 1.000, Precision: 0.764, F1 Score: 0.868, AUC: 0.575). The precision score of the Random Forest model implies that approximately 84.2% of pedestrians may be satisfied if they walk the areas, suggested by the Random Forest model.

The Effect of Highland Weather and Soil Information on the Prediction of Chinese Cabbage Weight (기상 및 토양정보가 고랭지배추 단수예측에 미치는 영향)

  • Kwon, Taeyong;Kim, Rae Yong;Yoon, Sanghoo
    • Journal of Environmental Science International
    • /
    • v.28 no.8
    • /
    • pp.701-707
    • /
    • 2019
  • Highland farming is agriculture that takes place 400 m above sea level and typically involves both low temperatures and long sunshine hours. Most highland Chinese cabbages are harvested in the Gangwon province. The Ubiquitous Sensor Network (USN) has been deployed to observe Chinese cabbages growth because of the lack of installed weather stations in the highlands. Five representative Chinese cabbage cultivation spots were selected for USN and meteorological data collection between 2015 and 2017. The purpose of this study is to develop a weight prediction model for Chinese cabbages using the meteorological and growth data that were collected one week prior. Both a regression and random forest model were considered for this study, with the regression assumptions being satisfied. The Root Mean Square Error (RMSE) was used to evaluate the predictive performance of the models. The variables influencing the weight of cabbage were the number of cabbage leaves, wind speed, precipitation and soil electrical conductivity in the regression model. In the random forest model, cabbage width, the number of cabbage leaves, soil temperature, precipitation, temperature, soil moisture at a depth of 30 cm, cabbage leaf width, soil electrical conductivity, humidity, and cabbage leaf length were screened. The RMSE of the random forest model was 265.478, a value that was relatively lower than that of the regression model (404.493); this is because the random forest model could explain nonlinearity.

COMPLETE CONVERGENCE FOR WEIGHTED SUMS OF AANA RANDOM VARIABLES AND ITS APPLICATION IN NONPARAMETRIC REGRESSION MODELS

  • Shen, Aiting;Zhang, Yajing
    • Journal of the Korean Mathematical Society
    • /
    • v.58 no.2
    • /
    • pp.327-349
    • /
    • 2021
  • In this paper, we main study the strong law of large numbers and complete convergence for weighted sums of asymptotically almost negatively associated (AANA, in short) random variables, by using the Marcinkiewicz-Zygmund type moment inequality and Roenthal type moment inequality for AANA random variables. As an application, the complete consistency for the weighted linear estimator of nonparametric regression models based on AANA errors is obtained. Finally, some numerical simulations are carried out to verify the validity of our theoretical result.

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest (다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측)

  • Kim, Jun Su;Choi, Byung-Jae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.249-255
    • /
    • 2022
  • The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.

Genetic analysis of milk production traits of Tunisian Holsteins using random regression test-day model with Legendre polynomials

  • Zaabza, Hafedh Ben;Gara, Abderrahmen Ben;Rekik, Boulbaba
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.5
    • /
    • pp.636-642
    • /
    • 2018
  • Objective: The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. Methods: A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. Results: All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from $0.78{\pm}0.01$ to $0.82{\pm}0.03$, between the first and second parities, from $0.73{\pm}0.03$ to $0.8{\pm}0.04$ between the first and third parities, and from $0.82{\pm}0.02$ to $0.84{\pm}0.04$ between the second and third parities. Conclusion: These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins.

Machine learning-based regression analysis for estimating Cerchar abrasivity index

  • Kwak, No-Sang;Ko, Tae Young
    • Geomechanics and Engineering
    • /
    • v.29 no.3
    • /
    • pp.219-228
    • /
    • 2022
  • The most widely used parameter to represent rock abrasiveness is the Cerchar abrasivity index (CAI). The CAI value can be applied to predict wear in TBM cutters. It has been extensively demonstrated that the CAI is affected significantly by cementation degree, strength, and amount of abrasive minerals, i.e., the quartz content or equivalent quartz content in rocks. The relationship between the properties of rocks and the CAI is investigated in this study. A database comprising 223 observations that includes rock types, uniaxial compressive strengths, Brazilian tensile strengths, equivalent quartz contents, quartz contents, brittleness indices, and CAIs is constructed. A linear model is developed by selecting independent variables while considering multicollinearity after performing multiple regression analyses. Machine learning-based regression methods including support vector regression, regression tree regression, k-nearest neighbors regression, random forest regression, and artificial neural network regression are used in addition to multiple linear regression. The results of the random forest regression model show that it yields the best prediction performance.

Estimation of Genetic Parameters of Body Weights in Hanwoo Steers(Korean Cattle), Bos Taurus Coreanae Using Random Regression Model (임의회귀모형을 이용한 한우 거세우 체중의 유전모수 추정)

  • Seo, Kang- Seok;Salces, Agapita J.;Yoon, Du- Hak;Lee, Hong- Gu;Kim, Sang- Hoon;Choi, Te- Jeong
    • Journal of Animal Science and Technology
    • /
    • v.50 no.2
    • /
    • pp.151-156
    • /
    • 2008
  • The study aimed to estimate genetic parameters of body weights in Hanwoo steers using random regression model and compare it with single trait animal model. A total of 1,372 Hanwoo steers that belonged to progeny testing program of the Hanwoo Genetic Improvement conducted at the Livestock Improvement Main Center of the National Agricultural Cooperative Federation (LIMC-NACF) in Rep. of Korea were used. Results of the random regression model fitting quadratic function revealed heritability values from 0.17 to 0.30 for the whole testing days up to 800 days. The results of the animal model showed estimated heritability values ranged from 0.24 to 0.36. Estimates of permanent environmental correlations tended to increase with increasing test in days. Unlike in the direct genetic correlation that at early stage the estimate was slightly negative it was 0.30 then increased to approach unity at later stage of test. Comparing the results between random regression model and the animal model showed not much differences and both followed similar pattern and therefore the use of random regression model for the national genetic evaluation of Hanwoo could be implemented.

Comparison of CT Exposure Dose Prediction Models Using Machine Learning-based Body Measurement Information (머신러닝 기반 신체 계측정보를 이용한 CT 피폭선량 예측모델 비교)

  • Hong, Dong-Hee
    • Journal of radiological science and technology
    • /
    • v.43 no.6
    • /
    • pp.503-509
    • /
    • 2020
  • This study aims to develop a patient-specific radiation exposure dose prediction model based on anthropometric data that can be easily measurable during CT examination, and to be used as basic data for DRL setting and radiation dose management system in the future. In addition, among the machine learning algorithms, the most suitable model for predicting exposure doses is presented. The data used in this study were chest CT scan data, and a data set was constructed based on the data including the patient's anthropometric data. In the pre-processing and sample selection of the data, out of the total number of samples of 250 samples, only chest CT scans were performed without using a contrast agent, and 110 samples including height and weight variables were extracted. Of the 110 samples extracted, 66% was used as a training set, and the remaining 44% were used as a test set for verification. The exposure dose was predicted through random forest, linear regression analysis, and SVM algorithm using Orange version 3.26.0, an open software as a machine learning algorithm. Results Algorithm model prediction accuracy was R^2 0.840 for random forest, R^2 0.969 for linear regression analysis, and R^2 0.189 for SVM. As a result of verifying the prediction rate of the algorithm model, the random forest is the highest with R^2 0.986 of the random forest, R^2 0.973 of the linear regression analysis, and R^2 of 0.204 of the SVM, indicating that the model has the best predictive power.

A random forest-regression-based inverse-modeling evolutionary algorithm using uniform reference points

  • Gholamnezhad, Pezhman;Broumandnia, Ali;Seydi, Vahid
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.805-815
    • /
    • 2022
  • The model-based evolutionary algorithms are divided into three groups: estimation of distribution algorithms, inverse modeling, and surrogate modeling. Existing inverse modeling is mainly applied to solve multi-objective optimization problems and is not suitable for many-objective optimization problems. Some inversed-model techniques, such as the inversed-model of multi-objective evolutionary algorithm, constructed from the Pareto front (PF) to the Pareto solution on nondominated solutions using a random grouping method and Gaussian process, were introduced. However, some of the most efficient inverse models might be eliminated during this procedure. Also, there are challenges, such as the presence of many local PFs and developing poor solutions when the population has no evident regularity. This paper proposes inverse modeling using random forest regression and uniform reference points that map all nondominated solutions from the objective space to the decision space to solve many-objective optimization problems. The proposed algorithm is evaluated using the benchmark test suite for evolutionary algorithms. The results show an improvement in diversity and convergence performance (quality indicators).

Maximum likelihood estimation of Logistic random effects model (로지스틱 임의선형 혼합모형의 최대우도 추정법)

  • Kim, Minah;Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.957-981
    • /
    • 2017
  • A generalized linear mixed model is an extension of a generalized linear model that allows random effect as well as provides flexibility in developing a suitable model when observations are correlated or when there are other underlying phenomena that contribute to resulting variability. We describe maximum likelihood estimation methods for logistic regression models that include random effects - the Laplace approximation, Gauss-Hermite quadrature, adaptive Gauss-Hermite quadrature, and pseudo-likelihood. Applications are provided with social science problems by analyzing the effect of mental health and life satisfaction on volunteer activities from Korean welfare panel data; in addition, we observe that the inclusion of random effects in the model leads to improved analyses with more reasonable inferences.