• Title/Summary/Keyword: Regression testing

Search Result 707, Processing Time 0.027 seconds

A Logistic Regression Analysis of Two-Way Binary Attribute Data (이원 이항 계수치 자료의 로지스틱 회귀 분석)

  • Ahn, Hae-Il
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.35 no.3
    • /
    • pp.118-128
    • /
    • 2012
  • An attempt is given to the problem of analyzing the two-way binary attribute data using the logistic regression model in order to find a sound statistical methodology. It is demonstrated that the analysis of variance (ANOVA) may not be good enough, especially for the case that the proportion is very low or high. The logistic transformation of proportion data could be a help, but not sound in the statistical sense. Meanwhile, the adoption of generalized least squares (GLS) method entails much to estimate the variance-covariance matrix. On the other hand, the logistic regression methodology provides sound statistical means in estimating related confidence intervals and testing the significance of model parameters. Based on simulated data, the efficiencies of estimates are ensured with a view to demonstrate the usefulness of the methodology.

Application of artificial neural network model in regional frequency analysis: Comparison between quantile regression and parameter regression techniques.

  • Lee, Joohyung;Kim, Hanbeen;Kim, Taereem;Heo, Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.170-170
    • /
    • 2020
  • Due to the development of technologies, complex computation of huge data set is possible with a prevalent personal computer. Therefore, machine learning methods have been widely applied in the hydrologic field such as regression-based regional frequency analysis (RFA). The main purpose of this study is to compare two frameworks of RFA based on the artificial neural network (ANN) models: quantile regression technique (QRT-ANN) and parameter regression technique (PRT-ANN). As an output layer of the ANN model, the QRT-ANN predicts quantiles for various return periods whereas the PRT-ANN provides prediction of three parameters for the generalized extreme value distribution. Rainfall gauging sites where record length is more than 20 years were selected and their annual maximum rainfalls and various hydro-meteorological variables were used as an input layer of the ANN model. While employing the ANN model, 70% and 30% of gauging sites were used as training set and testing set, respectively. For each technique, ANN model structure such as number of hidden layers and nodes was determined by a leave-one-out validation with calculating root mean square error (RMSE). To assess the performances of two frameworks, RMSEs of quantile predicted by the QRT-ANN are compared to those of the PRT-ANN.

  • PDF

A Study on Diabetes Management System Based on Logistic Regression and Random Forest

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.2
    • /
    • pp.61-68
    • /
    • 2024
  • In the quest for advancing diabetes diagnosis, this study introduces a novel two-step machine learning approach that synergizes the probabilistic predictions of Logistic Regression with the classification prowess of Random Forest. Diabetes, a pervasive chronic disease impacting millions globally, necessitates precise and early detection to mitigate long-term complications. Traditional diagnostic methods, while effective, often entail invasive testing and may not fully leverage the patterns hidden in patient data. Addressing this gap, our research harnesses the predictive capability of Logistic Regression to estimate the likelihood of diabetes presence, followed by employing Random Forest to classify individuals into diabetic, pre-diabetic or nondiabetic categories based on the computed probabilities. This methodology not only capitalizes on the strengths of both algorithms-Logistic Regression's proficiency in estimating nuanced probabilities and Random Forest's robustness in classification-but also introduces a refined mechanism to enhance diagnostic accuracy. Through the application of this model to a comprehensive diabetes dataset, we demonstrate a marked improvement in diagnostic precision, as evidenced by superior performance metrics when compared to other machine learning approaches. Our findings underscore the potential of integrating diverse machine learning models to improve clinical decision-making processes, offering a promising avenue for the early and accurate diagnosis of diabetes and potentially other complex diseases.

Predicting strength development of RMSM using ultrasonic pulse velocity and artificial neural network

  • Sheen, Nain Y.;Huang, Jeng L.;Le, Hien D.
    • Computers and Concrete
    • /
    • v.12 no.6
    • /
    • pp.785-802
    • /
    • 2013
  • Ready-mixed soil material, known as a kind of controlled low-strength material, is a new way of soil cement combination. It can be used as backfill materials. In this paper, artificial neural network and nonlinear regression approach were applied to predict the compressive strength of ready-mixed soil material containing Portland cement, slag, sand, and soil in mixture. The data used for analyzing were obtained from our testing program. In the experiment, we carried out a mix design with three proportions of sand to soil (e.g., 6:4, 5:5, and 4:6). In addition, blast furnace slag partially replaced cement to improve workability, whereas the water-to-binder ratio was fixed. Testing was conducted on samples to estimate its engineering properties as per ASTM such as flowability, strength, and pulse velocity. Based on testing data, the empirical pulse velocity-strength correlation was established by regression method. Next, three topologies of neural network were developed to predict the strength, namely ANN-I, ANN-II, and ANN-III. The first two models are back-propagation feed-forward networks, and the other one is radial basis neural network. The results show that the compressive strength of ready-mixed soil material can be well-predicted from neural networks. Among all currently proposed neural network models, the ANN-I gives the best prediction because it is closest to the actual strength. Moreover, considering combination of pulse velocity and other factors, viz. curing time, and material contents in mixture, the proposed neural networks offer better evaluation than interpolated from pulse velocity only.

A Longitudinal Study on the Causal Association Between Smoking and Depression

  • Kang, Eun-Jeong;Lee, Jae-Hee
    • Journal of Preventive Medicine and Public Health
    • /
    • v.43 no.3
    • /
    • pp.193-204
    • /
    • 2010
  • Objectives: The objective of this study was to analyze the causal relationship between smoking and depression using longitudinal data. Methods: Two waves of the Korea Welfare Panel collected in 2006 and 2007 were used. The sample consisted of 14 426 in 2006 and 13 052 in 2007 who were aged 20 and older. Smoking was measured by smoking amount (none/$\geq$ two packs). Depression was defined when the summated CESD (center for epidemiological studies depression)-11 score was greater than or equal to 16. The causal relationship between smoking and depression was tested using logistic regression. In order to test the causal effect of smoking on depression, depression at year 2 was regressed on smoking status at year 1 only using the sample without depression at year 1. Likewise, smoking status at year 2 was regressed on depression at year 1 only using those who were not smoking at year 1 in order to test the causal effect of depression on smoking. The statistical package used was Stata 10.0. Sampling weights were applied to obtain the population estimation. Results: The logistic regression testing for the causal relationship between smoking and depression showed that smoking at year 1 was significantly related to depression at year 2. Smoking amounts associated with depression were different among age groups. On the other hand, the results from the logistic regression testing for the opposite direction of the relationship between smoking and depression found no significant association regardless of age group. Conclusions: The study results showed some evidence that smoking caused depression but not the other way around.

Assessment of compressive strength of high-performance concrete using soft computing approaches

  • Chukwuemeka Daniel;Jitendra Khatti;Kamaldeep Singh Grover
    • Computers and Concrete
    • /
    • v.33 no.1
    • /
    • pp.55-75
    • /
    • 2024
  • The present study introduces an optimum performance soft computing model for predicting the compressive strength of high-performance concrete (HPC) by comparing models based on conventional (kernel-based, covariance function-based, and tree-based), advanced machine (least square support vector machine-LSSVM and minimax probability machine regressor-MPMR), and deep (artificial neural network-ANN) learning approaches using a common database for the first time. A compressive strength database, having results of 1030 concrete samples, has been compiled from the literature and preprocessed. For the purpose of training, testing, and validation of soft computing models, 803, 101, and 101 data points have been selected arbitrarily from preprocessed data points, i.e., 1005. Thirteen performance metrics, including three new metrics, i.e., a20-index, index of agreement, and index of scatter, have been implemented for each model. The performance comparison reveals that the SVM (kernel-based), ET (tree-based), MPMR (advanced), and ANN (deep) models have achieved higher performance in predicting the compressive strength of HPC. From the overall analysis of performance, accuracy, Taylor plot, accuracy metric, regression error characteristics curve, Anderson-Darling, Wilcoxon, Uncertainty, and reliability, it has been observed that model CS4 based on the ensemble tree has been recognized as an optimum performance model with higher performance, i.e., a correlation coefficient of 0.9352, root mean square error of 5.76 MPa, and mean absolute error of 4.1069 MPa. The present study also reveals that multicollinearity affects the prediction accuracy of Gaussian process regression, decision tree, multilinear regression, and adaptive boosting regressor models, novel research in compressive strength prediction of HPC. The cosine sensitivity analysis reveals that the prediction of compressive strength of HPC is highly affected by cement content, fine aggregate, coarse aggregate, and water content.

Estimation of Genetic Parameters of Body Weights in Hanwoo Steers(Korean Cattle), Bos Taurus Coreanae Using Random Regression Model (임의회귀모형을 이용한 한우 거세우 체중의 유전모수 추정)

  • Seo, Kang- Seok;Salces, Agapita J.;Yoon, Du- Hak;Lee, Hong- Gu;Kim, Sang- Hoon;Choi, Te- Jeong
    • Journal of Animal Science and Technology
    • /
    • v.50 no.2
    • /
    • pp.151-156
    • /
    • 2008
  • The study aimed to estimate genetic parameters of body weights in Hanwoo steers using random regression model and compare it with single trait animal model. A total of 1,372 Hanwoo steers that belonged to progeny testing program of the Hanwoo Genetic Improvement conducted at the Livestock Improvement Main Center of the National Agricultural Cooperative Federation (LIMC-NACF) in Rep. of Korea were used. Results of the random regression model fitting quadratic function revealed heritability values from 0.17 to 0.30 for the whole testing days up to 800 days. The results of the animal model showed estimated heritability values ranged from 0.24 to 0.36. Estimates of permanent environmental correlations tended to increase with increasing test in days. Unlike in the direct genetic correlation that at early stage the estimate was slightly negative it was 0.30 then increased to approach unity at later stage of test. Comparing the results between random regression model and the animal model showed not much differences and both followed similar pattern and therefore the use of random regression model for the national genetic evaluation of Hanwoo could be implemented.

Estimation of S&T Knowledge Production Function Using Principal Component Regression Model (주성분 회귀모형을 이용한 과학기술 지식생산함수 추정)

  • Park, Su-Dong;Sung, Oong-Hyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.13 no.2
    • /
    • pp.231-251
    • /
    • 2010
  • The numbers of SCI paper or patent in science and technology are expected to be related with the number of researcher and knowledge stock (R&D stock, paper stock, patent stock). The results of the regression model showed that severe multicollinearity existed and errors were made in the estimation and testing of regression coefficients. To solve the problem of multicollinearity and estimate the effect of the independent variable properly, principal component regression model were applied for three cases with S&T knowledge production. The estimated principal component regression function was transformed into original independent variables to interpret properly its effect. The analysis indicated that the principal component regression model was useful to estimate the effect of the highly correlate production factors and showed that the number of researcher, R&D stock, paper or patent stock had all positive effect on the production of paper or patent.

  • PDF

Characteristics of Median Frequency According to the Load During Fatiguing Isometric Exercise (등척성 운동시 운동강도에 따른 중앙주파수의 특성)

  • Lee, Su-Young;Shin, Hwa-Kyung;Cho, Sang-Hyun
    • Physical Therapy Korea
    • /
    • v.10 no.3
    • /
    • pp.141-149
    • /
    • 2003
  • Median frequency can be regarded as a valid indicator of local muscle fatigue. As local muscle fatigue develops, the muscle fiber conduction velocity decreases, the fast twitch fibers are recruited less, and consequently the median frequency shifts toward the lower frequency area. The aim of this study was to test the characteristics of the median frequency according to exercise load (30% and 60% of MVC on the biceps brachii, 40% and 80% of MVC on the vastus lateralis) during the fatiguing isometric exercise. Thirteen healthy male volunteer students of Yonsei University were recruited. After the testing maximal voluntary isometric contraction, three variables (initial median frequency, regression slope, fatigue index) from the regression line of MDF data were measured in each exercise load. The results showed that the regression slope and fatigue index were significantly different for the biceps brachii, but not for the vastus lateralis initial MDF was not significant difference according to the exercise load on both muscles. The regression slope and fatigue index could monitor physiologic muscle change during fatiguing isometric exercise. The results showed that two MDF variables reflect the local muscle fatigue according to the exercise load.

  • PDF

Test of Model Specification in Panel Regression Model with Two Error Components (이원오차성분을 갖는 패널회귀모형의 모형식별검정)

  • Song, Seuck-Heun;Kim, Young-Ji;Hwang, Sun-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.461-479
    • /
    • 2006
  • This paper derives joint and conditional Lagrange multiplier tests based on Double-Length Artificial Regression(DLR) for testing functional form and/or the presence of individual(time) effect in a panel regression model. Small sample properties of these tests are assessed by Monte Carlo study, and comparisons are made with LM tests based on Outer Product Gradient(OPG). The results show that the proposed DLR based LM tests have the most appropriate finite sample performance.