• Title/Summary/Keyword: Regression Testing

Search Result 690, Processing Time 0.03 seconds

A Study on Diabetes Management System Based on Logistic Regression and Random Forest

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.2
    • /
    • pp.61-68
    • /
    • 2024
  • In the quest for advancing diabetes diagnosis, this study introduces a novel two-step machine learning approach that synergizes the probabilistic predictions of Logistic Regression with the classification prowess of Random Forest. Diabetes, a pervasive chronic disease impacting millions globally, necessitates precise and early detection to mitigate long-term complications. Traditional diagnostic methods, while effective, often entail invasive testing and may not fully leverage the patterns hidden in patient data. Addressing this gap, our research harnesses the predictive capability of Logistic Regression to estimate the likelihood of diabetes presence, followed by employing Random Forest to classify individuals into diabetic, pre-diabetic or nondiabetic categories based on the computed probabilities. This methodology not only capitalizes on the strengths of both algorithms-Logistic Regression's proficiency in estimating nuanced probabilities and Random Forest's robustness in classification-but also introduces a refined mechanism to enhance diagnostic accuracy. Through the application of this model to a comprehensive diabetes dataset, we demonstrate a marked improvement in diagnostic precision, as evidenced by superior performance metrics when compared to other machine learning approaches. Our findings underscore the potential of integrating diverse machine learning models to improve clinical decision-making processes, offering a promising avenue for the early and accurate diagnosis of diabetes and potentially other complex diseases.

Predicting strength development of RMSM using ultrasonic pulse velocity and artificial neural network

  • Sheen, Nain Y.;Huang, Jeng L.;Le, Hien D.
    • Computers and Concrete
    • /
    • v.12 no.6
    • /
    • pp.785-802
    • /
    • 2013
  • Ready-mixed soil material, known as a kind of controlled low-strength material, is a new way of soil cement combination. It can be used as backfill materials. In this paper, artificial neural network and nonlinear regression approach were applied to predict the compressive strength of ready-mixed soil material containing Portland cement, slag, sand, and soil in mixture. The data used for analyzing were obtained from our testing program. In the experiment, we carried out a mix design with three proportions of sand to soil (e.g., 6:4, 5:5, and 4:6). In addition, blast furnace slag partially replaced cement to improve workability, whereas the water-to-binder ratio was fixed. Testing was conducted on samples to estimate its engineering properties as per ASTM such as flowability, strength, and pulse velocity. Based on testing data, the empirical pulse velocity-strength correlation was established by regression method. Next, three topologies of neural network were developed to predict the strength, namely ANN-I, ANN-II, and ANN-III. The first two models are back-propagation feed-forward networks, and the other one is radial basis neural network. The results show that the compressive strength of ready-mixed soil material can be well-predicted from neural networks. Among all currently proposed neural network models, the ANN-I gives the best prediction because it is closest to the actual strength. Moreover, considering combination of pulse velocity and other factors, viz. curing time, and material contents in mixture, the proposed neural networks offer better evaluation than interpolated from pulse velocity only.

A Longitudinal Study on the Causal Association Between Smoking and Depression

  • Kang, Eun-Jeong;Lee, Jae-Hee
    • Journal of Preventive Medicine and Public Health
    • /
    • v.43 no.3
    • /
    • pp.193-204
    • /
    • 2010
  • Objectives: The objective of this study was to analyze the causal relationship between smoking and depression using longitudinal data. Methods: Two waves of the Korea Welfare Panel collected in 2006 and 2007 were used. The sample consisted of 14 426 in 2006 and 13 052 in 2007 who were aged 20 and older. Smoking was measured by smoking amount (none/$\geq$ two packs). Depression was defined when the summated CESD (center for epidemiological studies depression)-11 score was greater than or equal to 16. The causal relationship between smoking and depression was tested using logistic regression. In order to test the causal effect of smoking on depression, depression at year 2 was regressed on smoking status at year 1 only using the sample without depression at year 1. Likewise, smoking status at year 2 was regressed on depression at year 1 only using those who were not smoking at year 1 in order to test the causal effect of depression on smoking. The statistical package used was Stata 10.0. Sampling weights were applied to obtain the population estimation. Results: The logistic regression testing for the causal relationship between smoking and depression showed that smoking at year 1 was significantly related to depression at year 2. Smoking amounts associated with depression were different among age groups. On the other hand, the results from the logistic regression testing for the opposite direction of the relationship between smoking and depression found no significant association regardless of age group. Conclusions: The study results showed some evidence that smoking caused depression but not the other way around.

Assessment of compressive strength of high-performance concrete using soft computing approaches

  • Chukwuemeka Daniel;Jitendra Khatti;Kamaldeep Singh Grover
    • Computers and Concrete
    • /
    • v.33 no.1
    • /
    • pp.55-75
    • /
    • 2024
  • The present study introduces an optimum performance soft computing model for predicting the compressive strength of high-performance concrete (HPC) by comparing models based on conventional (kernel-based, covariance function-based, and tree-based), advanced machine (least square support vector machine-LSSVM and minimax probability machine regressor-MPMR), and deep (artificial neural network-ANN) learning approaches using a common database for the first time. A compressive strength database, having results of 1030 concrete samples, has been compiled from the literature and preprocessed. For the purpose of training, testing, and validation of soft computing models, 803, 101, and 101 data points have been selected arbitrarily from preprocessed data points, i.e., 1005. Thirteen performance metrics, including three new metrics, i.e., a20-index, index of agreement, and index of scatter, have been implemented for each model. The performance comparison reveals that the SVM (kernel-based), ET (tree-based), MPMR (advanced), and ANN (deep) models have achieved higher performance in predicting the compressive strength of HPC. From the overall analysis of performance, accuracy, Taylor plot, accuracy metric, regression error characteristics curve, Anderson-Darling, Wilcoxon, Uncertainty, and reliability, it has been observed that model CS4 based on the ensemble tree has been recognized as an optimum performance model with higher performance, i.e., a correlation coefficient of 0.9352, root mean square error of 5.76 MPa, and mean absolute error of 4.1069 MPa. The present study also reveals that multicollinearity affects the prediction accuracy of Gaussian process regression, decision tree, multilinear regression, and adaptive boosting regressor models, novel research in compressive strength prediction of HPC. The cosine sensitivity analysis reveals that the prediction of compressive strength of HPC is highly affected by cement content, fine aggregate, coarse aggregate, and water content.

Estimation of Genetic Parameters of Body Weights in Hanwoo Steers(Korean Cattle), Bos Taurus Coreanae Using Random Regression Model (임의회귀모형을 이용한 한우 거세우 체중의 유전모수 추정)

  • Seo, Kang- Seok;Salces, Agapita J.;Yoon, Du- Hak;Lee, Hong- Gu;Kim, Sang- Hoon;Choi, Te- Jeong
    • Journal of Animal Science and Technology
    • /
    • v.50 no.2
    • /
    • pp.151-156
    • /
    • 2008
  • The study aimed to estimate genetic parameters of body weights in Hanwoo steers using random regression model and compare it with single trait animal model. A total of 1,372 Hanwoo steers that belonged to progeny testing program of the Hanwoo Genetic Improvement conducted at the Livestock Improvement Main Center of the National Agricultural Cooperative Federation (LIMC-NACF) in Rep. of Korea were used. Results of the random regression model fitting quadratic function revealed heritability values from 0.17 to 0.30 for the whole testing days up to 800 days. The results of the animal model showed estimated heritability values ranged from 0.24 to 0.36. Estimates of permanent environmental correlations tended to increase with increasing test in days. Unlike in the direct genetic correlation that at early stage the estimate was slightly negative it was 0.30 then increased to approach unity at later stage of test. Comparing the results between random regression model and the animal model showed not much differences and both followed similar pattern and therefore the use of random regression model for the national genetic evaluation of Hanwoo could be implemented.

Estimation of S&T Knowledge Production Function Using Principal Component Regression Model (주성분 회귀모형을 이용한 과학기술 지식생산함수 추정)

  • Park, Su-Dong;Sung, Oong-Hyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.13 no.2
    • /
    • pp.231-251
    • /
    • 2010
  • The numbers of SCI paper or patent in science and technology are expected to be related with the number of researcher and knowledge stock (R&D stock, paper stock, patent stock). The results of the regression model showed that severe multicollinearity existed and errors were made in the estimation and testing of regression coefficients. To solve the problem of multicollinearity and estimate the effect of the independent variable properly, principal component regression model were applied for three cases with S&T knowledge production. The estimated principal component regression function was transformed into original independent variables to interpret properly its effect. The analysis indicated that the principal component regression model was useful to estimate the effect of the highly correlate production factors and showed that the number of researcher, R&D stock, paper or patent stock had all positive effect on the production of paper or patent.

  • PDF

Characteristics of Median Frequency According to the Load During Fatiguing Isometric Exercise (등척성 운동시 운동강도에 따른 중앙주파수의 특성)

  • Lee, Su-Young;Shin, Hwa-Kyung;Cho, Sang-Hyun
    • Physical Therapy Korea
    • /
    • v.10 no.3
    • /
    • pp.141-149
    • /
    • 2003
  • Median frequency can be regarded as a valid indicator of local muscle fatigue. As local muscle fatigue develops, the muscle fiber conduction velocity decreases, the fast twitch fibers are recruited less, and consequently the median frequency shifts toward the lower frequency area. The aim of this study was to test the characteristics of the median frequency according to exercise load (30% and 60% of MVC on the biceps brachii, 40% and 80% of MVC on the vastus lateralis) during the fatiguing isometric exercise. Thirteen healthy male volunteer students of Yonsei University were recruited. After the testing maximal voluntary isometric contraction, three variables (initial median frequency, regression slope, fatigue index) from the regression line of MDF data were measured in each exercise load. The results showed that the regression slope and fatigue index were significantly different for the biceps brachii, but not for the vastus lateralis initial MDF was not significant difference according to the exercise load on both muscles. The regression slope and fatigue index could monitor physiologic muscle change during fatiguing isometric exercise. The results showed that two MDF variables reflect the local muscle fatigue according to the exercise load.

  • PDF

Test of Model Specification in Panel Regression Model with Two Error Components (이원오차성분을 갖는 패널회귀모형의 모형식별검정)

  • Song, Seuck-Heun;Kim, Young-Ji;Hwang, Sun-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.461-479
    • /
    • 2006
  • This paper derives joint and conditional Lagrange multiplier tests based on Double-Length Artificial Regression(DLR) for testing functional form and/or the presence of individual(time) effect in a panel regression model. Small sample properties of these tests are assessed by Monte Carlo study, and comparisons are made with LM tests based on Outer Product Gradient(OPG). The results show that the proposed DLR based LM tests have the most appropriate finite sample performance.

Evolutionary Nonlinear Regression Based Compensation Technique for Short-range Prediction of Wind Speed using Automatic Weather Station (AWS 지점별 기상데이타를 이용한 진화적 회귀분석 기반의 단기 풍속 예보 보정 기법)

  • Hyeon, Byeongyong;Lee, Yonghee;Seo, Kisung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.64 no.1
    • /
    • pp.107-112
    • /
    • 2015
  • This paper introduces an evolutionary nonlinear regression based compensation technique for the short-range prediction of wind speed using AWS(Automatic Weather Station) data. Development of an efficient MOS(Model Output Statistics) is necessary to correct systematic errors of the model, but a linear regression based MOS is hard to manage an irregular nature of weather prediction. In order to solve the problem, a nonlinear and symbolic regression method using GP(Genetic Programming) is suggested for a development of MOS wind forecast guidance. Also FCM(Fuzzy C-Means) clustering is adopted to mitigate bias of wind speed data. The purpose of this study is to evaluate the accuracy of the estimation by a GP based nonlinear MOS for 3 days prediction of wind speed in South Korean regions. This method is then compared to the UM model and has shown superior results. Data for 2007-2009, 2011 is used for training, and 2012 is used for testing.

Goodness-of-fit tests for a proportional odds model

  • Lee, Hyun Yung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1465-1475
    • /
    • 2013
  • The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.