• 제목/요약/키워드: Regression Testing

검색결과 690건 처리시간 0.027초

A Study on Diabetes Management System Based on Logistic Regression and Random Forest

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • 제13권2호
    • /
    • pp.61-68
    • /
    • 2024
  • In the quest for advancing diabetes diagnosis, this study introduces a novel two-step machine learning approach that synergizes the probabilistic predictions of Logistic Regression with the classification prowess of Random Forest. Diabetes, a pervasive chronic disease impacting millions globally, necessitates precise and early detection to mitigate long-term complications. Traditional diagnostic methods, while effective, often entail invasive testing and may not fully leverage the patterns hidden in patient data. Addressing this gap, our research harnesses the predictive capability of Logistic Regression to estimate the likelihood of diabetes presence, followed by employing Random Forest to classify individuals into diabetic, pre-diabetic or nondiabetic categories based on the computed probabilities. This methodology not only capitalizes on the strengths of both algorithms-Logistic Regression's proficiency in estimating nuanced probabilities and Random Forest's robustness in classification-but also introduces a refined mechanism to enhance diagnostic accuracy. Through the application of this model to a comprehensive diabetes dataset, we demonstrate a marked improvement in diagnostic precision, as evidenced by superior performance metrics when compared to other machine learning approaches. Our findings underscore the potential of integrating diverse machine learning models to improve clinical decision-making processes, offering a promising avenue for the early and accurate diagnosis of diabetes and potentially other complex diseases.

Predicting strength development of RMSM using ultrasonic pulse velocity and artificial neural network

  • Sheen, Nain Y.;Huang, Jeng L.;Le, Hien D.
    • Computers and Concrete
    • /
    • 제12권6호
    • /
    • pp.785-802
    • /
    • 2013
  • Ready-mixed soil material, known as a kind of controlled low-strength material, is a new way of soil cement combination. It can be used as backfill materials. In this paper, artificial neural network and nonlinear regression approach were applied to predict the compressive strength of ready-mixed soil material containing Portland cement, slag, sand, and soil in mixture. The data used for analyzing were obtained from our testing program. In the experiment, we carried out a mix design with three proportions of sand to soil (e.g., 6:4, 5:5, and 4:6). In addition, blast furnace slag partially replaced cement to improve workability, whereas the water-to-binder ratio was fixed. Testing was conducted on samples to estimate its engineering properties as per ASTM such as flowability, strength, and pulse velocity. Based on testing data, the empirical pulse velocity-strength correlation was established by regression method. Next, three topologies of neural network were developed to predict the strength, namely ANN-I, ANN-II, and ANN-III. The first two models are back-propagation feed-forward networks, and the other one is radial basis neural network. The results show that the compressive strength of ready-mixed soil material can be well-predicted from neural networks. Among all currently proposed neural network models, the ANN-I gives the best prediction because it is closest to the actual strength. Moreover, considering combination of pulse velocity and other factors, viz. curing time, and material contents in mixture, the proposed neural networks offer better evaluation than interpolated from pulse velocity only.

A Longitudinal Study on the Causal Association Between Smoking and Depression

  • Kang, Eun-Jeong;Lee, Jae-Hee
    • Journal of Preventive Medicine and Public Health
    • /
    • 제43권3호
    • /
    • pp.193-204
    • /
    • 2010
  • Objectives: The objective of this study was to analyze the causal relationship between smoking and depression using longitudinal data. Methods: Two waves of the Korea Welfare Panel collected in 2006 and 2007 were used. The sample consisted of 14 426 in 2006 and 13 052 in 2007 who were aged 20 and older. Smoking was measured by smoking amount (none/$\geq$ two packs). Depression was defined when the summated CESD (center for epidemiological studies depression)-11 score was greater than or equal to 16. The causal relationship between smoking and depression was tested using logistic regression. In order to test the causal effect of smoking on depression, depression at year 2 was regressed on smoking status at year 1 only using the sample without depression at year 1. Likewise, smoking status at year 2 was regressed on depression at year 1 only using those who were not smoking at year 1 in order to test the causal effect of depression on smoking. The statistical package used was Stata 10.0. Sampling weights were applied to obtain the population estimation. Results: The logistic regression testing for the causal relationship between smoking and depression showed that smoking at year 1 was significantly related to depression at year 2. Smoking amounts associated with depression were different among age groups. On the other hand, the results from the logistic regression testing for the opposite direction of the relationship between smoking and depression found no significant association regardless of age group. Conclusions: The study results showed some evidence that smoking caused depression but not the other way around.

Assessment of compressive strength of high-performance concrete using soft computing approaches

  • Chukwuemeka Daniel;Jitendra Khatti;Kamaldeep Singh Grover
    • Computers and Concrete
    • /
    • 제33권1호
    • /
    • pp.55-75
    • /
    • 2024
  • The present study introduces an optimum performance soft computing model for predicting the compressive strength of high-performance concrete (HPC) by comparing models based on conventional (kernel-based, covariance function-based, and tree-based), advanced machine (least square support vector machine-LSSVM and minimax probability machine regressor-MPMR), and deep (artificial neural network-ANN) learning approaches using a common database for the first time. A compressive strength database, having results of 1030 concrete samples, has been compiled from the literature and preprocessed. For the purpose of training, testing, and validation of soft computing models, 803, 101, and 101 data points have been selected arbitrarily from preprocessed data points, i.e., 1005. Thirteen performance metrics, including three new metrics, i.e., a20-index, index of agreement, and index of scatter, have been implemented for each model. The performance comparison reveals that the SVM (kernel-based), ET (tree-based), MPMR (advanced), and ANN (deep) models have achieved higher performance in predicting the compressive strength of HPC. From the overall analysis of performance, accuracy, Taylor plot, accuracy metric, regression error characteristics curve, Anderson-Darling, Wilcoxon, Uncertainty, and reliability, it has been observed that model CS4 based on the ensemble tree has been recognized as an optimum performance model with higher performance, i.e., a correlation coefficient of 0.9352, root mean square error of 5.76 MPa, and mean absolute error of 4.1069 MPa. The present study also reveals that multicollinearity affects the prediction accuracy of Gaussian process regression, decision tree, multilinear regression, and adaptive boosting regressor models, novel research in compressive strength prediction of HPC. The cosine sensitivity analysis reveals that the prediction of compressive strength of HPC is highly affected by cement content, fine aggregate, coarse aggregate, and water content.

임의회귀모형을 이용한 한우 거세우 체중의 유전모수 추정 (Estimation of Genetic Parameters of Body Weights in Hanwoo Steers(Korean Cattle), Bos Taurus Coreanae Using Random Regression Model)

  • 서강석;;윤두학;이홍구;김상훈;최태정
    • Journal of Animal Science and Technology
    • /
    • 제50권2호
    • /
    • pp.151-156
    • /
    • 2008
  • 본 연구는 임의회귀모형을 이용하여 한우 거세우 체중에 대해서 유전모수 추정을 하고 이것을 단형질 개체모형의 결과와 비교해 보고자 실시하였다. 분석에 이용한 자료는 총 1,372두의 한우 거세우의 체중 자료로, 농협중앙회 가축개량사업소에서 실시한 한우 후대검정우의 기록이다. 이차의 임의회귀 모형에 적용한 결과 유전력이 총 800일령까지의 검정기간에 대해 0.17~0.30의 범위로 나타났다. 개체모형을 통해 얻은 유전력은 0.24~0.36의 범위로 나타났다. 측정일간의 영구환경효과의 상관은 검정일령이 늘어남에 따라 함께 증가하는 경향을 보였다. 반면, 측정일간의 유전상관의 경우 검정초기에는 0.30 정도의 약한 음의 상관을 보이지만, 검정이 이뤄짐에 따라 상관이 점차 증가하여 검정이 종료될 무렵이면 거의 고정되는 것으로 나타났다. 임의회귀모형과 개체모형의 결과를 비교해보면 두가지 모형 모두 비슷한 경향을 보여 큰 차이를 보이지 않았다. 따라서, 임의회귀모형을 한우에 대한 국가유전능력평가에 사용하는 것이 가능할 것으로 사료된다.

주성분 회귀모형을 이용한 과학기술 지식생산함수 추정 (Estimation of S&T Knowledge Production Function Using Principal Component Regression Model)

  • 박수동;성웅현
    • 기술혁신학회지
    • /
    • 제13권2호
    • /
    • pp.231-251
    • /
    • 2010
  • 과학기술 R&D 활동의 대표적 성과인 SCI 논문과 특허의 생산에 영향을 미치는 요인은 연구비, 연구원수, 지식스톡(R&D스톡, 논문스톡, 특허스톡 등), 연구환경, 개방화 정도, 인적자본, GDP 등 다양하다. 일반적인 회귀모형을 이용하여 논문 또는 특허의 생산에 영향을 미치는 요인을 추정하면 생산요인들 간에 다중공선성 문제가 발생하여 추정의 오류가 발생한다. 본 논문에서는 과학기술 지식생산에 영향을 미치는 요인들 간의 다중공선성 문제를 해결하기 위해 주성분 회귀모형을 이용하였다. SCI 논문을 산출로 가정한 과학생산성과와 특허를 산출로 가정한 기술생산성과에 영향을 미치는 요인을 회귀모형과 주성분 회귀모형을 이용하여 3가지 사례를 대상으로 비교 분석하였다. 일반 회귀모형을 이용하여 SCI 논문과 특허의 생산에 영향을 미치는 요인들을 분석한 결과, 요인들간에 다중공선성이 매우 높게 나타났고, 그 결과 회귀계수와 추정과 검정에 오류가 발생되었다. 반면 주성분 회귀모형을 이용하여 분석한 결과 다중공선성문제가 해결되어, 개별 생산요인에 대한 효과를 적절하게 추정할 수 있었다. 본 논문에서 제안한 주성분 회귀모형을 이용한 과학기술 지식생산함수 추정방법은 다중공선성이 강한 소수의 생산요소를 포함한 회귀분석에서 유용하게 적용될 수 있을 것이다.

  • PDF

등척성 운동시 운동강도에 따른 중앙주파수의 특성 (Characteristics of Median Frequency According to the Load During Fatiguing Isometric Exercise)

  • 이수영;신화경;조상현
    • 한국전문물리치료학회지
    • /
    • 제10권3호
    • /
    • pp.141-149
    • /
    • 2003
  • Median frequency can be regarded as a valid indicator of local muscle fatigue. As local muscle fatigue develops, the muscle fiber conduction velocity decreases, the fast twitch fibers are recruited less, and consequently the median frequency shifts toward the lower frequency area. The aim of this study was to test the characteristics of the median frequency according to exercise load (30% and 60% of MVC on the biceps brachii, 40% and 80% of MVC on the vastus lateralis) during the fatiguing isometric exercise. Thirteen healthy male volunteer students of Yonsei University were recruited. After the testing maximal voluntary isometric contraction, three variables (initial median frequency, regression slope, fatigue index) from the regression line of MDF data were measured in each exercise load. The results showed that the regression slope and fatigue index were significantly different for the biceps brachii, but not for the vastus lateralis initial MDF was not significant difference according to the exercise load on both muscles. The regression slope and fatigue index could monitor physiologic muscle change during fatiguing isometric exercise. The results showed that two MDF variables reflect the local muscle fatigue according to the exercise load.

  • PDF

이원오차성분을 갖는 패널회귀모형의 모형식별검정 (Test of Model Specification in Panel Regression Model with Two Error Components)

  • 송석헌;김영지;황선영
    • 응용통계연구
    • /
    • 제19권3호
    • /
    • pp.461-479
    • /
    • 2006
  • 본 논문에서는 이원오차성분을 갖는 패널회귀모형에서 모형식별을 위하여 LM 검정통계량을 유도하고 검정통계량의 연산을 위하여 인공회귀방법(Double-Length Artificial Regression, DLR)을 이용한다. 모의 실험 결과, 소표본의 경 우에는 Outer-Product Gradient(OPG)에 근거한 LM 검정통계량은 유위수준이 과대기각하는 경향을 보인 반면 DLR에 근거한 LM 검정통계량은 명목유의수준을 잘 유지하고 검정력도 높게 나타났다.

AWS 지점별 기상데이타를 이용한 진화적 회귀분석 기반의 단기 풍속 예보 보정 기법 (Evolutionary Nonlinear Regression Based Compensation Technique for Short-range Prediction of Wind Speed using Automatic Weather Station)

  • 현병용;이용희;서기성
    • 전기학회논문지
    • /
    • 제64권1호
    • /
    • pp.107-112
    • /
    • 2015
  • This paper introduces an evolutionary nonlinear regression based compensation technique for the short-range prediction of wind speed using AWS(Automatic Weather Station) data. Development of an efficient MOS(Model Output Statistics) is necessary to correct systematic errors of the model, but a linear regression based MOS is hard to manage an irregular nature of weather prediction. In order to solve the problem, a nonlinear and symbolic regression method using GP(Genetic Programming) is suggested for a development of MOS wind forecast guidance. Also FCM(Fuzzy C-Means) clustering is adopted to mitigate bias of wind speed data. The purpose of this study is to evaluate the accuracy of the estimation by a GP based nonlinear MOS for 3 days prediction of wind speed in South Korean regions. This method is then compared to the UM model and has shown superior results. Data for 2007-2009, 2011 is used for training, and 2012 is used for testing.

Goodness-of-fit tests for a proportional odds model

  • Lee, Hyun Yung
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권6호
    • /
    • pp.1465-1475
    • /
    • 2013
  • The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.