• Title/Summary/Keyword: Predictive Validation

Search Result 253, Processing Time 0.025 seconds

Exploring Machine Learning Classifiers for Breast Cancer Classification

  • Inayatul Haq;Tehseen Mazhar;Hinna Hafeez;Najib Ullah;Fatma Mallek;Habib Hamam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.860-880
    • /
    • 2024
  • Breast cancer is a major health concern affecting women and men globally. Early detection and accurate classification of breast cancer are vital for effective treatment and survival of patients. This study addresses the challenge of accurately classifying breast tumors using machine learning classifiers such as MLP, AdaBoostM1, logit Boost, Bayes Net, and the J48 decision tree. The research uses a dataset available publicly on GitHub to assess the classifiers' performance and differentiate between the occurrence and non-occurrence of breast cancer. The study compares the 10-fold and 5-fold cross-validation effectiveness, showing that 10-fold cross-validation provides superior results. Also, it examines the impact of varying split percentages, with a 66% split yielding the best performance. This shows the importance of selecting appropriate validation techniques for machine learning-based breast tumor classification. The results also indicate that the J48 decision tree method is the most accurate classifier, providing valuable insights for developing predictive models for cancer diagnosis and advancing computational medical research.

Prediction of concrete compressive strength using non-destructive test results

  • Erdal, Hamit;Erdal, Mursel;Simsek, Osman;Erdal, Halil Ibrahim
    • Computers and Concrete
    • /
    • v.21 no.4
    • /
    • pp.407-417
    • /
    • 2018
  • Concrete which is a composite material is one of the most important construction materials. Compressive strength is a commonly used parameter for the assessment of concrete quality. Accurate prediction of concrete compressive strength is an important issue. In this study, we utilized an experimental procedure for the assessment of concrete quality. Firstly, the concrete mix was prepared according to C 20 type concrete, and slump of fresh concrete was about 20 cm. After the placement of fresh concrete to formworks, compaction was achieved using a vibrating screed. After 28 day period, a total of 100 core samples having 75 mm diameter were extracted. On the core samples pulse velocity determination tests and compressive strength tests were performed. Besides, Windsor probe penetration tests and Schmidt hammer tests were also performed. After setting up the data set, twelve artificial intelligence (AI) models compared for predicting the concrete compressive strength. These models can be divided into three categories (i) Functions (i.e., Linear Regression, Simple Linear Regression, Multilayer Perceptron, Support Vector Regression), (ii) Lazy-Learning Algorithms (i.e., IBk Linear NN Search, KStar, Locally Weighted Learning) (iii) Tree-Based Learning Algorithms (i.e., Decision Stump, Model Trees Regression, Random Forest, Random Tree, Reduced Error Pruning Tree). Four evaluation processes, four validation implements (i.e., 10-fold cross validation, 5-fold cross validation, 10% split sample validation & 20% split sample validation) are used to examine the performance of predictive models. This study shows that machine learning regression techniques are promising tools for predicting compressive strength of concrete.

A Study on Time Series Cross-Validation Techniques for Enhancing the Accuracy of Reservoir Water Level Prediction Using Automated Machine Learning TPOT (자동기계학습 TPOT 기반 저수위 예측 정확도 향상을 위한 시계열 교차검증 기법 연구)

  • Bae, Joo-Hyun;Park, Woon-Ji;Lee, Seoro;Park, Tae-Seon;Park, Sang-Bin;Kim, Jonggun;Lim, Kyoung-Jae
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.66 no.1
    • /
    • pp.1-13
    • /
    • 2024
  • This study assessed the efficacy of improving the accuracy of reservoir water level prediction models by employing automated machine learning models and efficient cross-validation methods for time-series data. Considering the inherent complexity and non-linearity of time-series data related to reservoir water levels, we proposed an optimized approach for model selection and training. The performance of twelve models was evaluated for the Obong Reservoir in Gangneung, Gangwon Province, using the TPOT (Tree-based Pipeline Optimization Tool) and four cross-validation methods, which led to the determination of the optimal pipeline model. The pipeline model consisting of Extra Tree, Stacking Ridge Regression, and Simple Ridge Regression showed outstanding predictive performance for both training and test data, with an R2 (Coefficient of determination) and NSE (Nash-Sutcliffe Efficiency) exceeding 0.93. On the other hand, for predictions of water levels 12 hours later, the pipeline model selected through time-series split cross-validation accurately captured the change pattern of time-series water level data during the test period, with an NSE exceeding 0.99. The methodology proposed in this study is expected to greatly contribute to the efficient generation of reservoir water level predictions in regions with high rainfall variability.

Predictive Growth Models of Bacillus cereus on Dried Laver Pyropia pseudolinearis as Function of Storage Temperature (저장온도에 따른 마른김(Pyropia pseudolinearis)의 Bacillus cereus 성장예측모델 개발)

  • Choi, Man-Seok;Kim, Ji Yoon;Jeon, Eun Bi;Park, Shin Young
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.53 no.5
    • /
    • pp.699-706
    • /
    • 2020
  • Predictive models in food microbiology are used for predicting microbial growth or death rates using mathematical and statistical tools considering the intrinsic and extrinsic factors of food. This study developed predictive growth models for Bacillus cereus on dried laver Pyropia pseudolinearis stored at different temperatures (5, 10, 15, 20, and 25℃). Primary models developed for specific growth rate (SGR), lag time (LT), and maximum population density (MPD) indicated a good fit (R2≥0.98) with the Gompertz equation. The SGR values were 0.03, 0.08, and 0.12, and the LT values were 12.64, 4.01, and 2.17 h, at the storage temperatures of 15, 20, and 25℃, respectively. Secondary models for the same parameters were determined via nonlinear regression as follows: SGR=0.0228-0.0069*T1+0.0005*T12; LT=113.0685-9.6256*T1+0.2079*T12; MPD=1.6630+0.4284*T1-0.0080*T12 (where T1 is the storage temperature). The appropriateness of the secondary models was validated using statistical indices, such as mean squared error (MSE<0.01), bias factor (0.99≤Bf≤1.07), and accuracy factor (1.01≤Af≤1.14). External validation was performed at three random temperatures, and the results were consistent with each other. Thus, these models may be useful for predicting the growth of B. cereus on dried laver.

Development of Hypertension Predictive Model (고혈압 발생 예측 모형 개발)

  • Yong, Wang-Sik;Park, Il-Su;Kang, Sung-Hong;Kim, Won-Joong;Kim, Kong-Hyun;Kim, Kwang-Kee;Park, No-Yai
    • Korean Journal of Health Education and Promotion
    • /
    • v.23 no.4
    • /
    • pp.13-28
    • /
    • 2006
  • Objectives: This study used the characteristics of the knowledge discovery and data mining algorithms to develop hypertension predictive model for hypertension management using the Korea National Health Insurance Corporation database(the insureds' screening and health care benefit data). Methods: This study validated the predictive power of data mining algorithms by comparing the performance of logistic regression, decision tree, and ensemble technique. On the basis of internal and external validation, it was found that the model performance of logistic regression method was the best among the above three techniques. Results: Major results of logistic regression analysis suggested that the probability of hypertension was: - lower for the female(compared with the male)(OR=0.834) - higher for the persons whose ages were 60 or above(compared with below 40)(OR=4.628) - higher for obese persons(compared with normal persons)(OR= 2.103) - higher for the persons with high level of glucose(compared with normal persons)(OR=1.086) - higher for the persons who had family history of hypertension(compared with the persons who had not)(OR=1.512) - higher for the persons who periodically drank alcohol(compared with the persons who did not)$(OR=1.037{\sim}1.291)$ Conclusions: This study produced several factors affecting the outbreak of hypertension using screening. It is considered to be a contributing factor towards the nation's building of a Hypertension Management System in the near future by bringing forth representative results on the rise and care of hypertension.

The Study on the Extraction of the Distribution Potential Area of Debris Landform Using Fuzzy Set and Bayesian Predictive Discriminate Model (퍼지집합과 베이지안 확률 기법을 이용한 암설사면지형 분포지역 추출에 관한 연구)

  • Wi, Nun-Sol;JANG, Dong-Ho
    • Journal of The Geomorphological Association of Korea
    • /
    • v.24 no.3
    • /
    • pp.105-118
    • /
    • 2017
  • The debris slope landforms which are existent in Korean mountains is generally on the steep slopes and mostly covered by vegetation, it is difficult to investigate the landform. Therefore a scientific method is required to come up with an effective field investigation plan. For this purpose, the use of Remote Sensing and GIS technologies for a spatial analysis is essential. This study has extracted the potential area of debrisslope landform formation using Fuzzy set and Bayesian Predictive Discriminate Model as mathematical data integration methods. The first step was to obtain information about debris locations and their related factors. This information was verified through field investigation and then used to build a database. In the second step, the map that zoning the study area based on the degree of debris formation possibility was generated using two modeling methods, and then cross validation technique was applied. In order to quantitatively analyze the accuracy of two modeling methods, the calculated potential rate of debrisformation within the study area was evaluated by plotting SRC(Success Rate Curve) and calculating AUC(Area Under the Curve). As a result, the prediction accuracy of Fuzzy set model wes 83.1% and Bayesian Predictive Discriminate Model wes 84.9%. It showed that two models are accurate and reliable and can contribute to efficient field investigation and debris landform management.

Statistical Prediction of Used Tablet PC Transaction Price among Consumers (소비자 사이의 중고 태블릿PC 거래 가격의 통계적 예측)

  • Younghee Go;Sohyung Kim;Yujin Chung
    • Journal of Industrial Convergence
    • /
    • v.20 no.12
    • /
    • pp.179-186
    • /
    • 2022
  • This study aims to develop a predictive model to suggest a used sales price to sellers and buyers when trading used tablet PCs. For model development, we analyzed the real used tablet PC transaction data and additionally collected detailed product information. We developed several predictive models and selected the best predictive model among them. Specifically, we considered a multiple linear regression model using the used sales price as a dependent variable and other variables in the integrated data as independent variables, a multiple linear regression model including interactions, and the models from stepwise variable selection in each model. The model with the best predictive performance was finally selected through cross-validation. Through this study, we can predict the sales price of used tablet PCs and suggest appropriate used sales prices to sellers and buyers.

Prediction of Protein-Protein Interaction Sites Based on 3D Surface Patches Using SVM (SVM 모델을 이용한 3차원 패치 기반 단백질 상호작용 사이트 예측기법)

  • Park, Sung-Hee;Hansen, Bjorn
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.21-28
    • /
    • 2012
  • Predication of protein interaction sites for monomer structures can reduce the search space for protein docking and has been regarded as very significant for predicting unknown functions of proteins from their interacting proteins whose functions are known. In the other hand, the prediction of interaction sites has been limited in crystallizing weakly interacting complexes which are transient and do not form the complexes stable enough for obtaining experimental structures by crystallization or even NMR for the most important protein-protein interactions. This work reports the calculation of 3D surface patches of complex structures and their properties and a machine learning approach to build a predictive model for the 3D surface patches in interaction and non-interaction sites using support vector machine. To overcome classification problems for class imbalanced data, we employed an under-sampling technique. 9 properties of the patches were calculated from amino acid compositions and secondary structure elements. With 10 fold cross validation, the predictive model built from SVM achieved an accuracy of 92.7% for classification of 3D patches in interaction and non-interaction sites from 147 complexes.

Validation of Adult Fall Assessment Scale Korean Version for Adult Patients in General Hospitals in Korea (한국형 낙상 위험 사정도구의 타당성 평가연구)

  • Choi, Eun Hee;Ko, Mi Suk;Lee, Shin Ae;Park, Jung Ha
    • Journal of Korean Clinical Nursing Research
    • /
    • v.26 no.2
    • /
    • pp.265-273
    • /
    • 2020
  • Purpose: The purpose of this study was to test the predictive validity of the Fall Assessment Scale-Korean version (FAS-K) and to find the most appropriate cutoff score to screen high-risk fall groups in adult patients in general hospitals in Korea. Methods: We performed a prospective evaluation study in medical and surgical ward patients at two major general hospitals in Seoul. Data were collected from Nov. 1, 2018 to Feb. 28, 2019, nurses performed 651 observation series. The researcher measured the fall risk assessment score by applying FAS-K, MFS (Morse Fall Scale), and JHFRAT (Johns Hopkins Hospital Fall Risk Assessment tool) to the patients twice a week between 10 am and 12 noon. Data were analyzed using Pearson's corelation coefficients, and the sensitivity, specificity, predictive value, and the area under the curve (AUC) of the three tools. Results: The FAS-K was positively correlated with the MFS (r=.70, p<.001) and the JHFRAT (r=.82, p<.001). According to the receiver operating characteristics (ROC) curve analysis of the FAS-K, sensitivity, specificity, and positive and negative prediction values were 85.3%, 49.4%, 8.5%, and 98.4%, respectively, when the FAS-K score was 4. Therefore, the cut-off score of the FAS-K to identify groups with high fall risk was 4. Conclusion: The FAS-K is a valid tool for measuring fall risk in adult inpatients. In addition, the FAS-K score, 4, can be used to identify high-risk fall groups and know specific points in time to provide active interventions to prevent falls.

The Consumer Perceived Value in Taekwondo Performance Spectators: Scale Development and Validation (태권도 공연 관람자의 인지된 가치 척도 개발 및 적용)

  • Jeong, Seung-Hoon
    • 한국체육학회지인문사회과학편
    • /
    • v.55 no.6
    • /
    • pp.417-435
    • /
    • 2016
  • The purpose of this study was to develop a valid, reliable instrument to measure perceived value for Taekwondo performance spectators. The perceived value scale for Taekwondo performance spectators was developed in eight phases. For the development spectators perceived value scale (1) literature review, (2) preliminary factors and items selected, (3) assessment of the items, (4) pilot test, (5) data sampling, (6) validity of development scale, (7) assess reliability of items and, (8) predictive validity of development items. Based on three processes, a new perceived value of Taekwondo performance spectators scale with six factors(cultural, social, hedonic, aesthetic, moral and utilitarian value) and 19 items were developed. The result of predictive validity of development items: first, cultural, social, aesthetics, moral of perceived value had significantly influence on spectators attitude. Second, social, hedonic, utilitarian of perceived value had significantly influence on spectators satisfaction. Third, cultural, social, aesthetics perceived value had significantly influence on future consumption behavior of spectators.