• Title/Summary/Keyword: RandomForest

Search Result 1,013, Processing Time 0.031 seconds

A Study on Diabetes Management System Based on Logistic Regression and Random Forest

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.2
    • /
    • pp.61-68
    • /
    • 2024
  • In the quest for advancing diabetes diagnosis, this study introduces a novel two-step machine learning approach that synergizes the probabilistic predictions of Logistic Regression with the classification prowess of Random Forest. Diabetes, a pervasive chronic disease impacting millions globally, necessitates precise and early detection to mitigate long-term complications. Traditional diagnostic methods, while effective, often entail invasive testing and may not fully leverage the patterns hidden in patient data. Addressing this gap, our research harnesses the predictive capability of Logistic Regression to estimate the likelihood of diabetes presence, followed by employing Random Forest to classify individuals into diabetic, pre-diabetic or nondiabetic categories based on the computed probabilities. This methodology not only capitalizes on the strengths of both algorithms-Logistic Regression's proficiency in estimating nuanced probabilities and Random Forest's robustness in classification-but also introduces a refined mechanism to enhance diagnostic accuracy. Through the application of this model to a comprehensive diabetes dataset, we demonstrate a marked improvement in diagnostic precision, as evidenced by superior performance metrics when compared to other machine learning approaches. Our findings underscore the potential of integrating diverse machine learning models to improve clinical decision-making processes, offering a promising avenue for the early and accurate diagnosis of diabetes and potentially other complex diseases.

Enhancing Internet of Things Security with Random Forest-Based Anomaly Detection

  • Ahmed Al Shihimi;Muhammad R Ahmed;Thirein Myo;Badar Al Baroomi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.6
    • /
    • pp.67-76
    • /
    • 2024
  • The Internet of Things (IoT) has revolutionized communication and device operation, but it has also brought significant security challenges. IoT networks are structured into four levels: devices, networks, applications, and services, each with specific security considerations. Personal Area Networks (PANs), Local Area Networks (LANs), and Wide Area Networks (WANs) are the three types of IoT networks, each with unique security requirements. Communication protocols such as Wi-Fi and Bluetooth, commonly used in IoT networks, are susceptible to vulnerabilities and require additional security measures. Apart from physical security, authentication, encryption, software vulnerabilities, DoS attacks, data privacy, and supply chain security pose significant challenges. Ensuring the security of IoT devices and the data they exchange is crucial. This paper utilizes the Random Forest Algorithm from machine learning to detect anomalous data in IoT devices. The dataset consists of environmental data (temperature and humidity) collected from IoT sensors in Oman. The Random Forest Algorithm is implemented and trained using Python, and the accuracy and results of the model are discussed, demonstrating the effectiveness of Random Forest for detecting IoT device data anomalies.

Stress Assesment based on Bio-Signals using Random Forest Algorithm (랜덤포레스트 기법을 이용한 생체 신호 기반의 스트레스 평가 방법)

  • Lim, Taegyoon;Heo, Jeongheon;Jeong, Kyuwon;Ghim, Heirhee
    • Journal of the Korean Society of Safety
    • /
    • v.35 no.1
    • /
    • pp.62-69
    • /
    • 2020
  • Most people suffer from stress during day life because modernized society is very complex and changes fast. Because stress can affect to many kind of physiological phenomena it is even considered as a disease. Therefore, it should be detected earlier, then must be released. When a person is being stressed several bio-signals such as heart rate, etc. are changed. So, those can be detected using medical electronics techniques. In this paper, stress assessment system is studied using random forest algorithm based on heart rate, RR interval and Galvanic skin response. The random forest model was trained and tested using the data set obtained from the bio-signals. It is found that the stress assessment procedure developed in this paper is very useful.

A Study on Prediction Techniques through Machine Learning of Real-time Solar Radiation in Jeju (제주 실시간 일사량의 기계학습 예측 기법 연구)

  • Lee, Young-Mi;Bae, Joo-Hyun;Park, Jeong-keun
    • Journal of Environmental Science International
    • /
    • v.26 no.4
    • /
    • pp.521-527
    • /
    • 2017
  • Solar radiation forecasts are important for predicting the amount of ice on road and the potential solar energy. In an attempt to improve solar radiation predictability in Jeju, we conducted machine learning with various data mining techniques such as tree models, conditional inference tree, random forest, support vector machines and logistic regression. To validate machine learning models, the results from the simulation was compared with the solar radiation data observed over Jeju observation site. According to the model assesment, it can be seen that the solar radiation prediction using random forest is the most effective method. The error rate proposed by random forest data mining is 17%.

Prediction of Paroxysmal Atrial Fibrillation using Time-domain Analysis and Random Forest

  • Lee, Seung-Hwan;Kang, Dong-Won;Lee, Kyoung-Joung
    • Journal of Biomedical Engineering Research
    • /
    • v.39 no.2
    • /
    • pp.69-79
    • /
    • 2018
  • The present study proposes an algorithm that can discriminate between normal subjects and paroxysmal atrial fibrillation (PAF) patients, which is conducted using electrocardiogram (ECG) without PAF events. For this, time-domain features and random forest classifier are used. Time-domain features are obtained from Poincare plot, Lorenz plot of ${\delta}RR$ interval, and morphology analysis. Afterward, three features are selected in total through feature selection. PAF patients and normal subjects are classified using random forest. The classification result showed that sensitivity and specificity were 81.82% and 95.24% respectively, the positive predictive value and negative predictive value were 96.43% and 76.92% respectively, and accuracy was 87.04%. The proposed algorithm had an advantage in terms of the computation requirement compared to existing algorithm, so it has suggested applicability in the more efficient prediction of PAF.

Prediction of protein binding regions in RNA using random forest (Random forest를 이용한 RNA에서의 단백질 결합 영역 예측)

  • Choi, Daesik;Park, Byungkyu;Chae, Hanju;Lee, Wook;Han, Kyungsook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.583-586
    • /
    • 2016
  • 단백질과 RNA의 상호작용 데이터가 대량으로 늘어남에 따라, 단백질과 RNA의 결합부위를 예측하는 계산학적인 방법들이 많이 개발되고 있다. 하지만, 많은 계산학적인 방법들은 단백질에서 단백질과 RNA 결합부위를 예측한다는 한계점이 있었다. 본 논문에서는 RNA와 단백질의 서열정보를 모두 사용하여, 단백질과 결합하는 RNA 결합부위를 예측하는 기법과 그 결과를 논한다. WEKA random forest(http://www.cs.waikato.ac.nz/ml/weka/)를 이용하여 예측 모델을 개발하였고, RNA 서열의 서열 프로파일, 서열 composition, 결합 상대방의 단백질의 특성 등을 특정으로 표현하였다. Random forest 기법을 사용한 cross validation의 결과로서 1:1 모델에서 제일 높은 성능인 92.4% sensitivity, 92.0% specificity, 92.2% accuracy를 보였고, independent test에서는 72.5% sensitivity, 90.0% specificity, 2.1% accuracy를 보였다.

A Research on Accuracy Improvement of Diabetes Recognition Factors Based on XGBoost

  • Shin, Yongsub;Yun, Dai Yeol;Moon, Seok-Jae;Hwang, Chi-gon
    • International journal of advanced smart convergence
    • /
    • v.10 no.2
    • /
    • pp.73-78
    • /
    • 2021
  • Recently, the number of people who visit the hospital due to diabetes is increasing. According to the Korean Diabetes Association, it is statistically indicated that one in seven adults aged 30 years or older in Korea suffers from diabetes, and it is expected to be more if the pre-diabetes, fasting blood sugar disorders, are combined. In the last study, the validity of Triglyceride and Cholesterol associated with diabetes was confirmed and analyzed using Random Forest. Random Forest has a disadvantage that as the amount of data increases, it uses more memory and slows down the speed. Therefore, in this paper, we compared and analyzed Random Forest and XGBoost, focusing on improvement of learning speed and prevention of memory waste, which are mainly dealt with in machine learning. Using XGBoost, the problem of slowing down and wasting memory was solved, and the accuracy of the diabetes recognition factor was further increased.

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest (다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측)

  • Kim, Jun Su;Choi, Byung-Jae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.249-255
    • /
    • 2022
  • The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.

Analysis of Adolescent Suicide Factors based on Random Forest Machine Learning Algorithm

  • Gi-Lim HA;In Seon EO;Dong Hun HAN;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.3
    • /
    • pp.23-27
    • /
    • 2023
  • The purpose of this study is to identify and analyze suicide factors of adolescents using the Random Forest algorithm. According to statistics on the cause of death by the National Statistical Office in 2019, suicide was the highest cause of death in the 10-19 age group, which is a major social problem. Using machine learning algorithms, research can predict whether individual adolescents think of suicide without investigating suicidal ideation and can contribute to protecting adolescents and analyzing factors that affect suicide, establishing effective intervention measures. As a result of predicting with the random forest algorithm, it can be said that the possibility of identifying and predicting suicide factors of adolescents was confirmed. To increase the accuracy of the results, continuous research on the factors that induce youth suicide is necessary.

SEQUENTIAL MINIMAL OPTIMIZATION WITH RANDOM FOREST ALGORITHM (SMORF) USING TWITTER CLASSIFICATION TECHNIQUES

  • J.Uma;K.Prabha
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.116-122
    • /
    • 2023
  • Sentiment categorization technique be commonly isolated interested in threes significant classifications name Machine Learning Procedure (ML), Lexicon Based Method (LB) also finally, the Hybrid Method. In Machine Learning Methods (ML) utilizes phonetic highlights with apply notable ML algorithm. In this paper, in classification and identification be complete base under in optimizations technique called sequential minimal optimization with Random Forest algorithm (SMORF) for expanding the exhibition and proficiency of sentiment classification framework. The three existing classification algorithms are compared with proposed SMORF algorithm. Imitation result within experiential structure is Precisions (P), recalls (R), F-measures (F) and accuracy metric. The proposed sequential minimal optimization with Random Forest (SMORF) provides the great accuracy.