• Title/Summary/Keyword: Random Forest Classification

Search Result 299, Processing Time 0.028 seconds

A Novel Feature Selection Approach to Classify Breast Cancer Drug using Optimized Grey Wolf Algorithm

  • Shobana, G.;Priya, N.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.258-270
    • /
    • 2022
  • Cancer has become a common disease for the past two decades throughout the globe and there is significant increase of cancer among women. Breast cancer and ovarian cancers are more prevalent among women. Majority of the patients approach the physicians only during their final stage of the disease. Early diagnosis of cancer remains a great challenge for the researchers. Although several drugs are being synthesized very often, their multi-benefits are less investigated. With millions of drugs synthesized and their data are accessible through open repositories. Drug repurposing can be done using machine learning techniques. We propose a feature selection technique in this paper, which is novel that generates multiple populations for the grey wolf algorithm and classifies breast cancer drugs efficiently. Leukemia drug dataset is also investigated and Multilayer perceptron achieved 96% prediction accuracy. Three supervised machine learning algorithms namely Random Forest classifier, Multilayer Perceptron and Support Vector Machine models were applied and Multilayer perceptron had higher accuracy rate of 97.7% for breast cancer drug classification.

Study on the Estimation of Frost Occurrence Classification Using Machine Learning Methods (기계학습법을 이용한 서리 발생 구분 추정 연구)

  • Kim, Yongseok;Shim, Kyo-Moon;Jung, Myung-Pyo;Choi, In-tae
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.19 no.3
    • /
    • pp.86-92
    • /
    • 2017
  • In this study, a model to classify frost occurrence and frost free day was developed using the digital weather forecast data provided by Korea Meteorological Administration (KMA). The minimum temperature, average wind speed, relative humidity, and dew point temperature were identified as the meteorological variables useful for classification frost occurrence and frost-free days. It was found that frost-occurrence date tended to have relatively low values of the minimum temperature, dew point temperature, and average wind speed. On the other hand, relatively humidity on frost-free days was higher than on frost-occurrence dates. Models based on machine learning methods including Artificial Neural Network (ANN), Random Forest(RF), Support Vector Machine(SVM) with those meteorological factors had >70% of accuracy. This results suggested that these models would be useful to predict the occurrence of frost using a digital weather forecast data.

Satellite Monitoring of Reclamation and Land Cover Change Neighboring Tidal Flats on the West Coast of North Korea: Comparative Approaches Using Artificial Intelligence and the Normalized Difference Water Index

  • Sanae Kang;Chul-Hee Lim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.4
    • /
    • pp.409-423
    • /
    • 2023
  • North Korea is carrying out reclamation activities in tidal flat areas distributed throughout the west coast. Previousremote sensing research on North Korean tidal flats either failsto reflect recent trends or focuses on identifying and analyzing tidal flats. Thisstudy aimsto quantify the impact of recent reclamation activitiesin North Korea's coastal areas and contribute knowledge useful for determining the best remote sensing methods for coastal areas with limited accessibility, such as those in North Korea. Using Landsat-8 OLI images from 2014-2022, we analyzed land cover changesin an area on the west coast of Pyeonganbuk-do where reclamation activities are underway. Unsupervised classification using the normalized difference water index and the random forest classification technique were each used to divide the study area into classification groups, and changes in their areas over time were analyzed. The resultsshow a clear decrease in the water area and a tendency to increase cultivated area,supporting the evidence that North Korea'sreclamation isfor agricultural land expansion.Along coasts behind seawalls, the water area decreased by nearly half, and the cultivated area increased by over 2,300%, indicating significant changes and highlighting the anthropogenic nature of the cover changes due to reclamation. Both methods demonstrated high accuracy, making them suitable for detecting cover changes caused by reclamation. It is expected that further quality research will be conducted through the use of high-resolution satellite images and by combining data from multiple satellites in the future.

Intelligent System for the Prediction of Heart Diseases Using Machine Learning Algorithms with Anew Mixed Feature Creation (MFC) technique

  • Rawia Elarabi;Abdelrahman Elsharif Karrar;Murtada El-mukashfi El-taher
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.148-162
    • /
    • 2023
  • Classification systems can significantly assist the medical sector by allowing for the precise and quick diagnosis of diseases. As a result, both doctors and patients will save time. A possible way for identifying risk variables is to use machine learning algorithms. Non-surgical technologies, such as machine learning, are trustworthy and effective in categorizing healthy and heart-disease patients, and they save time and effort. The goal of this study is to create a medical intelligent decision support system based on machine learning for the diagnosis of heart disease. We have used a mixed feature creation (MFC) technique to generate new features from the UCI Cleveland Cardiology dataset. We select the most suitable features by using Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination with Random Forest feature selection (RFE-RF) and the best features of both LASSO RFE-RF (BLR) techniques. Cross-validated and grid-search methods are used to optimize the parameters of the estimator used in applying these algorithms. and classifier performance assessment metrics including classification accuracy, specificity, sensitivity, precision, and F1-Score, of each classification model, along with execution time and RMSE the results are presented independently for comparison. Our proposed work finds the best potential outcome across all available prediction models and improves the system's performance, allowing physicians to diagnose heart patients more accurately.

Improved prediction of soil liquefaction susceptibility using ensemble learning algorithms

  • Satyam Tiwari;Sarat K. Das;Madhumita Mohanty;Prakhar
    • Geomechanics and Engineering
    • /
    • v.37 no.5
    • /
    • pp.475-498
    • /
    • 2024
  • The prediction of the susceptibility of soil to liquefaction using a limited set of parameters, particularly when dealing with highly unbalanced databases is a challenging problem. The current study focuses on different ensemble learning classification algorithms using highly unbalanced databases of results from in-situ tests; standard penetration test (SPT), shear wave velocity (Vs) test, and cone penetration test (CPT). The input parameters for these datasets consist of earthquake intensity parameters, strong ground motion parameters, and in-situ soil testing parameters. liquefaction index serving as the binary output parameter. After a rigorous comparison with existing literature, extreme gradient boosting (XGBoost), bagging, and random forest (RF) emerge as the most efficient models for liquefaction instance classification across different datasets. Notably, for SPT and Vs-based models, XGBoost exhibits superior performance, followed by Light gradient boosting machine (LightGBM) and Bagging, while for CPT-based models, Bagging ranks highest, followed by Gradient boosting and random forest, with CPT-based models demonstrating lower Gmean(error), rendering them preferable for soil liquefaction susceptibility prediction. Key parameters influencing model performance include internal friction angle of soil (ϕ) and percentage of fines less than 75 µ (F75) for SPT and Vs data and normalized average cone tip resistance (qc) and peak horizontal ground acceleration (amax) for CPT data. It was also observed that the addition of Vs measurement to SPT data increased the efficiency of the prediction in comparison to only SPT data. Furthermore, to enhance usability, a graphical user interface (GUI) for seamless classification operations based on provided input parameters was proposed.

Land Cover Classification with High Spatial Resolution Using Orthoimage and DSM Based on Fixed-Wing UAV

  • Kim, Gu Hyeok;Choi, Jae Wan
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.1
    • /
    • pp.1-10
    • /
    • 2017
  • An UAV (Unmanned Aerial Vehicle) is a flight system that is designed to conduct missions without a pilot. Compared to traditional airborne-based photogrammetry, UAV-based photogrammetry is inexpensive and can obtain high-spatial resolution data quickly. In this study, we aimed to classify the land cover using high-spatial resolution images obtained using a UAV. An RGB camera was used to obtain high-spatial resolution orthoimage. For accurate classification, multispectral image about same areas were obtained using a multispectral sensor. A DSM (Digital Surface Model) and a modified NDVI (Normalized Difference Vegetation Index) were generated using images obtained using the RGB camera and multispectral sensor. Pixel-based classification was performed for twelve classes by using the RF (Random Forest) method. The classification accuracy was evaluated based on the error matrix, and it was confirmed that the proposed method effectively classified the area compared to supervised classification using only the RGB image.

Classification of COVID-19 Disease: A Machine Learning Perspective

  • Kinza Sardar
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.3
    • /
    • pp.107-112
    • /
    • 2024
  • Nowadays the deadly virus famous as COVID-19 spread all over the world starts from the Wuhan China in 2019. This disease COVID-19 Virus effect millions of people in very short time. There are so many symptoms of COVID19 perhaps the Identification of a person infected with COVID-19 virus is really a difficult task. Moreover it's a challenging task to identify whether a person or individual have covid test positive or negative. We are developing a framework in which we used machine learning techniques..The proposed method uses DecisionTree, KNearestNeighbors, GaussianNB, LogisticRegression, BernoulliNB , RandomForest , Machine Learning methods as the classifier for diagnosis of covid ,however, 5-fold and 10-fold cross-validations were applied through the classification process. The experimental results showed that the best accuracy obtained from Decision Tree classifiers. The data preprocessing techniques have been applied for improving the classification performance. Recall, accuracy, precision, and F-score metrics were used to evaluate the classification performance. In future we will improve model accuracy more than we achieved now that is 93 percent by applying different techniques

A Study on Classification of Crown Classes and Selection of Thinned Trees for Major Conifers Using Machine Learning Techniques (머신러닝 기법을 활용한 주요 침엽수종의 수관급 분류와 간벌목 선정 연구)

  • Lee, Yong-Kyu;Lee, Jung-Soo;Park, Jin-Woo
    • Journal of Korean Society of Forest Science
    • /
    • v.111 no.2
    • /
    • pp.302-310
    • /
    • 2022
  • Here we aimed to classify the major coniferous tree species (Pinus densiflora, Pinus koraiensis, and Larix kaempferi) by tree measurement information and machine learning algorithms to establish an efficient forest management plan. We used national forest monitoring information amassed over nine years for the measurement information of trees, and random forest (RF), XGBoost (XGB), and light GBM (LGBM) as machine learning algorithms. We compared and evaluated the accuracy of the algorithm through performance evaluation using the accuracy, precision, recall, and F1 score of the algorithm. The RF algorithm had the highest performance evaluation score for all tree species, and highest scores for Pinus densiflora, with an accuracy of about 65%, a precision of about 72%, a recall of about 60%, and an F1 score of about 66%. The classification accuracy for the dominant trees was higher than about 80% in the crown classes, but that of the co-dominant trees, the intermediate trees, and the overtopper trees was evaluated as low. We consider that the results of this study can be used as reference data for decision-making in the selection of thinning trees for forest management.

Evaluating the Efficiency of Models for Predicting Seismic Building Damage (지진으로 인한 건물 손상 예측 모델의 효율성 분석)

  • Chae Song Hwa;Yujin Lim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.5
    • /
    • pp.217-220
    • /
    • 2024
  • Predicting earthquake occurrences accurately is challenging, and preparing all buildings with seismic design for such random events is a difficult task. Analyzing building features to predict potential damage and reinforcing vulnerabilities based on this analysis can minimize damages even in buildings without seismic design. Therefore, research analyzing the efficiency of building damage prediction models is essential. In this paper, we compare the accuracy of earthquake damage prediction models using machine learning classification algorithms, including Random Forest, Extreme Gradient Boosting, LightGBM, and CatBoost, utilizing data from buildings damaged during the 2015 Nepal earthquake.

Machine Learning for Flood Prediction in Indonesia: Providing Online Access for Disaster Management Control

  • Reta L. Puspasari;Daeung Yoon;Hyun Kim;Kyoung-Woong Kim
    • Economic and Environmental Geology
    • /
    • v.56 no.1
    • /
    • pp.65-73
    • /
    • 2023
  • As one of the most vulnerable countries to floods, there should be an increased necessity for accurate and reliable flood forecasting in Indonesia. Therefore, a new prediction model using a machine learning algorithm is proposed to provide daily flood prediction in Indonesia. Data crawling was conducted to obtain daily rainfall, streamflow, land cover, and flood data from 2008 to 2021. The model was built using a Random Forest (RF) algorithm for classification to predict future floods by inputting three days of rainfall rate, forest ratio, and stream flow. The accuracy, specificity, precision, recall, and F1-score on the test dataset using the RF algorithm are approximately 94.93%, 68.24%, 94.34%, 99.97%, and 97.08%, respectively. Moreover, the AUC (Area Under the Curve) of the ROC (Receiver Operating Characteristics) curve results in 71%. The objective of this research is providing a model that predicts flood events accurately in Indonesian regions 3 months prior the day of flood. As a trial, we used the month of June 2022 and the model predicted the flood events accurately. The result of prediction is then published to the website as a warning system as a form of flood mitigation.