• Title/Summary/Keyword: Random forest classification - SVM classification

Search Result 54, Processing Time 0.027 seconds

A Predictive Model to identify possible affected Bipolar disorder students using Naive Baye's, Random Forest and SVM machine learning techniques of data mining and Building a Sequential Deep Learning Model using Keras

  • Peerbasha, S.;Surputheen, M. Mohamed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.5
    • /
    • pp.267-274
    • /
    • 2021
  • Medical care practices include gathering a wide range of student data that are with manic episodes and depression which would assist the specialist with diagnosing a health condition of the students correctly. In this way, the instructors of the specific students will also identify those students and take care of them well. The data which we collected from the students could be straightforward indications seen by them. The artificial intelligence has been utilized with Naive Baye's classification, Random forest classification algorithm, SVM algorithm to characterize the datasets which we gathered to check whether the student is influenced by Bipolar illness or not. Performance analysis of the disease data for the algorithms used is calculated and compared. Also, a sequential deep learning model is builded using Keras. The consequences of the simulations show the efficacy of the grouping techniques on a dataset, just as the nature and complexity of the dataset utilized.

A Study on the Performance Evaluation of Machine Learning for Predicting the Number of Movie Audiences (영화 관객 수 예측을 위한 기계학습 기법의 성능 평가 연구)

  • Jeong, Chan-Mi;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.2
    • /
    • pp.49-63
    • /
    • 2020
  • The accurate prediction of box office in the early stage is crucial for film industry to make better managerial decision. With aims to improve the prediction performance, the purpose of this paper is to evaluate the use of machine learning methods. We tested both classification and regression based methods including k-NN, SVM and Random Forest. We first evaluate input variables, which show that reputation-related information generated during the first two-week period after release is significant. Prediction test results show that regression based methods provides lower prediction error, and Random Forest particularly outperforms other machine learning methods. Regression based method has better prediction power when films have small box office earnings. On the other hand, classification based method works better for predicting large box office earnings.

Correlation Analysis of Airline Customer Satisfaction using Random Forest with Deep Neural Network and Support Vector Machine Model

  • Hong, Sang Hoon;Kim, Bumsu;Jung, Yong Gyu
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.4
    • /
    • pp.26-32
    • /
    • 2020
  • There are many airline customer evaluation data, but they are insufficient in terms of predicting customer satisfaction in practice. In particular, they are generally insufficient in case of verification of data value and development of a customer satisfaction prediction model based on customer evaluation data. In this paper, airline customer satisfaction analysis is conducted through an experiment of correlation analysis between customer evaluation data provided by Google's Kaggle. The difference in accuracy varied according to the three types, which are the overall variables, the top 4 and top 8 variables with the highest correlation. To build an airline customer satisfaction prediction model, they are applied to three classification algorithms of Random Forest, SVM, DNN and conduct a classification experiment. They are divided into training data and verification data by 7:3. As a result, the DNN model showed the lowest accuracy at 86.4%, while the SVM model at 89% and the Random Forest model at 95.7% showed the highest accuracy and performance.

Development of Galaxy Image Classification Based on Hand-crafted Features and Machine Learning (Hand-crafted 특징 및 머신 러닝 기반의 은하 이미지 분류 기법 개발)

  • Oh, Yoonju;Jung, Heechul
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.1
    • /
    • pp.17-27
    • /
    • 2021
  • In this paper, we develop a galaxy image classification method based on hand-crafted features and machine learning techniques. Additionally, we provide an empirical analysis to reveal which combination of the techniques is effective for galaxy image classification. To achieve this, we developed a framework which consists of four modules such as preprocessing, feature extraction, feature post-processing, and classification. Finally, we found that the best technique for galaxy image classification is a method to use a median filter, ORB vector features and a voting classifier based on RBF SVM, random forest and logistic regression. The final method is efficient so we believe that it is applicable to embedded environments.

A Performance Comparison of Machine Learning Classification Methods for Soil Creep Susceptibility Assessment (땅밀림 위험지 평가를 위한 기계학습 분류모델 비교)

  • Lee, Jeman;Seo, Jung Il;Lee, Jin-Ho;Im, Sangjun
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.4
    • /
    • pp.610-621
    • /
    • 2021
  • The soil creep, primarily caused by earthquakes and torrential rainfall events, has widely occurred across the country. The Korea Forest Service attempted to quantify the soil creep susceptible areas using a discriminant value table to prevent or mitigate casualties and/or property damages in advance. With the advent of advanced computer technologies, machine learning-based classification models have been employed for managing mountainous disasters, such as landslides and debris flows. This study aims to quantify the soil creep susceptibility using several classifiers, namely the k-Nearest Neighbor (k-NN), Naive Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM) models. To develop the classification models, we downscaled 292 data from 4,618 field survey data. About 70% of the selected data were used for training, with the remaining 30% used for model testing. The developed models have the classification accuracy of 0.727 for k-NN, 0.750 for NB, 0.807 for RF, and 0.750 for SVM against test datasets representing 30% of the total data. Furthermore, we estimated Cohen's Kappa index as 0.534, 0.580, 0.673, and 0.585, with AUC values of 0.872, 0.912, 0.943, and 0.834, respectively. The machine learning-based classifications for soil creep susceptibility were RF, NB, SVM, and k-NN in that order. Our findings indicate that the machine learning classifiers can provide valuable information in establishing and implementing natural disaster management plans in mountainous areas.

Development of a Classification Method for Forest Vegetation on the Stand Level, Using KOMPSAT-3A Imagery and Land Coverage Map (KOMPSAT-3A 위성영상과 토지피복도를 활용한 산림식생의 임상 분류법 개발)

  • Song, Ji-Yong;Jeong, Jong-Chul;Lee, Peter Sang-Hoon
    • Korean Journal of Environment and Ecology
    • /
    • v.32 no.6
    • /
    • pp.686-697
    • /
    • 2018
  • Due to the advance in remote sensing technology, it has become easier to more frequently obtain high resolution imagery to detect delicate changes in an extensive area, particularly including forest which is not readily sub-classified. Time-series analysis on high resolution images requires to collect extensive amount of ground truth data. In this study, the potential of land coverage mapas ground truth data was tested in classifying high-resolution imagery. The study site was Wonju-si at Gangwon-do, South Korea, having a mix of urban and natural areas. KOMPSAT-3A imagery taken on March 2015 and land coverage map published in 2017 were used as source data. Two pixel-based classification algorithms, Support Vector Machine (SVM) and Random Forest (RF), were selected for the analysis. Forest only classification was compared with that of the whole study area except wetland. Confusion matrixes from the classification presented that overall accuracies for both the targets were higher in RF algorithm than in SVM. While the overall accuracy in the forest only analysis by RF algorithm was higher by 18.3% than SVM, in the case of the whole region analysis, the difference was relatively smaller by 5.5%. For the SVM algorithm, adding the Majority analysis process indicated a marginal improvement of about 1% than the normal SVM analysis. It was found that the RF algorithm was more effective to identify the broad-leaved forest within the forest, but for the other classes the SVM algorithm was more effective. As the two pixel-based classification algorithms were tested here, it is expected that future classification will improve the overall accuracy and the reliability by introducing a time-series analysis and an object-based algorithm. It is considered that this approach will contribute to improving a large-scale land planning by providing an effective land classification method on higher spatial and temporal scales.

A Best Effort Classification Model For Sars-Cov-2 Carriers Using Random Forest

  • Mallick, Shrabani;Verma, Ashish Kumar;Kushwaha, Dharmender Singh
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.27-33
    • /
    • 2021
  • The whole world now is dealing with Coronavirus, and it has turned to be one of the most widespread and long-lived pandemics of our times. Reports reveal that the infectious disease has taken toll of the almost 80% of the world's population. Amidst a lot of research going on with regards to the prediction on growth and transmission through Symptomatic carriers of the virus, it can't be ignored that pre-symptomatic and asymptomatic carriers also play a crucial role in spreading the reach of the virus. Classification Algorithm has been widely used to classify different types of COVID-19 carriers ranging from simple feature-based classification to Convolutional Neural Networks (CNNs). This research paper aims to present a novel technique using a Random Forest Machine learning algorithm with hyper-parameter tuning to classify different types COVID-19-carriers such that these carriers can be accurately characterized and hence dealt timely to contain the spread of the virus. The main idea for selecting Random Forest is that it works on the powerful concept of "the wisdom of crowd" which produces ensemble prediction. The results are quite convincing and the model records an accuracy score of 99.72 %. The results have been compared with the same dataset being subjected to K-Nearest Neighbour, logistic regression, support vector machine (SVM), and Decision Tree algorithms where the accuracy score has been recorded as 78.58%, 70.11%, 70.385,99% respectively, thus establishing the concreteness and suitability of our approach.

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.133-140
    • /
    • 2018
  • Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.

Comparison of machine learning algorithms for regression and classification of ultimate load-carrying capacity of steel frames

  • Kim, Seung-Eock;Vu, Quang-Viet;Papazafeiropoulos, George;Kong, Zhengyi;Truong, Viet-Hung
    • Steel and Composite Structures
    • /
    • v.37 no.2
    • /
    • pp.193-209
    • /
    • 2020
  • In this paper, the efficiency of five Machine Learning (ML) methods consisting of Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Tree Booting (GTB) for regression and classification of the Ultimate Load Factor (ULF) of nonlinear inelastic steel frames is compared. For this purpose, a two-story, a six-story, and a twenty-story space frame are considered. An advanced nonlinear inelastic analysis is carried out for the steel frames to generate datasets for the training of the considered ML methods. In each dataset, the input variables are the geometric features of W-sections and the output variable is the ULF of the frame. The comparison between the five ML methods is made in terms of the mean-squared-error (MSE) for the regression models and the accuracy for the classification models, respectively. Moreover, the ULF distribution curve is calculated for each frame and the strength failure probability is estimated. It is found that the GTB method has the best efficiency in both regression and classification of ULF regardless of the number of training samples and the space frames considered.

Analysis of Dimensionality Reduction Methods Through Epileptic EEG Feature Selection for Machine Learning in BCI (BCI에서 기계 학습을 위한 간질 뇌파 특징 선택을 통한 차원 감소 방법 분석)

  • Tong, Yang;Aliyu, Ibrahim;Lim, Chang-Gyoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1333-1342
    • /
    • 2018
  • Until now, Electroencephalography(: EEG) has been the most important and convenient method for the diagnosis and treatment of epilepsy. However, it is difficult to identify the wave characteristics of an epileptic EEG signals because it is very weak, non-stationary and has strong background noise. In this paper, we analyse the effect of dimensionality reduction methods on Epileptic EEG feature selection and classification. Three dimensionality reduction methods: Pincipal Component Analysis(: PCA), Kernel Principal Component Analysis(: KPCA) and Linear Discriminant Analysis(: LDA) were investigated. The performance of each method was evaluated by using Support Vector Machine SVM, Logistic Regression(: LR), K-Nearestneighbor(: K-NN), Decision Tree(: DR) and Random Forest(: RF). From the experimental result, PCA recorded 75% of highest accuracy in SVM, LR and K-NN. KPCA recorded 85% of best performance in SVM and K-KNN while LDA achieved 100% accuracy in K-NN. Thus, LDA dimensionality reduction is found to provide the best classification result for epileptic EEG signal.