• Title/Summary/Keyword: Ensemble Machine learning

Search Result 226, Processing Time 0.03 seconds

Comparing the Performance of 17 Machine Learning Models in Predicting Human Population Growth of Countries

  • Otoom, Mohammad Mahmood
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.220-225
    • /
    • 2021
  • Human population growth rate is an important parameter for real-world planning. Common approaches rely upon fixed parameters like human population, mortality rate, fertility rate, which is collected historically to determine the region's population growth rate. Literature does not provide a solution for areas with no historical knowledge. In such areas, machine learning can solve the problem, but a multitude of machine learning algorithm makes it difficult to determine the best approach. Further, the missing feature is a common real-world problem. Thus, it is essential to compare and select the machine learning techniques which provide the best and most robust in the presence of missing features. This study compares 17 machine learning techniques (base learners and ensemble learners) performance in predicting the human population growth rate of the country. Among the 17 machine learning techniques, random forest outperformed all the other techniques both in predictive performance and robustness towards missing features. Thus, the study successfully demonstrates and compares machine learning techniques to predict the human population growth rate in settings where historical data and feature information is not available. Further, the study provides the best machine learning algorithm for performing population growth rate prediction.

Developing an Ensemble Classifier for Bankruptcy Prediction (부도 예측을 위한 앙상블 분류기 개발)

  • Min, Sung-Hwan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.7
    • /
    • pp.139-148
    • /
    • 2012
  • An ensemble of classifiers is to employ a set of individually trained classifiers and combine their predictions. It has been found that in most cases the ensembles produce more accurate predictions than the base classifiers. Combining outputs from multiple classifiers, known as ensemble learning, is one of the standard and most important techniques for improving classification accuracy in machine learning. An ensemble of classifiers is efficient only if the individual classifiers make decisions as diverse as possible. Bagging is the most popular method of ensemble learning to generate a diverse set of classifiers. Diversity in bagging is obtained by using different training sets. The different training data subsets are randomly drawn with replacement from the entire training dataset. The random subspace method is an ensemble construction technique using different attribute subsets. In the random subspace, the training dataset is also modified as in bagging. However, this modification is performed in the feature space. Bagging and random subspace are quite well known and popular ensemble algorithms. However, few studies have dealt with the integration of bagging and random subspace using SVM Classifiers, though there is a great potential for useful applications in this area. The focus of this paper is to propose methods for improving SVM performance using hybrid ensemble strategy for bankruptcy prediction. This paper applies the proposed ensemble model to the bankruptcy prediction problem using a real data set from Korean companies.

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data (트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅)

  • Kim, Junho;Kim, Wongyum;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.6
    • /
    • pp.187-194
    • /
    • 2020
  • This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

Deep Learning-Based Brain Tumor Classification in MRI images using Ensemble of Deep Features

  • Kang, Jaeyong;Gwak, Jeonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.7
    • /
    • pp.37-44
    • /
    • 2021
  • Automatic classification of brain MRI images play an important role in early diagnosis of brain tumors. In this work, we present a deep learning-based brain tumor classification model in MRI images using ensemble of deep features. In our proposed framework, three different deep features from brain MR image are extracted using three different pre-trained models. After that, the extracted deep features are fed to the classification module. In the classification module, the three different deep features are first fed into the fully-connected layers individually to reduce the dimension of the features. After that, the output features from the fully-connected layers are concatenated and fed into the fully-connected layer to predict the final output. To evaluate our proposed model, we use openly accessible brain MRI dataset from web. Experimental results show that our proposed model outperforms other machine learning-based models.

Ensemble Model for Urine Spectrum Analysis Based on Hybrid Machine Learning (혼합 기계 학습 기반 소변 스펙트럼 분석 앙상블 모델)

  • Choi, Jaehyeok;Chung, Mokdong
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.1059-1065
    • /
    • 2020
  • In hospitals, nurses are subjectively determining the urine status to check the kidneys and circulatory system of patients whose statuses are related to patients with kidney disease, critically ill patients, and nursing homes before and after surgery. To improve this problem, this paper proposes a urine spectrum analysis system which clusters urine test results based on a hybrid machine learning model consists of unsupervised learning and supervised learning. The proposed system clusters the spectral data using unsupervised learning in the first part, and classifies them using supervised learning in the second part. The results of the proposed urine spectrum analysis system using a mixed model are evaluated with the results of pure supervised learning. This paper is expected to provide better services than existing medical services to patients by solving the shortage of nurses, shortening of examination time, and subjective evaluation in hospitals.

Tree size determination for classification ensemble

  • Choi, Sung Hoon;Kim, Hyunjoong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.255-264
    • /
    • 2016
  • Classification is a predictive modeling for a categorical target variable. Various classification ensemble methods, which predict with better accuracy by combining multiple classifiers, became a powerful machine learning and data mining paradigm. Well-known methodologies of classification ensemble are boosting, bagging and random forest. In this article, we assume that decision trees are used as classifiers in the ensemble. Further, we hypothesized that tree size affects classification accuracy. To study how the tree size in uences accuracy, we performed experiments using twenty-eight data sets. Then we compare the performances of ensemble algorithms; bagging, double-bagging, boosting and random forest, with different tree sizes in the experiment.

Prediction of Track Quality Index (TQI) Using Vehicle Acceleration Data based on Machine Learning (차량가속도데이터를 이용한 머신러닝 기반의 궤도품질지수(TQI) 예측)

  • Choi, Chanyong;Kim, Hunki;Kim, Young Cheul;Kim, Sang-su
    • Journal of the Korean Geosynthetics Society
    • /
    • v.19 no.1
    • /
    • pp.45-53
    • /
    • 2020
  • There is an increasing tendency to try to make predictive analysis using measurement data based on machine learning techniques in the railway industries. In this paper, it was predicted that Track quality index (TQI) using vehicle acceleration data based on the machine learning method. The XGB (XGBoost) was the most accurate with 85% in the all data sets. Unlike the SVM model with a single algorithm, the RF and XGB model with a ensemble system were considered to be good at the prediction performance. In the case of the Surface TQI, it is shown that the acceleration of the z axis is highly related to the vertical direction and is in good agreement with the previous studies. Therefore, it is appropriate to apply the model with the ensemble algorithm to predict the track quality index using the vehicle vibration acceleration data because the accuracy may vary depending on the applied model in the machine learning methods.

Machine Learning Framework for Predicting Voids in the Mineral Aggregation in Asphalt Mixtures (아스팔트 혼합물의 골재 간극률 예측을 위한 기계학습 프레임워크)

  • Hyemin Park;Ilho Na;Hyunhwan Kim;Bongjun Ji
    • Journal of the Korean Geosynthetics Society
    • /
    • v.23 no.1
    • /
    • pp.17-25
    • /
    • 2024
  • The Voids in the Mineral Aggregate (VMA) within asphalt mixtures play a crucial role in defining the mixture's structural integrity, durability, and resistance to environmental factors. Accurate prediction and optimization of VMA are essential for enhancing the performance and longevity of asphalt pavements, particularly in varying climatic and environmental conditions. This study introduces a novel machine learning framework leveraging ensemble machine learning model for predicting VMA in asphalt mixtures. By analyzing a comprehensive set of variables, including aggregate size distribution, binder content, and compaction levels, our framework offers a more precise prediction of VMA than traditional single-model approaches. The use of advanced machine learning techniques not only surpasses the accuracy of conventional empirical methods but also significantly reduces the reliance on extensive laboratory testing. Our findings highlight the effectiveness of a data-driven approach in the field of asphalt mixture design, showcasing a path toward more efficient and sustainable pavement engineering practices. This research contributes to the advancement of predictive modeling in construction materials, offering valuable insights for the design and optimization of asphalt mixtures with optimal void characteristics.

A Study on Comparison of Lung Cancer Prediction Using Ensemble Machine Learning

  • NAM, Yu-Jin;SHIN, Won-Ji
    • Korean Journal of Artificial Intelligence
    • /
    • v.7 no.2
    • /
    • pp.19-24
    • /
    • 2019
  • Lung cancer is a chronic disease which ranks fourth in cancer incidence with 11 percent of the total cancer incidence in Korea. To deal with such issues, there is an active study on the usefulness and utilization of the Clinical Decision Support System (CDSS) which utilizes machine learning. Thus, this study reviews existing studies on artificial intelligence technology that can be used in determining the lung cancer, and conducted a study on the applicability of machine learning in determination of the lung cancer by comparison and analysis using Azure ML provided by Microsoft. The results of this study show different predictions yielded by three algorithms: Support Vector Machine (SVM), Two-Class Support Decision Jungle and Multiclass Decision Jungle. This study has its limitations in the size of the Big data used in Machine Learning. Although the data provided by Kaggle is the most suitable one for this study, it is assumed that there is a limit in learning the data fully due to the lack of absolute figures. Therefore, it is claimed that if the agency's cooperation in the subsequent research is used to compare and analyze various kinds of algorithms other than those used in this study, a more accurate screening machine for lung cancer could be created.

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease

  • Wiharto, Wiharto;Suryani, Esti;Setyawan, Sigit;Putra, Bintang PE
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.1
    • /
    • pp.31-40
    • /
    • 2022
  • Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.