• Title/Summary/Keyword: ensemble machine learning

Search Result 229, Processing Time 0.022 seconds

Performance comparison on vocal cords disordered voice discrimination via machine learning methods (기계학습에 의한 후두 장애음성 식별기의 성능 비교)

  • Cheolwoo Jo;Soo-Geun Wang;Ickhwan Kwon
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.35-43
    • /
    • 2022
  • This paper studies how to improve the identification rate of laryngeal disability speech data by convolutional neural network (CNN) and machine learning ensemble learning methods. In general, the number of laryngeal dysfunction speech data is small, so even if identifiers are constructed by statistical methods, the phenomenon caused by overfitting depending on the training method can lead to a decrease the identification rate when exposed to external data. In this work, we try to combine results derived from CNN models and machine learning models with various accuracy in a multi-voting manner to ensure improved classification efficiency compared to the original trained models. The Pusan National University Hospital (PNUH) dataset was used to train and validate algorithms. The dataset contains normal voice and voice data of benign and malignant tumors. In the experiment, an attempt was made to distinguish between normal and benign tumors and malignant tumors. As a result of the experiment, the random forest method was found to be the best ensemble method and showed an identification rate of 85%.

Multi-Time Window Feature Extraction Technique for Anger Detection in Gait Data

  • Beom Kwon;Taegeun Oh
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.41-51
    • /
    • 2023
  • In this paper, we propose a technique of multi-time window feature extraction for anger detection in gait data. In the previous gait-based emotion recognition methods, the pedestrian's stride, time taken for one stride, walking speed, and forward tilt angles of the neck and thorax are calculated. Then, minimum, mean, and maximum values are calculated for the entire interval to use them as features. However, each feature does not always change uniformly over the entire interval but sometimes changes locally. Therefore, we propose a multi-time window feature extraction technique that can extract both global and local features, from long-term to short-term. In addition, we also propose an ensemble model that consists of multiple classifiers. Each classifier is trained with features extracted from different multi-time windows. To verify the effectiveness of the proposed feature extraction technique and ensemble model, a public three-dimensional gait dataset was used. The simulation results demonstrate that the proposed ensemble model achieves the best performance compared to machine learning models trained with existing feature extraction techniques for four performance evaluation metrics.

Developing of New a Tensorflow Tutorial Model on Machine Learning : Focusing on the Kaggle Titanic Dataset (텐서플로우 튜토리얼 방식의 머신러닝 신규 모델 개발 : 캐글 타이타닉 데이터 셋을 중심으로)

  • Kim, Dong Gil;Park, Yong-Soon;Park, Lae-Jeong;Chung, Tae-Yun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.4
    • /
    • pp.207-218
    • /
    • 2019
  • The purpose of this study is to develop a model that can systematically study the whole learning process of machine learning. Since the existing model describes the learning process with minimum coding, it can learn the progress of machine learning sequentially through the new model, and can visualize each process using the tensor flow. The new model used all of the existing model algorithms and confirmed the importance of the variables that affect the target variable, survival. The used to classification training data into training and verification, and to evaluate the performance of the model with test data. As a result of the final analysis, the ensemble techniques is the all tutorial model showed high performance, and the maximum performance of the model was improved by maximum 5.2% when compared with the existing model using. In future research, it is necessary to construct an environment in which machine learning can be learned regardless of the data preprocessing method and OS that can learn a model that is better than the existing performance.

Shield TBM disc cutter replacement and wear rate prediction using machine learning techniques

  • Kim, Yunhee;Hong, Jiyeon;Shin, Jaewoo;Kim, Bumjoo
    • Geomechanics and Engineering
    • /
    • v.29 no.3
    • /
    • pp.249-258
    • /
    • 2022
  • A disc cutter is an excavation tool on a tunnel boring machine (TBM) cutterhead; it crushes and cuts rock mass while the machine excavates using the cutterhead's rotational movement. Disc cutter wear occurs naturally. Thus, along with the management of downtime and excavation efficiency, abrasioned disc cutters need to be replaced at the proper time; otherwise, the construction period could be delayed and the cost could increase. The most common prediction models for TBM performance and for the disc cutter lifetime have been proposed by the Colorado School of Mines and Norwegian University of Science and Technology. However, design parameters of existing models do not well correspond to the field values when a TBM encounters complex and difficult ground conditions in the field. Thus, this study proposes a series of machine learning models to predict the disc cutter lifetime of a shield TBM using the excavation (machine) data during operation which is response to the rock mass. This study utilizes five different machine learning techniques: four types of classification models (i.e., K-Nearest Neighbors (KNN), Support Vector Machine, Decision Tree, and Staking Ensemble Model) and one artificial neural network (ANN) model. The KNN model was found to be the best model among the four classification models, affording the highest recall of 81%. The ANN model also predicted the wear rate of disc cutters reasonably well.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Development of Prediction Model of Chloride Diffusion Coefficient using Machine Learning (기계학습을 이용한 염화물 확산계수 예측모델 개발)

  • Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.23 no.3
    • /
    • pp.87-94
    • /
    • 2023
  • Chloride is one of the most common threats to reinforced concrete (RC) durability. Alkaline environment of concrete makes a passive layer on the surface of reinforcement bars that prevents the bar from corrosion. However, when the chloride concentration amount at the reinforcement bar reaches a certain level, deterioration of the passive protection layer occurs, causing corrosion and ultimately reducing the structure's safety and durability. Therefore, understanding the chloride diffusion and its prediction are important to evaluate the safety and durability of RC structure. In this study, the chloride diffusion coefficient is predicted by machine learning techniques. Various machine learning techniques such as multiple linear regression, decision tree, random forest, support vector machine, artificial neural networks, extreme gradient boosting annd k-nearest neighbor were used and accuracy of there models were compared. In order to evaluate the accuracy, root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE) and coefficient of determination (R2) were used as prediction performance indices. The k-fold cross-validation procedure was used to estimate the performance of machine learning models when making predictions on data not used during training. Grid search was applied to hyperparameter optimization. It has been shown from numerical simulation that ensemble learning methods such as random forest and extreme gradient boosting successfully predicted the chloride diffusion coefficient and artificial neural networks also provided accurate result.

Parallel Implementation of A Neural Network Ensemble on the Connection Machine CM-2 (Connection Machine CM-2상에서 신경망군(群)의 병렬 구현)

  • 김대진
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.1
    • /
    • pp.28-41
    • /
    • 1997
  • This paper describes a parallel implementation of a neurla network ensemble developed for object recognition on the connection machine CM-2. The implementation ensures that multiple networks are implemented simultaneously starting from different initial weights and all training samples are applied to each network by one sample per a copy of each network. When compared with a sequential implementation, this accelerates the computation speed by O(N.m.n) where N, m, and n are the network, respectively. The speedup in the computation time and the convergence characteristics of sthe modified backpropagation learning precedure were evaluated by two-dimensional object recognition problem.

  • PDF

Improved prediction of soil liquefaction susceptibility using ensemble learning algorithms

  • Satyam Tiwari;Sarat K. Das;Madhumita Mohanty;Prakhar
    • Geomechanics and Engineering
    • /
    • v.37 no.5
    • /
    • pp.475-498
    • /
    • 2024
  • The prediction of the susceptibility of soil to liquefaction using a limited set of parameters, particularly when dealing with highly unbalanced databases is a challenging problem. The current study focuses on different ensemble learning classification algorithms using highly unbalanced databases of results from in-situ tests; standard penetration test (SPT), shear wave velocity (Vs) test, and cone penetration test (CPT). The input parameters for these datasets consist of earthquake intensity parameters, strong ground motion parameters, and in-situ soil testing parameters. liquefaction index serving as the binary output parameter. After a rigorous comparison with existing literature, extreme gradient boosting (XGBoost), bagging, and random forest (RF) emerge as the most efficient models for liquefaction instance classification across different datasets. Notably, for SPT and Vs-based models, XGBoost exhibits superior performance, followed by Light gradient boosting machine (LightGBM) and Bagging, while for CPT-based models, Bagging ranks highest, followed by Gradient boosting and random forest, with CPT-based models demonstrating lower Gmean(error), rendering them preferable for soil liquefaction susceptibility prediction. Key parameters influencing model performance include internal friction angle of soil (ϕ) and percentage of fines less than 75 µ (F75) for SPT and Vs data and normalized average cone tip resistance (qc) and peak horizontal ground acceleration (amax) for CPT data. It was also observed that the addition of Vs measurement to SPT data increased the efficiency of the prediction in comparison to only SPT data. Furthermore, to enhance usability, a graphical user interface (GUI) for seamless classification operations based on provided input parameters was proposed.

Very Short-Term Wind Power Ensemble Forecasting without Numerical Weather Prediction through the Predictor Design

  • Lee, Duehee;Park, Yong-Gi;Park, Jong-Bae;Roh, Jae Hyung
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.6
    • /
    • pp.2177-2186
    • /
    • 2017
  • The goal of this paper is to provide the specific forecasting steps and to explain how to design the forecasting architecture and training data sets to forecast very short-term wind power when the numerical weather prediction (NWP) is unavailable, and when the sampling periods of the wind power and training data are different. We forecast the very short-term wind power every 15 minutes starting two hours after receiving the most recent measurements up to 40 hours for a total of 38 hours, without using the NWP data but using the historical weather data. Generally, the NWP works as a predictor and can be converted to wind power forecasts through machine learning-based forecasting algorithms. Without the NWP, we can still build the predictor by shifting the historical weather data and apply the machine learning-based algorithms to the shifted weather data. In this process, the sampling intervals of the weather and wind power data are unified. To verify our approaches, we participated in the 2017 wind power forecasting competition held by the European Energy Market conference and ranked sixth. We have shown that the wind power can be accurately forecasted through the data shifting although the NWP is unavailable.

Cognitive Impairment Prediction Model Using AutoML and Lifelog

  • Hyunchul Choi;Chiho Yoon;Sae Bom Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.11
    • /
    • pp.53-63
    • /
    • 2023
  • This study developed a cognitive impairment predictive model as one of the screening tests for preventing dementia in the elderly by using Automated Machine Learning(AutoML). We used 'Wearable lifelog data for high-risk dementia patients' of National Information Society Agency, then conducted using PyCaret 3.0.0 in the Google Colaboratory environment. This study analysis steps are as follows; first, selecting five models demonstrating excellent classification performance for the model development and lifelog data analysis. Next, using ensemble learning to integrate these models and assess their performance. It was found that Voting Classifier, Gradient Boosting Classifier, Extreme Gradient Boosting, Light Gradient Boosting Machine, Extra Trees Classifier, and Random Forest Classifier model showed high predictive performance in that order. This study findings, furthermore, emphasized on the the crucial importance of 'Average respiration per minute during sleep' and 'Average heart rate per minute during sleep' as the most critical feature variables for accurate predictions. Finally, these study results suggest that consideration of the possibility of using machine learning and lifelog as a means to more effectively manage and prevent cognitive impairment in the elderly.