• Title/Summary/Keyword: Ensemble machine learning

Search Result 226, Processing Time 0.025 seconds

Machine Learning Methodology for Management of Shipbuilding Master Data

  • Jeong, Ju Hyeon;Woo, Jong Hun;Park, JungGoo
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.12 no.1
    • /
    • pp.428-439
    • /
    • 2020
  • The continuous development of information and communication technologies has resulted in an exponential increase in data. Consequently, technologies related to data analysis are growing in importance. The shipbuilding industry has high production uncertainty and variability, which has created an urgent need for data analysis techniques, such as machine learning. In particular, the industry cannot effectively respond to changes in the production-related standard time information systems, such as the basic cycle time and lead time. Improvement measures are necessary to enable the industry to respond swiftly to changes in the production environment. In this study, the lead times for fabrication, assembly of ship block, spool fabrication and painting were predicted using machine learning technology to propose a new management method for the process lead time using a master data system for the time element in the production data. Data preprocessing was performed in various ways using R and Python, which are open source programming languages, and process variables were selected considering their relationships with the lead time through correlation analysis and analysis of variables. Various machine learning, deep learning, and ensemble learning algorithms were applied to create the lead time prediction models. In addition, the applicability of the proposed machine learning methodology to standard work hour prediction was verified by evaluating the prediction models using the evaluation criteria, such as the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Logarithmic Error (RMSLE).

Ensemble-By-Session Method on Keystroke Dynamics based User Authentication

  • Ho, Jiacang;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.8 no.4
    • /
    • pp.19-25
    • /
    • 2016
  • There are many free applications that need users to sign up before they can use the applications nowadays. It is difficult to choose a suitable password for your account. If the password is too complicated, then it is hard to remember it. However, it is easy to be intruded by other users if we use a very simple password. Therefore, biometric-based approach is one of the solutions to solve the issue. The biometric-based approach includes keystroke dynamics on keyboard, mice, or mobile devices, gait analysis and many more. The approach can integrate with any appropriate machine learning algorithm to learn a user typing behavior for authentication system. Preprocessing phase is one the important role to increase the performance of the algorithm. In this paper, we have proposed ensemble-by-session (EBS) method which to operate the preprocessing phase before the training phase. EBS distributes the dataset into multiple sub-datasets based on the session. In other words, we split the dataset into session by session instead of assemble them all into one dataset. If a session is considered as one day, then the sub-dataset has all the information on the particular day. Each sub-dataset will have different information for different day. The sub-datasets are then trained by a machine learning algorithm. From the experimental result, we have shown the improvement of the performance for each base algorithm after the preprocessing phase.

Neural Networks-Based Method for Electrocardiogram Classification

  • Maksym Kovalchuk;Viktoriia Kharchenko;Andrii Yavorskyi;Igor Bieda;Taras Panchenko
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.9
    • /
    • pp.186-191
    • /
    • 2023
  • Neural Networks are widely used for huge variety of tasks solution. Machine Learning methods are used also for signal and time series analysis, including electrocardiograms. Contemporary wearable devices, both medical and non-medical type like smart watch, allow to gather the data in real time uninterruptedly. This allows us to transfer these data for analysis or make an analysis on the device, and thus provide preliminary diagnosis, or at least fix some serious deviations. Different methods are being used for this kind of analysis, ranging from medical-oriented using distinctive features of the signal to machine learning and deep learning approaches. Here we will demonstrate a neural network-based approach to this task by building an ensemble of 1D CNN classifiers and a final classifier of selection using logistic regression, random forest or support vector machine, and make the conclusions of the comparison with other approaches.

Development of an Ensemble Prediction Model for Lateral Deformation of Retaining Wall Under Construction (시공 중 흙막이 벽체 수평변위 예측을 위한 앙상블 모델 개발)

  • Seo, Seunghwan;Chung, Moonkyung
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.4
    • /
    • pp.5-17
    • /
    • 2023
  • The advancement in large-scale underground excavation in urban areas necessitates monitoring and predicting technologies that can pre-emptively mitigate risk factors at construction sites. Traditionally, two methods predict the deformation of retaining walls induced by excavation: empirical and numerical analysis. Recent progress in artificial intelligence technology has led to the development of a predictive model using machine learning techniques. This study developed a model for predicting the deformation of a retaining wall under construction using a boosting-based algorithm and an ensemble model with outstanding predictive power and efficiency. A database was established using the data from the design-construction-maintenance process of the underground retaining wall project in a manifold manner. Based on these data, a learning model was created, and the performance was evaluated. The boosting and ensemble models demonstrated that wall deformation could be accurately predicted. In addition, it was confirmed that prediction results with the characteristics of the actual construction process can be presented using data collected from ground measurements. The predictive model developed in this study is expected to be used to evaluate and monitor the stability of retaining walls under construction.

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.

Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics (약물유전체학에서 약물반응 예측모형과 변수선택 방법)

  • Kim, Kyuhwan;Kim, Wonkuk
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.153-166
    • /
    • 2021
  • A main goal of pharmacogenomics studies is to predict individual's drug responsiveness based on high dimensional genetic variables. Due to a large number of variables, feature selection is required in order to reduce the number of variables. The selected features are used to construct a predictive model using machine learning algorithms. In the present study, we applied several hybrid feature selection methods such as combinations of logistic regression, ReliefF, TurF, random forest, and LASSO to a next generation sequencing data set of 400 epilepsy patients. We then applied the selected features to machine learning methods including random forest, gradient boosting, and support vector machine as well as a stacking ensemble method. Our results showed that the stacking model with a hybrid feature selection of random forest and ReliefF performs better than with other combinations of approaches. Based on a 5-fold cross validation partition, the mean test accuracy value of the best model was 0.727 and the mean test AUC value of the best model was 0.761. It also appeared that the stacking models outperform than single machine learning predictive models when using the same selected features.

Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction (앙상블 머신러닝 모형을 이용한 하천 녹조발생 예측모형의 입력변수 특성에 따른 성능 영향)

  • Kang, Byeong-Koo;Park, Jungsu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.6
    • /
    • pp.417-424
    • /
    • 2021
  • Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.

Classifying the severity of pedestrian accidents using ensemble machine learning algorithms: A case study of Daejeon City (앙상블 학습기법을 활용한 보행자 교통사고 심각도 분류: 대전시 사례를 중심으로)

  • Kang, Heungsik;Noh, Myounggyu
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.39-46
    • /
    • 2022
  • As the link between traffic accidents and social and economic losses has been confirmed, there is a growing interest in developing safety policies based on crash data and a need for countermeasures to reduce severe crash outcomes such as severe injuries and fatalities. In this study, we select Daejeon city where the relative proportion of fatal crashes is high, as a case study region and focus on the severity of pedestrian crashes. After a series of data manipulation process, we run machine learning algorithms for the optimal model selection and variable identification. Of nine algorithms applied, AdaBoost and Random Forest (ensemble based ones) outperform others in terms of performance metrics. Based on the results, we identify major influential factors (i.e., the age of pedestrian as 70s or 20s, pedestrian crossing) on pedestrian crashes in Daejeon, and suggest them as measures for reducing severe outcomes.

Performance comparison on vocal cords disordered voice discrimination via machine learning methods (기계학습에 의한 후두 장애음성 식별기의 성능 비교)

  • Cheolwoo Jo;Soo-Geun Wang;Ickhwan Kwon
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.35-43
    • /
    • 2022
  • This paper studies how to improve the identification rate of laryngeal disability speech data by convolutional neural network (CNN) and machine learning ensemble learning methods. In general, the number of laryngeal dysfunction speech data is small, so even if identifiers are constructed by statistical methods, the phenomenon caused by overfitting depending on the training method can lead to a decrease the identification rate when exposed to external data. In this work, we try to combine results derived from CNN models and machine learning models with various accuracy in a multi-voting manner to ensure improved classification efficiency compared to the original trained models. The Pusan National University Hospital (PNUH) dataset was used to train and validate algorithms. The dataset contains normal voice and voice data of benign and malignant tumors. In the experiment, an attempt was made to distinguish between normal and benign tumors and malignant tumors. As a result of the experiment, the random forest method was found to be the best ensemble method and showed an identification rate of 85%.

Multi-Time Window Feature Extraction Technique for Anger Detection in Gait Data

  • Beom Kwon;Taegeun Oh
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.41-51
    • /
    • 2023
  • In this paper, we propose a technique of multi-time window feature extraction for anger detection in gait data. In the previous gait-based emotion recognition methods, the pedestrian's stride, time taken for one stride, walking speed, and forward tilt angles of the neck and thorax are calculated. Then, minimum, mean, and maximum values are calculated for the entire interval to use them as features. However, each feature does not always change uniformly over the entire interval but sometimes changes locally. Therefore, we propose a multi-time window feature extraction technique that can extract both global and local features, from long-term to short-term. In addition, we also propose an ensemble model that consists of multiple classifiers. Each classifier is trained with features extracted from different multi-time windows. To verify the effectiveness of the proposed feature extraction technique and ensemble model, a public three-dimensional gait dataset was used. The simulation results demonstrate that the proposed ensemble model achieves the best performance compared to machine learning models trained with existing feature extraction techniques for four performance evaluation metrics.