• 제목/요약/키워드: ensemble machine learning

검색결과 229건 처리시간 0.031초

앙상블 기계학습 모델을 이용한 비정질 소재의 자기냉각 효과 및 전이온도 예측 (Prediction of Transition Temperature and Magnetocaloric Effects in Bulk Metallic Glasses with Ensemble Models)

  • 남충희
    • 한국재료학회지
    • /
    • 제34권7호
    • /
    • pp.363-369
    • /
    • 2024
  • In this study, the magnetocaloric effect and transition temperature of bulk metallic glass, an amorphous material, were predicted through machine learning based on the composition features. From the Python module 'Matminer', 174 compositional features were obtained, and prediction performance was compared while reducing the composition features to prevent overfitting. After optimization using RandomForest, an ensemble model, changes in prediction performance were analyzed according to the number of compositional features. The R2 score was used as a performance metric in the regression prediction, and the best prediction performance was found using only 90 features predicting transition temperature, and 20 features predicting magnetocaloric effects. The most important feature when predicting magnetocaloric effects was the 'Fe' compositional ratio. The feature importance method provided by 'scikit-learn' was applied to sort compositional features. The feature importance method was found to be appropriate by comparing the prediction performance of the Fe-contained dataset with the full dataset.

Machine Learning Methodology for Management of Shipbuilding Master Data

  • Jeong, Ju Hyeon;Woo, Jong Hun;Park, JungGoo
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • 제12권1호
    • /
    • pp.428-439
    • /
    • 2020
  • The continuous development of information and communication technologies has resulted in an exponential increase in data. Consequently, technologies related to data analysis are growing in importance. The shipbuilding industry has high production uncertainty and variability, which has created an urgent need for data analysis techniques, such as machine learning. In particular, the industry cannot effectively respond to changes in the production-related standard time information systems, such as the basic cycle time and lead time. Improvement measures are necessary to enable the industry to respond swiftly to changes in the production environment. In this study, the lead times for fabrication, assembly of ship block, spool fabrication and painting were predicted using machine learning technology to propose a new management method for the process lead time using a master data system for the time element in the production data. Data preprocessing was performed in various ways using R and Python, which are open source programming languages, and process variables were selected considering their relationships with the lead time through correlation analysis and analysis of variables. Various machine learning, deep learning, and ensemble learning algorithms were applied to create the lead time prediction models. In addition, the applicability of the proposed machine learning methodology to standard work hour prediction was verified by evaluating the prediction models using the evaluation criteria, such as the Mean Absolute Percentage Error (MAPE) and Root Mean Squared Logarithmic Error (RMSLE).

Ensemble-By-Session Method on Keystroke Dynamics based User Authentication

  • Ho, Jiacang;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제8권4호
    • /
    • pp.19-25
    • /
    • 2016
  • There are many free applications that need users to sign up before they can use the applications nowadays. It is difficult to choose a suitable password for your account. If the password is too complicated, then it is hard to remember it. However, it is easy to be intruded by other users if we use a very simple password. Therefore, biometric-based approach is one of the solutions to solve the issue. The biometric-based approach includes keystroke dynamics on keyboard, mice, or mobile devices, gait analysis and many more. The approach can integrate with any appropriate machine learning algorithm to learn a user typing behavior for authentication system. Preprocessing phase is one the important role to increase the performance of the algorithm. In this paper, we have proposed ensemble-by-session (EBS) method which to operate the preprocessing phase before the training phase. EBS distributes the dataset into multiple sub-datasets based on the session. In other words, we split the dataset into session by session instead of assemble them all into one dataset. If a session is considered as one day, then the sub-dataset has all the information on the particular day. Each sub-dataset will have different information for different day. The sub-datasets are then trained by a machine learning algorithm. From the experimental result, we have shown the improvement of the performance for each base algorithm after the preprocessing phase.

A Grey Wolf Optimized- Stacked Ensemble Approach for Nitrate Contamination Prediction in Cauvery Delta

  • Kalaivanan K;Vellingiri J
    • 자원환경지질
    • /
    • 제57권3호
    • /
    • pp.329-342
    • /
    • 2024
  • The exponential increase in nitrate pollution of river water poses an immediate threat to public health and the environment. This contamination is primarily due to various human activities, which include the overuse of nitrogenous fertilizers in agriculture and the discharge of nitrate-rich industrial effluents into rivers. As a result, the accurate prediction and identification of contaminated areas has become a crucial and challenging task for researchers. To solve these problems, this work leads to the prediction of nitrate contamination using machine learning approaches. This paper presents a novel approach known as Grey Wolf Optimizer (GWO) based on the Stacked Ensemble approach for predicting nitrate pollution in the Cauvery Delta region of Tamilnadu, India. The proposed method is evaluated using a Cauvery River dataset from the Tamilnadu Pollution Control Board. The proposed method shows excellent performance, achieving an accuracy of 93.31%, a precision of 93%, a sensitivity of 97.53%, a specificity of 94.28%, an F1-score of 95.23%, and an ROC score of 95%. These impressive results underline the demonstration of the proposed method in accurately predicting nitrate pollution in river water and ultimately help to make informed decisions to tackle these critical environmental problems.

Neural Networks-Based Method for Electrocardiogram Classification

  • Maksym Kovalchuk;Viktoriia Kharchenko;Andrii Yavorskyi;Igor Bieda;Taras Panchenko
    • International Journal of Computer Science & Network Security
    • /
    • 제23권9호
    • /
    • pp.186-191
    • /
    • 2023
  • Neural Networks are widely used for huge variety of tasks solution. Machine Learning methods are used also for signal and time series analysis, including electrocardiograms. Contemporary wearable devices, both medical and non-medical type like smart watch, allow to gather the data in real time uninterruptedly. This allows us to transfer these data for analysis or make an analysis on the device, and thus provide preliminary diagnosis, or at least fix some serious deviations. Different methods are being used for this kind of analysis, ranging from medical-oriented using distinctive features of the signal to machine learning and deep learning approaches. Here we will demonstrate a neural network-based approach to this task by building an ensemble of 1D CNN classifiers and a final classifier of selection using logistic regression, random forest or support vector machine, and make the conclusions of the comparison with other approaches.

시공 중 흙막이 벽체 수평변위 예측을 위한 앙상블 모델 개발 (Development of an Ensemble Prediction Model for Lateral Deformation of Retaining Wall Under Construction)

  • 서승환;정문경
    • 한국지반공학회논문집
    • /
    • 제39권4호
    • /
    • pp.5-17
    • /
    • 2023
  • 도심지 지하굴착 공사가 대형화되면서 공사 중 안전사고에 대한 위험요인이 더욱 증가하고 있다. 이에 따라 공사현장의 위험요소를 모니터링하고 사전에 예측할 수 있는 기술이 필요하다. 굴착으로 인한 흙막이 벽체의 변형을 예측하는 방법에는 크게 경험식과 수치해석 두 가지 방법으로 분류할 수 있으며, 최근에는 인공지능 기술의 발달과 함께 머신러닝 기법을 활용한 예측 모델이 한 가지 방법으로 자리 잡고 있다. 본 연구에서는 예측력과 효율성이 우수한 부스팅 계열 알고리즘 및 앙상블 모델을 이용하여 시공 중 흙막이 벽체 변형을 예측하는 모델을 구축하였다. 지하흙막이 공사의 설계-시공-유지관리 과정에서 도출되는 자료들을 복합적으로 활용하여 데이터베이스를 구축하고, 이 자료를 토대로 학습모델을 만들고 성능을 평가하였다. 모델 성능 평가 결과, 높은 정확도로 흙막이 벽체 변형을 예측할 수 있었으며, 지반계측 자료를 학습에 활용함으로써 실제 시공과정의 특성이 반영된 예측결과를 제시할 수 있었다. 본 연구에서 구축한 예측 모델을 활용하여 시공 중 흙막이 벽체의 안정성 평가 및 모니터링에 활용할 수 있을 것으로 기대된다.

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12spc호
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.

약물유전체학에서 약물반응 예측모형과 변수선택 방법 (Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics)

  • 김규환;김원국
    • 응용통계연구
    • /
    • 제34권2호
    • /
    • pp.153-166
    • /
    • 2021
  • 약물유전체학 연구의 주요 목표는 고차원의 유전 변수를 기반으로 개인의 약물 반응성을 예측하는 것이다. 변수의 개수가 많기 때문에 변수의 개수를 줄이기 위해서는 변수 선택이 필요하며, 선택된 변수들은 머신러닝 알고리즘을 사용하여 예측 모델을 구축하는데 사용된다. 본 연구에서는 400명의 뇌전증 환자의 차세대 염기서열 분석 데이터에 로지스틱 회귀, ReliefF, TurF, 랜덤 포레스트, LASSO의 조합과 같은 여러 가지 혼합 변수 선택 방법을 적용하였다. 선택된 변수들에 랜덤포레스트, 그래디언트 부스팅, 서포트벡터머신을 포함한 머신러닝 방법들을 적용했고 스태킹을 통해 앙상블 모형을 구축하였다. 본 연구의 결과는 랜덤포레스트와 ReliefF의 혼합 변수 선택 방법을 이용한 스태킹 모형이 다른 모형보다 더 좋은 성능을 보인다는 것을 보여주었다. 5-폴드 교차 검증을 기반으로 하여 적합한 최적 모형의 평균 검증 정확도는 0.727이고 평균 검증 AUC 값은 0.761로 나타났다. 또한, 동일한 변수를 사용할 때 스태킹 모델이 단일 머신러닝 예측 모델보다 성능이 우수한 것으로 나타났다.

앙상블 머신러닝 모형을 이용한 하천 녹조발생 예측모형의 입력변수 특성에 따른 성능 영향 (Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction)

  • 강병구;박정수
    • 상하수도학회지
    • /
    • 제35권6호
    • /
    • pp.417-424
    • /
    • 2021
  • Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.

앙상블 학습기법을 활용한 보행자 교통사고 심각도 분류: 대전시 사례를 중심으로 (Classifying the severity of pedestrian accidents using ensemble machine learning algorithms: A case study of Daejeon City)

  • 강흥식;노명규
    • 디지털융복합연구
    • /
    • 제20권5호
    • /
    • pp.39-46
    • /
    • 2022
  • 교통사고와 사회·경제적 손실 간의 연계성이 확인됨에 따라 사고 데이터에 기반을 둔 안전 정책 마련 및 중상·사망 등 그 심각도가 높은 교통사고의 절감 방안의 필요성이 제기되고 있다. 본 연구에서는 인구 대비 교통사고 사망자 비율이 높은 대전시를 대상지역으로 설정하고 보행자 교통사고 데이터를 수집한 후, 기계학습을 통해 최적알고리즘과 심각도 분류의 주요 인자를 도출하였다. 연구의 결과에 따르면, 적용한 9개 알고리즘 중 앙상블 기반의 학습 기법인 AdaBoost (Adaptive Boosting)와 RF (Random Forest)가 최적의 성능을 보여주었다. 이를 기반으로 도출된 대전시 보행자 교통사고 심각도의 주요 인자는 보행자의 연령이 70대 및 20대이거나 사고유형이 횡단사고에 의한 경우로 나타남에 따라 대전시 보행자 사고 저감 대책을 위한 고려요인으로 제안하였다.