• Title/Summary/Keyword: 앙상블 결정트리

Search Result 18, Processing Time 0.027 seconds

ECG-based Biometric Authentication Using Random Forest (랜덤 포레스트를 이용한 심전도 기반 생체 인증)

  • Kim, JeongKyun;Lee, Kang Bok;Hong, Sang Gi
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.6
    • /
    • pp.100-105
    • /
    • 2017
  • This work presents an ECG biometric recognition system for the purpose of biometric authentication. ECG biometric approaches are divided into two major categories, fiducial-based and non-fiducial-based methods. This paper proposes a new non-fiducial framework using discrete cosine transform and a Random Forest classifier. When using DCT, most of the signal information tends to be concentrated in a few low-frequency components. In order to apply feature vector of Random Forest, DCT feature vectors of ECG heartbeats are constructed by using the first 40 DCT coefficients. RF is based on the computation of a large number of decision trees. It is relatively fast, robust and inherently suitable for multi-class problems. Furthermore, it trade-off threshold between admission and rejection of ID inside RF classifier. As a result, proposed method offers 99.9% recognition rates when tested on MIT-BIH NSRDB.

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

Machine Learning Algorithms Evaluation and CombML Development for Dam Inflow Prediction (댐 유입량 예측을 위한 머신러닝 알고리즘 평가 및 CombML 개발)

  • Hong, Jiyeong;Bae, Juhyeon;Jeong, Yeonseok;Lim, Kyoung Jae
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.317-317
    • /
    • 2021
  • 효율적인 물관리를 위한 댐 유입량 대한 연구는 필수적이다. 본 연구에서는 다양한 머신러닝 알고리즘을 통해 40년동안의 기상 및 댐 유입량 데이터를 이용하여 소양강댐 유입량을 예측하였으며, 그 중 고유량과 저유량예측에 적합한 알고리즘을 각각 선정하여 머신러닝 알고리즘을 결합한 CombML을 개발하였다. 의사 결정 트리 (DT), 멀티 레이어 퍼셉트론 (MLP), 랜덤 포레스트(RF), 그래디언트 부스팅 (GB), RNN-LSTM 및 CNN-LSTM 알고리즘이 사용되었으며, 그 중 가장 정확도가 높은 모형과 고유량이 아닌 경우에서 특별히 예측 정확도가 높은 모형을 결합하여 결합 머신러닝 알고리즘 (CombML)을 개발 및 평가하였다. 사용된 알고리즘 중 MLP가 NSE 0.812, RMSE 77.218 m3/s, MAE 29.034 m3/s, R 0.924, R2 0.817로 댐 유입량 예측에서 최상의 결과를 보여주었으며, 댐 유입량이 100 m3/s 이하인 경우 앙상블 모델 (RF, GB) 이 댐 유입 예측에서 MLP보다 더 나은 성능을 보였다. 따라서, 유입량이 100 m3/s 이상 시의 평균 일일 강수량인 16 mm를 기준으로 강수가 16mm 이하인 경우 앙상블 방법 (RF 및 GB)을 사용하고 강수가 16 mm 이상인 경우 MLP를 사용하여 댐 유입을 예측하기 위해 두 가지 복합 머신러닝(CombML) 모델 (RF_MLP 및 GB_MLP)을 개발하였다. 그 결과 RF_MLP에서 NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, R2 0.859, GB_MLP의 경우 NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, R2 0.831로 CombML이 댐 유입을 가장 정확하게 예측하는 것으로 평가되었다. 본 연구를 통해 하천 유황을 고려한 여러 머신러닝 알고리즘의 결합을 통한 유입량 예측 결과, 알고리즘 결합 시 예측 모형의 정확도가 개선되는 것이 확인되었으며, 이는 추후 효율적인 물관리에 이용될 수 있을 것으로 판단된다.

  • PDF

Evaluation of a Thermal Conductivity Prediction Model for Compacted Clay Based on a Machine Learning Method (기계학습법을 통한 압축 벤토나이트의 열전도도 추정 모델 평가)

  • Yoon, Seok;Bang, Hyun-Tae;Kim, Geon-Young;Jeon, Haemin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.41 no.2
    • /
    • pp.123-131
    • /
    • 2021
  • The buffer is a key component of an engineered barrier system that safeguards the disposal of high-level radioactive waste. Buffers are located between disposal canisters and host rock, and they can restrain the release of radionuclides and protect canisters from the inflow of ground water. Since considerable heat is released from a disposal canister to the surrounding buffer, the thermal conductivity of the buffer is a very important parameter in the entire disposal safety. For this reason, a lot of research has been conducted on thermal conductivity prediction models that consider various factors. In this study, the thermal conductivity of a buffer is estimated using the machine learning methods of: linear regression, decision tree, support vector machine (SVM), ensemble, Gaussian process regression (GPR), neural network, deep belief network, and genetic programming. In the results, the machine learning methods such as ensemble, genetic programming, SVM with cubic parameter, and GPR showed better performance compared with the regression model, with the ensemble with XGBoost and Gaussian process regression models showing best performance.

Development of Type 2 Prediction Prediction Based on Big Data (빅데이터 기반 2형 당뇨 예측 알고리즘 개발)

  • Hyun Sim;HyunWook Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.5
    • /
    • pp.999-1008
    • /
    • 2023
  • Early prediction of chronic diseases such as diabetes is an important issue, and improving the accuracy of diabetes prediction is especially important. Various machine learning and deep learning-based methodologies are being introduced for diabetes prediction, but these technologies require large amounts of data for better performance than other methodologies, and the learning cost is high due to complex data models. In this study, we aim to verify the claim that DNN using the pima dataset and k-fold cross-validation reduces the efficiency of diabetes diagnosis models. Machine learning classification methods such as decision trees, SVM, random forests, logistic regression, KNN, and various ensemble techniques were used to determine which algorithm produces the best prediction results. After training and testing all classification models, the proposed system provided the best results on XGBoost classifier with ADASYN method, with accuracy of 81%, F1 coefficient of 0.81, and AUC of 0.84. Additionally, a domain adaptation method was implemented to demonstrate the versatility of the proposed system. An explainable AI approach using the LIME and SHAP frameworks was implemented to understand how the model predicts the final outcome.

A Development of Defeat Prediction Model Using Machine Learning in Polyurethane Foaming Process for Automotive Seat (머신러닝을 활용한 자동차 시트용 폴리우레탄 발포공정의 불량 예측 모델 개발)

  • Choi, Nak-Hun;Oh, Jong-Seok;Ahn, Jong-Rok;Kim, Key-Sun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.6
    • /
    • pp.36-42
    • /
    • 2021
  • With recent developments in the Fourth Industrial Revolution, the manufacturing industry has changed rapidly. Through key aspects of Fourth Industrial Revolution super-connections and super-intelligence, machine learning will be able to make fault predictions during the foam-making process. Polyol and isocyanate are components in polyurethane foam. There has been a lot of research that could affect the characteristics of the products, depending on the specific mixture ratio and temperature. Based on these characteristics, this study collects data from each factor during the foam-making process and applies them to machine learning in order to predict faults. The algorithms used in machine learning are the decision tree, kNN, and an ensemble algorithm, and these algorithms learn from 5,147 cases. Based on 1,000 pieces of data for validation, the learning results show up to 98.5% accuracy using the ensemble algorithm. Therefore, the results confirm the faults of currently produced parts by collecting real-time data from each factor during the foam-making process. Furthermore, control of each of the factors may improve the fault rate.

A Study on the Work-time Estimation for Block Erections Using Stacking Ensemble Learning (Stacking Ensemble Learning을 활용한 블록 탑재 시수 예측)

  • Kwon, Hyukcheon;Ruy, Wonsun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.6
    • /
    • pp.488-496
    • /
    • 2019
  • The estimation of block erection work time at a dock is one of the important factors when establishing or managing the total shipbuilding schedule. In order to predict the work time, it is a natural approach that the existing block erection data would be used to solve the problem. Generally the work time per unit is the product of coefficient value, quantity, and product value. Previously, the work time per unit is determined statistically by unit load data. However, we estimate the work time per unit through work time coefficient value from series ships using machine learning. In machine learning, the outcome depends mainly on how the training data is organized. Therefore, in this study, we use 'Feature Engineering' to determine which one should be used as features, and to check their influence on the result. In order to get the coefficient value of each block, we try to solve this problem through the Ensemble learning methods which is actively used nowadays. Among the many techniques of Ensemble learning, the final model is constructed by Stacking Ensemble techniques, consisting of the existing Ensemble models (Decision Tree, Random Forest, Gradient Boost, Square Loss Gradient Boost, XG Boost), and the accuracy is maximized by selecting three candidates among all models. Finally, the results of this study are verified by the predicted total work time for one ship among the same series.

Crop Yield Estimation Utilizing Feature Selection Based on Graph Classification (그래프 분류 기반 특징 선택을 활용한 작물 수확량 예측)

  • Ohnmar Khin;Sung-Keun Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1269-1276
    • /
    • 2023
  • Crop estimation is essential for the multinational meal and powerful demand due to its numerous aspects like soil, rain, climate, atmosphere, and their relations. The consequence of climate shift impacts the farming yield products. We operate the dataset with temperature, rainfall, humidity, etc. The current research focuses on feature selection with multifarious classifiers to assist farmers and agriculturalists. The crop yield estimation utilizing the feature selection approach is 96% accuracy. Feature selection affects a machine learning model's performance. Additionally, the performance of the current graph classifier accepts 81.5%. Eventually, the random forest regressor without feature selections owns 78% accuracy and the decision tree regressor without feature selections retains 67% accuracy. Our research merit is to reveal the experimental results of with and without feature selection significance for the proposed ten algorithms. These findings support learners and students in choosing the appropriate models for crop classification studies.