• 제목/요약/키워드: Ensemble machine learning

검색결과 228건 처리시간 0.024초

의약품 콜드체인 유통 수요 예측을 위한 AI 모델에 관한 연구 (A Study on the AI Model for Prediction of Demand for Cold Chain Distribution of Drugs)

  • 김희영;류기환;근재;손현곤
    • 문화기술의 융합
    • /
    • 제9권3호
    • /
    • pp.763-768
    • /
    • 2023
  • 본 논문에서는 의약품 유통량 예측을 위해 기존의 통계 방식(ARIMA)과 머신러닝 방식(Informer)을 개발하고 비교하였다. 일별 데이터의 예측에서는 머신러닝 기반의 모델이 유리하며, 월별 예측에서는 ARIMA를 활용하고 데이터가 증가하면서 Informer로 전환하는 것이 효과적임을 발견하였다. 예측 에러율(RMSE)은 기존 방식 대비 26.6% 낮아졌으며, 예측 정확도도 13% 개선되어 86.2%의 결과를 보였다. 본 논문을 통해 통계적 방법과 머신러닝 방법을 앙상블하여 최상의 결과를 얻을 수 있다는 장점을 발견하였다. 또한 머신러닝 기반의 AI 모델은 불규칙한 상황에서도 딥러닝 연산을 통해 최선의 결과를 도출할 수 있으며, 상용화 이후에는 데이터양이 증가함에 따라 성능이 향상될 것으로 기대된다.

Transfer Learning based DNN-SVM Hybrid Model for Breast Cancer Classification

  • Gui Rae Jo;Beomsu Baek;Young Soon Kim;Dong Hoon Lim
    • 한국컴퓨터정보학회논문지
    • /
    • 제28권11호
    • /
    • pp.1-11
    • /
    • 2023
  • 유방암은 전 세계적으로 여성들 대다수에게 가장 두려워하는 질환이다. 오늘날 데이터의 증가와 컴퓨팅 기술의 향상으로 머신러닝(machine learning)의 효율성이 증대되어 암 검출 및 진단 등에 중요한 역할을 하고 있다. 딥러닝(deep learning)은 인공신경망(artificial neural network, ANN)을 기반으로 하는 머신러닝 기술의 한 분야로 최근 여러 분야에서 성능이 급속도로 개선되어 활용 범위가 확대되고 있다. 본 연구에서는 유방암 분류를 위해 전이학습(transfer learning) 기반 DNN(Deep Neural Network)과 SVM(support vector machine)의 구조를 결합한 DNN-SVM Hybrid 모형을 제안한다. 전이학습 기반 제안된 모형은 적은 학습 데이터에도 효과적이고, 학습 속도도 빠르며, 단일모형, 즉 DNN과 SVM이 가지는 장점을 모두 활용 가능토록 결합함으로써 모형 성능이 개선되었다. 제안된 DNN-SVM Hybrid 모형의 성능평가를 위해 UCI 머신러닝 저장소에서 제공하는 WOBC와 WDBC 유방암 자료를 가지고 성능실험 결과, 제안된 모형은 여러 가지 성능 척도 면에서 단일모형인 로지스틱회귀 모형, DNN, SVM 그리고 앙상블 모형인 랜덤 포레스트보다 우수함을 보였다.

Data Correction For Enhancing Classification Accuracy By Unknown Deep Neural Network Classifiers

  • Kwon, Hyun;Yoon, Hyunsoo;Choi, Daeseon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권9호
    • /
    • pp.3243-3257
    • /
    • 2021
  • Deep neural networks provide excellent performance in pattern recognition, audio classification, and image recognition. It is important that they accurately recognize input data, particularly when they are used in autonomous vehicles or for medical services. In this study, we propose a data correction method for increasing the accuracy of an unknown classifier by modifying the input data without changing the classifier. This method modifies the input data slightly so that the unknown classifier will correctly recognize the input data. It is an ensemble method that has the characteristic of transferability to an unknown classifier by generating corrected data that are correctly recognized by several classifiers that are known in advance. We tested our method using MNIST and CIFAR-10 as experimental data. The experimental results exhibit that the accuracy of the unknown classifier is a 100% correct recognition rate owing to the data correction generated by the proposed method, which minimizes data distortion to maintain the data's recognizability by humans.

부스팅 인공신경망을 활용한 부실예측모형의 성과개선 (Boosting neural networks with an application to bankruptcy prediction)

  • 김명종;강대기
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2009년도 춘계학술대회
    • /
    • pp.872-875
    • /
    • 2009
  • In a bankruptcy prediction model, the accuracy is one of crucial performance measures due to its significant economic impacts. Ensemble is one of widely used methods for improving the performance of classification and prediction models. Two popular ensemble methods, Bagging and Boosting, have been applied with great success to various machine learning problems using mostly decision trees as base classifiers. In this paper, we analyze the performance of boosted neural networks for improving the performance of traditional neural networks on bankruptcy prediction tasks. Experimental results on Korean firms indicated that the boosted neural networks showed the improved performance over traditional neural networks.

  • PDF

Securing SCADA Systems: A Comprehensive Machine Learning Approach for Detecting Reconnaissance Attacks

  • Ezaz Aldahasi;Talal Alkharobi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권12호
    • /
    • pp.1-12
    • /
    • 2023
  • Ensuring the security of Supervisory Control and Data Acquisition (SCADA) and Industrial Control Systems (ICS) is paramount to safeguarding the reliability and safety of critical infrastructure. This paper addresses the significant threat posed by reconnaissance attacks on SCADA/ICS networks and presents an innovative methodology for enhancing their protection. The proposed approach strategically employs imbalance dataset handling techniques, ensemble methods, and feature engineering to enhance the resilience of SCADA/ICS systems. Experimentation and analysis demonstrate the compelling efficacy of our strategy, as evidenced by excellent model performance characterized by good precision, recall, and a commendably low false negative (FN). The practical utility of our approach is underscored through the evaluation of real-world SCADA/ICS datasets, showcasing superior performance compared to existing methods in a comparative analysis. Moreover, the integration of feature augmentation is revealed to significantly enhance detection capabilities. This research contributes to advancing the security posture of SCADA/ICS environments, addressing a critical imperative in the face of evolving cyber threats.

Early Detection of Rice Leaf Blast Disease using Deep-Learning Techniques

  • Syed Rehan Shah;Syed Muhammad Waqas Shah;Hadia Bibi;Mirza Murad Baig
    • International Journal of Computer Science & Network Security
    • /
    • 제24권4호
    • /
    • pp.211-221
    • /
    • 2024
  • Pakistan is a top producer and exporter of high-quality rice, but traditional methods are still being used for detecting rice diseases. This research project developed an automated rice blast disease diagnosis technique based on deep learning, image processing, and transfer learning with pre-trained models such as Inception V3, VGG16, VGG19, and ResNet50. The modified connection skipping ResNet 50 had the highest accuracy of 99.16%, while the other models achieved 98.16%, 98.47%, and 98.56%, respectively. In addition, CNN and an ensemble model K-nearest neighbor were explored for disease prediction, and the study demonstrated superior performance and disease prediction using recommended web-app approaches.

단일 리드 심전도를 이용한 개인 식별 (Identification of Individuals using Single-Lead Electrocardiogram Signal)

  • 임서현;민경란;이종실;장동표;김인영
    • 대한의용생체공학회:의공학회지
    • /
    • 제35권3호
    • /
    • pp.42-49
    • /
    • 2014
  • We propose an individual identification method using a single-lead electrocardiogram signal. In this paper, lead I ECG is measured from subjects in various physical and psychological states. We performed a noise reduction for lead I signal as a preprocessing stage and this signal is used to acquire the representative beat waveform for individuals by utilizing the ensemble average. From the P-QRS-T waves, features are extracted to identify individuals, 19 using the duration and amplitude information, and 16 from the QRS complex acquired by applying Pan-Tompkins algorithm to the ensemble averaged waveform. To analyze the effect of each feature and to improve efficiency while maintaining the performance, Relief-F algorithm is used to select features from the 35 features extracted. Some or all of these 35 features were used in the support vector machine (SVM) learning and tests. The classification accuracy using the entire feature set was 98.34%. Experimental results show that it is possible to identify a person by features extracted from limb lead I signal only.

빅데이터 기반 2형 당뇨 예측 알고리즘 개발 (Development of Type 2 Prediction Prediction Based on Big Data)

  • 심현;김현욱
    • 한국전자통신학회논문지
    • /
    • 제18권5호
    • /
    • pp.999-1008
    • /
    • 2023
  • 당뇨병과 같은 만성 질환의 조기 예측은 중요한 이슈이며, 그중에서도 당뇨 예측의 정확도 향상은 매우 중요하다. 당뇨 예측을 위한 다양한 기계 학습 및 딥 러닝 기반 방법론을 도입하고 있으나, 이러한 기술들은 다른 방법론보다 더 우수한 성능을 위해 대량의 데이터를 필요로 하며, 복잡한 데이터 모델 때문에 학습 비용이 높다. 본 연구에서는 pima 데이터셋과 k-fold 교차 검증을 사용한 DNN이 당뇨 진단 모델의 효율성을 감소시킨다는 주장을 검증하고자 한다. 의사 결정 트리, SVM, 랜덤 포레스트, 로지스틱 회귀, KNN 및 다양한 앙상블 기법과 같은 기계 학습 분류 방법을 사용하여 어떤 알고리즘이 최상의 예측 결과를 내는지 결정하였다. 모든 분류 모델에 대한 훈련 및 테스트 후 제안된 시스템은 ADASYN 방법과 함께 XGBoost 분류기에서 최상의 결과를 제공하였으며, 정확도는 81%, F1 계수는 0.81, AUC는 0.84였다. 또한 도메인 적응 방법이 제안된 시스템의 다양성을 보여주기 위해 구현되었다. LIME 및 SHAP 프레임워크를 사용한 설명 가능한 AI 접근 방식이 모델이 최종 결과를 어떻게 예측하는지 이해하기 위해 구현되었다.

Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

  • Malhotra, Ruchika;Sharma, Anjali
    • Journal of Information Processing Systems
    • /
    • 제14권3호
    • /
    • pp.751-770
    • /
    • 2018
  • Web applications are indispensable in the software industry and continuously evolve either meeting a newer criteria and/or including new functionalities. However, despite assuring quality via testing, what hinders a straightforward development is the presence of defects. Several factors contribute to defects and are often minimized at high expense in terms of man-hours. Thus, detection of fault proneness in early phases of software development is important. Therefore, a fault prediction model for identifying fault-prone classes in a web application is highly desired. In this work, we compare 14 machine learning techniques to analyse the relationship between object oriented metrics and fault prediction in web applications. The study is carried out using various releases of Apache Click and Apache Rave datasets. En-route to the predictive analysis, the input basis set for each release is first optimized using filter based correlation feature selection (CFS) method. It is found that the LCOM3, WMC, NPM and DAM metrics are the most significant predictors. The statistical analysis of these metrics also finds good conformity with the CFS evaluation and affirms the role of these metrics in the defect prediction of web applications. The overall predictive ability of different fault prediction models is first ranked using Friedman technique and then statistically compared using Nemenyi post-hoc analysis. The results not only upholds the predictive capability of machine learning models for faulty classes using web applications, but also finds that ensemble algorithms are most appropriate for defect prediction in Apache datasets. Further, we also derive a consensus between the metrics selected by the CFS technique and the statistical analysis of the datasets.

Classifying Social Media Users' Stance: Exploring Diverse Feature Sets Using Machine Learning Algorithms

  • Kashif Ayyub;Muhammad Wasif Nisar;Ehsan Ullah Munir;Muhammad Ramzan
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.79-88
    • /
    • 2024
  • The use of the social media has become part of our daily life activities. The social web channels provide the content generation facility to its users who can share their views, opinions and experiences towards certain topics. The researchers are using the social media content for various research areas. Sentiment analysis, one of the most active research areas in last decade, is the process to extract reviews, opinions and sentiments of people. Sentiment analysis is applied in diverse sub-areas such as subjectivity analysis, polarity detection, and emotion detection. Stance classification has emerged as a new and interesting research area as it aims to determine whether the content writer is in favor, against or neutral towards the target topic or issue. Stance classification is significant as it has many research applications like rumor stance classifications, stance classification towards public forums, claim stance classification, neural attention stance classification, online debate stance classification, dialogic properties stance classification etc. This research study explores different feature sets such as lexical, sentiment-specific, dialog-based which have been extracted using the standard datasets in the relevant area. Supervised learning approaches of generative algorithms such as Naïve Bayes and discriminative machine learning algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree and k-Nearest Neighbor have been applied and then ensemble-based algorithms like Random Forest and AdaBoost have been applied. The empirical based results have been evaluated using the standard performance measures of Accuracy, Precision, Recall, and F-measures.