• 제목/요약/키워드: Ensemble approach

검색결과 173건 처리시간 0.03초

Evaluation of Ensemble Approach for O3 and PM2.5 Simulation

  • Morino, Yu;Chatani, Satoru;Hayami, Hiroshi;Sasaki, Kansuke;Mori, Yasuaki;Morikawa, Tazuko;Ohara, Toshimasa;Hasegawa, Shuichi;Kobayashi, Shinji
    • Asian Journal of Atmospheric Environment
    • /
    • 제4권3호
    • /
    • pp.150-156
    • /
    • 2010
  • Inter-comparison of chemical transport models (CTMs) was conducted among four modeling research groups. Model performance of the ensemble approach to $O_3$ and $PM_{2.5}$ simulation was evaluated by using observational data with a time resolution of 1 or 6 hours at four sites in the Kanto area, Japan, in summer 2007. All groups applied the Community Multiscale Air Quality model. The ensemble average of the four CTMs reproduced well the temporal variation of $O_3$ (r=0.65-0.85) and the daily maximum $O_3$ concentration within a factor of 1.3. By contrast, it underestimated $PM_{2.5}$ concentrations by a factor of 1.4-2, and did not reproduce the $PM_{2.5}$ temporal variation at two suburban sites (r=~0.2). The ensemble average improved the simulation of ${SO_4}^{2-}$, ${NO_3}^-$, and ${NH_4}^+$, whose production pathways are well known. In particular, the ensemble approach effectively simulated ${NO_3}^-$, despite the large variability among CTMs (up to a factor of 10). However, the ensemble average did not improve the simulation of organic aerosols (OAs), underestimating their concentrations by a factor of 5. The contribution of OAs to $PM_{2.5}$ (36-39%) was large, so improvement of the OA simulation model is essential to improve the $PM_{2.5}$ simulation.

흉부 X-선 영상을 이용한 14 가지 흉부 질환 분류를 위한 Ensemble Knowledge Distillation (Ensemble Knowledge Distillation for Classification of 14 Thorax Diseases using Chest X-ray Images)

  • 호티키우칸;전영훈;곽정환
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2021년도 제64차 하계학술대회논문집 29권2호
    • /
    • pp.313-315
    • /
    • 2021
  • Timely and accurate diagnosis of lung diseases using Chest X-ray images has been gained much attention from the computer vision and medical imaging communities. Although previous studies have presented the capability of deep convolutional neural networks by achieving competitive binary classification results, their models were seemingly unreliable to effectively distinguish multiple disease groups using a large number of x-ray images. In this paper, we aim to build an advanced approach, so-called Ensemble Knowledge Distillation (EKD), to significantly boost the classification accuracies, compared to traditional KD methods by distilling knowledge from a cumbersome teacher model into an ensemble of lightweight student models with parallel branches trained with ground truth labels. Therefore, learning features at different branches of the student models could enable the network to learn diverse patterns and improve the qualify of final predictions through an ensemble learning solution. Although we observed that experiments on the well-established ChestX-ray14 dataset showed the classification improvements of traditional KD compared to the base transfer learning approach, the EKD performance would be expected to potentially enhance classification accuracy and model generalization, especially in situations of the imbalanced dataset and the interdependency of 14 weakly annotated thorax diseases.

  • PDF

랜덤화 배깅을 이용한 재무 부실화 예측 (Randomized Bagging for Bankruptcy Prediction)

  • 민성환
    • 한국IT서비스학회지
    • /
    • 제15권1호
    • /
    • pp.153-166
    • /
    • 2016
  • Ensemble classification is an approach that combines individually trained classifiers in order to improve prediction accuracy over individual classifiers. Ensemble techniques have been shown to be very effective in improving the generalization ability of the classifier. But base classifiers need to be as accurate and diverse as possible in order to enhance the generalization abilities of an ensemble model. Bagging is one of the most popular ensemble methods. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. In this study we proposed a new bagging variant ensemble model, Randomized Bagging (RBagging) for improving the standard bagging ensemble model. The proposed model was applied to the bankruptcy prediction problem using a real data set and the results were compared with those of the other models. The experimental results showed that the proposed model outperformed the standard bagging model.

심층 신경망 기반의 앙상블 방식을 이용한 토마토 작물의 질병 식별 (Tomato Crop Disease Classification Using an Ensemble Approach Based on a Deep Neural Network)

  • 김민기
    • 한국멀티미디어학회논문지
    • /
    • 제23권10호
    • /
    • pp.1250-1257
    • /
    • 2020
  • The early detection of diseases is important in agriculture because diseases are major threats of reducing crop yield for farmers. The shape and color of plant leaf are changed differently according to the disease. So we can detect and estimate the disease by inspecting the visual feature in leaf. This study presents a vision-based leaf classification method for detecting the diseases of tomato crop. ResNet-50 model was used to extract the visual feature in leaf and classify the disease of tomato crop, since the model showed the higher accuracy than the other ResNet models with different depths. We propose a new ensemble approach using several DCNN classifiers that have the same structure but have been trained at different ranges in the DCNN layers. Experimental result achieved accuracy of 97.19% for PlantVillage dataset. It validates that the proposed method effectively classify the disease of tomato crop.

단시간 다중모델 앙상블 바람 예측 (Wind Prediction with a Short-range Multi-Model Ensemble System)

  • 윤지원;이용희;이희춘;하종철;이희상;장동언
    • 대기
    • /
    • 제17권4호
    • /
    • pp.327-337
    • /
    • 2007
  • In this study, we examined the new ensemble training approach to reduce the systematic error and improve prediction skill of wind by using the Short-range Ensemble prediction system (SENSE), which is the mesoscale multi-model ensemble prediction system. The SENSE has 16 ensemble members based on the MM5, WRF ARW, and WRF NMM. We evaluated the skill of surface wind prediction compared with AWS (Automatic Weather Station) observation during the summer season (June - August, 2006). At first stage, the correction of initial state for each member was performed with respect to the observed values, and the corrected members get the training stage to find out an adaptive weight function, which is formulated by Root Mean Square Vector Error (RMSVE). It was found that the optimal training period was 1-day through the experiments of sensitivity to the training interval. We obtained the weighted ensemble average which reveals smaller errors of the spatial and temporal pattern of wind speed than those of the simple ensemble average.

앙상블 방법에 따른 WRF/CMAQ 수치 모의 결과 비교 연구 - 2013년 부산지역 고농도 PM10 사례 (A Comparison Study of Ensemble Approach Using WRF/CMAQ Model - The High PM10 Episode in Busan)

  • 김태희;김유근;손장호;정주희
    • 한국대기환경학회지
    • /
    • 제32권5호
    • /
    • pp.513-525
    • /
    • 2016
  • To propose an effective ensemble methods in predicting $PM_{10}$ concentration, six experiments were designed by different ensemble average methods (e.g., non-weighted, single weighted, and cluster weighted methods). The single weighted method was calculated the weighted value using both multiple regression analysis and singular value decomposition and the cluster weighted method was estimated the weighted value based on temperature, relative humidity, and wind component using multiple regression analysis. The effects of ensemble average methods were significantly better in weighted average than non-weight. The results of ensemble experiments using weighted average methods were distinguished according to methods calculating the weighted value. The single weighted average method using multiple regression analysis showed the highest accuracy for hourly $PM_{10}$ concentration, and the cluster weighted average method based on relative humidity showed the highest accuracy for daily mean $PM_{10}$ concentration. However, the result of ensemble spread analysis showed better reliability in the single weighted average method than the cluster weighted average method based on relative humidity. Thus, the single weighted average method was the most effective method in this study case.

The ensemble approach in comparison with the diverse feature selection techniques for estimating NPPs parameters using the different learning algorithms of the feed-forward neural network

  • Moshkbar-Bakhshayesh, Khalil
    • Nuclear Engineering and Technology
    • /
    • 제53권12호
    • /
    • pp.3944-3951
    • /
    • 2021
  • Several reasons such as no free lunch theorem indicate that there is not a universal Feature selection (FS) technique that outperforms other ones. Moreover, some approaches such as using synthetic dataset, in presence of large number of FS techniques, are very tedious and time consuming task. In this study to tackle the issue of dependency of estimation accuracy on the selected FS technique, a methodology based on the heterogeneous ensemble is proposed. The performance of the major learning algorithms of neural network (i.e. the FFNN-BR, the FFNN-LM) in combination with the diverse FS techniques (i.e. the NCA, the F-test, the Kendall's tau, the Pearson, the Spearman, and the Relief) and different combination techniques of the heterogeneous ensemble (i.e. the Min, the Median, the Arithmetic mean, and the Geometric mean) are considered. The target parameters/transients of Bushehr nuclear power plant (BNPP) are examined as the case study. The results show that the Min combination technique gives the more accurate estimation. Therefore, if the number of FS techniques is m and the number of learning algorithms is n, by the heterogeneous ensemble, the search space for acceptable estimation of the target parameters may be reduced from n × m to n × 1. The proposed methodology gives a simple and practical approach for more reliable and more accurate estimation of the target parameters compared to the methods such as the use of synthetic dataset or trial and error methods.

앙상블 학습을 이용한 기업혁신과 경영성과 예측 (Corporate Innovation and Business Performance Prediction Using Ensemble Learning)

  • 안경민;이영찬
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제30권4호
    • /
    • pp.247-275
    • /
    • 2021
  • Purpose This study attempted to predict corporate innovation and business performance using ensemble learning. Design/methodology/approach The ensemble techniques uses weak learning to create robust learning, which combines several weak models to derive improved performance. In this study, XGboost, LightGBM, and Catboost were used among ensemble techniques. It was compared and evaluated with traditional machine learning methods. Findings The summary of the research results is as follows. First, the type of innovation is expanding from technical innovation to non-technical areas. Second, it was confirmed that LightGBM performed best for radical innovation prediction, and XGboost performed best for incremental innovation prediction. Third, Catboost performed best for firm performance prediction. Although there was no significant difference in predictive power between ensemble techniques, we found that comparative analysis was necessary to confirm better prediction performance.

신용카드 불법현금융통 적발을 위한 축소된 앙상블 모형 (Illegal Cash Accommodation Detection Modeling Using Ensemble Size Reduction)

  • 이화경;한상범;지원철
    • 지능정보연구
    • /
    • 제16권1호
    • /
    • pp.93-116
    • /
    • 2010
  • 불법현금융통 적발모형 개발에 앙상블 접근방법을 사용하였다. 불법현금융통은 국내 신용카드사의 손익에 영향을 미치며 최근 국제화되고 있음에도 불구하고 학문적인 접근이 이루어지지 않았다. 부정행위 적발모형(Fraud Detection Model, FDM)은 데이터 불균형 문제로 인하여 좋은 성능을 얻기 어려운데, 다수의 모형을 결합하는 앙상블이 대안으로 제시되어 왔다. 앙상블에 포함된 모형들의 다양성이 보장된다면 단일모형에 비해 더 좋은 성능을 보인다는 점은 이미 인정되고 있으며, 최근 연구 결과는 학습된 모든 기본모형들을 사용하는 것보다 적절한 기본모형들만 선택하여 앙상블에 포함시키는 것이 바람직하다는 것이다. 본 논문에서는 효과적인 불법현금융통 적발을 위하여 축소된 앙상블 기법을 사용하는데, 정확성과 다양성 척도를 사용하여 앙상블에 참여할 기본모형을 선택하는 것이다. 다양성은 앙상블을 구성하는 기본모형들 사이의 불일치 (Disagreement or Ambiguity)를 의미하는데, FDM에 내재된 데이터 불균형문제를 고려하여 두 가지 측면에 중점을 두었다. 첫째, 학습 자료의 추출 과정에서 다양성을 확보하기 위한 소수 범주의 과잉추출 방법과 적절한 훈련 방법에 대해 설명하였다. 둘째, 소수범주에 초점을 맞추어 기존의 다양성 척도를 효과적인 척도로 변형시키고, 전진추가법과 후진소거법의 동적 다양성 계산법을 도입하여 앙상블에 참여할 기본모형을 평가하였다. 실험에 사용된 학습 알고리즘은 신경망, 의사결정수와 로짓 회귀분석이었으며, 동질적 앙상블과 이질적 앙상블을 구성하여 성능평가를 하였다. 실험결과 불법현금융통 적발모형에 있어 축소된 앙상블은 모든 기본모형이 포함된 앙상블과 성능 차이가 없었다. 축소된 앙상블은 앙상블 구성의 복잡성을 감소시키고 구현을 용이하게 한다는 점에서 FDM에서도 유력한 모형 수립 접근방법이 될 수 있음을 보였다.

Predicting stock price direction by using data mining methods : Emphasis on comparing single classifiers and ensemble classifiers

  • Eo, Kyun Sun;Lee, Kun Chang
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권11호
    • /
    • pp.111-116
    • /
    • 2017
  • This paper proposes a data mining approach to predicting stock price direction. Stock market fluctuates due to many factors. Therefore, predicting stock price direction has become an important issue in the field of stock market analysis. However, in literature, there are few studies applying data mining approaches to predicting the stock price direction. To contribute to literature, this paper proposes comparing single classifiers and ensemble classifiers. Single classifiers include logistic regression, decision tree, neural network, and support vector machine. Ensemble classifiers we consider are adaboost, random forest, bagging, stacking, and vote. For the sake of experiments, we garnered dataset from Korea Stock Exchange (KRX) ranging from 2008 to 2015. Data mining experiments using WEKA revealed that random forest, one of ensemble classifiers, shows best results in terms of metrics such as AUC (area under the ROC curve) and accuracy.