• Title/Summary/Keyword: 앙상블 학습 기법

Search Result 91, Processing Time 0.034 seconds

Improving SVM Classification by Constructing Ensemble (앙상블 구성을 이용한 SVM 분류성능의 향상)

  • 제홍모;방승양
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.251-258
    • /
    • 2003
  • A support vector machine (SVM) is supposed to provide a good generalization performance, but the actual performance of a actually implemented SVM is often far from the theoretically expected level. This is largely because the implementation is based on an approximated algorithm, due to the high complexity of time and space. To improve this limitation, we propose ensemble of SVMs by using Bagging (bootstrap aggregating) and Boosting. By a Bagging stage each individual SVM is trained independently using randomly chosen training samples via a bootstrap technique. By a Boosting stage an individual SVM is trained by choosing training samples according to their probability distribution. The probability distribution is updated by the error of independent classifiers, and the process is iterated. After the training stage, they are aggregated to make a collective decision in several ways, such ai majority voting, the LSE(least squares estimation) -based weighting, and double layer hierarchical combining. The simulation results for IRIS data classification, the hand-written digit recognition and Face detection show that the proposed SVM ensembles greatly outperforms a single SVM in terms of classification accuracy.

Vacant Technology Forecasting using Ensemble Model (앙상블모형을 이용한 공백기술예측)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.3
    • /
    • pp.341-346
    • /
    • 2011
  • A vacant technology forecasting is an important issue in management of technology. The forecast of vacant technology leads to the growth of nation and company. So, we need the results of technology developments until now to predict the vacant technology. Patent is an objective thing of the results in research and development of technology. We study a predictive method for forecasting the vacant technology quantitatively using patent data in this paper. We propose an ensemble model that is to vote some clustering criteria because we can't guarantee a model is optimal. Therefore, an objective and accurate forecasting model of vacant technology is researched in our paper. This model combines statistical analysis methods with machine learning algorithms. To verify our performance evaluation objectively, we make experiments using patent documents of diverse technology fields.

Evaluation of a Thermal Conductivity Prediction Model for Compacted Clay Based on a Machine Learning Method (기계학습법을 통한 압축 벤토나이트의 열전도도 추정 모델 평가)

  • Yoon, Seok;Bang, Hyun-Tae;Kim, Geon-Young;Jeon, Haemin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.41 no.2
    • /
    • pp.123-131
    • /
    • 2021
  • The buffer is a key component of an engineered barrier system that safeguards the disposal of high-level radioactive waste. Buffers are located between disposal canisters and host rock, and they can restrain the release of radionuclides and protect canisters from the inflow of ground water. Since considerable heat is released from a disposal canister to the surrounding buffer, the thermal conductivity of the buffer is a very important parameter in the entire disposal safety. For this reason, a lot of research has been conducted on thermal conductivity prediction models that consider various factors. In this study, the thermal conductivity of a buffer is estimated using the machine learning methods of: linear regression, decision tree, support vector machine (SVM), ensemble, Gaussian process regression (GPR), neural network, deep belief network, and genetic programming. In the results, the machine learning methods such as ensemble, genetic programming, SVM with cubic parameter, and GPR showed better performance compared with the regression model, with the ensemble with XGBoost and Gaussian process regression models showing best performance.

Kernel Perceptron Boosting for Effective Learning of Imbalanced Data (불균형 데이터의 효과적 학습을 위한 커널 퍼셉트론 부스팅 기법)

  • 오장민;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.304-306
    • /
    • 2001
  • 많은 실세계의 문제에서 일반적인 패턴 분류 알고리즘들은 데이터의 불균형 문제에 어려움을 겪는다. 각각의 학습 예제에 균등한 중요도를 부여하는 기존의 기법들은 문제의 특징을 제대로 파악하지 못하는 경우가 많다. 본 논문에서는 불균형 데이터 문제를 해결하기 위해 퍼셉트론에 기반한 부스팅 기법을 제안한다. 부스팅 기법은 학습을 어렵게 하는 데이터에 집중하여 앙상블 머신을 구축하는 기법이다. 부스팅 기법에서는 약학습기를 필요로 하는데 기존 퍼셉트론의 경우 문제에 따라 약학습기(weak learner)의 조건을 만족시키지 못하는 경우가 있을 수 있다. 이에 커널을 도입한 커널 퍼셉트론을 사용하여 학습기의 표현 능력을 높였다. Reuters-21578 문서 집합을 대상으로 한 문서 여과 문제에서 부스팅 기법은 다층신경망이나 나이브 베이스 분류기보다 우수한 성능을 보였으며, 인공 데이터 실험을 통하여 부스팅의 샘플링 경향을 분석하였다.

  • PDF

Research on Pothole Detection using Feature-Level Ensemble of Pretrained Deep Learning Models (사전 학습된 딥러닝 모델들의 피처 레벨 앙상블을 이용한 포트홀 검출 기법 연구)

  • Ye-Eun Shin;Inki Kim;Beomjun Kim;Younghoon Jeon;Jeonghwan Gwak
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.35-38
    • /
    • 2023
  • 포트홀은 주행하는 자동차와 접촉이 이뤄지면 차체나 운전자에게 충격을 주고 제어를 잃게 하여 도로 위 안전을 위협할 수 있다. 포트홀의 검출을 위한 국내 동향으로는 진동을 이용한 방식과 신고시스템 이용한 방식과 영상 인식을 기반한 방식이 있다. 이 중 영상 인식 기반 방식은 보급이 쉽고 비용이 저렴하나, 컴퓨터 비전 알고리즘은 영상의 품질에 따라 정확도가 달라지는 문제가 있었다. 이를 보완하기 위해 영상 인식 기반의 딥러닝 모델을 사용한다. 따라서, 본 논문에서는 사전 학습된 딥러닝 모델의 정확도 향상을 위한 Feature Level Ensemble 기법을 제안한다. 제안된 기법은 사전 학습된 CNN 모델 중 Test 데이터의 정확도 기준 Top-3 모델을 선정하여 각 딥러닝 모델의 Feature Map을 Concatenate하고 이를 Fully-Connected(FC) Layer로 입력하여 구현한다. Feature Level Ensemble 기법이 적용된 딥러닝 모델은 평균 대비 3.76%의 정확도 향상을 보였으며, Top-1 모델인 ShuffleNet보다 0.94%의 정확도 향상을 보였다. 결론적으로 본 논문에서 제안된 기법은 사전 학습된 모델들을 이용하여 각 모델의 다양한 특징을 통해 기존 모델 대비 정확도의 향상을 이룰 수 있었다.

  • PDF

Wild Bird Sound Classification Scheme using Focal Loss and Ensemble Learning (Focal Loss와 앙상블 학습을 이용한 야생조류 소리 분류 기법)

  • Jaeseung Lee;Jehyeok Rew
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.2
    • /
    • pp.15-25
    • /
    • 2024
  • For effective analysis of animal ecosystems, technology that can automatically identify the current status of animal habitats is crucial. Specifically, animal sound classification, which identifies species based on their sounds, is gaining great attention where video-based discrimination is impractical. Traditional studies have relied on a single deep learning model to classify animal sounds. However, sounds collected in outdoor settings often include substantial background noise, complicating the task for a single model. In addition, data imbalance among species may lead to biased model training. To address these challenges, in this paper, we propose an animal sound classification scheme that combines predictions from multiple models using Focal Loss, which adjusts penalties based on class data volume. Experiments on public datasets have demonstrated that our scheme can improve recall by up to 22.6% compared to an average of single models.

Classifying the severity of pedestrian accidents using ensemble machine learning algorithms: A case study of Daejeon City (앙상블 학습기법을 활용한 보행자 교통사고 심각도 분류: 대전시 사례를 중심으로)

  • Kang, Heungsik;Noh, Myounggyu
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.39-46
    • /
    • 2022
  • As the link between traffic accidents and social and economic losses has been confirmed, there is a growing interest in developing safety policies based on crash data and a need for countermeasures to reduce severe crash outcomes such as severe injuries and fatalities. In this study, we select Daejeon city where the relative proportion of fatal crashes is high, as a case study region and focus on the severity of pedestrian crashes. After a series of data manipulation process, we run machine learning algorithms for the optimal model selection and variable identification. Of nine algorithms applied, AdaBoost and Random Forest (ensemble based ones) outperform others in terms of performance metrics. Based on the results, we identify major influential factors (i.e., the age of pedestrian as 70s or 20s, pedestrian crossing) on pedestrian crashes in Daejeon, and suggest them as measures for reducing severe outcomes.

Named Entity Recognition Using Distant Supervision and Active Bagging (원거리 감독과 능동 배깅을 이용한 개체명 인식)

  • Lee, Seong-hee;Song, Yeong-kil;Kim, Hark-soo
    • Journal of KIISE
    • /
    • v.43 no.2
    • /
    • pp.269-274
    • /
    • 2016
  • Named entity recognition is a process which extracts named entities in sentences and determines categories of the named entities. Previous studies on named entity recognition have primarily been used for supervised learning. For supervised learning, a large training corpus manually annotated with named entity categories is needed, and it is a time-consuming and labor-intensive job to manually construct a large training corpus. We propose a semi-supervised learning method to minimize the cost needed for training corpus construction and to rapidly enhance the performance of named entity recognition. The proposed method uses distance supervision for the construction of the initial training corpus. It can then effectively remove noise sentences in the initial training corpus through the use of an active bagging method, an ensemble method of bagging and active learning. In the experiments, the proposed method improved the F1-score of named entity recognition from 67.36% to 76.42% after active bagging for 15 times.

Korean Dependency Parsing Using Various Ensemble Models (다양한 앙상블 알고리즘을 이용한 한국어 의존 구문 분석)

  • Jo, Gyeong-Cheol;Kim, Ju-Wan;Kim, Gyun-Yeop;Park, Seong-Jin;Gang, Sang-U
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.543-545
    • /
    • 2019
  • 본 논문은 최신 한국어 의존 구문 분석 모델(Korean dependency parsing model)들과 다양한 앙상블 모델(ensemble model)들을 결합하여 그 성능을 분석한다. 단어 표현은 미리 학습된 워드 임베딩 모델(word embedding model)과 ELMo(Embedding from Language Model), Bert(Bidirectional Encoder Representations from Transformer) 그리고 다양한 추가 자질들을 사용한다. 또한 사용된 의존 구문 분석 모델로는 Stack Pointer Network Model, Deep Biaffine Attention Parser와 Left to Right Pointer Parser를 이용한다. 최종적으로 각 모델의 분석 결과를 앙상블 모델인 Bagging 기법과 XGBoost(Extreme Gradient Boosting) 이용하여 최적의 모델을 제안한다.

  • PDF

A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting (설명 가능한 정기예금 가입 여부 예측을 위한 앙상블 학습 기반 분류 모델들의 비교 분석)

  • Shin, Zian;Moon, Jihoon;Rho, Seungmin
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.97-117
    • /
    • 2021
  • Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.