• 제목/요약/키워드: Ensemble combination

검색결과 55건 처리시간 0.025초

흉부 CT 영상에서 비소세포폐암 환자의 재발 예측을 위한 종양 내외부 영상 패치 기반 앙상블 학습 (Ensemble Learning Based on Tumor Internal and External Imaging Patch to Predict the Recurrence of Non-small Cell Lung Cancer Patients in Chest CT Image)

  • 이예슬;조아현;홍헬렌
    • 한국멀티미디어학회논문지
    • /
    • 제24권3호
    • /
    • pp.373-381
    • /
    • 2021
  • In this paper, we propose a classification model based on convolutional neural network(CNN) for predicting 2-year recurrence in non-small cell lung cancer(NSCLC) patients using preoperative chest CT images. Based on the region of interest(ROI) defined as the tumor internal and external area, the input images consist of an intratumoral patch, a peritumoral patch and a peritumoral texture patch focusing on the texture information of the peritumoral patch. Each patch is trained through AlexNet pretrained on ImageNet to explore the usefulness and performance of various patches. Additionally, ensemble learning of network trained with each patch analyzes the performance of different patch combination. Compared with all results, the ensemble model with intratumoral and peritumoral patches achieved the best performance (ACC=98.28%, Sensitivity=100%, NPV=100%).

A Study on Korean Sentiment Analysis Rate Using Neural Network and Ensemble Combination

  • Sim, YuJeong;Moon, Seok-Jae;Lee, Jong-Youg
    • International Journal of Advanced Culture Technology
    • /
    • 제9권4호
    • /
    • pp.268-273
    • /
    • 2021
  • In this paper, we propose a sentiment analysis model that improves performance on small-scale data. A sentiment analysis model for small-scale data is proposed and verified through experiments. To this end, we propose Bagging-Bi-GRU, which combines Bi-GRU, which learns GRU, which is a variant of LSTM (Long Short-Term Memory) with excellent performance on sequential data, in both directions and the bagging technique, which is one of the ensembles learning methods. In order to verify the performance of the proposed model, it is applied to small-scale data and large-scale data. And by comparing and analyzing it with the existing machine learning algorithm, Bi-GRU, it shows that the performance of the proposed model is improved not only for small data but also for large data.

다중 스태킹을 가진 새로운 앙상블 학습 기법 (A New Ensemble Machine Learning Technique with Multiple Stacking)

  • 이수은;김한준
    • 한국전자거래학회지
    • /
    • 제25권3호
    • /
    • pp.1-13
    • /
    • 2020
  • 기계학습(machine learning)이란 주어진 데이터에 대한 일반화 과정으로부터 특정 문제를 해결할 수 있는 모델(model) 생성 기술을 의미한다. 우수한 성능의 모델을 생성하기 위해서는 양질의 학습데이터와 일반화 과정을 위한 학습 알고리즘이 준비되어야 한다. 성능 개선을 위한 한 가지 방법으로서 앙상블(Ensemble) 기법은 단일 모델(single model)을 생성하기보다 다중 모델을 생성하며, 이는 배깅(Bagging), 부스팅(Boosting), 스태킹(Stacking) 학습 기법을 포함한다. 본 논문은 기존 스태킹 기법을 개선한 다중 스태킹 앙상블(Multiple Stacking Ensemble) 학습 기법을 제안한다. 다중 스태킹 앙상블 기법의 학습 구조는 딥러닝 구조와 유사하고 각 레이어가 스태킹 모델의 조합으로 구성되며 계층의 수를 증가시켜 각 계층의 오분류율을 최소화하여 성능을 개선한다. 4가지 유형의 데이터셋을 이용한 실험을 통해 제안 기법이 기존 기법에 비해 분류 성능이 우수함을 보인다.

IMPROVING THE ESP ACCURACY WITH COMBINATION OF PROBABILISTIC FORECASTS

  • Yu, Seung-Oh;Kim, Young-Oh
    • Water Engineering Research
    • /
    • 제5권2호
    • /
    • pp.101-109
    • /
    • 2004
  • Aggregating information by combining forecasts from two or more forecasting methods is an alternative to using forecasts from just a single method to improve forecast accuracy. This paper describes the development and use of a monthly inflow forecast model based on an optimal linear combination (OLC) of forecasts derived from naive, persistence, and Ensemble Streamflow Prediction (ESP) forecasts. Using the cross-validation technique, the OLC model made 1-month ahead probabilistic forecasts for the Chungju multi-purpose dam inflows for 15 years. For most of the verification months, the skill associated with the OLC forecast was superior to those drawn from the individual forecast techniques. Therefore this study demonstrates that OLC can improve the accuracy of the ESP forecast, especially during the dry season. This study also examined the value of the OLC forecasts in reservoir operations. Stochastic Dynamic Programming (SDP) derived the optimal operating policy for the Chungju multi-purpose dam operation and the derived policy was simulated using the 15-year observed inflows. The simulation results showed the SDP model that updated its probability from the new OLC forecast provided more efficient operation decisions than the conventional SDP model.

  • PDF

Molecular Dynamics Simulation Studies of Benzene, Toluene, and p-Xylene in a Canonical Ensemble

  • Kim, Ja-Hun;Lee, Song-Hui
    • Bulletin of the Korean Chemical Society
    • /
    • 제23권3호
    • /
    • pp.441-446
    • /
    • 2002
  • We have presented the results of thermodynamic, structural and dynamic properties of liquid benzene, toluene, and p-xylene in canonical (NVT) ensemble at 293.15 K by molecular dynamics (MD) simulations. The molecular model adopted for these molecules is a combination of the rigid body treatment for the benzene ring and an atomistically detailed model for the methyl hydrogen atoms. The calculated pressures are too low in the NVT ensemble MD simulations. The various thermodynamic properties reflect that the intermolecular interactions become stronger as the number of methyl group attached into the benzene ring increases. The pronounced nearest neighbor peak in the center of mass g(r) of liquid benzene at 293.15 K, provides the interpretation that nearest neighbors tend to be perpendicular. Two self-diffusion coefficients of liquid benzene at 293.15 K calculated from MSD and VAC function are in excellent agreement with the experimental measures. The self-diffusion coefficients of liquid toluene also agree well with the experimental ones for toluene in benzene and for toluene in cyclohexane.

유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택 (Optimal Selection of Classifier Ensemble Using Genetic Algorithms)

  • 김명종
    • 지능정보연구
    • /
    • 제16권4호
    • /
    • pp.99-112
    • /
    • 2010
  • 앙상블 학습은 분류 및 예측 알고리즘의 성과개선을 위하여 제안된 기계학습 기법이다. 그러나 앙상블 학습은 기저 분류자의 다양성이 부족한 경우 다중공선성 문제로 인하여 성과개선 효과가 미약하고 심지어는 성과가 악화될 수 있다는 문제점이 제기되었다. 본 연구에서는 기저 분류자의 다양성을 확보하고 앙상블 학습의 성과개선 효과를 제고하기 위하여 유전자 알고리즘 기반의 범위 최적화 기법을 제안하고자 한다. 본 연구에서 제안된 최적화 기법을 기업 부실예측 인공신경망 앙상블에 적용한 결과 기저 분류자의 다양성이 확보되고 인공신경망 앙상블의 성과가 유의적으로 개선되었음을 보여주었다.

암호화폐 가격 예측을 위한 딥러닝 앙상블 모델링 : Deep 4-LSTM Ensemble Model (Development of Deep Learning Ensemble Modeling for Cryptocurrency Price Prediction : Deep 4-LSTM Ensemble Model)

  • 최수빈;신동훈;윤상혁;김희웅
    • 한국IT서비스학회지
    • /
    • 제19권6호
    • /
    • pp.131-144
    • /
    • 2020
  • As the blockchain technology attracts attention, interest in cryptocurrency that is received as a reward is also increasing. Currently, investments and transactions are continuing with the expectation and increasing value of cryptocurrency. Accordingly, prediction for cryptocurrency price has been attempted through artificial intelligence technology and social sentiment analysis. The purpose of this paper is to develop a deep learning ensemble model for predicting the price fluctuations and one-day lag price of cryptocurrency based on the design science research method. This paper intends to perform predictive modeling on Ethereum among cryptocurrencies to make predictions more efficiently and accurately than existing models. Therefore, it collects data for five years related to Ethereum price and performs pre-processing through customized functions. In the model development stage, four LSTM models, which are efficient for time series data processing, are utilized to build an ensemble model with the optimal combination of hyperparameters found in the experimental process. Then, based on the performance evaluation scale, the superiority of the model is evaluated through comparison with other deep learning models. The results of this paper have a practical contribution that can be used as a model that shows high performance and predictive rate for cryptocurrency price prediction and price fluctuations. Besides, it shows academic contribution in that it improves the quality of research by following scientific design research procedures that solve scientific problems and create and evaluate new and innovative products in the field of information systems.

RapidEye 위성영상과 Semantic Segmentation 기반 딥러닝 모델을 이용한 토지피복분류의 정확도 평가 (Accuracy Assessment of Land-Use Land-Cover Classification Using Semantic Segmentation-Based Deep Learning Model and RapidEye Imagery)

  • 심우담;임종수;이정수
    • 대한원격탐사학회지
    • /
    • 제39권3호
    • /
    • pp.269-282
    • /
    • 2023
  • 본 연구는 딥러닝 모델(deep learning model)을 활용하여 토지피복분류를 수행하였으며 입력 이미지의 크기, Stride 적용 등 데이터세트(dataset)의 조절을 통해 토지피복분류를 위한 최적의 딥러닝 모델 선정을 목적으로 하였다. 적용한 딥러닝 모델은 3종류로 Encoder-Decoder 구조를 가진 U-net과 DeeplabV3+, 두 가지 모델을 결합한 앙상블(Ensemble) 모델을 활용하였다. 데이터세트는 RapidEye 위성영상을 입력영상으로, 라벨(label) 이미지는 Intergovernmental Panel on Climate Change 토지이용의 6가지 범주에 따라 구축한 Raster 이미지를 참값으로 활용하였다. 딥러닝 모델의 정확도 향상을 위해 데이터세트의 질적 향상 문제에 대해 주목하였으며 딥러닝 모델(U-net, DeeplabV3+, Ensemble), 입력 이미지 크기(64 × 64 pixel, 256 × 256 pixel), Stride 적용(50%, 100%) 조합을 통해 12가지 토지피복도를 구축하였다. 라벨 이미지와 딥러닝 모델 기반의 토지피복도의 정합성 평가결과, U-net과 DeeplabV3+ 모델의 전체 정확도는 각각 최대 약 87.9%와 89.8%, kappa 계수는 모두 약 72% 이상으로 높은 정확도를 보였으며, 64 × 64 pixel 크기의 데이터세트를 활용한 U-net 모델의 정확도가 가장 높았다. 또한 딥러닝 모델에 앙상블 및 Stride를 적용한 결과, 최대 약 3% 정확도가 상승하였으며 Semantic Segmentation 기반 딥러닝 모델의 단점인 경계간의 불일치가 개선됨을 확인하였다.

Object Classification Method Using Dynamic Random Forests and Genetic Optimization

  • Kim, Jae Hyup;Kim, Hun Ki;Jang, Kyung Hyun;Lee, Jong Min;Moon, Young Shik
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권5호
    • /
    • pp.79-89
    • /
    • 2016
  • In this paper, we proposed the object classification method using genetic and dynamic random forest consisting of optimal combination of unit tree. The random forest can ensure good generalization performance in combination of large amount of trees by assigning the randomization to the training samples and feature selection, etc. allocated to the decision tree as an ensemble classification model which combines with the unit decision tree based on the bagging. However, the random forest is composed of unit trees randomly, so it can show the excellent classification performance only when the sufficient amounts of trees are combined. There is no quantitative measurement method for the number of trees, and there is no choice but to repeat random tree structure continuously. The proposed algorithm is composed of random forest with a combination of optimal tree while maintaining the generalization performance of random forest. To achieve this, the problem of improving the classification performance was assigned to the optimization problem which found the optimal tree combination. For this end, the genetic algorithm methodology was applied. As a result of experiment, we had found out that the proposed algorithm could improve about 3~5% of classification performance in specific cases like common database and self infrared database compare with the existing random forest. In addition, we had shown that the optimal tree combination was decided at 55~60% level from the maximum trees.

An Ensemble Approach to Detect Fake News Spreaders on Twitter

  • Sarwar, Muhammad Nabeel;UlAmin, Riaz;Jabeen, Sidra
    • International Journal of Computer Science & Network Security
    • /
    • 제22권5호
    • /
    • pp.294-302
    • /
    • 2022
  • Detection of fake news is a complex and a challenging task. Generation of fake news is very hard to stop, only steps to control its circulation may help in minimizing its impacts. Humans tend to believe in misleading false information. Researcher started with social media sites to categorize in terms of real or fake news. False information misleads any individual or an organization that may cause of big failure and any financial loss. Automatic system for detection of false information circulating on social media is an emerging area of research. It is gaining attention of both industry and academia since US presidential elections 2016. Fake news has negative and severe effects on individuals and organizations elongating its hostile effects on the society. Prediction of fake news in timely manner is important. This research focuses on detection of fake news spreaders. In this context, overall, 6 models are developed during this research, trained and tested with dataset of PAN 2020. Four approaches N-gram based; user statistics-based models are trained with different values of hyper parameters. Extensive grid search with cross validation is applied in each machine learning model. In N-gram based models, out of numerous machine learning models this research focused on better results yielding algorithms, assessed by deep reading of state-of-the-art related work in the field. For better accuracy, author aimed at developing models using Random Forest, Logistic Regression, SVM, and XGBoost. All four machine learning algorithms were trained with cross validated grid search hyper parameters. Advantages of this research over previous work is user statistics-based model and then ensemble learning model. Which were designed in a way to help classifying Twitter users as fake news spreader or not with highest reliability. User statistical model used 17 features, on the basis of which it categorized a Twitter user as malicious. New dataset based on predictions of machine learning models was constructed. And then Three techniques of simple mean, logistic regression and random forest in combination with ensemble model is applied. Logistic regression combined in ensemble model gave best training and testing results, achieving an accuracy of 72%.