• 제목/요약/키워드: Ensemble models

검색결과 365건 처리시간 0.168초

머신러닝을 활용한 모돈의 생산성 예측모델 (Forecasting Sow's Productivity using the Machine Learning Models)

  • 이민수;최영찬
    • 농촌지도와개발
    • /
    • 제16권4호
    • /
    • pp.939-965
    • /
    • 2009
  • The Machine Learning has been identified as a promising approach to knowledge-based system development. This study aims to examine the ability of machine learning techniques for farmer's decision making and to develop the reference model for using pig farm data. We compared five machine learning techniques: logistic regression, decision tree, artificial neural network, k-nearest neighbor, and ensemble. All models are well performed to predict the sow's productivity in all parity, showing over 87.6% predictability. The model predictability of total litter size are highest at 91.3% in third parity and decreasing as parity increases. The ensemble is well performed to predict the sow's productivity. The neural network and logistic regression is excellent classifier for all parity. The decision tree and the k-nearest neighbor was not good classifier for all parity. Performance of models varies over models used, showing up to 104% difference in lift values. Artificial Neural network and ensemble models have resulted in highest lift values implying best performance among models.

  • PDF

Accounting for Uncertainty Propagation: Streamflow Forecasting using Multiple Climate and Hydrological Models

  • 권현한;문영일;박세훈;오태석
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2008년도 학술발표회 논문집
    • /
    • pp.1388-1392
    • /
    • 2008
  • Water resources management depends on dealing inherent uncertainties stemming from climatic and hydrological inputs and models. Dealing with these uncertainties remains a challenge. Streamflow forecasts basically contain uncertainties arising from model structure and initial conditions. Recent enhancements in climate forecasting skill and hydrological modeling provide an breakthrough for delivering improved streamflow forecasts. However, little consideration has been given to methodologies that include coupling both multiple climate and multiple hydrological models, increasing the pool of streamflow forecast ensemble members and accounting for cumulative sources of uncertainty. The approach here proposes integration and coupling of global climate models (GCM), multiple regional climate models, and numerous hydrological models to improve streamflow forecasting and characterize system uncertainty through generation of ensemble forecasts.

  • PDF

TIGGE 모델을 이용한 한반도 여름철 집중호우 예측 활용에 관한 연구 (Predictability for Heavy Rainfall over the Korean Peninsula during the Summer using TIGGE Model)

  • 황윤정;김연희;정관영;장동언
    • 대기
    • /
    • 제22권3호
    • /
    • pp.287-298
    • /
    • 2012
  • The predictability of heavy precipitation over the Korean Peninsula is studied using THORPEX Interactive Grand Global Ensemble (TIGGE) data. The performance of the six ensemble models is compared through the inconsistency (or jumpiness) and Root Mean Square Error (RMSE) for MSLP, T850 and H500. Grand Ensemble (GE) of the three best ensemble models (ECMWF, UKMO and CMA) with equal weight and without bias correction is consisted. The jumpiness calculated in this study indicates that the GE is more consistent than each single ensemble model. Brier Score (BS) of precipitation also shows that the GE outperforms. The GE is used for a case study of a heavy rainfall event in Korean Peninsula on 9 July 2009. The probability forecast of precipitation using 90 members of the GE and the percentage of 90 members exceeding 90 percentile in climatological Probability Density Function (PDF) of observed precipitation are calculated. As the GE is excellent in possibility of potential detection of heavy rainfall, GE is more skillful than the single ensemble model and can lead to a heavy rainfall warning in medium-range. If the performance of each single ensemble model is also improved, GE can provide better performance.

Ensemble Methods Applied to Classification Problem

  • Kim, ByungJoo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제11권1호
    • /
    • pp.47-53
    • /
    • 2019
  • The idea of ensemble learning is to train multiple models, each with the objective to predict or classify a set of results. Most of the errors from a model's learning are from three main factors: variance, noise, and bias. By using ensemble methods, we're able to increase the stability of the final model and reduce the errors mentioned previously. By combining many models, we're able to reduce the variance, even when they are individually not great. In this paper we propose an ensemble model and applied it to classification problem. In iris, Pima indian diabeit and semiconductor fault detection problem, proposed model classifies well compared to traditional single classifier that is logistic regression, SVM and random forest.

Ensemble Deep Learning Features for Real-World Image Steganalysis

  • Zhou, Ziling;Tan, Shunquan;Zeng, Jishen;Chen, Han;Hong, Shaobin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권11호
    • /
    • pp.4557-4572
    • /
    • 2020
  • The Alaska competition provides an opportunity to study the practical problems of real-world steganalysis. Participants are required to solve steganalysis involving various embedding schemes, inconsistency JPEG Quality Factor and various processing pipelines. In this paper, we propose a method to ensemble multiple deep learning steganalyzers. We select SRNet and RESDET as our base models. Then we design a three-layers model ensemble network to fuse these base models and output the final prediction. By separating the three colors channels for base model training and feature replacement strategy instead of simply merging features, the performance of the model ensemble is greatly improved. The proposed method won second place in the Alaska 1 competition in the end.

An Ensemble Approach to Detect Fake News Spreaders on Twitter

  • Sarwar, Muhammad Nabeel;UlAmin, Riaz;Jabeen, Sidra
    • International Journal of Computer Science & Network Security
    • /
    • 제22권5호
    • /
    • pp.294-302
    • /
    • 2022
  • Detection of fake news is a complex and a challenging task. Generation of fake news is very hard to stop, only steps to control its circulation may help in minimizing its impacts. Humans tend to believe in misleading false information. Researcher started with social media sites to categorize in terms of real or fake news. False information misleads any individual or an organization that may cause of big failure and any financial loss. Automatic system for detection of false information circulating on social media is an emerging area of research. It is gaining attention of both industry and academia since US presidential elections 2016. Fake news has negative and severe effects on individuals and organizations elongating its hostile effects on the society. Prediction of fake news in timely manner is important. This research focuses on detection of fake news spreaders. In this context, overall, 6 models are developed during this research, trained and tested with dataset of PAN 2020. Four approaches N-gram based; user statistics-based models are trained with different values of hyper parameters. Extensive grid search with cross validation is applied in each machine learning model. In N-gram based models, out of numerous machine learning models this research focused on better results yielding algorithms, assessed by deep reading of state-of-the-art related work in the field. For better accuracy, author aimed at developing models using Random Forest, Logistic Regression, SVM, and XGBoost. All four machine learning algorithms were trained with cross validated grid search hyper parameters. Advantages of this research over previous work is user statistics-based model and then ensemble learning model. Which were designed in a way to help classifying Twitter users as fake news spreader or not with highest reliability. User statistical model used 17 features, on the basis of which it categorized a Twitter user as malicious. New dataset based on predictions of machine learning models was constructed. And then Three techniques of simple mean, logistic regression and random forest in combination with ensemble model is applied. Logistic regression combined in ensemble model gave best training and testing results, achieving an accuracy of 72%.

부도예측을 위한 KNN 앙상블 모형의 동시 최적화 (Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis)

  • 민성환
    • 지능정보연구
    • /
    • 제22권1호
    • /
    • pp.139-157
    • /
    • 2016
  • 앙상블 분류기란 개별 분류기보다 더 좋은 성과를 내기 위해 다수의 분류기를 결합하는 것을 의미한다. 이와 같은 앙상블 분류기는 단일 분류기의 일반화 성능을 향상시키는데 매우 유용한 것으로 알려져 있다. 랜덤 서브스페이스 앙상블 기법은 각각의 기저 분류기들을 위해 원 입력 변수 집합으로부터 랜덤하게 입력 변수 집합을 선택하며 이를 통해 기저 분류기들을 다양화 시키는 기법이다. k-최근접 이웃(KNN: k nearest neighbor)을 기저 분류기로 하는 랜덤 서브스페이스 앙상블 모형의 성과는 단일 모형의 성과를 개선시키는 데 효과적인 것으로 알려져 있으며, 이와 같은 랜덤 서브스페이스 앙상블의 성과는 각 기저 분류기를 위해 랜덤하게 선택된 입력 변수 집합과 KNN의 파라미터 k의 값이 중요한 영향을 미친다. 하지만, 단일 모형을 위한 k의 최적 선택이나 단일 모형을 위한 입력 변수 집합의 최적 선택에 관한 연구는 있었지만 KNN을 기저 분류기로 하는 앙상블 모형에서 이들의 최적화와 관련된 연구는 없는 것이 현실이다. 이에 본 연구에서는 KNN을 기저 분류기로 하는 앙상블 모형의 성과 개선을 위해 각 기저 분류기들의 k 파라미터 값과 입력 변수 집합을 동시에 최적화하는 새로운 형태의 앙상블 모형을 제안하였다. 본 논문에서 제안한 방법은 앙상블을 구성하게 될 각각의 KNN 기저 분류기들에 대해 최적의 앙상블 성과가 나올 수 있도록 각각의 기저 분류기가 사용할 파라미터 k의 값과 입력 변수를 유전자 알고리즘을 이용해 탐색하였다. 제안한 모형의 검증을 위해 국내 기업의 부도 예측 관련 데이터를 가지고 다양한 실험을 하였으며, 실험 결과 제안한 모형이 기존의 앙상블 모형보다 기저 분류기의 다양화와 예측 성과 개선에 효과적임을 알 수 있었다.

Performance Enhancement of Automatic Wood Classification of Korean Softwood by Ensembles of Convolutional Neural Networks

  • Kwon, Ohkyung;Lee, Hyung Gu;Yang, Sang-Yun;Kim, Hyunbin;Park, Se-Yeong;Choi, In-Gyu;Yeo, Hwanmyeong
    • Journal of the Korean Wood Science and Technology
    • /
    • 제47권3호
    • /
    • pp.265-276
    • /
    • 2019
  • In our previous study, the LeNet3 model successfully classified images from the transverse surfaces of five Korean softwood species (cedar, cypress, Korean pine, Korean red pine, and larch). However, a practical limitation exists in our system stemming from the nature of the training images obtained from the transverse plane of the wood species. In real-world applications, it is necessary to utilize images from the longitudinal surfaces of lumber. Thus, we improved our model by training it with images from the longitudinal and transverse surfaces of lumber. Because the longitudinal surface has complex but less distinguishable features than the transverse surface, the classification performance of the LeNet3 model decreases when we include images from the longitudinal surfaces of the five Korean softwood species. To remedy this situation, we adopt ensemble methods that can enhance the classification performance. Herein, we investigated the use of ensemble models from the LeNet and MiniVGGNet models to automatically classify the transverse and longitudinal surfaces of the five Korean softwoods. Experimentally, the best classification performance was achieved via an ensemble model comprising the LeNet2, LeNet3, and MiniVGGNet4 models trained using input images of $128{\times}128{\times}3pixels$ via the averaging method. The ensemble model showed an F1 score greater than 0.98. The classification performance for the longitudinal surfaces of Korean pine and Korean red pine was significantly improved by the ensemble model compared to individual convolutional neural network models such as LeNet3.

투자와 수출 및 환율의 고용에 대한 의사결정 나무, 랜덤 포레스트와 그래디언트 부스팅 머신러닝 모형 예측 (Investment, Export, and Exchange Rate on Prediction of Employment with Decision Tree, Random Forest, and Gradient Boosting Machine Learning Models)

  • 이재득
    • 무역학회지
    • /
    • 제46권2호
    • /
    • pp.281-299
    • /
    • 2021
  • This paper analyzes the feasibility of using machine learning methods to forecast the employment. The machine learning methods, such as decision tree, artificial neural network, and ensemble models such as random forest and gradient boosting regression tree were used to forecast the employment in Busan regional economy. The following were the main findings of the comparison of their predictive abilities. First, the forecasting power of machine learning methods can predict the employment well. Second, the forecasting values for the employment by decision tree models appeared somewhat differently according to the depth of decision trees. Third, the predictive power of artificial neural network model, however, does not show the high predictive power. Fourth, the ensemble models such as random forest and gradient boosting regression tree model show the higher predictive power. Thus, since the machine learning method can accurately predict the employment, we need to improve the accuracy of forecasting employment with the use of machine learning methods.

레이더 강우 앙상블과 유출 블랜딩 기법을 이용한 최적 유출 수문곡선 산정 (Estimation of optimal runoff hydrograph using radar rainfall ensemble and blending technique of rainfall-runoff models)

  • 이명진;강나래;김종성;김형수
    • 한국수자원학회논문집
    • /
    • 제51권3호
    • /
    • pp.221-233
    • /
    • 2018
  • 최근 기후변화로 인한 국지성 호우 및 태풍 피해가 자주 발생하고 있다. 이와 같은 피해를 저감하기 위해서는 정확한 강우의 예측과 홍수량 산정이 필요하다. 그러나 지점 및 레이더 강우 시 공간적 오차를 포함하고 있고, 유출 모형에 의한 유출수문곡선 역시 보정을 실시하더라도 관측유량과 오차를 가지고 있어 불확실성이 존재한다. 따라서 본 연구에서는 확률론적 강우 앙상블을 생성하여 강우의 불확실성을 확인하였다. 또한 유출 결과를 통해 수문 모형의 불확실성을 확인하였고, 블랜딩 기법을 이용하여 하나의 통합된 유출 수문곡선을 제시하였다. 생성된 강우앙상블은 강우강도 및 지형적인 영향으로 레이더가 과소 관측이 될 때, 강우 앙상블의 불확실성이 큰 것을 확인하였고, 블랜딩 기법을 적용하여 산정된 최적 유출 수문곡선은 유출모형의 불확실성을 크게 줄이는 것으로 나타났다. 본 연구 결과를 활용한다면, 정확한 홍수량 산정 및 예측을 통해 집중호우로 인한 피해를 줄일 수 있을 것으로 판단된다.