• Title/Summary/Keyword: Bagging method

Search Result 74, Processing Time 0.02 seconds

Development of optimization method for water quality prediction accuracy (수질예측 정확도를 위한 최적화 기법 개발)

  • Lee, Seung Jae;Kim, Hyeon Sik;Sohn, Byeong Yong;Han, Ji Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.41-41
    • /
    • 2018
  • 하천과 저수지의 수질을 예측하고 관리하는데 수리 수질예측모형이 널리 활용되고 있다. 수질예측모형은 유역이나 수체 내의 오염물질 이동경로나 농도를 수치해석 방법으로 계산하여 사용자가 필요로 하는 지점과 시점에서의 수질자료 생산하는데 활용되고 있다. 수질예측모형은 검 보정을 통해 정확도를 확보하며, 정확도의 확보를 위해서는 높은 수준의 전문성을 필요로 한다. 특히 시행착오법으로 모형을 보정하는 경우 많은 시간과 노력을 필요로 하게 되며, 보정계수를 과대 혹은 과소로 모형에 적용하는 오류를 범하기 쉽고 모델러의 주관이 관여되기 쉽다. 그래서 본 연구에서는 CE-QUAL-W2모형의 조류항목에 대한 모형 보정을 위하여 Chl-a와 남조류세포수에서 주로 활용되고 있는 보정계수에 대한 민감도 분석 결과를 토대로 매개변수별 모의결과 변화율을 산정하였으며, 시기적 경향성을 재현하기 위해 Ensemble-Bagging 기법과 머신 러닝 기법을 적용하여 모형 구동횟수를 최소화 할 수 있는 방법으로 구성하였다. Chl-a를 보정하기 위한 매개변수는 9개를 선정하였으며, 규조류, 남조류, 녹조류에 총 27개 매개 변수를 민감도 분석으로 도출 한 후 예상 변화율 대비 이벤트별 모의치와 실측치 간 %difference가 유사하도록 매개변수를 조정하였다. 또한 각 이벤트 조합의 매개변수 빈도수와 매개변수별 예상변화율, 시기적 조류특성을 고려하여 가중치를 도출하였으며, 1회 보정에 맞춰 Chl-a 모델 실행결과를 %difference로 평가한 후 "good"등급을 만족할 때까지 반복 적용하였다. 남조류세포수의 경우 Chl-a에 맞춰 매개변수 최적화 이후 남조류세포수 농도를 세포수로 환산하기 위한 CACEL에 대해 머신러닝 기법을 적용하였으며, CACEL 추정변화율 회귀식에 따라 평가 한 후 %difference "good"등급 이상을 만족할 때까지 반복 수행하는 방법을 적용하였다. 본 연구에서는 수질예측모형의 정확도를 확보하기 위하여 최적화 기법을 적용하였으며, 이를 통해 모형을 보정하는 과정에서 요구되는 시간과 노력을 줄일 수 있도록 하였으며, Ensemble기법과 머신러닝 기법을 적용하여 모형보정계수 적용에 객관성을 확보할 수 있도록 하였다.

  • PDF

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Predictive Analysis of Ethereum Uncle Block using Ensemble Machine Learning Technique and Blockchain Information (앙상블 머신러닝 기법과 블록체인 정보를 활용한 이더리움 엉클 블록 예측 분석)

  • Kim, Han-Min
    • Journal of Digital Convergence
    • /
    • v.18 no.11
    • /
    • pp.129-136
    • /
    • 2020
  • The advantages of Blockchain present the necessity of Blockchain in various fields. However, there are several disadvantages to Blockchain. Among them, the uncle block problem is one of the problems that can greatly hinder the value and utilization of Blockchain. Although the value of Blockchain may be degraded by the uncle block problem, previous studies did not pay much attention to research on uncle block. Therefore, the purpose of this study attempts to predict the occurrence of uncle block in order to predict and prepare for the uncle block problem of Blockchain. This study verifies the validity of introducing new attributes and ensemble analysis techniques for accurate prediction of uncle block occurrence. As a research method, voting, bagging, and stacking ensemble analysis techniques were employed for Ethereum's uncle block where the uncle block problem actually occurs. We used Blockchain information of Ethereum and Bitcoin as analysis data. As a result of the study, we found that the best prediction result was presented when voting and stacking ensemble techniques were applied using only Ethereum Blockchain information. The result of this study contributes to more accurately predict the occurrence of uncle block and prepare for the uncle block problem of Blockchain.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.