• Title/Summary/Keyword: ensemble method

Search Result 511, Processing Time 0.03 seconds

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Generation of radar rainfall data for hydrological and meteorological application (II) : radar rainfall ensemble (수문기상학적 활용을 위한 레이더 강우자료 생산(II) : 레이더 강우앙상블)

  • Kim, Tae-Jeong;Lee, Dong-Ryul;Jang, Sang-Min;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.1
    • /
    • pp.17-28
    • /
    • 2017
  • A recent increase in extreme weather events and flash floods associated with the enhanced climate variability results in an increase in climate-related disasters. For these reasons, various studies based on a high resolution weather radar system have been carried out. The weather radar can provide estimates of precipitation in real-time over a wide area, while ground-based rain gauges only provides a point estimate in space. Weather radar is thus capable of identifying changes in rainfall structure as it moves through an ungauged basin. However, the advantage of the weather radar rainfall estimates has been limited by a variety of sources of uncertainty in the radar reflectivity process, including systematic and random errors. In this study, we developed an ensemble radar rainfall estimation scheme using the multivariate copula method. The results presented in this study confirmed that the proposed ensemble technique can effectively reproduce the rainfall statistics such as mean, variance and skewness (more importantly the extremes) as well as the spatio-temporal structure of rainfall fields.

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data (트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅)

  • Kim, Junho;Kim, Wongyum;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.6
    • /
    • pp.187-194
    • /
    • 2020
  • This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

Enhancement of Evoked Potential Waveform using Delay-compensated Wiener Filtering (지연보상 위너 필터링에 의한 유발전위 파형개선)

  • Lee, JeeEun;Yoo, Sun K.
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.261-269
    • /
    • 2013
  • In this paper, the evoked potential(EP) was represented by additive delay model to comply with the variational noisy response of stimulus-event synchronization. The hybrid method of delay compensated-Wiener filtered-ensemble averaging(DWEA) was proposed to enhance the EP signal distortion occurred during averaging procedure due to synchronization timing mismatch. The performance of DWEA has been tested by surrogated simulation, which is composed of synthesized arbitrary delay and arbitrary level of added noise. The performance of DWEA is better than those of Wiener filtered-ensemble averaging and of conventional ensemble averaging. DWEA is endurable up to added noise gain of 7 for 10 % mean square error limit. Throughout the experimentation observation, it has been demonstrated that DWEA can be applied to enhance the evoked potential having the synchronization mismatch with added noise.

Analysis on the Planar Bowtie Antenna for IMT-2000 Handset (IMT-2000 핸드셋용 평면형 Bowtie 안테나 해석)

  • Lee, Hee-Suk;Kim, Nam
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.11 no.5
    • /
    • pp.681-688
    • /
    • 2000
  • In this paper, a planar antenna that is small and light, is designed and analyzed aiming handset antenna of IMT-2000. Employing the Ensemble simulator based on a MoM, design-parameters are found to determine a resonant frequency. Therefore, it is analyzed with the Ensemble simulation and FDTD numerical for resonating at the allocated frequency for IMT-2000 in the fixed antenna dimension of 21$^{\circ}$wing angle that is a design parameter. Analyzing with FDTD method, Though the results of FDTD are very exact, this analysis introduces errors due to the staircasing approximation in the slope of bowtie. To reduce this error, it is divided to 4-ranges where the cell contains the boundary of perfect conductor/free space. Then, each range is calculated by different by different equation, which modify the H-field to add the component of the area and length of the cell filled with free space. Therefore, the modified FDTD algorithm provided with a narrow bandwidth of return loss calculated with a standard FDTD algorithm that can be extended to the desired ranges.

  • PDF

Effect of Application of Ensemble Method on Machine Learning with Insufficient Training Set in Developing Automated English Essay Scoring System (영작문 자동채점 시스템 개발에서 학습데이터 부족 문제 해결을 위한 앙상블 기법 적용의 효과)

  • Lee, Gyoung Ho;Lee, Kong Joo
    • Journal of KIISE
    • /
    • v.42 no.9
    • /
    • pp.1124-1132
    • /
    • 2015
  • In order to train a supervised machine learning algorithm, it is necessary to have non-biased labels and a sufficient amount of training data. However, it is difficult to collect the required non-biased labels and a sufficient amount of training data to develop an automatic English Composition scoring system. In addition, an English writing assessment is carried out using a multi-faceted evaluation of the overall level of the answer. Therefore, it is difficult to choose an appropriate machine learning algorithm for such work. In this paper, we show that it is possible to alleviate these problems through ensemble learning. The results of the experiment indicate that the ensemble technique exhibited an overall performance that was better than that of other algorithms.

Comparison between Uncertainties of Cultivar Parameter Estimates Obtained Using Error Calculation Methods for Forage Rice Cultivars (오차 계산 방식에 따른 사료용 벼 품종의 품종모수 추정치 불확도 비교)

  • Young Sang Joh;Shinwoo Hyun;Kwang Soo Kim
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.3
    • /
    • pp.129-141
    • /
    • 2023
  • Crop models have been used to predict yield under diverse environmental and cultivation conditions, which can be used to support decisions on the management of forage crop. Cultivar parameters are one of required inputs to crop models in order to represent genetic properties for a given forage cultivar. The objectives of this study were to compare calibration and ensemble approaches in order to minimize the uncertainty of crop yield estimates using the SIMPLE crop model. Cultivar parameters were calibrated using Log-likelihood (LL) and Generic Composite Similarity Measure (GCSM) as an objective function for Metropolis-Hastings (MH) algorithm. In total, 20 sets of cultivar parameters were generated for each method. Two types of ensemble approach. First type of ensemble approach was the average of model outputs (Eem), using individual parameters. The second ensemble approach was model output (Epm) of cultivar parameter obtained by averaging given 20 sets of parameters. Comparison was done for each cultivar and for each error calculation methods. 'Jowoo' and 'Yeongwoo', which are forage rice cultivars used in Korea, were subject to the parameter calibration. Yield data were obtained from experiment fields at Suwon, Jeonju, Naju and I ksan. Data for 2013, 2014 and 2016 were used for parameter calibration. For validation, yield data reported from 2016 to 2018 at Suwon was used. Initial calibration indicated that genetic coefficients obtained by LL were distributed in a narrower range than coefficients obtained by GCSM. A two-sample t-test was performed to compare between different methods of ensemble approaches and no significant difference was found between them. Uncertainty of GCSM can be neutralized by adjusting the acceptance probability. The other ensemble method (Epm) indicates that the uncertainty can be reduced with less computation using ensemble approach.

In vivo Evaluation of Flow Estimation Methods for 3D Color Doppler Imaging

  • Yoo, Yang-Mo
    • Journal of Biomedical Engineering Research
    • /
    • v.31 no.3
    • /
    • pp.177-186
    • /
    • 2010
  • In 3D ultrasound color Doppler imaging (CDI), 8-16 pulse transmissions (ensembles) per each scanline are used for effective clutter rejection and flow estimation, but it yields a low volume acquisition rate. In this paper, we have evaluated three flow estimation methods: autoregression (AR), eigendecomposition (ED), and autocorrelation combined with adaptive clutter rejection (AC-ACR) for a small ensemble size (E=4). The performance of AR, ED and AC-ACR methods was compared using 2D and 3D in vivo data acquired under different clutter conditions (common carotid artery, kidney and liver). To evaluate the effectiveness of three methods, receiver operating characteristic (ROC) curves were generated. For 2D kidney in vivo data, the AC-ACR method outperforms the AR and ED methods in terms of the area under the ROC curve (AUC) (0.852 vs. 0.793 and 0.813, respectively). Similarly, the AC-ACR method shows higher AUC values for 2D liver in vivo data compared to the AR and ED methods (0.855 vs. 0.807 and 0.823, respectively). For the common carotid artery data, the AR provides higher AUC values, but it suffers from biased estimates. For 3D in vivo data acquired from a kidney transplant patient, the AC-ACR with E=4 provides an AUC value of 0.799. These in vivo experiment results indicate that the AC-ACR method can provide more robust flow estimates compared to the AR and ED methods with a small ensemble size.

An Ensemble Classifier Based Method to Select Optimal Image Features for License Plate Recognition (차량 번호판 인식을 위한 앙상블 학습기 기반의 최적 특징 선택 방법)

  • Jo, Jae-Ho;Kang, Dong-Joong
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.1
    • /
    • pp.142-149
    • /
    • 2016
  • This paper proposes a method to detect LP(License Plate) of vehicles in indoor and outdoor parking lots. In restricted environment, there are many conventional methods for detecting LP. But, it is difficult to detect LP in natural and complex scenes with background clutters because several patterns similar with text or LP always exist in complicated backgrounds. To verify the performance of LP text detection in natural images, we apply MB-LGP feature by combining with ensemble machine learning algorithm in purpose of selecting optimal features of small number in huge pool. The feature selection is performed by adaptive boosting algorithm that shows great performance in minimum false positive detection ratio and in computing time when combined with cascade approach. MSER is used to provide initial text regions of vehicle LP. Throughout the experiment using real images, the proposed method functions robustly extracting LP in natural scene as well as the controlled environment.

Prediction of Hindered Settling Velocity of Bidisperse Suspensions (이중 입도 분포를 가진 현탁액의 침강 속도 예측)

  • Koo, Sangkyun
    • Applied Chemistry for Engineering
    • /
    • v.19 no.6
    • /
    • pp.609-616
    • /
    • 2008
  • The present study is concerned with a simple numerical method for estimating the hindered settling velocity of noncolloidal suspensions with bidisperse size distribution of particles. The method is based on an effective-medium theory which uses the conditional ensemble averages for describing the velocity fields or other physical quantities of interest in the suspension system with the particles randomly placed. The effective-medium theory originally developed by Acrivos and Chang[1] for monodisperse suspensions is modified for the bidisperse case. Using the radial distribution functions and stream functions the hindered settling velocity of the suspended particles is calculated numerically. The predictions by the present method are compared with the previous experimental results by Davis and Birdsell[2] and Cheung et al.[3]. It is shown that the estimations by the effective-medium model of the present study reasonably agree with the experimental results.