• Title/Summary/Keyword: Weighted ensemble

Search Result 36, Processing Time 0.028 seconds

Ensemble Learning of Region Based Classifiers (지역 기반 분류기의 앙상블 학습)

  • Choi, Sung-Ha;Lee, Byung-Woo;Yang, Ji-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.14B no.4
    • /
    • pp.303-310
    • /
    • 2007
  • In machine learning, the ensemble classifier that is a set of classifiers have been introduced for higher accuracy than individual classifiers. We propose a new ensemble learning method that employs a set of region based classifiers. To show the performance of the proposed method. we compared its performance with that of bagging and boosting, which ard existing ensemble methods. Since the distribution of data can be different in different regions in the feature space, we split the data and generate classifiers based on each region and apply a weighted voting among the classifiers. We used 11 data sets from the UCI Machine Learning Repository to compare the performance of our new ensemble method with that of individual classifiers as well as existing ensemble methods such as bagging and boosting. As a result, we found that our method produced improved performance, particularly when the base learner is Naive Bayes or SVM.

Noise Reduction Technique by Three-Points Ensemble Averaging in Uroflowmetry (삼점 신호 평균기법에 의한 요속신호의 잡음 축소 기법)

  • Choi, Seong-Su;Lee, In-Kwang;Lee, Sang-Bong;Park, Jun-Oh;Lee, Su-Ok;Cha, Eun-Jong;Kim, Kyung-Ah
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.8
    • /
    • pp.1638-1643
    • /
    • 2009
  • Uroflowmetry is a convenient clinical test to screen the benign prostatic hyperplasia(BPH) common in the aged men. A load cell is located beneath the urine container to measure the weight of urine. However, it is sensitive to the impact applied on the bottom of the container by the urine stream, which could be a noise source lowering the reliability of the system. With this aim, our study proposed a noise reduction technique by computing ensemble average of the weighted signals that were acquired from three-load cells forming a regular triangle beneath the urine container. Simulated urination experiment was performed with three different collection methods, all of which demonstrated significant noise reduction by ensemble averaging. Furthermore, the best results can be obtained without any special urine collection devices. Thus, our novel method can be usefully applied to uroflowmetry for enhancing measurement in terms of accuracy and reliability.

Kalman Filter-Based Ensemble Timescale with 3- Hydrogen Masers

  • Lee, Ho Seong;Kwon, Taeg Yong;Lee, Young Kyu;Yang, Sung-hoon;Yu, Dai-Hyuk
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.9 no.3
    • /
    • pp.261-272
    • /
    • 2020
  • A Kalman filter algorithm is used for the generation of an ensemble timescale with three hydrogen masers maintained in KRISS. Allan deviation curves of three pairs of clocks were obtained by a three-cornered hat method and were used as reference curves for determination of parameters of the Kalman filter-based timescale. The ensemble timescale equation of a 3-clock system was established, and the clocks' phases estimated by the Kalman filter were used as the prediction time of each clock in the equation. The weight of each clock was determined inversely proportional to the Allan variance calculated with the clocks' phases. The Allan deviation of the weighted mean was 1.2×10-16 at the averaging time of 57,600 s. However when we made fine adjustments of the clocks' weight, the minimum Allan deviation of 2×10-17 was obtained. To find out the reason of the great improvement in the frequency stability, additional researches are in progress theoretically and experimentally.

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data (빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법)

  • Kim, Minjeong;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.93-110
    • /
    • 2015
  • The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

Human Action Recognition in Still Image Using Weighted Bag-of-Features and Ensemble Decision Trees (가중치 기반 Bag-of-Feature와 앙상블 결정 트리를 이용한 정지 영상에서의 인간 행동 인식)

  • Hong, June-Hyeok;Ko, Byoung-Chul;Nam, Jae-Yeal
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.1
    • /
    • pp.1-9
    • /
    • 2013
  • This paper propose a human action recognition method that uses bag-of-features (BoF) based on CS-LBP (center-symmetric local binary pattern) and a spatial pyramid in addition to the random forest classifier. To construct the BoF, an image divided into dense regular grids and extract from each patch. A code word which is a visual vocabulary, is formed by k-means clustering of a random subset of patches. For enhanced action discrimination, local BoF histogram from three subdivided levels of a spatial pyramid is estimated, and a weighted BoF histogram is generated by concatenating the local histograms. For action classification, a random forest, which is an ensemble of decision trees, is built to model the distribution of each action class. The random forest combined with the weighted BoF histogram is successfully applied to Standford Action 40 including various human action images, and its classification performance is better than that of other methods. Furthermore, the proposed method allows action recognition to be performed in near real-time.

Performance Assessment of Weekly Ensemble Prediction Data at Seasonal Forecast System with High Resolution (고해상도 장기예측시스템의 주별 앙상블 예측자료 성능 평가)

  • Ham, Hyunjun;Won, Dukjin;Lee, Yei-sook
    • Atmosphere
    • /
    • v.27 no.3
    • /
    • pp.261-276
    • /
    • 2017
  • The main objectives of this study are to introduce Global Seasonal forecasting system version5 (GloSea5) of KMA and to evaluate the performance of ensemble prediction of system. KMA has performed an operational seasonal forecast system which is a joint system between KMA and UK Met office since 2014. GloSea5 is a fully coupled global climate model which consists of atmosphere (UM), ocean (NEMO), land surface (JULES) and sea ice (CICE) components through the coupler OASIS. The model resolution, used in GloSea5, is N216L85 (~60 km in mid-latitudes) in the atmosphere and ORCA0.25L75 ($0.25^{\circ}$ on a tri-polar grid) in the ocean. In this research, we evaluate the performance of this system using by RMSE, Correlation and MSSS for ensemble mean values. The forecast (FCST) and hindcast (HCST) are separately verified, and the operational data of GloSea5 are used from 2014 to 2015. The performance skills are similar to the past study. For example, the RMSE of h500 is increased from 22.30 gpm of 1 week forecast to 53.82 gpm of 7 week forecast but there is a similar error about 50~53 gpm after 3 week forecast. The Nino Index of SST shows a great correlation (higher than 0.9) up to 7 week forecast in Nino 3.4 area. It can be concluded that GloSea5 has a great performance for seasonal prediction.

Ensemble Method for Predicting Particulate Matter and Odor Intensity (미세먼지, 악취 농도 예측을 위한 앙상블 방법)

  • Lee, Jong-Yeong;Choi, Myoung Jin;Joo, Yeongin;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.4
    • /
    • pp.203-210
    • /
    • 2019
  • Recently, a number of researchers have produced research and reports in order to forecast more exactly air quality such as particulate matter and odor. However, such research mainly focuses on the atmospheric diffusion models that have been used for the air quality prediction in environmental engineering area. Even though it has various merits, it has some limitation in that it uses very limited spatial attributes such as geographical attributes. Thus, we propose the new approach to forecast an air quality using a deep learning based ensemble model combining temporal and spatial predictor. The temporal predictor employs the RNN LSTM and the spatial predictor is based on the geographically weighted regression model. The ensemble model also uses the RNN LSTM that combines two models with stacking structure. The ensemble model is capable of inferring the air quality of the areas without air quality monitoring station, and even forecasting future air quality. We installed the IoT sensors measuring PM2.5, PM10, H2S, NH3, VOC at the 8 stations in Jeonju in order to gather air quality data. The numerical results showed that our new model has very exact prediction capability with comparison to the real measured data. It implies that the spatial attributes should be considered to more exact air quality prediction.

Probabilistic Daecheong Dam Streamflow Prediction using Weather Outlook Weighted Ensemble Streamflow Prediction (확률론적 통계분석을 이용한 대청댐 유입량 예측)

  • Lee, Sang-Jin;Kim, Jeong-Kon;Kim, Joo-Cheol;Woo, Dong-Hyeon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2011.05a
    • /
    • pp.303-303
    • /
    • 2011
  • 효율적인 수자원 관리를 위해서는 미래 수문자료의 예측치에 대한 구간을 추정하여 미래에 관측될 자료에 대한 정보를 얻는 문제는 어렵지만 중요한 부분에 해당한다. 특히 중장기 유량예측은 입력변수의 불확실성이 크므로 확률론적 방법을 적용한 예측이 유리하다. 본 연구에서는 SSARR 모형을 이용하여 현재 유역의 상태에 과거에 재현되었던 강우를 결합한 앙상블 유출시나리오를 생성하였다. 그리고 대청댐 월 유입량에 대한 확률론적 예측방안을 제시하기위하여 과거 시나리오의 관측 ESP(Ensemble Streamflow Prediction)확률 및 Croley방법, PDF-Ratio방법을 한국의 기상예측정보 실정에 맞는 가중치 부여방안으로 적용하여 분석하였다. 2010년도 상반기를 기준으로 각 분석 기법별 정확성을 검증한 결과 Croley, PDF-Ratio 등 기상전망을 가중치로 부여한 확률론적 예측기법의 효용성을 확인하였다.

  • PDF

Modeling and Selecting Optimal Features for Machine Learning Based Detections of Android Malwares (머신러닝 기반 안드로이드 모바일 악성 앱의 최적 특징점 선정 및 모델링 방안 제안)

  • Lee, Kye Woong;Oh, Seung Taek;Yoon, Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.11
    • /
    • pp.427-432
    • /
    • 2019
  • In this paper, we propose three approaches to modeling Android malware. The first method involves human security experts for meticulously selecting feature sets. With the second approach, we choose 300 features with the highest importance among the top 99% features in terms of occurrence rate. The third approach is to combine multiple models and identify malware through weighted voting. In addition, we applied a novel method of eliminating permission information which used to be regarded as a critical factor for distinguishing malware. With our carefully generated feature sets and the weighted voting by the ensemble algorithm, we were able to reach the highest malware detection accuracy of 97.8%. We also verified that discarding the permission information lead to the improvement in terms of false positive and false negative rates.

Evaluation of Multi-classification Model Performance for Algal Bloom Prediction Using CatBoost (머신러닝 CatBoost 다중 분류 알고리즘을 이용한 조류 발생 예측 모형 성능 평가 연구)

  • Juneoh Kim;Jungsu Park
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.1
    • /
    • pp.1-8
    • /
    • 2023
  • Monitoring and prediction of water quality are essential for effective river pollution prevention and water quality management. In this study, a multi-classification model was developed to predict chlorophyll-a (Chl-a) level in rivers. A model was developed using CatBoost, a novel ensemble machine learning algorithm. The model was developed using hourly field monitoring data collected from January 1 to December 31, 2015. For model development, chl-a was classified into class 1 (Chl-a≤10 ㎍/L), class 2 (10<Chl-a≤50 ㎍/L), and class 3 (Chl-a>50 ㎍/L), where the number of data used for the model training were 27,192, 11,031, and 511, respectively. The macro averages of precision, recall, and F1-score for the three classes were 0.58, 0.58, and 0.58, respectively, while the weighted averages were 0.89, 0.90, and 0.89, for precision, recall, and F1-score, respectively. The model showed relatively poor performance for class 3 where the number of observations was much smaller compared to the other two classes. The imbalance of data distribution among the three classes was resolved by using the synthetic minority over-sampling technique (SMOTE) algorithm, where the number of data used for model training was evenly distributed as 26,868 for each class. The model performance was improved with the macro averages of precision, rcall, and F1-score of the three classes as 0.58, 0.70, and 0.59, respectively, while the weighted averages were 0.88, 0.84, and 0.86 after SMOTE application.