• Title/Summary/Keyword: Ensemble Average

Search Result 140, Processing Time 0.024 seconds

Illegal Cash Accommodation Detection Modeling Using Ensemble Size Reduction (신용카드 불법현금융통 적발을 위한 축소된 앙상블 모형)

  • Lee, Hwa-Kyung;Han, Sang-Bum;Jhee, Won-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.1
    • /
    • pp.93-116
    • /
    • 2010
  • Ensemble approach is applied to the detection modeling of illegal cash accommodation (ICA) that is the well-known type of fraudulent usages of credit cards in far east nations and has not been addressed in the academic literatures. The performance of fraud detection model (FDM) suffers from the imbalanced data problem, which can be remedied to some extent using an ensemble of many classifiers. It is generally accepted that ensembles of classifiers produce better accuracy than a single classifier provided there is diversity in the ensemble. Furthermore, recent researches reveal that it may be better to ensemble some selected classifiers instead of all of the classifiers at hand. For the effective detection of ICA, we adopt ensemble size reduction technique that prunes the ensemble of all classifiers using accuracy and diversity measures. The diversity in ensemble manifests itself as disagreement or ambiguity among members. Data imbalance intrinsic to FDM affects our approach for ICA detection in two ways. First, we suggest the training procedure with over-sampling methods to obtain diverse training data sets. Second, we use some variants of accuracy and diversity measures that focus on fraud class. We also dynamically calculate the diversity measure-Forward Addition and Backward Elimination. In our experiments, Neural Networks, Decision Trees and Logit Regressions are the base models as the ensemble members and the performance of homogeneous ensembles are compared with that of heterogeneous ensembles. The experimental results show that the reduced size ensemble is as accurate on average over the data-sets tested as the non-pruned version, which provides benefits in terms of its application efficiency and reduced complexity of the ensemble.

Asymmetric Semi-Supervised Boosting Scheme for Interactive Image Retrieval

  • Wu, Jun;Lu, Ming-Yu
    • ETRI Journal
    • /
    • v.32 no.5
    • /
    • pp.766-773
    • /
    • 2010
  • Support vector machine (SVM) active learning plays a key role in the interactive content-based image retrieval (CBIR) community. However, the regular SVM active learning is challenged by what we call "the small example problem" and "the asymmetric distribution problem." This paper attempts to integrate the merits of semi-supervised learning, ensemble learning, and active learning into the interactive CBIR. Concretely, unlabeled images are exploited to facilitate boosting by helping augment the diversity among base SVM classifiers, and then the learned ensemble model is used to identify the most informative images for active learning. In particular, a bias-weighting mechanism is developed to guide the ensemble model to pay more attention on positive images than negative images. Experiments on 5000 Corel images show that the proposed method yields better retrieval performance by an amount of 0.16 in mean average precision compared to regular SVM active learning, which is more effective than some existing improved variants of SVM active learning.

Field observation of sediment suspension in the surf zone (쇄파대의 저질부유에 관한 현지관측)

  • 신승호;율산서소
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2003.05a
    • /
    • pp.141-146
    • /
    • 2003
  • Time seres of suspended sediment concentration, surface elevation and velocity were measured and analysed to investigate the role of waves and the predominance of infra-gravity wave component for sediment suspension phenomena in the surf zone. For the investigation in detail, we adopted the cross spectral analysis method between sediment concentration and the characteristic values of wave, and ensemble average analysis method about long-period wave component, which is dominant to sediment suspension in the measurement point. The obtained results are summarized as follows: 1) The relationship between sediment concentration and the characteristic values of wave is stronger for the long-period standing wave components(about 60s and 30s) than the long wave components(about 100s), which have the most energetic power, 2) and also, it is cleared that sediment concentration is increased in the case of the phase, the velocity components of the first mode long-period standing wave(60sec) were accelerated toward on-shore direction, that is, the water surface in offshore side is higher than on-shore side.

  • PDF

A Study on the Computation Method of Simple Heat Release Rate in Internal Combustion Engine (내열기관에 있어서 열발생율(熱發生率)의 산출방법(算出方法)에 관한 연구)

  • Tak, Y.J.;Ha, J.Y.
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.3 no.1
    • /
    • pp.129-135
    • /
    • 1995
  • This study aims to compare the heat release calculated using the ensemble average of pressure data with the heat release calculated using the least squares method for pressure data. This paper propose a heat release computation method that can analyze the most correct, straight and simple method to analyse combustion phenomenon. In conclusion, we found that the least squares method of third-order was the best computational method for heat release calculation.

  • PDF

Monte Carlo Simulation on the Adsorption Properties of Methane in Zeolite L

  • 문성두;Yoshimori Miyano
    • Bulletin of the Korean Chemical Society
    • /
    • v.18 no.3
    • /
    • pp.291-295
    • /
    • 1997
  • The adsorption of methane in K+ ion exchanged zeolite L has been studied using grand canonical ensemble Monte Carlo simulation. Average number of molecules per unit cell, number density of molecules in zeolite, distribution of molecules per unit cell, average potential per sorbate molecule, and isosteric heats of adsorption were calculated, and these results were compared with experimental results. The simulation results agreed fairly well with experimental ones. All methane molecules were located in the main channel, and the average potential of sorbate molecule was almost constant regardless of average number of molecules per unit cell and the amounts sorbed in zeolite.

Noise Reduction Technique by Three-Points Ensemble Averaging in Uroflowmetry (삼점 신호 평균기법에 의한 요속신호의 잡음 축소 기법)

  • Choi, Seong-Su;Lee, In-Kwang;Lee, Sang-Bong;Park, Jun-Oh;Lee, Su-Ok;Cha, Eun-Jong;Kim, Kyung-Ah
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.8
    • /
    • pp.1638-1643
    • /
    • 2009
  • Uroflowmetry is a convenient clinical test to screen the benign prostatic hyperplasia(BPH) common in the aged men. A load cell is located beneath the urine container to measure the weight of urine. However, it is sensitive to the impact applied on the bottom of the container by the urine stream, which could be a noise source lowering the reliability of the system. With this aim, our study proposed a noise reduction technique by computing ensemble average of the weighted signals that were acquired from three-load cells forming a regular triangle beneath the urine container. Simulated urination experiment was performed with three different collection methods, all of which demonstrated significant noise reduction by ensemble averaging. Furthermore, the best results can be obtained without any special urine collection devices. Thus, our novel method can be usefully applied to uroflowmetry for enhancing measurement in terms of accuracy and reliability.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

Classification Algorithm for Liver Lesions of Ultrasound Images using Ensemble Deep Learning (앙상블 딥러닝을 이용한 초음파 영상의 간병변증 분류 알고리즘)

  • Cho, Young-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.101-106
    • /
    • 2020
  • In the current medical field, ultrasound diagnosis can be said to be the same as a stethoscope in the past. However, due to the nature of ultrasound, it has the disadvantage that the prediction of results is uncertain depending on the skill level of the examiner. Therefore, this paper aims to improve the accuracy of liver lesion detection during ultrasound examination based on deep learning technology to solve this problem. In the proposed paper, we compared the accuracy of lesion classification using a CNN model and an ensemble model. As a result of the experiment, it was confirmed that the classification accuracy in the CNN model averaged 82.33% and the ensemble model averaged 89.9%, about 7% higher. Also, it was confirmed that the ensemble model was 0.97 in the average ROC curve, which is about 0.4 higher than the CNN model.

Malicious Insider Detection Using Boosting Ensemble Methods (앙상블 학습의 부스팅 방법을 이용한 악의적인 내부자 탐지 기법)

  • Park, Suyun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.267-277
    • /
    • 2022
  • Due to the increasing proportion of cloud and remote working environments, various information security incidents are occurring. Insider threats have emerged as a major issue, with cases in which corporate insiders attempting to leak confidential data by accessing it remotely. In response, insider threat detection approaches based on machine learning have been developed. However, existing machine learning methods used to detect insider threats do not take biases and variances into account, which leads to limited performance. In this paper, boosting-type ensemble learning algorithms are applied to verify the performance of malicious insider detection, conduct a close analysis, and even consider the imbalance in datasets to determine the final result. Through experiments, we show that using ensemble learning achieves similar or higher accuracy to other existing malicious insider detection approaches while considering bias-variance tradeoff. The experimental results show that ensemble learning using bagging and boosting methods reached an accuracy of over 98%, which improves malicious insider detection performance by 5.62% compared to the average accuracy of single learning models used.

Study on Control Model Based on Signal Processing In End-Milling Process (엔드밀 공정에서의 신호처리에 따른 제어모델에 관한 연구)

  • 양우석;이건복
    • Proceedings of the Korean Society of Machine Tool Engineers Conference
    • /
    • 2001.04a
    • /
    • pp.192-196
    • /
    • 2001
  • This work describes the modeling of cutting process for feedback control based on signal processing in end-milling. Here, cutting force is used to design control model by a variety of schemes which are moving average, ensemble average, peak value, root mean square and analog low-pass filtering. It is expected that each model offers its own peculiar advantage in following cutting force control.

  • PDF