• Title/Summary/Keyword: generalization error bound

Search Result 6, Processing Time 0.02 seconds

Improving the Generalization Error Bound using Total margin in Support Vector Machines (서포트 벡터 기계에서 TOTAL MARGIN을 이용한 일반화 오차 경계의 개선)

  • Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.75-88
    • /
    • 2004
  • The Support Vector Machine(SVM) algorithm has paid attention on maximizing the shortest distance between sample points and discrimination hyperplane. This paper suggests the total margin algorithm which considers the distance between all data points and the separating hyperplane. The method extends existing support vector machine algorithm. In addition, this newly proposed method improves the generalization error bound. Numerical experiments show that the total margin algorithm provides good performance, comparing with the previous methods.

Data-Adaptive ECOC for Multicategory Classification

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.1
    • /
    • pp.25-36
    • /
    • 2008
  • Error Correcting Output Codes (ECOC) can improve generalization performance when applied to multicategory classification problem. In this study we propose a new criterion to select hyperparameters included in ECOC scheme. Instead of margins of a data we propose to use the probability of misclassification error since it makes the criterion simple. Using this we obtain an upper bound of leave-one-out error of OVA(one vs all) method. Our experiments from real and synthetic data indicate that the bound leads to good estimates of parameters.

  • PDF

Support Vector Machines Controlling Noise Influence Effectively (서포트 벡터 기계에서 잡음 영향의 효과적 조절)

  • Kim, Chul-Eung;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.261-271
    • /
    • 2003
  • Support Vector Machines (SVMs) provide a powerful performance of the learning system. Generally, SVMs tend to make overfitting. For the purpose of overcoming this difficulty, the definition of soft margin has been introduced. In this case, it causes another difficulty to decide the weight for slack variables reflecting soft margin classifiers. Especially, the error of soft margin algorithm can be bounded by a target margin and some norms of the slack vector. In this paper, we formulate a new soft margin algorithm considering the bound of corruption by noise in data directly. Additionally, through a numerical example, we compare the proposed method with a conventional soft margin algorithm.

A Pruning Algorithm of Neural Networks Using Impact Factors (임팩트 팩터를 이용한 신경 회로망의 연결 소거 알고리즘)

  • 이하준;정승범;박철훈
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.2
    • /
    • pp.77-86
    • /
    • 2004
  • In general, small-sized neural networks, even though they show good generalization performance, tend to fail to team the training data within a given error bound, whereas large-sized ones learn the training data easily but yield poor generalization. Therefore, a way of achieving good generalization is to find the smallest network that can learn the data, called the optimal-sized neural network. This paper proposes a new scheme for network pruning with ‘impact factor’ which is defined as a multiplication of the variance of a neuron output and the square of its outgoing weight. Simulation results of function approximation problems show that the proposed method is effective in regression.

A Study on Frequency-Hopped Code Division Multiple Access for Mobile Radio (이동무선통신을 위한 주파수 도약부호 분할 다중접근에 관한 연구)

  • 한영렬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.14 no.3
    • /
    • pp.227-234
    • /
    • 1989
  • In this paper, a new receiver for a frequency-hopped multilevel FSK system for mobile Communications is presented. This new receiver provides an implementation advantage by eliminating unecessary energy detection of all the frequency channels. A performace analysis of the proposed system is carried out by using the union bound. We show that an optimim number of message bits that minimize the word error probability exists for a given number of simultaneous users. This scheme is a generalization of the MFSK system that allows the message bits to be uaried, making it possible for designers to increase the flexibility of implementation. Error probabilities are calculated in the case of fixed bandwidth and fixed tone repetition number. The effect of using error-correcting coding is considered.

  • PDF

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.