• Title/Summary/Keyword: Support vector machines(SVM)

Search Result 286, Processing Time 0.03 seconds

Plant leaf Classification Using Orientation Feature Descriptions (방향성 특징 기술자를 이용한 식물 잎 인식)

  • Gang, Su Myung;Yoon, Sang Min;Lee, Joon Jae
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.3
    • /
    • pp.300-311
    • /
    • 2014
  • According to fast change of the environment, the structured study of the ecosystem by analyzing the plant leaves are needed. Expecially, the methodology that searches and classifies the leaves from captured from the smart device have received numerous concerns in the field of computer science and ecology. In this paper, we propose a plant leaf classification technique using shape descriptor by combining Scale Invarinat Feature Transform (SIFT) and Histogram of Oriented Gradient (HOG) from the image segmented from the background via Graphcut algorithm. The shape descriptor is coded in the field of Locality-constrained Linear Coding to optimize the meaningful features from a high degree of freedom. It is connected to Support Vector Machines (SVM) for efficient classification. The experimental results show that our proposed approach is very efficient to classify the leaves which have similar color, and shape.

Efficient variable selection method using conditional mutual information (조건부 상호정보를 이용한 분류분석에서의 변수선택)

  • Ahn, Chi Kyung;Kim, Donguk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1079-1094
    • /
    • 2014
  • In this paper, we study efficient gene selection methods by using conditional mutual information. We suggest gene selection methods using conditional mutual information based on semiparametric methods utilizing multivariate normal distribution and Edgeworth approximation. We compare our suggested methods with other methods such as mutual information filter, SVM-RFE, Cai et al. (2009)'s gene selection (MIGS-original) in SVM classification. By these experiments, we show that gene selection methods using conditional mutual information based on semiparametric methods have better performance than mutual information filter. Furthermore, we show that they take far less computing time than Cai et al. (2009)'s gene selection but have similar performance.

Corporate Corruption Prediction Evidence From Emerging Markets

  • Kim, Yang Sok;Na, Kyunga;Kang, Young-Hee
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.4
    • /
    • pp.13-40
    • /
    • 2021
  • Purpose - The purpose of this study is to predict corporate corruption in emerging markets such as Brazil, Russia, India, and China (BRIC) using different machine learning techniques. Since corruption is a significant problem that can affect corporate performance, particularly in emerging markets, it is important to correctly identify whether a company engages in corrupt practices. Design/methodology/approach - In order to address the research question, we employ predictive analytic techniques (machine learning methods). Using the World Bank Enterprise Survey Data, this study evaluates various predictive models generated by seven supervised learning algorithms: k-Nearest Neighbour (k-NN), Naïve Bayes (NB), Decision Tree (DT), Decision Rules (DR), Logistic Regression (LR), Support Vector Machines (SVM), and Artificial Neural Network (ANN). Findings - We find that DT, DR, SVM and ANN create highly accurate models (over 90% of accuracy). Among various factors, firm age is the most significant, while several other determinants such as source of working capital, top manager experience, and the number of permanent full-time employees also contribute to company corruption. Research implications or Originality - This research successfully demonstrates how machine learning can be applied to predict corporate corruption and also identifies the major causes of corporate corruption.

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

  • Park, Ji-Young;Hong, Tae-Ho
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.139-155
    • /
    • 2009
  • For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Prediction of Parathyroid Hormone Signalling Potency Using SVMs

  • Yoo, Ahrim;Ko, Sunggeon;Lim, Sung-Kil;Lee, Weontae;Yang, Dae Ryook
    • Molecules and Cells
    • /
    • v.27 no.5
    • /
    • pp.547-556
    • /
    • 2009
  • Parathyroid hormone is the most important endocrine regulator of calcium concentration. Its N-terminal fragment (1-34) has sufficient activity for biological function. Recently, site-directed mutagenesis studies demonstrated that substitutions at several positions within shorter analogues (1-14) can enhance the bioactivity to greater than that of PTH (1-34). However, designing the optimal sequence combination is not simple due to complex combinatorial problems. In this study, support vector machines were introduced to predict the biological activity of modified PTH (1-14) analogues using mono-substituted experimental data and to analyze the key physicochemical properties at each position that correlated with bioactivity. This systematic approach can reduce the time and effort needed to obtain desirable molecules by bench experiments and provide useful information in the design of simpler activating molecules.

The Application of RL and SVMs to Decide Action of Mobile Robot

  • Ko, Kwang-won;Oh, Yong-sul;Jung, Qeun-yong;Hoon Heo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.496-499
    • /
    • 2003
  • Support Vector Machines (SVMs) is applied to a practical problem as one of standard tools for machine learning. The application of Reinforcement Learning (RL) and SVMs in action of mobile robot is investigated. A technique to decide the action of autonomous mobile robot in practice is explained in the paper, The proposed method is to find n basis for good action of the system under unknown environment. In multi-dimensional sensor input, the most reasonable action can be automatically decided in each state by RL. Using SVMs, not only optimal decision policy but also generalized state in unknown environment is obtained.

  • PDF

THE AKARI FIS CATALOGUE OF YSOS AND EXTRAGALACTIC OBJECTS

  • Toth, L. Viktor;Marton, Gabor;Zahorecz, Sarolta;Balazs, Lajos G.;Nagy, Andrea
    • Publications of The Korean Astronomical Society
    • /
    • v.32 no.1
    • /
    • pp.49-53
    • /
    • 2017
  • The point sources in the Bright Source Catalogue of the AKARI Far-Infrared Surveyor (FIS) were classified based on their FIR and mid-IR fluxes and colours into young stellar object (YSO) and extragalactic source types using a Quadratic Discriminant Analysis method (QDA) and Support Vector Machines (SVM). The reliability of the selection of YSO candidates is high, and the number of known YSO candidates were increased significantly, that we demonstrate in the case of the nearby open cluster IC348. Our results show that we can separate galactic and extragalactic AKARI point sources in the multidimensioal space of FIR fluxes and colours with high reliability, however, differentiating among the extragalactic sub-types needs further information.

Designed of personalized mail Filtering System using Support vector machines (멀티모델 기반의 개인화된 메일 필터링 시스템)

  • Park, You-Na;Chang, Hwan;Lee, Bog-Ju
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.172-174
    • /
    • 2003
  • 전자우편은 인터넷의 성장과 함께 필수적인 점보교환 수단으로 자리잡고 있다. 그 신속성과 용이성을 이용하여 많은 기업과 업체들이 손쉽게 광고 수단으로 이용하여 이로 인하여 개인과 기업에 큰 피해를 초래하고 있다. 필요한 스팸메일을 선정하여 분류하는데 개인과 조직에 많은 정신적 물리적인 스트레스를 요구한다. 본 논문에서는 통계적 학습 방법인 SVM을 이용하여 지속적으로 변화하는 다양한 스팸메일을 분류하고자 한다. 실험결과는 스팸메일 분류에 안정적인 성능을 보여줄 뿐 아니라 다양한 종류의 스팸메일을 카테고리별로 구분해 내는데 높은 성능을 보여준다.

  • PDF

A STUDY ON SPATIAL FEATURE EXTRACTION IN THE CLASSIFICATION OF HIGH RESOLUTIION SATELLITE IMAGERY

  • Han, You-Kyung;Kim, Hye-Jin;Choi, Jae-Wan;Kim, Yong-Il
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.361-364
    • /
    • 2008
  • It is well known that combining spatial and spectral information can improve land use classification from satellite imagery. High spatial resolution classification has a limitation when only using the spectral information due to the complex spatial arrangement of features and spectral heterogeneity within each class. Therefore, extracting the spatial information is one of the most important steps in high resolution satellite image classification. In this paper, we propose a new spatial feature extraction method. The extracted features are integrated with spectral bands to improve overall classification accuracy. The classification is achieved by applying a Support Vector Machines classifier. In order to evaluate the proposed feature extraction method, we applied our approach to KOMPSAT-2 data and compared the result with the other methods.

  • PDF