• Title/Summary/Keyword: support vector data description

Search Result 51, Processing Time 0.026 seconds

KMSVOD: Support Vector Data Description using K-means Clustering (KMSVDD: K-means Clustering을 이용한 Support Vector Data Description)

  • Kim, Pyo-Jae;Chang, Hyung-Jin;Song, Dong-Sung;Choi, Jin-Young
    • Proceedings of the KIEE Conference
    • /
    • 2006.04a
    • /
    • pp.90-92
    • /
    • 2006
  • 기존의 Support Vector Data Description (SVDD) 방법은 학습 데이터의 개수가 증가함에 따라 학습 시간이 지수 함수적으로 증가하므로, 대량의 데이터를 학습하는 데에는 한계가 있었다. 본 논문에서는 학습 속도를 빠르게 하기 위해 K-means clustering 알고리즘을 이용하는 SVDD 알고리즘을 제안하고자 한다. 제안된 알고리즘은 기존의 decomposition 방법과 유사하게 K-means clustering 알고리즘을 이용하여 학습 데이터 영역을 sub-grouping한 후 각각의 sub-group들을 개별적으로 학습함으로써 계산량 감소 효과를 얻는다. 이러한 sub-grouping 과정은 hypersphere를 이용하여 학습 데이터를 둘러싸는 SVDD의 학습 특성을 훼손시키지 않으면서 중심점으로 모여진 작은 영역의 학습 데이터를 학습하도록 함으로써, 기존의 SVDD와 비교하여 학습 정확도의 차이 없이 빠른 학습을 가능하게 한다. 다양한 데이터들을 이용한 모의실험을 통하여 그 효과를 검증하도록 한다.

  • PDF

New Kernel-Based Normality Recovery Method and Applications (새로운 커널 기반 정상 상태 복구 기법과 응용)

  • Kang Dae-Sung;Park Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.410-415
    • /
    • 2006
  • The SVDD(support vector data description) is one of the most important one-class support vector learning methods, which depends on the strategy of utilizing the balls defined on the feature space to discriminate the normal data from all other possible abnormal objects. This paper addresses on the extension of the SVDD method toward the problem of recovering the normal contents from the data contaminated with noises. The validity of the proposed de-noising method is shown via application to recovering the high-resolution images from the low-resolution images based on the high-resolution training data.

Fuzzy One Class Support Vector Machine (퍼지 원 클래스 서포트 벡터 머신)

  • Kim, Ki-Joo;Choi, Young-Sik
    • Journal of Internet Computing and Services
    • /
    • v.6 no.3
    • /
    • pp.159-170
    • /
    • 2005
  • OC-SVM(One Class Support Vector Machine) avoids solving a full density estimation problem, and instead focuses on a simpler task, estimating quantiles of a data distribution, i.e. its support. OC-SVM seeks to estimate regions where most of data resides and represents the regions as a function of the support vectors, Although OC-SVM is powerful method for data description, it is difficult to incorporate human subjective importance into its estimation process, In order to integrate the importance of each point into the OC-SVM process, we propose a fuzzy version of OC-SVM. In FOC-SVM (Fuzzy One-Class Support Vector Machine), we do not equally treat data points and instead weight data points according to the importance measure of the corresponding objects. That is, we scale the kernel feature vector according to the importance measure of the object so that a kernel feature vector of a less important object should contribute less to the detection process of OC-SVM. We demonstrate the performance of our algorithm on several synthesized data sets, Experimental results showed the promising results.

  • PDF

A Modified Approach to Density-Induced Support Vector Data Description

  • Park, Joo-Young;Kang, Dae-Sung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.7 no.1
    • /
    • pp.1-6
    • /
    • 2007
  • The SVDD (support vector data description) is one of the most well-known one-class support vector learning methods, in which one tries the strategy of utilizing balls defined on the feature space in order to distinguish a set of normal data from all other possible abnormal objects. Recently, with the objective of generalizing the SVDD which treats all training data with equal importance, the so-called D-SVDD (density-induced support vector data description) was proposed incorporating the idea that the data in a higher density region are more significant than those in a lower density region. In this paper, we consider the problem of further improving the D-SVDD toward the use of a partial reference set for testing, and propose an LMI (linear matrix inequality)-based optimization approach to solve the improved version of the D-SVDD problems. Our approach utilizes a new class of density-induced distance measures based on the RSDE (reduced set density estimator) along with the LMI-based mathematical formulation in the form of the SDP (semi-definite programming) problems, which can be efficiently solved by interior point methods. The validity of the proposed approach is illustrated via numerical experiments using real data sets.

Performance Comparison of Clustering Validity Indices with Business Applications (경영사례를 이용한 군집화 유효성 지수의 성능비교)

  • Lee, Soo-Hyun;Jeong, Youngseon;Kim, Jae-Yun
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.41 no.2
    • /
    • pp.17-33
    • /
    • 2016
  • Clustering is one of the leading methods to analyze big data and is used in many different fields. This study deals with Clustering Validity Index (CVI) to verify the effectiveness of clustering results. We compare the performance of CVIs with business applications of various field. In this study, the used CVIs for comparing performance are DU, CH, DB, SVDU, SVCH, and SVDB. The first three CVIs are well-known ones in the existing research and the last three CVIs are based on support vector data description. It has been verified with outstanding performance and qualified as the application ability of CVIs based on support vector data description.

Support vector quantile regression for longitudinal data

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.2
    • /
    • pp.309-316
    • /
    • 2010
  • Support vector quantile regression (SVQR) is capable of providing more complete description of the linear and nonlinear relationships among response and input variables. In this paper we propose a weighted SVQR for the longitudinal data. Furthermore, we introduce the generalized approximate cross validation function to select the hyperparameters which affect the performance of SVQR. Experimental results are the presented, which illustrate the performance of the proposed SVQR.

Anomaly Detection and Diagnostics (ADD) Based on Support Vector Data Description (SVDD) for Energy Consumption in Commercial Building (SVDD를 활용한 상업용 건물에너지 소비패턴의 이상현상 감지)

  • Chae, Young-Tae
    • Journal of Korean Institute of Architectural Sustainable Environment and Building Systems
    • /
    • v.12 no.6
    • /
    • pp.579-590
    • /
    • 2018
  • Anomaly detection on building energy consumption has been regarded as an effective tool to reduce energy saving on building operation and maintenance. However, it requires energy model and FDD expert for quantitative model approach or large amount of training data for qualitative/history data approach. Both method needs additional time and labors. This study propose a machine learning and data science approach to define faulty conditions on hourly building energy consumption with reducing data amount and input requirement. It suggests an application of Support Vector Data Description (SVDD) method on training normal condition of hourly building energy consumption incorporated with hourly outdoor air temperature and time integer in a week, 168 data points and identifying hourly abnormal condition in the next day. The result shows the developed model has a better performance when the ${\nu}$ (probability of error in the training set) is 0.05 and ${\gamma}$ (radius of hyper plane) 0.2. The model accuracy to identify anomaly operation ranges from 70% (10% increase anomaly) to 95% (20% decrease anomaly) for daily total (24 hours) and from 80% (10% decrease anomaly) to 10%(15% increase anomaly) for occupied hours, respectively.

A Study on Hierarchical Distributed Intrusion Detection for Secure Home Networks Service (안전한 홈네트워크 서비스를 위한 계층적 분산 침입탐지에 관한 연구)

  • Yu, Jae-Hak;Choi, Sung-Back;Yang, Sung-Hyun;Park, Dai-Hee;Chung, Yong-Wha
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.1
    • /
    • pp.49-57
    • /
    • 2008
  • In this paper, we propose a novel hierarchical distributed intrusion detection system, named HNHDIDS(Home Network Hierarchical Distributed Intrusion Detection System), which is not only based on the structure of distributed intrusion detection system, but also fully consider the environment of secure home networks service. The proposed system is hierarchically composed of the one-class support vector machine(support vector data description) and local agents, in which it is designed for optimizing for the environment of secure home networks service. We support our findings with computer experiments and analysis.

K-means Support Vector Data Description concerning Negative data (Negative data를 고려한 K-means Support Vector Data Description)

  • Song, Dong-Sung;Kim, Pyo-Jae;Chang, Hyung-Jin;Choi, Jin-Young
    • Proceedings of the KIEE Conference
    • /
    • 2007.04a
    • /
    • pp.310-312
    • /
    • 2007
  • SVDD는 one-class 분류기법이지만, 다중 클래스 분류에도 적용될 수 있다. 이 때 타 클래스의 data가 고려 대상 클래스의 학습된 경계안에 들어오지 않도록 하기 위하여 negative data를 고려한 SVDD방법이 사용되어 왔다. 그러나 이 방법은, 고려해야 하는 데이터 수가 늘어남에 따라 학습에 소요되는 시간이 증가하게 되는 문제점을 가지고 있다. 본 논문에서는 negative data를 고려한 학습 시, SVDD대신 KMSVDD를 사용하고 negative data일 가능성이 없는 영역에 놓인 데이터를 제외하는 기법을 사용함으로써 학습시간의 증가를 완화하는 방법을 제안하고자 한다. 이를 통해서 대상 클래스에 속하지 않은 모든 data를 negative data로 고려하여 학습을 진행할 때 보다 빠른 시간에 유사한 결과를 얻을 수 있다. 몇 가지 모의실험을 통하여 그 효과를 검증하도록 한다.

  • PDF

GACV for partially linear support vector regression

  • Shim, Jooyong;Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.2
    • /
    • pp.391-399
    • /
    • 2013
  • Partially linear regression is capable of providing more complete description of the linear and nonlinear relationships among random variables. In support vector regression (SVR) the hyper-parameters are known to affect the performance of regression. In this paper we propose an iterative reweighted least squares (IRWLS) procedure to solve the quadratic problem of partially linear support vector regression with a modified loss function, which enables us to use the generalized approximate cross validation function to select the hyper-parameters. Experimental results are then presented which illustrate the performance of the partially linear SVR using IRWLS procedure.