• 제목/요약/키워드: feature reduction

검색결과 595건 처리시간 0.029초

Classification of High Dimensionality Data through Feature Selection Using Markov Blanket

  • Lee, Junghye;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • 제14권2호
    • /
    • pp.210-219
    • /
    • 2015
  • A classification task requires an exponentially growing amount of computation time and number of observations as the variable dimensionality increases. Thus, reducing the dimensionality of the data is essential when the number of observations is limited. Often, dimensionality reduction or feature selection leads to better classification performance than using the whole number of features. In this paper, we study the possibility of utilizing the Markov blanket discovery algorithm as a new feature selection method. The Markov blanket of a target variable is the minimal variable set for explaining the target variable on the basis of conditional independence of all the variables to be connected in a Bayesian network. We apply several Markov blanket discovery algorithms to some high-dimensional categorical and continuous data sets, and compare their classification performance with other feature selection methods using well-known classifiers.

Music Genre Classification Based on Timbral Texture and Rhythmic Content Features

  • Baniya, Babu Kaji;Ghimire, Deepak;Lee, Joonwhon
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2013년도 춘계학술발표대회
    • /
    • pp.204-207
    • /
    • 2013
  • Music genre classification is an essential component for music information retrieval system. There are two important components to be considered for better genre classification, which are audio feature extraction and classifier. This paper incorporates two different kinds of features for genre classification, timbral texture and rhythmic content features. Timbral texture contains several spectral and Mel-frequency Cepstral Coefficient (MFCC) features. Before choosing a timbral feature we explore which feature contributes less significant role on genre discrimination. This facilitates the reduction of feature dimension. For the timbral features up to the 4-th order central moments and the covariance components of mutual features are considered to improve the overall classification result. For the rhythmic content the features extracted from beat histogram are selected. In the paper Extreme Learning Machine (ELM) with bagging is used as classifier for classifying the genres. Based on the proposed feature sets and classifier, experiment is performed with well-known datasets: GTZAN databases with ten different music genres, respectively. The proposed method acquires the better classification accuracy than the existing approaches.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제40권8호
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Dimensionality Reduction of Feature Set for API Call based Android Malware Classification

  • Hwang, Hee-Jin;Lee, Soojin
    • 한국컴퓨터정보학회논문지
    • /
    • 제26권11호
    • /
    • pp.41-49
    • /
    • 2021
  • 악성코드를 포함한 모든 응용프로그램은 실행 시 API(Application Programming Interface)를 호출한다. 최근에는 이러한 특성을 활용하여 API Call 정보를 기반으로 악성코드를 탐지하고 분류하는 접근방법이 많은 관심을 받고 있다. 그러나 API Call 정보를 포함하는 데이터세트는 그 양이 방대하여 많은 계산 비용과 처리시간이 필요하다. 또한, 악성코드 분류에 큰 영향을 미치지 않는 정보들이 학습모델의 분류 정확도에 영향을 미칠 수도 있다. 이에 본 논문에서는 다양한 특성 선택(feature selection) 방법을 적용하여 API Call 정보에 대한 차원을 축소시킨 후, 핵심 특성 집합을 추출하는 방안을 제시한다. 실험은 최근 발표된 안드로이드 악성코드 데이터세트인 CICAndMal2020을 이용하였다. 다양한 특성 선택 방법으로 핵심 특성 집합을 추출한 후 CNN(Convolutional Neural Network)을 이용하여 안드로이드 악성코드 분류를 시도하고 결과를 분석하였다. 그 결과 특성 선택 알고리즘에 따라 선택되는 특성 집합이나 가중치 우선순위가 달라짐을 확인하였다. 그리고 이진분류의 경우 특성 집합을 전체 크기의 15% 크기로 줄이더라도 97% 수준의 정확도로 악성코드를 분류하였다. 다중분류의 경우에는 최대 8% 이하의 크기로 특성 집합을 줄이면서도 평균 83%의 정확도를 달성하였다.

전정 유모세포 통합 모델을 이용한 반강성 기전 기반 섬모번들 특성 추정에 관한 연구 (A study on Hair Bundle Feature Estimation Based on Negative Stiffness Mechanism Using Integrated Vestibular Hair Cell Model)

  • 김동영;홍기환;김규성;이상민
    • 대한의용생체공학회:의공학회지
    • /
    • 제34권4호
    • /
    • pp.218-225
    • /
    • 2013
  • In this paper hair bundle feature model and integration method for hair cell models were proposed. The proposed hair bundle feature model was based on spring-damper-mass model. Input of integrated vestibular hair cell model was frequency and output was interspike interval of hair cell that was reflected the feature of hair bundles. Irregular afferents that had a great gain variation showed reduction of negative stiffness section. Regular afferents that had a small gain variation, however, showed same feature with base negative stiffness feature. As a result, integrated vestibular hair cell model showed almost the same modeling data with experimental data in the modeled eleven frequency bands. It is verified that the proposed model is a good model for hair bundle feature modeling.

Prototype-based Classifier with Feature Selection and Its Design with Particle Swarm Optimization: Analysis and Comparative Studies

  • Park, Byoung-Jun;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • 제7권2호
    • /
    • pp.245-254
    • /
    • 2012
  • In this study, we introduce a prototype-based classifier with feature selection that dwells upon the usage of a biologically inspired optimization technique of Particle Swarm Optimization (PSO). The design comprises two main phases. In the first phase, PSO selects P % of patterns to be treated as prototypes of c classes. During the second phase, the PSO is instrumental in the formation of a core set of features that constitute a collection of the most meaningful and highly discriminative coordinates of the original feature space. The proposed scheme of feature selection is developed in the wrapper mode with the performance evaluated with the aid of the nearest prototype classifier. The study offers a complete algorithmic framework and demonstrates the effectiveness (quality of solution) and efficiency (computing cost) of the approach when applied to a collection of selected data sets. We also include a comparative study which involves the usage of genetic algorithms (GAs). Numerical experiments show that a suitable selection of prototypes and a substantial reduction of the feature space could be accomplished and the classifier formed in this manner becomes characterized by low classification error. In addition, the advantage of the PSO is quantified in detail by running a number of experiments using Machine Learning datasets.

Rank-weighted reconstruction feature for a robust deep neural network-based acoustic model

  • Chung, Hoon;Park, Jeon Gue;Jung, Ho-Young
    • ETRI Journal
    • /
    • 제41권2호
    • /
    • pp.235-241
    • /
    • 2019
  • In this paper, we propose a rank-weighted reconstruction feature to improve the robustness of a feed-forward deep neural network (FFDNN)-based acoustic model. In the FFDNN-based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank-weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%.

Improved Feature Selection Techniques for Image Retrieval based on Metaheuristic Optimization

  • Johari, Punit Kumar;Gupta, Rajendra Kumar
    • International Journal of Computer Science & Network Security
    • /
    • 제21권1호
    • /
    • pp.40-48
    • /
    • 2021
  • Content-Based Image Retrieval (CBIR) system plays a vital role to retrieve the relevant images as per the user perception from the huge database is a challenging task. Images are represented is to employ a combination of low-level features as per their visual content to form a feature vector. To reduce the search time of a large database while retrieving images, a novel image retrieval technique based on feature dimensionality reduction is being proposed with the exploit of metaheuristic optimization techniques based on Genetic Algorithm (GA), Extended Binary Cuckoo Search (EBCS) and Whale Optimization Algorithm (WOA). Each image in the database is indexed using a feature vector comprising of fuzzified based color histogram descriptor for color and Median binary pattern were derived in the color space from HSI for texture feature variants respectively. Finally, results are being compared in terms of Precision, Recall, F-measure, Accuracy, and error rate with benchmark classification algorithms (Linear discriminant analysis, CatBoost, Extra Trees, Random Forest, Naive Bayes, light gradient boosting, Extreme gradient boosting, k-NN, and Ridge) to validate the efficiency of the proposed approach. Finally, a ranking of the techniques using TOPSIS has been considered choosing the best feature selection technique based on different model parameters.

Feature-Based Image Retrieval using SOM-Based R*-Tree

  • Shin, Min-Hwa;Kwon, Chang-Hee;Bae, Sang-Hyun
    • 한국산학기술학회:학술대회논문집
    • /
    • 한국산학기술학회 2003년도 Proceeding
    • /
    • pp.223-230
    • /
    • 2003
  • Feature-based similarity retrieval has become an important research issue in multimedia database systems. The features of multimedia data are useful for discriminating between multimedia objects (e 'g', documents, images, video, music score, etc.). For example, images are represented by their color histograms, texture vectors, and shape descriptors, and are usually high-dimensional data. The performance of conventional multidimensional data structures(e'g', R- Tree family, K-D-B tree, grid file, TV-tree) tends to deteriorate as the number of dimensions of feature vectors increases. The R*-tree is the most successful variant of the R-tree. In this paper, we propose a SOM-based R*-tree as a new indexing method for high-dimensional feature vectors.The SOM-based R*-tree combines SOM and R*-tree to achieve search performance more scalable to high dimensionalities. Self-Organizing Maps (SOMs) provide mapping from high-dimensional feature vectors onto a two dimensional space. The mapping preserves the topology of the feature vectors. The map is called a topological of the feature map, and preserves the mutual relationship (similarity) in the feature spaces of input data, clustering mutually similar feature vectors in neighboring nodes. Each node of the topological feature map holds a codebook vector. A best-matching-image-list. (BMIL) holds similar images that are closest to each codebook vector. In a topological feature map, there are empty nodes in which no image is classified. When we build an R*-tree, we use codebook vectors of topological feature map which eliminates the empty nodes that cause unnecessary disk access and degrade retrieval performance. We experimentally compare the retrieval time cost of a SOM-based R*-tree with that of an SOM and an R*-tree using color feature vectors extracted from 40, 000 images. The result show that the SOM-based R*-tree outperforms both the SOM and R*-tree due to the reduction of the number of nodes required to build R*-tree and retrieval time cost.

  • PDF

Supervisor reduction 과 관측함수 설계 (Supervisor redection and observation function design)

  • 조항주
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1991년도 한국자동제어학술회의논문집(국내학술편); KOEX, Seoul; 22-24 Oct. 1991
    • /
    • pp.476-481
    • /
    • 1991
  • This paper investigates the relationship between the two problems, supervisor reduction and observation function (projection) design, which arise in supervisory control of DEDS. It is shown through an example that a reduced supervisor of minimal size does not necessarily result in a maximal projection when a projection design method which uses the transition structure of a supervisor is applied. Also, if an L-realizable projection P is available and if a supervisor has a special structural feature, a cover C for supervisor reduction can be easily obtained. By investigating the control-compatibility of states of the reduced supervisor based on C, we can also check maximality of P in a simple manner.

  • PDF