• Title/Summary/Keyword: High-Dimensional Features

Search Result 280, Processing Time 0.029 seconds

An SVD-Based Approach for Generating High-Dimensional Data and Query Sets (SVD를 기반으로 한 고차원 데이터 및 질의 집합의 생성)

  • 김상욱
    • The Journal of Information Technology and Database
    • /
    • v.8 no.2
    • /
    • pp.91-101
    • /
    • 2001
  • Previous research efforts on performance evaluation of multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space. However, recent research research result has shown that these hinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the features of the data and query sets that are appropriate for fairly evaluating performances of multidimensional indexes, and then propose HDDQ_Gen(High-Dimensional Data and Query Generator) that satisfies such features. HDDQ_Gen supports the following features : (1) clustered distributions, (2) various object distributions in each cluster, (3) various cluster distributions, (4) various correlations among different dimensions, (5) query distributions depending on data distributions. Using these features, users are able to control tile distribution characteristics of data and query sets. Our contribution is fairly important in that HDDQ_Gen provides the benchmark environment evaluating multidimensional indexes correctly.

  • PDF

Sparse Representation based Two-dimensional Bar Code Image Super-resolution

  • Shen, Yiling;Liu, Ningzhong;Sun, Han
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.4
    • /
    • pp.2109-2123
    • /
    • 2017
  • This paper presents a super-resolution reconstruction method based on sparse representation for two-dimensional bar code images. Considering the features of two-dimensional bar code images, Kirsch and LBP (local binary pattern) operators are used to extract the edge gradient and texture features. Feature extraction is constituted based on these two features and additional two second-order derivatives. By joint dictionary learning of the low-resolution and high-resolution image patch pairs, the sparse representation of corresponding patches is the same. In addition, the global constraint is exerted on the initial estimation of high-resolution image which makes the reconstructed result closer to the real one. The experimental results demonstrate the effectiveness of the proposed algorithm for two-dimensional bar code images by comparing with other reconstruction algorithms.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Recognition of High Impedance Fault Patterns based on Chaotic Features (카오스 어트랙터를 이용한 전력계통의 고저항 지락사고 패턴분류)

  • Shin, Seung-Yeon;Kong, Seong-Gon
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2272-2274
    • /
    • 1998
  • This paper presents recognition and classification of high impedance fault(HIF) patterns in the electrical power systems based on chaotic features. Chaotic features are obtained from two dimensional chaos attractors reconstructed from fault current waveform. The RBFN is trained with the two types of HIF data generated by the electromagnetic transient program and measured from actual faults. The RBFN successfully classifies normal and the three types of fault patterns based on the binary chaotic features.

  • PDF

Demension reduction for high-dimensional data via mixtures of common factor analyzers-an application to tumor classification

  • Baek, Jang-Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.751-759
    • /
    • 2008
  • Mixtures of factor analyzers(MFA) is useful to model the distribution of high-dimensional data on much lower dimensional space where the number of observations is very large relative to their dimension. Mixtures of common factor analyzers(MCFA) can reduce further the number of parameters in the specification of the component covariance matrices as the number of classes is not small. Moreover, the factor scores of MCFA can be displayed in low-dimensional space to distinguish the groups. We propose the factor scores of MCFA as new low-dimensional features for classification of high-dimensional data. Compared with the conventional dimension reduction methods such as principal component analysis(PCA) and canonical covariates(CV), the proposed factor score was shown to have higher correct classification rates for three real data sets when it was used in parametric and nonparametric classifiers.

  • PDF

GB-Index: An Indexing Method for High Dimensional Complex Similarity Queries with Relevance Feedback (GB-색인: 고차원 데이타의 복합 유사 질의 및 적합성 피드백을 위한 색인 기법)

  • Cha Guang-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.4
    • /
    • pp.362-371
    • /
    • 2005
  • Similarity indexing and searching are well known to be difficult in high-dimensional applications such as multimedia databases. Especially, they become more difficult when multiple features have to be indexed together. In this paper, we propose a novel indexing method called the GB-index that is designed to efficiently handle complex similarity queries as well as relevance feedback in high-dimensional image databases. In order to provide the flexibility in controlling multiple features and query objects, the GB-index treats each dimension independently The efficiency of the GB-index is realized by specialized bitmap indexing that represents all objects in a database as a set of bitmaps. Main contributions of the GB-index are three-fold: (1) It provides a novel way to index high-dimensional data; (2) It efficiently handles complex similarity queries; and (3) Disjunctive queries driven by relevance feedback are efficiently treated. Empirical results demonstrate that the GB-index achieves great speedups over the sequential scan and the VA-file.

Nonlinear PLS Monitoring Applied to An Wastewater Treatment Process

  • Bang, Yoon-Ho;Yoo, Chang-Kyoo;Park, Sang-Wook;Lee, In-Beum
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.102.1-102
    • /
    • 2001
  • In this work, extensions to partial least squares (PLS) for wastewater treatment (WWT) process monitoring are discussed. Conventional data gathered by monitoring WWT systems are usually time varying, high dimensional, correlated and nonlinear, PLS has been shown to be an efficient approach in modeling and monitoring high dimensional and correlated data. To represent dynamic and nonlinear features of the data several kinds of dynamic nonlinear PLS (DNLPLS) models have been proposed. However, the complexity and ambiguity of the models make them unsuitable for WWT monitoring, Recently, dynamic fuzzy PLS (DFPLS) was proposed ...

  • PDF

Proposed large-scale modelling of the transient features of a downburst outflow

  • Lin, W.E.;Orf, L.G.;Savory, E.;Novacco, C.
    • Wind and Structures
    • /
    • v.10 no.4
    • /
    • pp.315-346
    • /
    • 2007
  • A preceding companion article introduced the slot jet approach for large-scale quasi-steady modelling of a downburst outflow. This article extends the approach to model the time-dependent features of the outflow. A two-dimensional slot jet with an actuated gate produces a gust with a dominant roll vortex. Two designs for the gate mechanism are investigated. Hot-wire anemometry velocity histories and profiles are presented. As well, a three-dimensional, subcloud numerical model is used to approximate the downdraft microphysics, and to compute stationary and translating outflows at high resolution. The evolution of the horizontal and vertical velocity components is examined. Comparison of the present experimental and numerical results with field observations is encouraging.

Feature reduction for classifying high dimensional data sets using support vector machine (고차원 데이터의 분류를 위한 서포트 벡터 머신을 이용한 피처 감소 기법)

  • Ko, Seok-Ha;Lee, Hyun-Ju
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.877-878
    • /
    • 2008
  • We suggest a feature reduction method to classify mouse function data sets, which integrate several biological data sets represented as high dimensional vectors. To increase classification accuracy and decrease computational overhead, it is important to reduce the dimension of features. To do this, we employed Hybrid Huberized Support Vector Machine with kernels used for a kernel logistic regression method. When compared to support vector machine, this a pproach shows the better accuracy with useful features for each mouse function.

  • PDF

Vehicle Image Recognition Using Deep Convolution Neural Network and Compressed Dictionary Learning

  • Zhou, Yanyan
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.411-425
    • /
    • 2021
  • In this paper, a vehicle recognition algorithm based on deep convolutional neural network and compression dictionary is proposed. Firstly, the network structure of fine vehicle recognition based on convolutional neural network is introduced. Then, a vehicle recognition system based on multi-scale pyramid convolutional neural network is constructed. The contribution of different networks to the recognition results is adjusted by the adaptive fusion method that adjusts the network according to the recognition accuracy of a single network. The proportion of output in the network output of the entire multiscale network. Then, the compressed dictionary learning and the data dimension reduction are carried out using the effective block structure method combined with very sparse random projection matrix, which solves the computational complexity caused by high-dimensional features and shortens the dictionary learning time. Finally, the sparse representation classification method is used to realize vehicle type recognition. The experimental results show that the detection effect of the proposed algorithm is stable in sunny, cloudy and rainy weather, and it has strong adaptability to typical application scenarios such as occlusion and blurring, with an average recognition rate of more than 95%.