• Title/Summary/Keyword: Non-clustering

Search Result 394, Processing Time 0.043 seconds

Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting (가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가)

  • Kim, Hyung-Soon;Kim, Young-Kuk;Shin, Young-Wook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

Unsupervised Clustering of Multivariate Time Series Microarray Experiments based on Incremental Non-Gaussian Analysis

  • Ng, Kam Swee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Kim, Sun-Hee;Anh, Nguyen Thi Ngoc
    • International Journal of Contents
    • /
    • v.8 no.1
    • /
    • pp.23-29
    • /
    • 2012
  • Multiple expression levels of genes obtained using time series microarray experiments have been exploited effectively to enhance understanding of a wide range of biological phenomena. However, the unique nature of microarray data is usually in the form of large matrices of expression genes with high dimensions. Among the huge number of genes presented in microarrays, only a small number of genes are expected to be effective for performing a certain task. Hence, discounting the majority of unaffected genes is the crucial goal of gene selection to improve accuracy for disease diagnosis. In this paper, a non-Gaussian weight matrix obtained from an incremental model is proposed to extract useful features of multivariate time series microarrays. The proposed method can automatically identify a small number of significant features via discovering hidden variables from a huge number of features. An unsupervised hierarchical clustering representative is then taken to evaluate the effectiveness of the proposed methodology. The proposed method achieves promising results based on predictive accuracy of clustering compared to existing methods of analysis. Furthermore, the proposed method offers a robust approach with low memory and computation costs.

Different Way of LMP/TAP/MHC Gene Clustering in Vertebrates,. Viviparity and Anti-tumor Immunity Failure

  • Bubanovic, Ivan;Najman, Stevo
    • Animal cells and systems
    • /
    • v.9 no.1
    • /
    • pp.1-7
    • /
    • 2005
  • Class I and class II MHC genes have been identified in most of the jawed vertebrate taxa. In all investigated bony fish species, unlike mammals, the classical class I and class II MHC genes are not linked and even are found on different chromosomes. Linking and clustering of the class I and class II MHC genes is not the only phenomenon clearly detected in the evolution of immune system from cartilaginous to mammals. In all non-mammalian classes the LMP/TAP genes are highly conserved within class I genes region, while these genes are conserved within class II genes region only in mammals. Today we know that LMP/TAP genes in mammals have a crucial role in peptide processing for presentation within class I molecules, as well as in anti-tumor immunity. For these reasons, differences in clustering of LMP/TAP/MHC genes can be responsible for the differences in mechanisms and efficacy of anti-tumor immunity in non-mammalian vertebrates compared to same mechanisms in mammals. Also, the differences in cytokine network and anti-tumor antigens presentation within classes of vertebrates can be explained by toe peculiarity of LMP/TAP/MHC gene clustering.

The clustering of critical points in the evolving cosmic web

  • Shim, Junsup;Codis, Sandrine;Pichon, Christophe;Pogosyan, Dmitri;Cadiou, Corentin
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.46 no.1
    • /
    • pp.47.2-47.2
    • /
    • 2021
  • Focusing on both small separations and baryonic acoustic oscillation scales, the cosmic evolution of the clustering properties of peak, void, wall, and filament-type critical points is measured using two-point correlation functions in ΛCDM dark matter simulations as a function of their relative rarity. A qualitative comparison to the corresponding theory for Gaussian random fields allows us to understand the following observed features: (i) the appearance of an exclusion zone at small separation, whose size depends both on rarity and signature (i.e. the number of negative eigenvalues) of the critical points involved; (ii) the amplification of the baryonic acoustic oscillation bump with rarity and its reversal for cross-correlations involving negatively biased critical points; (iii) the orientation-dependent small-separation divergence of the cross-correlations of peaks and filaments (respectively voids and walls) that reflects the relative loci of such points in the filament's (respectively wall's) eigenframe. The (cross-) correlations involving the most non-linear critical points (peaks, voids) display significant variation with redshift, while those involving less non-linear critical points seem mostly insensitive to redshift evolution, which should prove advantageous to model. The ratios of distances to the maxima of the peak-to-wall and peak-to-void over that of the peak-to-filament cross-correlation are ~2-√~2 and ~3-√~3WJ, respectively, which could be interpreted as the cosmic crystal being on average close to a cubic lattice. The insensitivity to redshift evolution suggests that the absolute and relative clustering of critical points could become a topologically robust alternative to standard clustering techniques when analysing upcoming surveys such as Euclid or Large Synoptic Survey Telescope (LSST).

  • PDF

Spatial Analysis of Common Gastrointestinal Tract Cancers in Counties of Iran

  • Soleimani, Ali;Hassanzadeh, Jafar;Motlagh, Ali Ghanbari;Tabatabaee, Hamidreza;Partovipour, Elham;Keshavarzi, Sareh;Hossein, Mohammad
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.9
    • /
    • pp.4025-4029
    • /
    • 2015
  • Background: Gastrointestinal tract cancers are among the most common cancers in Iran and comprise approximately 38% of all the reported cases of cancer. This study aimed to describe the epidemiology and to investigate spatial clustering of common cancers of the gastrointestinal tract across the counties of Iran using full Bayesian smoothing and Moran I Index statistics. Materials and Methods: The data of the national registry cancer were used in this study. Besides, indirect standardized rates were calculated for 371 counties of Iranand smoothed using Winbug 1.4 software with a full Bayesian method. Global Moran I and local Moran I were also used to investigate clustering. Results: According to the results, 75,644 new cases of cancer were nationally registered in Iran among which 18,019 cases (23.8%) were esophagus, gastric, colorectal, and liver cancers. The results of Global Moran's I test were 0.60 (P=0.001), 0.47 (P=0.001), 0.29 (P=0.001), and 0.40 (P=0.001) for esophagus, gastric, colorectal, and liver cancers, respectively. This shows clustering of the four studied cancers in Iran at the national level. Conclusions: High level clustering of the cases was seen in northern, northwestern, western, and northeastern areas for esophagus, gastric, and colorectal cancers. Considering liver cancer, high clustering was observed in some counties in central, northeastern, and southern areas.

Audio signal clustering and separation using a stacked autoencoder (복층 자기부호화기를 이용한 음향 신호 군집화 및 분리)

  • Jang, Gil-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.4
    • /
    • pp.303-309
    • /
    • 2016
  • This paper proposes a novel approach to the problem of audio signal clustering using a stacked autoencoder. The proposed stacked autoencoder learns an efficient representation for the input signal, enables clustering constituent signals with similar characteristics, and therefore the original sources can be separated based on the clustering results. STFT (Short-Time Fourier Transform) is performed to extract time-frequency spectrum, and rectangular windows at all the possible locations are used as input values to the autoencoder. The outputs at the middle, encoding layer, are used to cluster the rectangular windows and the original sources are separated by the Wiener filters derived from the clustering results. Source separation experiments were carried out in comparison to the conventional NMF (Non-negative Matrix Factorization), and the estimated sources by the proposed method well represent the characteristics of the orignal sources as shown in the time-frequency representation.

Design and development of the clustering algorithm considering weight in spatial data mining (공간 데이터 마이닝에서 가중치를 고려한 클러스터링 알고리즘의 설계와 구현)

  • 김호숙;임현숙;용환승
    • Journal of Intelligence and Information Systems
    • /
    • v.8 no.2
    • /
    • pp.177-187
    • /
    • 2002
  • Spatial data mining is a process to discover interesting relationships and characteristics those exist implicitly in a spatial database. Many spatial clustering algorithms have been developed. But, there are few approaches that focus simultaneously on clustering spatial data and assigning weight to non-spatial attributes of objects. In this paper, we propose a new spatial clustering algorithm, called DBSCAN-W, which is an extension of the existing density-based clustering algorithm DBSCAN. DBSCAN algorithm considers only the location of objects for clustering objects, whereas DBSCAN-W considers not only the location of each object but also its non-spatial attributes relevant to a given application. In DBSCAN-W, each datum has a region represented as a circle of various radius, where the radius means the degree of the importance of the object in the application. We showed that DBSCAN-W is effective in generating clusters reflecting the users requirements through experiments.

  • PDF

Practical Privacy-Preserving DBSCAN Clustering Over Horizontally Partitioned Data (다자간 환경에서 프라이버시를 보호하는 효율적인 DBSCAN 군집화 기법)

  • Kim, Gi-Sung;Jeong, Ik-Rae
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.3
    • /
    • pp.105-111
    • /
    • 2010
  • We propose a practical privacy-preserving clustering protocol over horizontally partitioned data. We extend the DBSCAN clustering algorithm into a distributed protocol in which data providers mix real data with fake data to provide privacy. Our privacy-preserving clustering protocol is very efficient whereas the previous privacy-preserving protocols in the distributed environments are not practical to be used in real applications. The efficiency of our privacy-preserving clustering protocol over horizontally partitioned data is comparable with those of privacy-preserving clustering protocols in the non-distributed environments.

Design and Development of Clustering Algorithm Considering Influences of Spatial Objects (공간객체의 영향력을 고려한 클러스터링 알고리즘의 설계와 구현)

  • Kim, Byung-Cheol
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.12
    • /
    • pp.113-120
    • /
    • 2006
  • This paper proposes DBSCAN-SI that is an algorithm for clustering with influences of spatial objects. DBSCAN-SI that is extended from existing DBSCAN and DBSCAN-W converts from non-spatial properties to the influences of spatial objects during the spatial clustering. It increases probability of inclusion to the cluster according to the higher the influences that is affected by the properties used in clustering and executes the clustering not only respect the spatial distances, but also volume of influences. For the perspective of specific property-centered, the clustering technique proposed in this paper can makeup the disadvantage of existing algorithms that exclude the objects in spite of high influences from cluster by means of being scarcely close objects around the cluster.

  • PDF

The extension of the largest generalized-eigenvalue based distance metric Dij1) in arbitrary feature spaces to classify composite data points

  • Daoud, Mosaab
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.39.1-39.20
    • /
    • 2019
  • Analyzing patterns in data points embedded in linear and non-linear feature spaces is considered as one of the common research problems among different research areas, for example: data mining, machine learning, pattern recognition, and multivariate analysis. In this paper, data points are heterogeneous sets of biosequences (composite data points). A composite data point is a set of ordinary data points (e.g., set of feature vectors). We theoretically extend the derivation of the largest generalized eigenvalue-based distance metric Dij1) in any linear and non-linear feature spaces. We prove that Dij1) is a metric under any linear and non-linear feature transformation function. We show the sufficiency and efficiency of using the decision rule $\bar{{\delta}}_{{\Xi}i}$(i.e., mean of Dij1)) in classification of heterogeneous sets of biosequences compared with the decision rules min𝚵iand median𝚵i. We analyze the impact of linear and non-linear transformation functions on classifying/clustering collections of heterogeneous sets of biosequences. The impact of the length of a sequence in a heterogeneous sequence-set generated by simulation on the classification and clustering results in linear and non-linear feature spaces is empirically shown in this paper. We propose a new concept: the limiting dispersion map of the existing clusters in heterogeneous sets of biosequences embedded in linear and nonlinear feature spaces, which is based on the limiting distribution of nucleotide compositions estimated from real data sets. Finally, the empirical conclusions and the scientific evidences are deduced from the experiments to support the theoretical side stated in this paper.