• Title/Summary/Keyword: Clustering Coefficient

Search Result 193, Processing Time 0.024 seconds

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

Comparative Analysis of Learning Methods of Fuzzy Clustering-based Neural Network Pattern Classifier (퍼지 클러스터링기반 신경회로망 패턴 분류기의 학습 방법 비교 분석)

  • Kim, Eun-Hu;Oh, Sung-Kwun;Kim, Hyun-Ki
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.9
    • /
    • pp.1541-1550
    • /
    • 2016
  • In this paper, we introduce a novel learning methodology of fuzzy clustering-based neural network pattern classifier. Fuzzy clustering-based neural network pattern classifier depicts the patterns of given classes using fuzzy rules and categorizes the patterns on unseen data through fuzzy rules. Least squares estimator(LSE) or weighted least squares estimator(WLSE) is typically used in order to estimate the coefficients of polynomial function, but this study proposes a novel coefficient estimate method which includes advantages of the existing methods. The premise part of fuzzy rule depicts input space as "If" clause of fuzzy rule through fuzzy c-means(FCM) clustering, while the consequent part of fuzzy rule denotes output space through polynomial function such as linear, quadratic and their coefficients are estimated by the proposed local least squares estimator(LLSE)-based learning. In order to evaluate the performance of the proposed pattern classifier, the variety of machine learning data sets are exploited in experiments and through the comparative analysis of performance, it provides that the proposed LLSE-based learning method is preferable when compared with the other learning methods conventionally used in previous literature.

Evaluating the Performance of Four Selections in Genetic Algorithms-Based Multispectral Pixel Clustering

  • Kutubi, Abdullah Al Rahat;Hong, Min-Gee;Kim, Choen
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.151-166
    • /
    • 2018
  • This paper compares the four selections of performance used in the application of genetic algorithms (GAs) to automatically optimize multispectral pixel cluster for unsupervised classification from KOMPSAT-3 data, since the selection among three main types of operators including crossover and mutation is the driving force to determine the overall operations in the clustering GAs. Experimental results demonstrate that the tournament selection obtains a better performance than the other selections, especially for both the number of generation and the convergence rate. However, it is computationally more expensive than the elitism selection with the slowest convergence rate in the comparison, which has less probability of getting optimum cluster centers than the other selections. Both the ranked-based selection and the proportional roulette wheel selection show similar performance in the average Euclidean distance using the pixel clustering, even the ranked-based is computationally much more expensive than the proportional roulette. With respect to finding global optimum, the tournament selection has higher potential to reach the global optimum prior to the ranked-based selection which spends a lot of computational time in fitness smoothing. The tournament selection-based clustering GA is used to successfully classify the KOMPSAT-3 multispectral data achieving the sufficient the matic accuracy assessment (namely, the achieved Kappa coefficient value of 0.923).

Smallest-Small-World Cellular Genetic Algorithms (최소좁은세상 셀룰러 유전알고리즘)

  • Kang, Tae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.11
    • /
    • pp.971-983
    • /
    • 2007
  • Cellular Genetic Algorithms(CGAs) are a subclass of Genetic Algorithms(GAs) in which each individuals are placed in a given geographical distribution. In general, CGAs# population space is a regular network that has relatively long characteristic path length and high clustering coefficient in the view of the Networks Theory. Long average path length makes the genetic interaction of remote nodes slow. If we have the population#s path length shorter with keeping the high clustering coefficient value, CGAs# population space will converge faster without loss of diversity. In this paper, we propose Smallest-Small-World Cellular Genetic Algorithms(SSWCGAs). In SSWCGAs, each individual lives in a population space that is highly clustered but having shorter characteristic path length, so that the SSWCGAs promote exploration of the search space with no loss of exploitation tendency that comes from being clustered. Some experiments along with four real variable functions and two GA-hard problems show that the SSWCGAs are more effective than SGAs and CGAs.

Lossless Compression for Hyperspectral Images based on Adaptive Band Selection and Adaptive Predictor Selection

  • Zhu, Fuquan;Wang, Huajun;Yang, Liping;Li, Changguo;Wang, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3295-3311
    • /
    • 2020
  • With the wide application of hyperspectral images, it becomes more and more important to compress hyperspectral images. Conventional recursive least squares (CRLS) algorithm has great potentiality in lossless compression for hyperspectral images. The prediction accuracy of CRLS is closely related to the correlations between the reference bands and the current band, and the similarity between pixels in prediction context. According to this characteristic, we present an improved CRLS with adaptive band selection and adaptive predictor selection (CRLS-ABS-APS). Firstly, a spectral vector correlation coefficient-based k-means clustering algorithm is employed to generate clustering map. Afterwards, an adaptive band selection strategy based on inter-spectral correlation coefficient is adopted to select the reference bands for each band. Then, an adaptive predictor selection strategy based on clustering map is adopted to select the optimal CRLS predictor for each pixel. In addition, a double snake scan mode is used to further improve the similarity of prediction context, and a recursive average estimation method is used to accelerate the local average calculation. Finally, the prediction residuals are entropy encoded by arithmetic encoder. Experiments on the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) 2006 data set show that the CRLS-ABS-APS achieves average bit rates of 3.28 bpp, 5.55 bpp and 2.39 bpp on the three subsets, respectively. The results indicate that the CRLS-ABS-APS effectively improves the compression effect with lower computation complexity, and outperforms to the current state-of-the-art methods.

Dynamic Hysteresis Model Based on Fuzzy Clustering Approach

  • Mourad, Mordjaoui;Bouzid, Boudjema
    • Journal of Electrical Engineering and Technology
    • /
    • v.7 no.6
    • /
    • pp.884-890
    • /
    • 2012
  • Hysteretic behavior model of soft magnetic material usually used in electrical machines and electronic devices is necessary for numerical solution of Maxwell equation. In this study, a new dynamic hysteresis model is presented, based on the nonlinear dynamic system identification from measured data capabilities of fuzzy clustering algorithm. The developed model is based on a Gustafson-Kessel (GK) fuzzy approach used on a normalized gathered data from measured dynamic cycles on a C core transformer made of 0.33mm laminations of cold rolled SiFe. The number of fuzzy rules is optimized by some cluster validity measures like 'partition coefficient' and 'classification entropy'. The clustering results from the GK approach show that it is not only very accurate but also provides its effectiveness and potential for dynamic magnetic hysteresis modeling.

Grouping stocks using dynamic linear models

  • Sihyeon, Kim;Byeongchan, Seong
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.695-708
    • /
    • 2022
  • Recently, several studies have been conducted using state space model. In this study, a dynamic linear model with state space model form is applied to stock data. The monthly returns for 135 Korean stocks are fitted to a dynamic linear model, to obtain an estimate of the time-varying 𝛽-coefficient time-series. The model formula used for the return is a capital asset pricing model formula explained in economics. In particular, the transition equation of the state space model form is appropriately modified to satisfy the assumptions of the error term. k-shape clustering is performed to classify the 135 estimated 𝛽 time-series into several groups. As a result of the clustering, four clusters are obtained, each consisting of approximately 30 stocks. It is found that the distribution is different for each group, so that it is well grouped to have its own characteristics. In addition, a common pattern is observed for each group, which could be interpreted appropriately.

Improving the Performance of Document Clustering with Distributional Similarities (분포유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구)

  • Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.267-283
    • /
    • 2007
  • In this study, measures of distributional similarity such as KL-divergence are applied to cluster documents instead of traditional cosine measure, which is the most prevalent vector similarity measure for document clustering. Three variations of KL-divergence are investigated; Jansen-Shannon divergence, symmetric skew divergence, and minimum skew divergence. In order to verify the contribution of distributional similarities to document clustering, two experiments are designed and carried out on three test collections. In the first experiment the clustering performances of the three divergence measures are compared to that of cosine measure. The result showed that minimum skew divergence outperformed the other divergence measures as well as cosine measure. In the second experiment second-order distributional similarities are calculated with Pearson correlation coefficient from the first-order similarity matrixes. From the result of the second experiment, secondorder distributional similarities were found to improve the overall performance of document clustering. These results suggest that minimum skew divergence must be selected as document vector similarity measure when considering both time and accuracy, and second-order similarity is a good choice for considering clustering accuracy only.

A Comparative Study on Clustering Methods for Grouping Related Tags (연관 태그의 군집화를 위한 클러스터링 기법 비교 연구)

  • Han, Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.3
    • /
    • pp.399-416
    • /
    • 2009
  • In this study, clustering methods with related tags were discussed for improving search and exploration in the tag space. The experiments were performed on 10 Delicious tags and the strongly-related tags extracted by each 300 documents, and hierarchical and non-hierarchical clustering methods were carried out based on the tag co-occurrences. To evaluate the experimental results, cluster relevance was measured. Results showed that Ward's method with cosine coefficient, which shows good performance to term clustering, was best performed with consistent clustering tendency. Furthermore, it was analyzed that cluster membership among related tags is based on users' tagging purposes or interest and can disambiguate word sense. Therefore, tag clusters would be helpful for improving search and exploration in the tag space.

Image coding using quad-tree of wavelet coefficients (Wavelet coefficients의 quad-tree를 이용한 이미지 압축)

  • 김성탁;추형석;이태호;전희성;안종구
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.08a
    • /
    • pp.313-316
    • /
    • 2000
  • Wavelet transform has specific properties for image coding. The property used at this Paper is clustering of significant coefficients across subband. These coefficients are classified in significant coefficient and insignificant coefficient on a threshold value, and symbolized EZW decreases symbol-position information using zero-trees, but threshold value fall for raising resolution, then coding cost of significant coefficients is expensive. To avoid this fact, this paper uses quad-tree representing coefficient-position information. a magnitude of significant coefficient is represented on matrix used at EZW. the proposed algorithm is hoped for raising a coding cost.

  • PDF