• Title/Summary/Keyword: Clustering test

Search Result 379, Processing Time 0.026 seconds

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

Emergent damage pattern recognition using immune network theory

  • Chen, Bo;Zang, Chuanzhi
    • Smart Structures and Systems
    • /
    • v.8 no.1
    • /
    • pp.69-92
    • /
    • 2011
  • This paper presents an emergent pattern recognition approach based on the immune network theory and hierarchical clustering algorithms. The immune network allows its components to change and learn patterns by changing the strength of connections between individual components. The presented immune-network-based approach achieves emergent pattern recognition by dynamically generating an internal image for the input data patterns. The members (feature vectors for each data pattern) of the internal image are produced by an immune network model to form a network of antibody memory cells. To classify antibody memory cells to different data patterns, hierarchical clustering algorithms are used to create an antibody memory cell clustering. In addition, evaluation graphs and L method are used to determine the best number of clusters for the antibody memory cell clustering. The presented immune-network-based emergent pattern recognition (INEPR) algorithm can automatically generate an internal image mapping to the input data patterns without the need of specifying the number of patterns in advance. The INEPR algorithm has been tested using a benchmark civil structure. The test results show that the INEPR algorithm is able to recognize new structural damage patterns.

Development of Datamining Roadmap and Its Application to Water Treatment Plant for Coagulant Control (데이터마이닝 로드맵 개발과 수처리 응집제 제어를 위한 데이터마이닝 적용)

  • Bae, Hyeon;Kim, Sung-Shin;Kim, Ye-Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.7
    • /
    • pp.1582-1587
    • /
    • 2005
  • In coagulant control of water treatment plants, rule extraction, one of datamining categories, was performed for coagulant control of a water treatment plant. Clustering methods were applied to extract control rules from data. These control rules can be used for fully automation of water treatment plants instead of operator's knowledge for plant control. To perform fuzzy clustering, there are some coefficients to be determined and these kinds of studies have been performed over decades such as clustering indices. In this study, statistical indices were taken to calculate the number of clusters. Simultaneously, seed points were found out based on hierarchical clustering. These statistical approaches give information about features of clusters, so it can reduce computing cost and increase accuracy of clustering. The proposed algorithm can play an important role in datamining and knowledge discovery.

Bootstrap Analysis and Major DNA Markers of BM4311 Microsatellite Locus in Hanwoo Chromosome 6

  • Yeo, Jung-Sou;Kim, Jae-Woo;Shin, Hyo-Sub;Lee, Jea-Young
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.17 no.8
    • /
    • pp.1033-1038
    • /
    • 2004
  • LOD scores related to marbling scores and permutation test have been applied for the purpose detecting quantitative trait loci (QTL) and we selected a considerable major locus BM4311. K-means clustering, for the major DNA marker mining of BM4311 microsatellite loci in Hanwoo chromosome 6, has been tried and five traits are divided by three cluster groups. Then, the three cluster groups are classified according to six DNA markers. Finally, bootstrap test method to calculate confidence intervals, using resampling method, has been adapted in order to find major DNA markers. It could be concluded that the major markers of BM4311 locus in Hanwoo chromosome 6 were DNA marker 100 and 95 bp.

Bootstrapping of Hanwoo Chromosome17 Based on BMS1167 Microsatellite Locus

  • Lee, Jea-Young;Lee, Yong-Won;Yeo, Jung-Sou
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.175-184
    • /
    • 2007
  • LOD scores and a permutation test for detecting and locating quantitative trait loci (QTL) from the Hanwoo economic trait have been described and we selected a considerable major BMS1167 locus for further analysis. K-means clustering analysis, for the major DNA marker mining of BMS1167 microsatellite loci in Hanwoo chromosome17, has been tried and three cluster groups divide four traits. The three cluster groups are classified according to eight DNA marker bps. Finally, we employed the bootstrap test method to calculate confidence intervals using the resampling method to find major DNA markers. We conclude that the major marker of BMS1167 locus in Hanwoo chromosome17 is only DNA marker 100bp.

  • PDF

Bootstrapping and DNA marker Mining of BMS941 microsatellite locus in Hanwoo chromosome 17

  • Lee, Jea-Young;Bae, Jung-Hwan;Yeo, Jung-Sou
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.1103-1113
    • /
    • 2007
  • LOD scores and a permutation test for detecting and locating Quantitative trait loci(QTL) from the Hanwoo economic trait have been described and we selected a considerable major BMS941 locus. K -means clustering analysis of eight markers in BMS941 and four traits resulted in three cluster groups. Finally, we applied the bootstrap test method to calculate confidence intervals for finding major DNA markers. We conclude that the major markers of BMS941 locus in Hanwoo chromosome 17 are markers 85bp and 105bp.

  • PDF

Unsupervised Speaker Adaptation Based on Sufficient HMM Statistics (SUFFICIENT HMM 통계치에 기반한 UNSUPERVISED 화자 적응)

  • Ko Bong-Ok;Kim Chong-Kyo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.127-130
    • /
    • 2003
  • This paper describes an efficient method for unsupervised speaker adaptation. This method is based on selecting a subset of speakers who are acoustically close to a test speaker, and calculating adapted model parameters according to the previously stored sufficient HMM statistics of the selected speakers' data. In this method, only a few unsupervised test speaker's data are required for the adaptation. Also, by using the sufficient HMM statistics of the selected speakers' data, a quick adaptation can be done. Compared with a pre-clustering method, the proposed method can obtain a more optimal speaker cluster because the clustering result is determined according to test speaker's data on-line. Experiment results show that the proposed method attains better improvement than MLLR from the speaker independent model. Moreover the proposed method utilizes only one unsupervised sentence utterance, while MLLR usually utilizes more than ten supervised sentence utterances.

  • PDF

Improving the Performance of Document Clustering with Distributional Similarities (분포유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구)

  • Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.267-283
    • /
    • 2007
  • In this study, measures of distributional similarity such as KL-divergence are applied to cluster documents instead of traditional cosine measure, which is the most prevalent vector similarity measure for document clustering. Three variations of KL-divergence are investigated; Jansen-Shannon divergence, symmetric skew divergence, and minimum skew divergence. In order to verify the contribution of distributional similarities to document clustering, two experiments are designed and carried out on three test collections. In the first experiment the clustering performances of the three divergence measures are compared to that of cosine measure. The result showed that minimum skew divergence outperformed the other divergence measures as well as cosine measure. In the second experiment second-order distributional similarities are calculated with Pearson correlation coefficient from the first-order similarity matrixes. From the result of the second experiment, secondorder distributional similarities were found to improve the overall performance of document clustering. These results suggest that minimum skew divergence must be selected as document vector similarity measure when considering both time and accuracy, and second-order similarity is a good choice for considering clustering accuracy only.

A Dimensionality Assessment for Polytomously Scored Items Using DETECT

  • Kim, Hae-Rim
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.597-603
    • /
    • 2000
  • A versatile dimensionality assessment index DETECT has been developed for binary item response data by Kim (1994). The present paper extends the use of DETECT to the polytomously scored item data. A simulation study shows DETECT performs well in differentiating multidimensional data from unidimensional one by yielding a greater value of DETECT in the case of multidimensionality. An additional investigation is necessary for the dimensionally meaningful clustering methods, such as HAC for binary data, particularly sensitive to the polytomous data.

  • PDF

Design and Implementation of Spatial Clustering Method using Regular Grid (균등 격자를 이용한 공간 클러스터링 기법의 설계 및 구현)

  • 문상호
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.05a
    • /
    • pp.485-489
    • /
    • 2003
  • Several clustering methods for spatial data mining have been devised in the literature, but have the following drawback: increase cost due to calculating distance among objects. To solve this problem, we propose a spatial clustering method using regular cells. In this paper, we design and implement file structures, data structures and algorithms to realize the proposed method, also, show experimental results after applying test data to the implemented method.

  • PDF