• Title/Summary/Keyword: Hierarchical Clustering Analysis

Search Result 250, Processing Time 0.024 seconds

A Composite Cluster Analysis Approach for Component Classification (컴포넌트 분류를 위한 복합 클러스터 분석 방법)

  • Lee, Sung-Koo
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.89-96
    • /
    • 2007
  • Various classification methods have been developed to reuse components. These classification methods enable the user to access the needed components quickly and easily. Conventional classification approaches include the following problems: a labor-intensive domain analysis effort to build a classification structure, the representation of the inter-component relationships, difficult to maintain as the domain evolves, and applied to a limited domain. In order to solve these problems, this paper describes a composite cluster analysis approach for component classification. The cluster analysis approach is a combination of a hierarchical cluster analysis method, which generates a stable clustering structure automatically, and a non-hierarchical cluster analysis concept, which classifies new components automatically. The clustering information generated from the proposed approach can support the domain analysis process.

Feature Extraction of Concepts by Independent Component Analysis

  • Chagnaa, Altangerel;Ock, Cheol-Young;Lee, Chang-Beom;Jaimai, Purev
    • Journal of Information Processing Systems
    • /
    • v.3 no.1
    • /
    • pp.33-37
    • /
    • 2007
  • Semantic clustering is important to various fields in the modem information society. In this work we applied the Independent Component Analysis method to the extraction of the features of latent concepts. We used verb and object noun information and formulated a concept as a linear combination of verbs. The proposed method is shown to be suitable for our framework and it performs better than a hierarchical clustering in latent semantic space for finding out invisible information from the data.

QCanvas: An Advanced Tool for Data Clustering and Visualization of Genomics Data

  • Kim, Nayoung;Park, Herin;He, Ningning;Lee, Hyeon Young;Yoon, Sukjoon
    • Genomics & Informatics
    • /
    • v.10 no.4
    • /
    • pp.263-265
    • /
    • 2012
  • We developed a user-friendly, interactive program to simultaneously cluster and visualize omics data, such as DNA and protein array profiles. This program provides diverse algorithms for the hierarchical clustering of two-dimensional data. The clustering results can be interactively visualized and optimized on a heatmap. The present tool does not require any prior knowledge of scripting languages to carry out the data clustering and visualization. Furthermore, the heatmaps allow the selective display of data points satisfying user-defined criteria. For example, a clustered heatmap of experimental values can be differentially visualized based on statistical values, such as p-values. Including diverse menu-based display options, QCanvas provides a convenient graphical user interface for pattern analysis and visualization with high-quality graphics.

A Method for Comparing Multiple Bacterial Community Structures from 16S rDNA Clone Library Sequences

  • Hur, Inae;Chun, Jongsik
    • Journal of Microbiology
    • /
    • v.42 no.1
    • /
    • pp.9-13
    • /
    • 2004
  • Culture-independent approaches, based on 16S rDNA sequences, are extensively used in modern microbial ecology. Sequencing of the clone library generated from environmental DNA has advantages over fingerprint-based methods, such as denaturing gradient gel electrophoresis, as it provides precise identification and quantification of the phylotypes present in samples. However, to date, no method exists for comparing multiple bacterial community structures using clone library sequences. In this study, an automated method to achieve this has been developed, by applying pair wise alignment, hierarchical clustering and principle component analysis. The method has been demonstrated to be successful in comparing samples from various environments. The program, named CommCluster, was written in JAVA, and is now freely available, at http://chunlab.snu.ac.kr/commcluster/.

Assessing Throughput and Availability based on Hierarchical Clustering in Wireless Sensor Networks (계층적 클러스터링을 기반으로 하는 무선 센서 네트워크의 Throughput 과 Availability 평가)

  • Lee Jun-Hyuk;Oh Young-Hwan
    • Journal of Applied Reliability
    • /
    • v.5 no.4
    • /
    • pp.465-486
    • /
    • 2005
  • A unreliable network system results in unsatisfied performance. A performance criterion of a network is throughput and availability. One of the most compelling technological advances of this decade has been the advent of deploying wireless networks of heterogeneous smart sensor nodes for complex information gathering tasks, The advancement and popularization of wireless communication technologies make more efficiency to network devices with wireless technology than with wired technology. Recently, the research of wireless sensor network has been drawing much attentions. In this paper, We evaluate throughput and availability of wireless sensor network, which have hierarchical structure based on clustering and estimate the maximum hroughput, average throughput and availability of the network considering several link failure patterns likely to happen at a cluster consisted of sensor nodes. Also increasing a number of sensor nodes in a cluster, We analysis the average throughput and availability of the network.

  • PDF

Performance Comparison of Clustering Techniques for Spatio-Temporal Data (시공간 데이터를 위한 클러스터링 기법 성능 비교)

  • Kang Nayoung;Kang Juyoung;Yong Hwan-Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.15-37
    • /
    • 2004
  • With the growth in the size of datasets, data mining has recently become an important research topic. Especially, interests about spatio-temporal data mining has been increased which is a method for analyzing massive spatio-temporal data collected from a wide variety of applications like GPS data, trajectory data of surveillance system and earth geographic data. In the former approaches, conventional clustering algorithms are applied as spatio-temporal data mining techniques without any modification. In this paper, we focused to SOM that is the most common clustering algorithm applied to clustering analysis in data mining wet and develop the spatio-temporal data mining module based on it. In addition, we analyzed the clustering results of developed SOM module and compare them with those of K-means and Agglomerative Hierarchical algorithm in the aspects of homogeneity, separation, separation, silhouette width and accuracy. We also developed specialized visualization module fur more accurate interpretation of mining result.

  • PDF

Unsupervised Clustering of Multivariate Time Series Microarray Experiments based on Incremental Non-Gaussian Analysis

  • Ng, Kam Swee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Kim, Sun-Hee;Anh, Nguyen Thi Ngoc
    • International Journal of Contents
    • /
    • v.8 no.1
    • /
    • pp.23-29
    • /
    • 2012
  • Multiple expression levels of genes obtained using time series microarray experiments have been exploited effectively to enhance understanding of a wide range of biological phenomena. However, the unique nature of microarray data is usually in the form of large matrices of expression genes with high dimensions. Among the huge number of genes presented in microarrays, only a small number of genes are expected to be effective for performing a certain task. Hence, discounting the majority of unaffected genes is the crucial goal of gene selection to improve accuracy for disease diagnosis. In this paper, a non-Gaussian weight matrix obtained from an incremental model is proposed to extract useful features of multivariate time series microarrays. The proposed method can automatically identify a small number of significant features via discovering hidden variables from a huge number of features. An unsupervised hierarchical clustering representative is then taken to evaluate the effectiveness of the proposed methodology. The proposed method achieves promising results based on predictive accuracy of clustering compared to existing methods of analysis. Furthermore, the proposed method offers a robust approach with low memory and computation costs.

Performance Evaluation of Distributed Clustering Protocol under Distance Estimation Error

  • Nguyen, Quoc Kien;Jeon, Taehyun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.10 no.1
    • /
    • pp.11-15
    • /
    • 2018
  • The application of Wireless Sensor Networks requires a wise utilization of limited energy resources. Therefore, a wide range of routing protocols with a motivation to prolong the lifetime of a network has been proposed in recent years. Hierarchical clustering based protocols have become an object of a large number of studies that aim to efficiently utilize the limited energy of network components. In this paper, the effect of mismatch in parameter estimation is discussed to evaluate the robustness of a distanced based algorithm called distributed clustering protocol in homogeneous and heterogeneous environment. For quantitative analysis, performance simulations for this protocol are carried out in terms of the network lifetime which is the main criteria of efficiency for the energy limited system.

Identification of Unknown Cryptographic Communication Protocol and Packet Analysis Using Machine Learning (머신러닝을 활용한 알려지지 않은 암호통신 프로토콜 식별 및 패킷 분류)

  • Koo, Dongyoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.193-200
    • /
    • 2022
  • Unknown cryptographic communication protocols may have advantage of guaranteeing personal and data privacy, but when used for malicious purposes, it is almost impossible to identify and respond to using existing network security equipment. In particular, there is a limit to manually analyzing a huge amount of traffic in real time. Therefore, in this paper, we attempt to identify packets of unknown cryptographic communication protocols and separate fields comprising a packet by using machine learning techniques. Using sequential patterns analysis, hierarchical clustering, and Pearson's correlation coefficient, we found that the structure of packets can be automatically analyzed even for an unknown cryptographic communication protocol.

Statistical bioinformatics for gene expression data

  • Lee, Jae-K.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.08a
    • /
    • pp.103-127
    • /
    • 2001
  • Gene expression studies require statistical experimental designs and validation before laboratory confirmation. Various clustering approaches, such as hierarchical, Kmeans, SOM are commonly used for unsupervised learning in gene expression data. Several classification methods, such as gene voting, SVM, or discriminant analysis are used for supervised lerning, where well-defined response classification is possible. Estimating gene-condition interaction effects require advanced, computationally-intensive statistical approaches.

  • PDF