• Title/Summary/Keyword: Hierarchical Clustering Analysis

Search Result 247, Processing Time 0.028 seconds

A Study on Fuzzy Logic based Clustering Method for Radar Data Analysis (레이더 데이터 분석을 위한 Fuzzy Logic 기반 클러스터링 기법에 관한 연구)

  • Lee, Hansoo;Kim, Eun Kyeong;Kim, Sungshin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.3
    • /
    • pp.217-222
    • /
    • 2015
  • Clustering is one of important data mining techniques known as exploratory data analysis and is being applied in various engineering and scientific fields such as pattern recognition, remote sensing, and so on. The method organizes data by abstracting underlying structure either as a grouping of individuals or as a hierarchy of groups. Weather radar observes atmospheric objects by utilizing reflected signals and stores observed data in corresponding coordinate. To analyze the radar data, it is needed to be separately organized precipitation and non-precipitation echo based on similarities. Thus, this paper studies to apply clustering method to radar data. In addition, in order to solve the problem when precipitation echo locates close to non-precipitation echo, fuzzy logic based clustering method which can consider both distance and other properties such as reflectivity and Doppler velocity is suggested in this paper. By using actual cases, the suggested clustering method derives better results than previous method in near-located precipitation and non-precipitation echo case.

Comparison of clustering methods of microarray gene expression data (마이크로어레이 유전자 발현 자료에 대한 군집 방법 비교)

  • Lim, Jin-Soo;Lim, Dong-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.39-51
    • /
    • 2012
  • Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.

A Comparative Study on Statistical Clustering Methods and Kohonen Self-Organizing Maps for Highway Characteristic Classification of National Highway (일반국도 도로특성분류를 위한 통계적 군집분석과 Kohonen Self-Organizing Maps의 비교연구)

  • Cho, Jun Han;Kim, Seong Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.3D
    • /
    • pp.347-356
    • /
    • 2009
  • This paper is described clustering analysis of traffic characteristics-based highway classification in order to deviate from methodologies of existing highway functional classification. This research focuses on comparing the clustering techniques performance based on the total within-group errors and deriving the optimal number of cluster. This research analyzed statistical clustering method (Hierarchical Ward's minimum-variance method, Nonhierarchical K-means method) and Kohonen self-organizing maps clustering method for highway characteristic classification. The outcomes of cluster techniques compared for the number of samples and traffic characteristics from subsets derived by the optimal number of cluster. As a comprehensive result, the k-means method is superior result to other methods less than 12. For a cluster of more than 20, Kohonen self-organizing maps is the best result in the cluster method. The main contribution of this research is expected to use important the basic road attribution information that produced the highway characteristic classification.

A New Cluster Head Selection Technique based on Remaining Energy of Each Node for Energy Efficiency in WSN

  • Subedi, Sagun;Lee, Sang-Il;Lee, Jae-Hee
    • International journal of advanced smart convergence
    • /
    • v.9 no.2
    • /
    • pp.185-194
    • /
    • 2020
  • Designing of a hierarchical clustering algorithm is one of the numerous approaches to minimize the energy consumption of the Wireless Sensor Networks (WSNs). In this paper, a homogeneous and randomly deployed sensor nodes is considered. These sensors are energy constrained elements. The nominal selection of the Cluster Head (CH) which falls under the clustering part of the network protocol is studied and compared to Low Energy Adaptive Clustering Hierarchy (LEACH) protocol. CHs in this proposed process is the function of total remaining energy of each node as well as total average energy of the whole arrangement. The algorithm considers initial energy, optimum value of cluster heads to elect the next group of cluster heads for the network as well as residual energy. Total remaining energy of each node is compared to total average energy of the system and if the result is positive, these nodes are eligible to become CH in the very next round. Analysis and numerical simulations quantify the efficiency and Average Energy Ratio (AER) of the proposed system.

FCAnalyzer: A Functional Clustering Analysis Tool for Predicted Transcription Regulatory Elements and Gene Ontology Terms

  • Kim, Sang-Bae;Ryu, Gil-Mi;Kim, Young-Jin;Heo, Jee-Yeon;Park, Chan;Oh, Berm-Seok;Kim, Hyung-Lae;Kimm, Ku-Chan;Kim, Kyu-Won;Kim, Young-Youl
    • Genomics & Informatics
    • /
    • v.5 no.1
    • /
    • pp.10-18
    • /
    • 2007
  • Numerous studies have reported that genes with similar expression patterns are co-regulated. From gene expression data, we have assumed that genes having similar expression pattern would share similar transcription factor binding sites (TFBSs). These function as the binding regions for transcription factors (TFs) and thereby regulate gene expression. In this context, various analysis tools have been developed. However, they have shortcomings in the combined analysis of expression patterns and significant TFBSs and in the functional analysis of target genes of significantly overrepresented putative regulators. In this study, we present a web-based A Functional Clustering Analysis Tool for Predicted Transcription Regulatory Elements and Gene Ontology Terms (FCAnalyzer). This system integrates microarray clustering data with similar expression patterns, and TFBS data in each cluster. FCAnalyzer is designed to perform two independent clustering procedures. The first process clusters gene expression profiles using the K-means clustering method, and the second process clusters predicted TFBSs in the upstream region of previously clustered genes using the hierarchical biclustering method for simultaneous grouping of genes and samples. This system offers retrieved information for predicted TFBSs in each cluster using $Match^{TM}$ in the TRANSFAC database. We used gene ontology term analysis for functional annotation of genes in the same cluster. We also provide the user with a combinatorial TFBS analysis of TFBS pairs. The enrichment of TFBS analysis and GO term analysis is statistically by the calculation of P values based on Fisher’s exact test, hypergeometric distribution and Bonferroni correction. FCAnalyzer is a web-based, user-friendly functional clustering analysis system that facilitates the transcriptional regulatory analysis of co-expressed genes. This system presents the analyses of clustered genes, significant TFBSs, significantly enriched TFBS combinations, their target genes and TFBS-TF pairs.

Segmenting Inpatients by Mixture Model and Analytical Hierarchical Process(AHP) Approach In Medical Service (의료서비스에서 혼합모형(Mixture model) 및 분석적 계층과정(AHP)를 이용한 입원환자의 시장세분화에 관한 연구)

  • 백수경;곽영식
    • Health Policy and Management
    • /
    • v.12 no.2
    • /
    • pp.1-22
    • /
    • 2002
  • Since the early 1980s scholars have applied latent structure and other type of finite mixture models from various academic fields. Although the merits of finite mixture model are well documented, the attempt to apply the mixture model to medical service has been relatively rare. The researchers aim to try to fill this gap by introducing finite mixture model and segmenting inpatients DB from one general hospital. In section 2 finite mixture models are compared with clustering, chi-square analysis, and discriminant analysis based on Wedel and Kamakura(2000)'s segmentation methodology schemata. The mixture model shows the optimal segments number and fuzzy classification for each observation by EM(expectation-maximization algorism). The finite mixture model is to unfix the sample, to Identify the groups, and to estimate the parameters of the density function underlying the observed data within each group. In section 3 and 4 we illustrate results of segmenting 4510 patients data including menial and ratio scales. And then, we show AHP can be identify the attractiveness of each segment, in which the decision maker can select the best target segment.

Estimation of Harvest Period and Cultivated Region of Commercial Green Tea by Pattern Recognition (패턴인식법에 의한 시판 녹차의 산지 및 채엽시기 추정)

  • Zhu, Hong-Mei;Kim, Jung-Sook;Park, Kyung-Lae;Cho, Cheong-Weon;Kim, Young-Sup;Kim, Jung-Woo;Ryu, Shi-Yong;Kang, Jong-Seong
    • YAKHAK HOEJI
    • /
    • v.53 no.2
    • /
    • pp.51-59
    • /
    • 2009
  • Quantitative analysis of (+)-catechin (C), (-)-epigallocatechin (EGC), (-)-epicatechin (EC), (-)-epigallocatechin gallate (EGCG), (-)-epicatechin gallate (ECG) and caffeine in commercial green tea was carried out by HPLC employing gradient elution of 0.1% acetic acid and acetonitrile on ODS column. The optimized HPLC method provided satisfactory linearity, accuracy and precision. The relationship between the concentration of the components and cultivated region of the commercial green tea was not significant, while the concentration of EGCG, ECG and caffeine decreased significantly in the later harvested green tea samples (p<0.01). Multivariate analysis of the components was performed in order to characterize and evaluate the cultivated region and harvest period-related variation. Hierarchical clustering and discriminant analysis were applied to classify the geographical and seasonal origins of the green tea samples. The classification accuracy of the cultivated region and harvest period by discriminant analysis was 95% and 91%, respectively, indicating that this method could be reliable and convenient for the quality control of herbal products with different origin.

Comparative Study of the Rhei Rhizoma by Pattern Analysis (패턴분석법에 의한 대황의 비교 연구)

  • Kang, Jong-Seong;Park, Ki-Ju;Wu, En-Qi;Lee, Eun-Sil;Hwang, Gwi-Seo;Lee, Hyun-Sun;Kim, Young-Ho
    • Korean Journal of Pharmacognosy
    • /
    • v.39 no.3
    • /
    • pp.179-185
    • /
    • 2008
  • Three species, such as Rheum palmatum L., R. tanguticum Maxim. and R. officinale Baillon are recognized as the source plants of Rhei Rhizoma in Korean Pharmacopeia. However, other herbal sources such as R. undulatum L. and Rumex crispus L. have been often misused as Rhei Rhizoma. A pattern analysis method to discriminate Rhei Rhizoma in Korean Pharmacopeia from other herbal plants using HPLC and TLC chromatograms was developed. The multivariate peak data of the chromatograms of methanol extracts of Rhei Rhizoma were used for hierarchical clustering analysis, principal components analysis and similarity calculation. Besides of the statistic analysis, TLC patterns of samples could be used as criteria of the discrimination. The developed pattern analysis method was specific and could be readily utilized for comprehensive evaluation of Rhei Rhizoma.

Regional Extension of the Neural Network Model for Storm Surge Prediction Using Cluster Analysis (군집분석을 이용한 국지해일모델 지역확장)

  • Lee, Da-Un;Seo, Jang-Won;Youn, Yong-Hoon
    • Atmosphere
    • /
    • v.16 no.4
    • /
    • pp.259-267
    • /
    • 2006
  • In the present study, the neural network (NN) model with cluster analysis method was developed to predict storm surge in the whole Korean coastal regions with special focuses on the regional extension. The model used in this study is NN model for each cluster (CL-NN) with the cluster analysis. In order to find the optimal clustering of the stations, agglomerative method among hierarchical clustering methods was used. Various stations were clustered each other according to the centroid-linkage criterion and the cluster analysis should stop when the distances between merged groups exceed any criterion. Finally the CL-NN can be constructed for predicting storm surge in the cluster regions. To validate model results, predicted sea level value from CL-NN model was compared with that of conventional harmonic analysis (HA) and of the NN model in each region. The forecast values from NN and CL-NN models show more accuracy with observed data than that of HA. Especially the statistics analysis such as RMSE and correlation coefficient shows little differences between CL-NN and NN model results. These results show that cluster analysis and CL-NN model can be applied in the regional storm surge prediction and developed forecast system.

Identification of Biomarkers for Diagnosis of Gastric Cancer by Bioinformatics

  • Wang, Da-Guang;Chen, Guang;Wen, Xiao-Yu;Wang, Dan;Cheng, Zhi-Hua;Sun, Si-Qiao
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.4
    • /
    • pp.1361-1365
    • /
    • 2015
  • Background: We aimed to discover potential gene biomarkers for gastric cancer (GC) diagnosis. Materials and Methods: Genechips of 10 GC tissues and 10 gastric mucosa (GM, para-carcinoma tissue, normal control) tissues were generated using an exon array of Affymetrix containing 30,000 genes. The differentially expressed genes (DEGs) between GC tissues and normal control were identified by the Limma package and analyzed by hierarchical clustering analysis. Gene ontology (GO) and pathway enrichment analyses were performed for investigating the functions of DEGs. Receiver operating characteristics (ROC) analysis was performed to measure the effects of biomarker candidates for diagnosis of GC. Results: Totals of 896 up-regulated and 60 down-regulated DEGs were identified to be differentially expressed between GC samples and normal control. Hierarchical clustering analysis showed that DEGs were highly differentially expressed and most DEGs were up-regulated. The most significantly enriched GO-BP term was revealed to be mitotic cell cycle and the most significantly enriched pathway was cell cycle. The intersection analysis showed that most significant DEGs were cyclin B1 (CCNB1) and cyclin B2 (CCNB2). The sensitivities and specificities of CCNB1 and CCNB2 were both high (p<0.0001). Areas under the ROC curve for CCNB1 and CCNB2 were both greater than 0.9 (p<0.0001). Conclusions: CCNB1 and CCNB2, which were involved in cell cycle, played significant roles in the progression and development of GC and these genes may be potential biomarkers for diagnosis and prognosis of GC.