• Title/Summary/Keyword: Hierarchical Clustering Analysis

Search Result 250, Processing Time 0.033 seconds

Classification of Ambient Particulate Samples Using Cluster Analysis and Disjoint Principal Component Analysis (군집분석법과 분산주성분분석법을 이용한 대기분진시료의 분류)

  • 유상준;김동술
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.13 no.1
    • /
    • pp.51-63
    • /
    • 1997
  • Total suspended particulate matters in the ambient air were analyzed for eight chemical elements (Ca, Co, Cu, Fe, Mn, Pb, Si, and Zn) using an x-ray fluorescence spectrometry (XRF) at the Kyung Hee University - Suwon Campus during 1989 to 1994. To use these data as basis for source identification study, membership of each sample was selected to represent one of the well defined sample groups. The data sets consisting of 83 objects and 8 variables were initially separated into two groups, fine (d$_{p}$<3.3 ${\mu}{\textrm}{m}$) and coarse particle groups (d$_{p}$>3.3 ${\mu}{\textrm}{m}$). A hierarchical clustering method was examined to obtain possible member of homogeneous sample classes for each of the two groups by transforming raw data and by applying various distances. A disjoint principal component analysis was then used to define homogeneous sample classes after deleting outliers. Each of five homogeneous sample classes was determined for the fine and the coarse particle group, respectively. The data were properly classified via an application of logarithmic transformation and Euclidean distance concept. After determining homogeneous classes, correlation coefficients among eight chemical variables within all the homogeneous classes for calculated and meteorological variables (temperature. relative humidity, wind speed, wind direction, and precipitation) were examined as well to intensively interpret environmental factors influencing the characteristics of each class for each group. According to our analysis, we found that each class had its own distinct seasonal pattern that was affected most sensitively by wind direction.ion.

  • PDF

Comparison of 12 Isoflavone Profiles of Soybean (Glycine max (L.) Merrill) Seed Sprouts from Three Different Countries

  • Park, Soo-Yun;Kim, Jae Kwang;Kim, Eun-Hye;Kim, Seung-Hyun;Prabakaran, Mayakrishnan;Chung, Ill-Min
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.63 no.4
    • /
    • pp.360-377
    • /
    • 2018
  • The levels of 12 isoflavones were measured in soybean (Glycine max (L.) Merrill) sprouts of 68 genetic varieties from three countries (China, Japan, and Korea). The isoflavone profile differences were analyzed using data mining methods. A principal component analysis (PCA) revealed that the CSRV021 variety was separated from the others by the first two principal components. This variety appears to be most suited for functional food production due to its high isoflavone levels. Partial least squares discriminant analysis (PLS-DA) and orthogonal projections to latent structures discriminant analysis (OPLS-DA) showed that there are meaningful isoflavone compositional differences in samples that have different countries of origin. Hierarchical clustering analysis (HCA) of these phytochemicals resulted in clusters derived from closely related biochemical pathways. These results indicate the usefulness of metabolite profiling combined with chemometrics as a tool for assessing the quality of foods and identifying metabolic links in biological systems.

Reclassification of the vulnerability group of wartime equipment (군집분석을 이용한 전시장비의 취약성 그룹 재분류)

  • Lee, Hanwoo;Kim, Suhwan;Joo, Kyungsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.581-592
    • /
    • 2015
  • In the GORRAM, the estimation of resource requirements for wartime equipment is based on the ELCON of the USA. The number of vulnerability groups of ELCON are 22, but unfortunately it is hard to determine how the 22 groups are classified. Thus, in this research we collected 505 types of basic items used in wartime and classified those items into new vulnerability groups using AHP and cluster analysis methods. We selected 11 variables through AHP to classify those items with cluster analysis. Next, we decided the number of vulnerability groups through hierarchical clustering and then we classified 505 types of basic items into the new vulnerability groups through K-means clustering.This paper presents new vulnerability groups of 505 types of basic items fitted to Korean weapon systems. Furthermore, our approach can be applied to a new weapon system which needs to be classified into a vulnerability group. We believe that our approach will provide practitioners in the military with a reliable and rational method for classifying wartime equipment and thus consequentially predict the exact estimation of resource requirements in wartime.

The Sliding Window Gene-Shaving Algorithm for Microarray Data Analysis

  • 이혜선;최대우;전치혁
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2002.06a
    • /
    • pp.139-152
    • /
    • 2002
  • Gene-shaving(Hastie et al, 2000) is a very useful method to identify a meaningful group of genes when the variation of expression is large. By shaving off the low-correlated genes with the leading principal component, the primary genes with the coherent expression pattern can be identified. Gene-shaving method works well If expression levels are varied enough, but it may not catch the meaningful cluster in low expression level or different expression time even with coherent patterns. The sliding window gene-shaving method which is to apply gene-shaving in each sliding window after hierarchical clustering is to compensate losing a meaningful set of genes whose variation is not large but distinct. The performance to identify expression patterns is compared for the simulated profile data by the different variance and expression level.

  • PDF

Reachability Plot for Non-monotonic Dendrograms (비단조적 덴드로그램을 위한 Reachability Plot)

  • Jeon, Yong-Kweon;Lee, Tae-Hoon;Lee, Byung-Han;Yoon, Sung-Roh
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.441-443
    • /
    • 2012
  • 계층 군집화 (Hierarchical Clustering)는 전역정보를 활용하여 군집화를 하기 때문에 다양한 군집 분석(Cluster Analysis) 방법들 중에 비교적 많이 이용되고 있으나 군집화의 결과를 덴드로그램의 형태로 나타내 전체 군집들의 정보를 직관적으로 확인하기에는 어려움이 존재한다. 이러한 문제를 개선하기 위해서 기존 Dendrogram의 정보를 크게 훼손하지 않고 직관적으로 클러스터의 정보를 확인할 수 있는 Reachability plot이 개발되었다. 그러나 Centroid Linkage 방식과 같이 덴드로그램이 비단조적이 될 수 있는 계층 군집화에서는 이것을 기존의 Reachability plot 방식으로 변환할 경우 정보가 왜곡 되어 나타날 수 있다. 따라서 우리는 이러한 문제를 해결하기 위한 방법을 제안함으로써 비단조적 덴드로그램의 경우에도 군집들을 정보의 왜곡 없이 표현할 수 있도록 하였다.

Estimating the Number of Clusters using Hotelling's

  • Choi, Kyung-Mee
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.305-312
    • /
    • 2005
  • In the cluster analysis, Hotelling's $T^2$ can be used to estimate the unknown number of clusters based on the idea of multiple comparison procedure. Especially, its threshold is obtained according to the probability of committing the type one error. Examples are used to compare Hotelling's $T^2$ with other classical location test statistics such as Sum-of-Squared Error and Wilks' $\Lambda$ The hierarchical clustering is used to reveal the underlying structure of the data. Also related criteria are reviewed in view of both the between variance and the within variance.

Rapid discrimination system of Chinese cabbage (Brassica rapa) at metabolic level using Fourier transform infrared spectroscopy (FT-IR) based on multivariate analysis (배추 대사체 추출물의 FT-IR 스펙트럼 및 다변량 통계분석을 통한 계통 신속 식별 체계)

  • Ahn, Myung Suk;Lim, Chan Ju;Song, Seung Yeob;Min, Sung Ran;Lee, In Ho;Nou, Ill-Sup;Kim, Suk Weon
    • Journal of Plant Biotechnology
    • /
    • v.43 no.3
    • /
    • pp.383-390
    • /
    • 2016
  • To determine whether FT-IR spectral analysis based on multivariate analysis could be used to discriminate Chinese cabbage breeding line at metabolic level, whole cell extracts of nine different breeding lines (three paternal, three maternal and three $F_1$ lines) were subjected to Fourier transform infrared spectroscopy (FT-IR). FT-IR spectral data of Chinese cabbage plants were analyzed by principal component analysis (PCA), partial least square discriminant analysis (PLS-DA), and hierarchical clustering analysis (HCA). The hierarchical dendrograms based on PLS-DA from two of three cross combinations showed that paternal, maternal, and their progeny $F_1$ lines samples were perfectly separated into three branches in breeding line dependent manner. However, a cross combination failed to fully discriminate them into three branches. Thus, hierarchical dendrograms based on PLS-DA of FT-IR spectral data of Chinese cabbage breeding lines could be used to represent the most probable chemotaxonomical relationship among maternal, paternal, and $F_1$ plants. Furthermore, these metabolic discrimination systems could be applied for rapid selection and classification of useful Chinese cabbage cultivars.

A News Video Mining based on Multi-modal Approach and Text Mining (멀티모달 방법론과 텍스트 마이닝 기반의 뉴스 비디오 마이닝)

  • Lee, Han-Sung;Im, Young-Hee;Yu, Jae-Hak;Oh, Seung-Geun;Park, Dai-Hee
    • Journal of KIISE:Databases
    • /
    • v.37 no.3
    • /
    • pp.127-136
    • /
    • 2010
  • With rapid growth of information and computer communication technologies, the numbers of digital documents including multimedia data have been recently exploded. In particular, news video database and news video mining have became the subject of extensive research, to develop effective and efficient tools for manipulation and analysis of news videos, because of their information richness. However, many research focus on browsing, retrieval and summarization of news videos. Up to date, it is a relatively early state to discover and to analyse the plentiful latent semantic knowledge from news videos. In this paper, we propose the news video mining system based on multi-modal approach and text mining, which uses the visual-textual information of news video clips and their scripts. The proposed system systematically constructs a taxonomy of news video stories in automatic manner with hierarchical clustering algorithm which is one of text mining methods. Then, it multilaterally analyzes the topics of news video stories by means of time-cluster trend graph, weighted cluster growth index, and network analysis. To clarify the validity of our approach, we analyzed the news videos on "The Second Summit of South and North Korea in 2007".

Establishment of rapid discrimination system of leguminous plants at metabolic level using FT-IR spectroscopy with multivariate analysis (FT-IR 스펙트럼 기반 다변량통계분석기법에 의한 두과작물의 대사체 수준 식별체계 확립)

  • Song, Seung-Yeob;Ha, Tae-Joung;Jang, Ki-Chang;Kim, In-Jung;Kim, Suk-Weon
    • Journal of Plant Biotechnology
    • /
    • v.39 no.3
    • /
    • pp.121-126
    • /
    • 2012
  • To determine whether FT-IR spectroscopy combined with multivariate analysis for whole cell extracts can be used to discriminate major leguminous plant at metabolic level, seed extracts of six leguminous plants were subjected to Fourier transform infrared spectroscopy (FT-IR). FT-IR spectral data from seed extracts were analyzed by principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). The PCA could not fully discriminate six leguminous plants, however PLS-DA could successfully discriminate six leguminous plants. The hierarchical dendrogram based on PLS-DA separated the six leguminous plants into four branches. The first branch was consisted of all three Vigna species including Vigna radiata var. radiate, Vigna angularis var. angularis and Vigna unguiculata subsp. Unguiculata. Whereas Pisum sativum var. sativum, Glycine max L and Phaseolus vulgaris var. vulgaris were clustered into a separate branch respectively. The overall results showed that metabolic discrimination system were in accordance with known phylogenic taxonomy. Thus we suggested that the hierarchical dendrogram based on PLS-DA of FT-IR spectral data from seed extracts represented the most probable chemotaxonomical relationship between six leguminous plants.

Performance Comparison of Clustering using Discritization Algorithm (이산화 알고리즘을 이용한 계층적 클러스터링의 실험적 성능 평가)

  • Won, Jae Kang;Lee, Jeong Chan;Jung, Yong Gyu;Lee, Young Ho
    • Journal of Service Research and Studies
    • /
    • v.3 no.2
    • /
    • pp.53-60
    • /
    • 2013
  • Datamining from the large data in the form of various techniques for obtaining information have been developed. In recent years one of the most sought areas of pattern recognition and machine learning method is created with most of existing learning algorithms based on categorical attributes to a rule or decision model. However, the real-world data, it may consist of numeric attributes in many cases. In addition it contains attributes with numerical values to the normal categorical attribute. In this case, therefore, it is required processes in order to use the data to learn an appropriate value for the type attribute. In this paper, the domain of the numeric attributes are divided into several segments using learning algorithm techniques of discritization. It is described Clustering with other data mining techniques. Large amount of first cluster with characteristics is similar records from the database into smaller groups that split multiple given finite patterns in the pattern space. It is close to each other of a set of patterns that together make up a bunch. Among the set without specifying a particular category in a given data by extracting a pattern. It will be described similar grouping of data clustering technique to classify the data.

  • PDF