• 제목/요약/키워드: Multivariate clustering analysis

검색결과 79건 처리시간 0.026초

Unsupervised Clustering of Multivariate Time Series Microarray Experiments based on Incremental Non-Gaussian Analysis

  • Ng, Kam Swee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Kim, Sun-Hee;Anh, Nguyen Thi Ngoc
    • International Journal of Contents
    • /
    • 제8권1호
    • /
    • pp.23-29
    • /
    • 2012
  • Multiple expression levels of genes obtained using time series microarray experiments have been exploited effectively to enhance understanding of a wide range of biological phenomena. However, the unique nature of microarray data is usually in the form of large matrices of expression genes with high dimensions. Among the huge number of genes presented in microarrays, only a small number of genes are expected to be effective for performing a certain task. Hence, discounting the majority of unaffected genes is the crucial goal of gene selection to improve accuracy for disease diagnosis. In this paper, a non-Gaussian weight matrix obtained from an incremental model is proposed to extract useful features of multivariate time series microarrays. The proposed method can automatically identify a small number of significant features via discovering hidden variables from a huge number of features. An unsupervised hierarchical clustering representative is then taken to evaluate the effectiveness of the proposed methodology. The proposed method achieves promising results based on predictive accuracy of clustering compared to existing methods of analysis. Furthermore, the proposed method offers a robust approach with low memory and computation costs.

Projection Pursuit K-Means Visual Clustering

  • Kim, Mi-Kyung;Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • 제31권4호
    • /
    • pp.519-532
    • /
    • 2002
  • K-means clustering is a well-known partitioning method of multivariate observations. Recently, the method is implemented broadly in data mining softwares due to its computational efficiency in handling large data sets. However, it does not yield a suitable visual display of multivariate observations that is important especially in exploratory stage of data analysis. The aim of this study is to develop a K-means clustering method that enables visual display of multivariate observations in a low-dimensional space, for which the projection pursuit method is adopted. We propose a computationally inexpensive and reliable algorithm and provide two numerical examples.

Clustering Technique for Multivariate Data Analysis

  • Lee, Jin-Ki
    • 한국국방경영분석학회지
    • /
    • 제6권2호
    • /
    • pp.89-127
    • /
    • 1980
  • The multivariate analysis techniques of cluster analysis are examined in this article. The theory and applications of the techniques and computer software concerning these techniques are discussed and sample jobs are included. A hierarchical cluster analysis algorithm, available in the IMSL software package, is applied to a set of data extracted from a group of subjects for the purpose of partitioning a collection of 26 attributes of a weapon system into six clusters of superattributes. A nonhierarchical clustering procedure were applied to a collection of data of tanks considering of twenty-four observations of ten attributes of tanks. The cluster analysis shows that the tanks cluster somewhat naturally by nationality. The principal componant analysis and the discriminant analysis show that tank weight is the single most important discriminator among nationality although they are not shown in this article because of the space restriction. This is a part of thesis for master's degree in operations research.

  • PDF

Fuzzy k-Means Local Centers of the Social Networks

  • Woo, Won-Seok;Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • 제19권2호
    • /
    • pp.213-217
    • /
    • 2012
  • Fuzzy k-means clustering is an attractive alternative to the ordinary k-means clustering in analyzing multivariate data. Fuzzy versions yield more natural output by allowing overlapped k groups. In this study, we modify a fuzzy k-means clustering algorithm to be used for undirected social networks, apply the algorithm to both real and simulated cases, and report the results.

Genetic Diversity and Population Genetic Structure of Black-spotted Pond Frog (Pelophylax nigromaculatus) Distributed in South Korean River Basins

  • Park, Jun-Kyu;Yoo, Nakyung;Do, Yuno
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • 제2권2호
    • /
    • pp.120-128
    • /
    • 2021
  • The objective of this study was to analyze the genotype of black-spotted pond frog (Pelophylax nigromaculatus) using seven microsatellite loci to quantify its genetic diversity and population structure throughout the spatial scale of basins of Han, Geum, Yeongsan, and Nakdong Rivers in South Korea. Genetic diversities in these four areas were compared using diversity index and inbreeding coefficient obtained from the number and frequency of alleles as well as heterozygosity. Additionally, the population structure was confirmed with population differentiation, Nei's genetic distance, multivariate analysis, and Bayesian clustering analysis. Interestingly, a negative genetic diversity pattern was observed in the Han River basin, indicating possible recent habitat disturbances or population declines. In contrast, a positive genetic diversity pattern was found for the population in the Nakdong River basin that had remained the most stable. Results of population structure suggested that populations of black-spotted pond frogs distributed in these four river basins were genetically independent. In particular, the population of the Nakdong River basin had the greatest genetic distance, indicating that it might have originated from an independent population. These results support the use of genetics in addition to designations strictly based on geographic stream areas to define the spatial scale of populations for management and conservation practices.

다변량해석을 이용한 서울시 하계 스모그의 형태 분류 (Multivariate Analysis for Classification of Smog Type during the Summer Season in Seoul, Korea)

  • 홍낙기;이종범;김용국
    • 한국대기환경학회지
    • /
    • 제9권4호
    • /
    • pp.278-287
    • /
    • 1993
  • In order to calssify smog type durnig the summer season in Seoul, air Quality and meterorological data were analyzed by multivariate analysis. Among 15 variables relating to visibility, 10 variables were selected by multiple regression analysis for clustering of smog types; total suspended particle, sulfur dioxide, ozone, ntrogen dioxide, total hydrocarbon, south-north wind component, ralative humidity, precipitable water, mixing height and air temperature. Somg types were grouped into three clusters using cubic clustering criterion and the mumbers of days in each cluster were contained 74, 28 and 16 days. Each cluster was seperated clearly by sulfur dioxide, precipitable water and air teperature. The first cluster was representative of high ozone concentration and prevailing meterological conditions for ozone formation. Therefore, visibility in the first cluster was considered to be affected by photochemical smog. The third cluster showed characteristics of sulphurous smog type due to the higher concentration of primary pollutant, based on the dry condition than that in another cluster. On the other hand, the characteristic of the second cluster was not relatively clear, but considered to be in an intermediate characteristic between photochemical smog and sulphurous smog type.

  • PDF

Simple Compromise Strategies in Multivariate Stratification

  • Park, Inho
    • Communications for Statistical Applications and Methods
    • /
    • 제20권2호
    • /
    • pp.97-105
    • /
    • 2013
  • Stratification (among other applications) is a popular technique used in survey practice to improve the accuracy of estimators. Its full potential benefit can be gained by the effective use of auxiliary variables in stratification related to survey variables. This paper focuses on the problem of stratum formation when multiple stratification variables are available. We first review a variance reduction strategy in the case of univariate stratification. We then discuss its use for multivariate situations in convenient and efficient ways using three methods: compromised measures of size, principal components analysis and a K-means clustering algorithm. We also consider three types of compromising factors to data when using these three methods. Finally, we compare their efficiency using data from MU281 Swedish municipality population.

Nonnegative Matrix Factorization with Orthogonality Constraints

  • Yoo, Ji-Ho;Choi, Seung-Jin
    • Journal of Computing Science and Engineering
    • /
    • 제4권2호
    • /
    • pp.97-109
    • /
    • 2010
  • Nonnegative matrix factorization (NMF) is a popular method for multivariate analysis of nonnegative data, which is to decompose a data matrix into a product of two factor matrices with all entries restricted to be nonnegative. NMF was shown to be useful in a task of clustering (especially document clustering), but in some cases NMF produces the results inappropriate to the clustering problems. In this paper, we present an algorithm for orthogonal nonnegative matrix factorization, where an orthogonality constraint is imposed on the nonnegative decomposition of a term-document matrix. The result of orthogonal NMF can be clearly interpreted for the clustering problems, and also the performance of clustering is usually better than that of the NMF. We develop multiplicative updates directly from true gradient on Stiefel manifold, whereas existing algorithms consider additive orthogonality constraints. Experiments on several different document data sets show our orthogonal NMF algorithms perform better in a task of clustering, compared to the standard NMF and an existing orthogonal NMF.

다변량 L-moment를 이용한 이변량 강우빈도해석에서 수문학적 동질지역 선정 (Hydrological homogeneous region delineation for bivariate frequency analysis of extreme rainfalls in Korea)

  • 신주영;정창삼;주경원;허준행
    • 한국수자원학회논문집
    • /
    • 제51권1호
    • /
    • pp.49-60
    • /
    • 2018
  • 다변량 지역빈도해석은 기존에 사용되어온 다변량 빈도해석과 지역빈도해석의 장점을 가지고 있는 방법으로 다양한 변수를 고려함으로써 수문현상에 대하여 많은 정보를 얻을 수 있다. 현재까지는 우리나라의 수문자료를 이용하여 다변량 지역빈도해석이 시도된 적이 없어 국내의 수문자료를 대상으로 다변량 지역빈도해석의 적용성을 검토할 필요가 있다. 본 연구에서는 다변량 지역빈도해석의 수문학적 동질지역을 설정하는 단계에 집중하여 이변량 수문자료인 연최대 강우량-지속기간 자료에 대하여 수문학적 동질지역을 설정하였다. 이변량 지역빈도해석에서 사용되는 지역구분방법의 한국의 연최대 강우량-지속기간 자료에 대한 적용성을 평가하였고 그 특성을 분석하였다. 기상청 71개 지점에 대하여 분석을 실시하였다. 군집해석방법으로는 K-medoid 방법을 적용하였고, 불일치 척도와 이질성 척도를 이용하여 지역구분이 적절히 되었는지를 판정하였다. 군집해석 결과 한국은 총 5개의 지역으로 나누어지며, 두 지역을 제외하고는 지역 내 모든 지점의 불일치 척도가 기준치 이하인 것으로 나타났다. 자료연수가 짧은 지점에서 불일치 척도가 높게 나오는 것을 확인하였다. 구분된 모든 지역은 지역 내 지점들의 자료들이 동질한 것으로 나타났고 각 지점간의 상관성이 매우 높은 것으로 나타났다.

유전자발현데이터의 군집분석을 위한 나무 의존 성분 분석 (Tree-Dependent Components of Gene Expression Data for Clustering)

  • 김종경;최승진
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2006년도 한국컴퓨터종합학술대회 논문집 Vol.33 No.1 (A)
    • /
    • pp.4-6
    • /
    • 2006
  • Tree-dependent component analysis (TCA) is a generalization of independent component analysis (ICA), the goal of which is to model the multivariate data by a linear transformation of latent variables, while latent variables fit by a tree-structured graphical model. In contrast to ICA, TCA allows dependent structure of latent variables and also consider non-spanning trees (forests). In this paper, we present a TCA-based method of clustering gene expression data. Empirical study with yeast cell cycle-related data, yeast metaboiic shift data, and yeast sporulation data, shows that TCA is more suitable for gene clustering, compared to principal component analysis (PCA) as well as ICA.

  • PDF