• Title/Summary/Keyword: Multivariate clustering analysis

Search Result 79, Processing Time 0.024 seconds

Discrimination of cultivation ages and cultivars of ginseng leaves using Fourier transform infrared spectroscopy combined with multivariate analysis

  • Kwon, Yong-Kook;Ahn, Myung Suk;Park, Jong Suk;Liu, Jang Ryol;In, Dong Su;Min, Byung Whan;Kim, Suk Weon
    • Journal of Ginseng Research
    • /
    • v.38 no.1
    • /
    • pp.52-58
    • /
    • 2014
  • To determine whether Fourier transform (FT)-IR spectral analysis combined with multivariate analysis of whole-cell extracts from ginseng leaves can be applied as a high-throughput discrimination system of cultivation ages and cultivars, a total of total 480 leaf samples belonging to 12 categories corresponding to four different cultivars (Yunpung, Kumpung, Chunpung, and an open-pollinated variety) and three different cultivation ages (1 yr, 2 yr, and 3 yr) were subjected to FT-IR. The spectral data were analyzed by principal component analysis and partial least squares-discriminant analysis. A dendrogram based on hierarchical clustering analysis of the FT-IR spectral data on ginseng leaves showed that leaf samples were initially segregated into three groups in a cultivation age-dependent manner. Then, within the same cultivation age group, leaf samples were clustered into four subgroups in a cultivar-dependent manner. The overall prediction accuracy for discrimination of cultivars and cultivation ages was 94.8% in a cross-validation test. These results clearly show that the FT-IR spectra combined with multivariate analysis from ginseng leaves can be applied as an alternative tool for discriminating of ginseng cultivars and cultivation ages. Therefore, we suggest that this result could be used as a rapid and reliable F1 hybrid seed-screening tool for accelerating the conventional breeding of ginseng.

Neural-based Blind Modeling of Mini-mill ASC Crown

  • Lee, Gang-Hwa;Lee, Dong-Il;Lee, Seung-Joon;Lee, Suk-Gyu;Kim, Shin-Il;Park, Hae-Doo;Park, Seung-Gap
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.6
    • /
    • pp.577-582
    • /
    • 2002
  • Neural network can be trained to approximate an arbitrary nonlinear function of multivariate data like the mini-mill crown values in Automatic Shape Control. The trained weights of neural network can evaluate or generalize the process data outside the training vectors. Sometimes, the blind modeling of the process data is necessary to compare with the scattered analytical model of mini-mill process in isolated electro-mechanical forms. To come up with a viable model, we propose the blind neural-based range-division domain-clustering piecewise-linear modeling scheme. The basic ideas are: 1) dividing the range of target data, 2) clustering the corresponding input space vectors, 3)training the neural network with clustered prototypes to smooth out the convergence and 4) solving the resulting matrix equations with a pseudo-inverse to alleviate the ill-conditioning problem. The simulation results support the effectiveness of the proposed scheme and it opens a new way to the data analysis technique. By the comparison with the statistical regression, it is evident that the proposed scheme obtains better modeling error uniformity and reduces the magnitudes of errors considerably. Approximatly 10-fold better performance results.

Wing Morphometric Analysis of Psylla elaeagni Complex (Homoptera : Psyllidae) (보리나무이종군의 날개에 대한 수량형태학적 분석 (동시목: 나무이과))

  • Park, Hee-Cheon;Lee, Chang-Eon;Kim, Hoon-Soo
    • Animal Systematics, Evolution and Diversity
    • /
    • no.nspc2
    • /
    • pp.243-250
    • /
    • 1988
  • The wing morphometric characters of P.elaeagni complex feeding on the genus Elaeagnus plants was analysed by the multivariate methods using clustering of generalized distance and discriminant analysis. On the clustering of the species, the effect of sexual differences, seasonal variation and geographic population sensitively appeared . However, four species of this group was precicely divided by the discriminant analysis.

  • PDF

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls (군집분석 비교 및 한우 관능평가데이터 군집화)

  • Kim, Jae-Hee;Ko, Yoon-Sil
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.745-758
    • /
    • 2009
  • Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds (Stiefel 다양체에서 곱셈의 업데이트를 이용한 비음수 행렬의 직교 분해)

  • Yoo, Ji-Ho;Choi, Seung-Jin
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.5
    • /
    • pp.347-352
    • /
    • 2009
  • Nonnegative matrix factorization (NMF) is a popular method for multivariate analysis of nonnegative data, the goal of which is decompose a data matrix into a product of two factor matrices with all entries in factor matrices restricted to be nonnegative. NMF was shown to be useful in a task of clustering (especially document clustering). In this paper we present an algorithm for orthogonal nonnegative matrix factorization, where an orthogonality constraint is imposed on the nonnegative decomposition of a term-document matrix. We develop multiplicative updates directly from true gradient on Stiefel manifold, whereas existing algorithms consider additive orthogonality constraints. Experiments on several different document data sets show our orthogonal NMF algorithms perform better in a task of clustering, compared to the standard NMF and an existing orthogonal NMF.

Improving data reliability on oligonucleotide microarray

  • Yoon, Yeo-In;Lee, Young-Hak;Park, Jin-Hyun
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.107-116
    • /
    • 2004
  • The advent of microarray technologies gives an opportunity to moni tor the expression of ten thousands of genes, simultaneously. Such microarray data can be deteriorated by experimental errors and image artifacts, which generate non-negligible outliers that are estimated by 15% of typical microarray data. Thus, it is an important issue to detect and correct the se faulty probes prior to high-level data analysis such as classification or clustering. In this paper, we propose a systematic procedure for the detection of faulty probes and its proper correction in Genechip array based on multivariate statistical approaches. Principal component analysis (PCA), one of the most widely used multivariate statistical approaches, has been applied to construct a statistical correlation model with 20 pairs of probes for each gene. And, the faulty probes are identified by inspecting the squared prediction error (SPE) of each probe from the PCA model. Then, the outlying probes are reconstructed by the iterative optimization approach minimizing SPE. We used the public data presented from the gene chip project of human fibroblast cell. Through the application study, the proposed approach showed good performance for probe correction without removing faulty probes, which may be desirable in the viewpoint of the maximum use of data information.

  • PDF

Clustering-based Monitoring and Fault detection in Hot Strip Roughing Mill (군집기반 열간조압연설비 상태모니터링과 진단)

  • SEO, MYUNG-KYO;YUN, WON YOUNG
    • Journal of Korean Society for Quality Management
    • /
    • v.45 no.1
    • /
    • pp.25-38
    • /
    • 2017
  • Purpose: Hot strip rolling mill consists of a lot of mechanical and electrical units. In condition monitoring and diagnosis phase, various units could be failed with unknown reasons. In this study, we propose an effective method to detect early the units with abnormal status to minimize system downtime. Methods: The early warning problem with various units is defined. K-means and PAM algorithm with Euclidean and Manhattan distances were performed to detect the abnormal status. In addition, an performance of the proposed algorithm is investigated by field data analysis. Results: PAM with Manhattan distance(PAM_ManD) showed better results than K-means algorithm with Euclidean distance(K-means_ED). In addition, we could know from multivariate field data analysis that the system reliability of hot strip rolling mill can be increased by detecting early abnormal status. Conclusion: In this paper, clustering-based monitoring and fault detection algorithm using Manhattan distance is proposed. Experiments are performed to study the benefit of the PAM with Manhattan distance against the K-means with Euclidean distance.

Establishment of discrimination system using multivariate analysis of FT-IR spectroscopy data from different species of artichoke (Cynara cardunculus var. scolymus L.) (FT-IR 스펙트럼 데이터 기반 다변량통계분석기법을 이용한 아티초크의 대사체 수준 품종 분류)

  • Kim, Chun Hwan;Seong, Ki-Cheol;Jung, Young Bin;Lim, Chan Kyu;Moon, Doo Gyung;Song, Seung Yeob
    • Horticultural Science & Technology
    • /
    • v.34 no.2
    • /
    • pp.324-330
    • /
    • 2016
  • To determine whether FT-IR spectral analysis based on multivariate analysis for whole cell extracts can be used to discriminate between artichoke (Cynara cardunculus var. scolymus L.) plants at the metabolic level, leaves of ten artichoke plants were subjected to Fourier transform infrared(FT-IR) spectroscopy. FT-IR spectral data from leaves were analyzed by principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). FT-IR spectra confirmed typical spectral differences between the frequency regions of 1,700-1,500, 1,500-1,300 and $1,100-950cm^{-1}$, respectively. These spectral regions reflect the quantitative and qualitative variations of amide I, II from amino acids and proteins ($1,700-1,500cm^{-1}$), phosphodiester groups from nucleic acid and phospholipid ($1,500-1,300cm^{-1}$) and carbohydrate compounds ($1,100-950cm^{-1}$). PCA revealed separate clusters that corresponded to their species relationship. Thus, PCA could be used to distinguish between artichoke species with different metabolite contents. PLS-DA showed similar species classification of artichoke. Furthermore these metabolic discrimination systems could be used for the rapid selection and classification of useful artichoke cultivars.

A dimensional reduction method in cluster analysis for multidimensional data: principal component analysis and factor analysis comparison (다차원 데이터의 군집분석을 위한 차원축소 방법: 주성분분석 및 요인분석 비교)

  • Hong, Jun-Ho;Oh, Min-Ji;Cho, Yong-Been;Lee, Kyung-Hee;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.135-143
    • /
    • 2020
  • This paper proposes a pre-processing method and a dimensional reduction method in the analysis of shopping carts where there are many correlations between variables when dividing the types of consumers in the agri-food consumer panel data. Cluster analysis is a widely used method for dividing observational objects into several clusters in multivariate data. However, cluster analysis through dimensional reduction may be more effective when several variables are related. In this paper, the food consumption data surveyed of 1,987 households was clustered using the K-means method, and 17 variables were re-selected to divide it into the clusters. Principal component analysis and factor analysis were compared as the solution for multicollinearity problems and as the way to reduce dimensions for clustering. In this study, both principal component analysis and factor analysis reduced the dataset into two dimensions. Although the principal component analysis divided the dataset into three clusters, it did not seem that the difference among the characteristics of the cluster appeared well. However, the characteristics of the clusters in the consumption pattern were well distinguished under the factor analysis method.

Clustering of Metabolic Risk Factors and Its Related Risk Factors in Young Schoolchildren (초등학교 저학년 어린이에서의 대사위험요인 군집의 분포와 관련 위험요인)

  • Kong, Kyoung-Ae;Park, Bo-Hyun;Min, Jung-Won;Hong, Ju-Hee;Hong, Young-Sun;Lee, Bo-Eun;Chang, Nam-Soo;Lee, Sun-Hwa;Ha, Eun-Hee;Park, Hye-Sook
    • Journal of Preventive Medicine and Public Health
    • /
    • v.39 no.3
    • /
    • pp.235-242
    • /
    • 2006
  • Objectives: We wanted to determine the distribution of the clustering of the metabolic risk factors and we wanted to evaluate the related factors in young schoolchildren. Methods: A cross-sectional study of metabolic syndrome was conducted in an elementary school in Seoul, Korea. We evaluated fasting glucose, triglyceride, HDL cholesterol, blood pressures and the body mass index, and we used parent-reported questionnaires to assess the potential risk factors in 261 children (136 boys, 125 girls). We defined the metabolic risk factors as obesity or at risk for obesity ($\geqq$ 85th percentile for age and gender), a systolic or diastolic blood pressure at $\geqq90th$ percentile for age and gender, fasting glucose at $\geqq110mg/dl$, triglyceride at $\geqq110mg/dl$ and HDL cholesterol at $\leqq40mg/dl$. Results: There were 15.7% of the subjects who showed clustering of two or more metabolic risk factors, 2.3% of the subjects who showed clustering for three or more risk factors, and 0.8% of the subjects who showed clustering for four or more risk factors. A multivariate analysis revealed that a father smoking more than 20 cigarettes per day, a mother with a body mass index of = $25kg/m^2$, and the child eating precooked or frozen food more than once per day were associated with clustering of two or more components, with the odds ratios of 3.61 (95% CI=1.24-10.48), 5.50 (95% CI=1.39-21.73) and 8.04 (95% CI=1.67-38.81), respectively. Conclusions: This study shows that clustering of the metabolic risk factors is present in young schoolchildren in Korea, with the clustering being associated with parental smoking and obesity as well as the child's eating behavior. These results suggest that evaluation of metabolic risk factors and intervention for lifestyle factors may be needed in both young Korean children and their parents.