• Title/Summary/Keyword: Rand′s C statistic

Search Result 3, Processing Time 0.018 seconds

A Comparative Study of Determining the Number of Clusters with a Method Proposed (군집수의 예측에 관한 방법의 제안 및 비교)

  • Chae, Seong-San;Lim, Nam-Kyoo
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.329-341
    • /
    • 2005
  • A method of determining the number of clusters is proposed based on some asymptotic results on the Rand's(1971} $C_k$, k = 2, 3, . . ., N - 1, statistic. Simulation is conducted to compare the proposed method with Chae and Warde(1991), and Huh and Lee(2004).

Recovery Levels of Clustering Algorithms Using Different Similarity Measures for Functional Data

  • Chae, Seong San;Kim, Chansoo;Warde, William D.
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.369-380
    • /
    • 2004
  • Clustering algorithms with different similarity measures are commonly used to find an optimal clustering or close to original clustering. The recovery level of using Euclidean distance and distances transformed from correlation coefficients is evaluated and compared using Rand's (1971) C statistic. The C values present how the resultant clustering is close to the original clustering. In simulation study, the recovery level is improved by applying the correlation coefficients between objects. Using the data set from Spellman et al. (1998), the recovery levels with different similarity measures are also presented. In general, the recovery level of true clusters was increased by using the correlation coefficients.

A Method to Predict the Number of Clusters

  • Chae, Seong-San;Willian D. Warde
    • Journal of the Korean Statistical Society
    • /
    • v.20 no.2
    • /
    • pp.162-176
    • /
    • 1991
  • The problem of determining the number of clusters, K. is the main objective of this study. Attention is focused on the use of Rand(1971)'s $C_{k}$ statistic with some agglomerative clustering algorithms(ACA) defined in the ($\beta$, $\pi$) plane in predicting the number of clusters within the given set of data. The (k, $C_{k}$) plots for k=1, 2, …, N are explored by a Monte Carlo study. Based on its performance, the use of $C_{k}$ with the pair of ACA, (-.5, .75) and (-.25, .0), is recommended for predicting the number of clusters present within a set of data. data.

  • PDF