Browse > Article
http://dx.doi.org/10.7465/jkdi.2015.26.2.367

A study on the ordering of PIM family similarity measures without marginal probability  

Park, Hee Chang (Department of Statistics, Changwon National University)
Publication Information
Journal of the Korean Data and Information Science Society / v.26, no.2, 2015 , pp. 367-376 More about this Journal
Abstract
Today, big data has become a hot keyword in that big data may be defined as collection of data sets so huge and complex that it becomes difficult to process by traditional methods. Clustering method is to identify the information in a big database by assigning a set of objects into the clusters so that the objects in the same cluster are more similar to each other clusters. The similarity measures being used in the cluster analysis may be classified into various types depending on the nature of the data. In this paper, we computed upper and lower limits for probability interestingness measure based similarity measures without marginal probability such as Yule I and II, Michael, Digby, Baulieu, and Dispersion measure. And we compared these measures by real data and simulated experiment. By Warrens (2008), Coefficients with the same quantities in the numerator and denominator, that are bounded, and are close to each other in the ordering, are likely to be more similar. Thus, results on bounds provide means of classifying various measures. Also, knowing which coefficients are similar provides insight into the stability of a given algorithm.
Keywords
Big data; cluster analysis; marginal probability; probabilistic interestingness measure; similarity measure;
Citations & Related Records
Times Cited By KSCI : 7  (Citation Analysis)
연도 인용수 순위
1 Baulieu, F. B. (1989). A classification of presence/absence based dissimilarity coefficients. Journal of Classification, 6, 233-246.   DOI
2 Choi, S. S., Cha, S. H. and Tappert, C. (2010). A survey of binary similarity and distance measures. Journal on Systemics, Cybernetics and Informatics, 8, 43-48.
3 Gordon, A. D. (1999). Classification, Chapman & Hall, London-New York.
4 Kim, M., Jeon, J., Woo, K. and Kim, M. (2010). A new similarity measure for categorical attribute-based clustering. Journal of Korean Institute of Information Scientists and Engineers : Databases, 37, 71-81.
5 Lee, J. H. (2013). Big data, data mining and temporary reproduction. The Journal of Intellectual Property, 8, 93-125.   DOI
6 Lee, K. A. and Kim, J. H. (2011). Comparison of clustering with yeast microarray gene expression data. Journal of the Korean Data & Information Science Society, 22, 741-753.
7 Lim, J. S. and Lim, D. H. (2012). Comparison of clustering methods of microarray gene expression data. Journal of the Korean Data & Information Science Society, 23, 39-51.   DOI   ScienceOn
8 Michael, E. L. (1920). Marine ecology and the coefficient of association. Journal of Animal Ecology, 8, 54-59.   DOI   ScienceOn
9 Park, H. C. (2012). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135.   DOI   ScienceOn
10 Park, H. C. (2014). Comparison of cosine family similarity measures in the aspect of association rule. Journal of the Korean Data Analysis Society, 16, 729-737.
11 Park, H. J. and Kim, J. T. (2013). Classification of universities in Daegu.Gyungpook by support vector cluster analysis. Journal of the Korean Data & Information Science Society, 24, 783-791.   DOI   ScienceOn
12 Ryu, J. Y. and Park, H. C. (2013). A study on Jaccard dissimilarity measures for negative association rule generation. Journal of the Korean Data Analysis Society, 15, 3111-3121.
13 Stanfill, C. and Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29, 1213-1228.   DOI   ScienceOn
14 Warrens, M. J. (2008). Bounds of resemblance measures for binary (presence/absence) variables. Journal of Classification, 25, 195-208.   DOI
15 Yeo, I. K. (2011). Clustering analysis of Korea's meteorological data. Journal of the Korean Data & Information Science Society, 22, 941-949.
16 Yule, G. U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society, 75, 257-319.
17 Yule, G. U. (1912). On the methods of measuring the association between two attributes. Journal of the Royal Statistical Society, 75, 579-652.   DOI   ScienceOn