Browse > Article
http://dx.doi.org/10.7465/jkdi.2015.26.1.89

A study on the ordering of similarity measures with negative matches  

Park, Hee Chang (Department of Statistics, Changwon National University)
Publication Information
Journal of the Korean Data and Information Science Society / v.26, no.1, 2015 , pp. 89-99 More about this Journal
Abstract
The World Economic Forum and the Korean Ministry of Knowledge Economy have selected big data as one of the top 10 in core information technology. The key of big data is to analyze effectively the properties that do have data. Clustering analysis method of big data techniques is a method of assigning a set of objects into the clusters so that the objects in the same cluster are more similar to each other clusters. Similarity measures being used in the cluster analysis may be classified into various types depending on the nature of the data. In this paper, we studied upper and lower bounds for binary similarity measures with negative matches such as Russel and Rao measure, simple matching measure by Sokal and Michener, Rogers and Tanimoto measure, Sokal and Sneath measure, Hamann measure, and Baroni-Urbani and Buser mesures I, II. And the comparative studies with these measures were shown by real data and simulated experiment.
Keywords
Big data; cluster analysis; co-occurrence frequency; negative matches; similarity measures;
Citations & Related Records
Times Cited By KSCI : 13  (Citation Analysis)
연도 인용수 순위
1 Cheong, D. and Oh, K. J. (2014). Using cluster analysis and genetic algorithm to develop portfolio investment strategy based on investor information. Journal of the Korean Data & Information Science Society, 25, 107-117.   과학기술학회마을   DOI   ScienceOn
2 Choi, S. S., Cha, S. H. and Tappert, C. (2010). A survey of binary similarity and distance measures. Journal on Systemics, Cybernetics and Informatics, 8, 43-48.
3 Jang, H., Kim, K. K. and Kang, C. (2014). Comparison of clustering methods for categorical data. Journal of the Korean Data Analysis Society, 16, 2439-2445.
4 Jeong, K. M. (2005). A note on Bayesian information criterion in model-based clustering. Journal of the Korean Data Analysis Society, 7, 1517-1529.   과학기술학회마을
5 Kim, D. (2009). On the Silhouette plot in cluster analysis. Journal of the Korean Data Analysis Society, 11, 2955-2964.
6 Kim, M., Jeon, J., Woo, K. and Kim, M. (2010). A new similarity measure for categorical attribute-based clustering. Journal of Korean Institute of Information Scientists and Engineers : Databases, 37, 71-81.   과학기술학회마을
7 Lee, K. A. and Kim, J. H. (2011). Comparison of clustering with yeast microarray gene expression data. Journal of the Korean Data & Information Science Society. 22, 741-753.   과학기술학회마을
8 Lim, J. S. and Lim, D. H. (2012). Comparison of clustering methods of microarray gene expression data. Journal of the Korean Data & Information Science Society, 23, 39-51.   과학기술학회마을   DOI   ScienceOn
9 Meyer A. (2002) Comparison of similarity coefficients used in cluster analysis with dominant markers data, MSc Thesis, Universidade de Sao Paulo, Piracicaba.
10 Oh, S. M., Song, J. M. and Kim, C. S. (2012). Clustering analysis using the influence of attributes in categorical data analysis. Journal of the Korean Institute of Information Scientists and Engineers, 18, 790-793.   과학기술학회마을
11 Park, H. C. (2009). An introduction to statistical database, Changwon National University Press, Changwon.
12 Park, H. C. (2011). Association rule thresholds of similarity measures considering negative co-occurrence frequencies. Journal of the Korean Data & Information Science Society, 22, 1113-1122.   과학기술학회마을
13 Park, H. C. (2012). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135.   과학기술학회마을   DOI   ScienceOn
14 Warrens, M. J. (2008). Bounds of resemblance measures for binary (presence/absence) variables. Journal of Classification, 25, 195-208.   DOI
15 Park, H. C. (2013). Proposition of causal association rule thresholds. Journal of the Korean Data & Information Science Society, 24, 1189-1197.   과학기술학회마을   DOI   ScienceOn
16 Park, H. J. and Kim, J. T. (2013). Classification of universities in Daegu.Gyungpook by support vector cluster analysis. Journal of the Korean Data & Information Science Society. 24, 783-791.   과학기술학회마을   DOI   ScienceOn
17 Ryu, J. Y. and Park, H. C. (2013). A study on Jaccard dissimilarity measures for negative association rule generation. Journal of the Korean Data Analysis Society, 15, 3111-3121.
18 Woo, S. Y., Lee, J. W. and Jhun, M. (2014). Microarray data analysis using relative hierarchical clustering. Journal of the Korean Data & Information Science Society, 25, 999-1009.   과학기술학회마을   DOI   ScienceOn
19 Yeo, I. K. (2011). Clustering analysis of Korea's meteorological data. Journal of the Korean Data & Information Science Society. 22, 941-949.