Browse > Article

Privacy-Preserving Clustering on Time-Series Data Using Fourier Magnitudes  

Kim, Hea-Suk (강원대학교 컴퓨터과학과)
Moon, Yang-Sae (강원대학교 컴퓨터과학과)
Abstract
In this paper we propose Fourier magnitudes based privacy preserving clustering on time-series data. The previous privacy-preserving method, called DFT coefficient method, has a critical problem in privacy-preservation itself since the original time-series data may be reconstructed from privacy-preserved data. In contrast, the proposed DFT magnitude method has an excellent characteristic that reconstructing the original data is almost impossible since it uses only DFT magnitudes except DFT phases. In this paper, we first explain why the reconstruction is easy in the DFT coefficient method, and why it is difficult in the DFT magnitude method. We then propose a notion of distance-order preservation which can be used both in estimating clustering accuracy and in selecting DFT magnitudes. Degree of distance-order preservation means how many time-series preserve their relative distance orders before and after privacy-preserving. Using this degree of distance-order preservation we present greedy strategies for selecting magnitudes in the DFT magnitude method. That is, those greedy strategies select DFT magnitudes to maximize the degree of distance-order preservation, and eventually we can achieve the relatively high clustering accuracy in the DFT magnitude method. Finally, we empirically show that the degree of distance-order preservation is an excellent measure that well reflects the clustering accuracy. In addition, experimental results show that our greedy strategies of the DFT magnitude method are comparable with the DFT coefficient method in the clustering accuracy. These results indicate that, compared with the DFT coefficient method, our DFT magnitude method provides the excellent degree of privacy-preservation as well as the comparable clustering accuracy.
Keywords
Time-series data; Clustering; Privacy preserving; DFT; Fourier magnitude;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Rizvi and J. R. Haritsa, "Maintaining Data Privacy in Association Rule Mining," In Proc. of the 28th Int'l Conf. on Very Large Data Bases, Hong Kong, China, pp. 682-693, Sept. 2002
2 J. Vaidya and C. Clifton, "Privacy-Preserving k-Means Clustering over Vertically Partitioned Data," In Proc. of the 9th Int'l Conf. on Knowledge Discovery and Data Mining, ACM SIGKDD, Washington D.C., pp. 24-27, Aug. 2003
3 S. R. M. Oliveira and O. R. Zaiane, "Privacy- Preserving Clustering by Object Similarity-Based Representation and Dimensionality Reduction Transformation," In Workshop on Privacy and Security Aspects of Data Mining, Houston, Texas, pp. 21-30, Nov. 2004
4 J. Han and M. Kamber, Data Mining, 2nd Ed., Morgan Kaufmann Publishers, 2006
5 R. Agrawal, C. Faloutsos, and A. N. Swami, "Efficient Similarity Search in Sequence Databases," In Proc. of the 4th Int'l Conf. on Foundations of Data Organization and Algorithms, Chicago, Illinois, pp. 69-84, Oct. 1993
6 F. Crestani, M. Lalmas, C. J. V. Rijsbergen, Information retrieval, Butterworths, 1979
7 J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," In Proc. of the 5th Berkeley Symp. on Math. Stat. Prob., California, pp. 281-297, Mar. 1967
8 R. Agrawal and R. Srikant, "Privacy Preserving Data Mining," In Proc. of the Int'l Conf. on Management of Data, ACM SIGMOD, Dallas, Texas, pp. 439-450, May 2000
9 E. Keogh, "A Decade of Progress in Indexing and Mining Large Time Series Databases," In Proc. of the 32th Int'l Conf. on Very Large Data Bases, A Tutorial, Seoul, Korea, Sept. 2006
10 G. Bebis, "Image Processing and Interpretation," Lecture Notes.(http://www.cse.unr.edu/~bebis/Math Methods/FT/lecture.pdf)
11 R. Ng and J. Han, "Efficient and Effective Clustering Method for Spatial Data Mining," In Proc. of the 20th Int'l Conf. on Very Large Data Bases, Santiago, Chile, pp. 144-155, Sept. 1994
12 Y.-S. Moon, K.-Y. Whang, and W.-S. Han, "General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows," In Proc. of the Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 382-393, June 2002
13 X. Xi, E. Keogh, C. Shelton, L. Wei, and C. A. Ratanamahatana, "Fast Time Series Classification Using Numerosity Reduction," In Proc. of the Int'l Conf. on Machine Learning, Pittsburgh, Pennsylvania, pp. 1033-1040, June 2006
14 Y. Lindell and B. Pinkas, "Privacy Preserving Data Mining," Advances in Cryptology, Vol. 1807, pp. 35-53, Dec. 2000
15 M. Vlachos, Z. Vagena, P. S. Yu, and V. Athitsos, "Rotation Invariant Indexing of Shapes and Line Drawings," In Proc. of the Int'l Conf. on Information and Knowledge Management, Bremen, Germany, pp. 131-138, Oct. 2005
16 C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, "Fast Subsequence Matching in Time-Series Databases." In Proc. of the Int'l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp. 419-429, May 1994
17 T. Rath and R. Manmatha, "Word Image Matching Using Dynamic Time Warping," In Proc. of Computer Vision and Pattern Recognition, Madison, Wisconsin, pp. 521-527, June 2003
18 E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana, The UCR Time Series for Classification/Clustering (http://www.cs.ucr.edu/~eamonn/time_series_data.
19 S. Papadimitriou, F. Li, G. Kollios, and P. S. Yu, "Time Series Compressibility and Privacy," In Proc. of the 33th Int'l Conf. on Very Large Data Bases, Vienna, Austria, pp. 459-470, Sept. 2007
20 T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: An Efficient Data Clustering Method for Very Large Databases," In Proc. of the Int'l Conf. on Management of Data, ACM SIGMOD, Montreal, Canada, pp. 103-114, June 1996
21 S. Guha, R. Rastogi, and K. Shim, "A Efficient Clustering Algorithm for Large Databases," In Proc. of the Int'l Conf. on Management of Data, ACM SIGMOD, Seattle, Washington, pp. 73-84, June 1998
22 A. V. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, "Privacy Preserving Mining of Association Rules," In Proc. of the 8th Int'l Conf. on Knowledge Discovery and Data Mining, ACM SIGKDD, Edmonton, Canada, pp. 217-228, July 2002
23 S. Mukherjee and Z. Chen, "A Privacy-Preserving Technique for Euclidean Distance-based Mining Algorithms Using Fourier-Related Transforms," The VLDB Journal, Vol. 15, No. 4, pp. 293-315, Nov. 2006   DOI   ScienceOn
24 L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley-Interscience, 1990
25 E. Keogh, L. Wei, X. Xi, S.-H. Lee, and M. Vlachos, "LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures," In Proc. of the 32th Int'l Conf. on Very Large Data Bases, Seoul, Korea, pp. 882-893, Sept. 2006
26 Y.-S. Moon, K.-Y. Whang, and W.-K. Loh, "Duality-Based Subsequence Matching in Time- Series Databases," In Proc. of the 17th Int'l Conf. on Data Engineering, Heidelberg, Germany, pp. 263-272, Apr. 2001
27 J. Vaidya and C. Clifton, "Privacy Preserving Association Rule Mining in Vertically Partitioned Data," In Proc. of the 8th Int'l Conf. on Knowledge Discovery and Data Mining, ACM SIGKDD, Edmonton, Canada, pp. 639-644, July 2002