Browse > Article

An Effective Method for Dimensionality Reduction in High-Dimensional Space  

Jeong Seung-Do (Dept. of Electronics and Computer Engineering, Hanyag University)
Kim Sang-Wook (College of Information and Communications, Hanyang University)
Choi Byung-Uk (College of Information and Communications, Hanyang University)
Publication Information
Abstract
In multimedia information retrieval, multimedia data are represented as vectors in high dimensional space. To search these vectors effectively, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high dimensional space into the ones in low dimensional space before indexing the data. This paper proposes a method for dimensionality reduction based on a function approximating the Euclidean distance, which makes use of the norm and angle components of a vector. First, we identify the causes of the errors in angle estimation for approximating the Euclidean distance, and discuss basic directions to reduce those errors. Then, we propose a novel method for dimensionality reduction that composes a set of subvectors from a feature vector and maintains only the norm and the estimated angle for every subvector. The selection of a good reference vector is important for accurate estimation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector by using Levenberg-Marquardt algorithm. Also, we define a novel distance function, and formally prove that the distance function lower-bounds the Euclidean distance. This implies that our approach does not incur any false dismissals in reducing the dimensionality effectively. Finally, we verify the superiority of the proposed method via performance evaluation with extensive experiments.
Keywords
multimedia information retrieval; high dimensional indexing; dimensionality reduction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 O. Egecioglu, 'Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity In Proc. European Conf. On Principles of Data Mining and Knowledge Discovery, PKDD, pp. 79-90, Sep. 2001
2 T. Seidl and H.-P. Kriegel, 'Optimal Multi-Step k-Nearest Neighbor Search,' In Proc. Int'l. Conf. on Management of data, ACM SIGMOD, pp. 154-165, June 1998   DOI
3 R. Weber, H. J. Schek, and S. Blott, 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 194-205, Aug. 1998
4 D. A. White and R. Jain, 'Similarity Indexing with the SS-tree,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp. 516-523, 1996   DOI
5 http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html
6 U. Ogras and H. Ferhatosmanoglu, 'Dimensionality Reduction Using Magnitude and Shape Approximations,' In Proc. Ini'l Conf. on Information and Knowledge Management, ACM CIKM, pp. 99-107, 2003   DOI
7 B.-U. Pagel, H - W. Six, and M. Winter, 'Window Query-Optimal Clustering of Spatial Objects,' In Proc. Int'l. Conf. on Principals of Database Systems, pp. 86-94, 1995   DOI
8 A. Mertins, Signal Analysis, John Wiley & Sons, Inc., 2000
9 T. Seidl and H.-P. Kriegel, 'Efficient User-daptable Similarity Search in Large Multimedia Databases, In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 506-515, Aug. 1997
10 K. Lin, H. Jagadish, and C. Faloutsos, 'The TV-Tree: An Index Structure for High Dimensional Data,' The VLDB Journal, Vol. 3, No.4, pp; 517-542, 1994   DOI
11 T. K Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing, Prentice-Hall, 2000
12 S. Krishnamachari and M. Abdel-Mottaleb, 'Hierarchical Clustering Algorithm for Fast Image Retrieval,' In Proc. IS & T/SPIE Conf. On Storage and Retrieval for Image and Video Databases, pp. 427-435, Jan. 1999   DOI
13 W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, and P. Yanker, 'The QBIC Project: Querying Images by Content Using Color, Texture, and Shape,' In Proc. Int'l. Conf. Storage and Retrieval for Image and Video Databases, pp. 173-187, 1993   DOI
14 K. V. R. Kanth, D. Agrawal, and A. Singh, 'Dimensionality Reduction for Similarity Searching in Dynamic Databases,' In Proc. Int'l. Conf. on Management if Data, ACM SIGMOD, pp. 166-176, Jun. 1998   DOI
15 N. Katayama and S. Satoh, 'The SR-Tree: An Index Structure for High-dimensional Nearest Neighbor Queries,' In Proc. Int'l Conf. on Management if Data, ACM SIGMOD, pp. 369-380, 1997   DOI   ScienceOn
16 P. Ciaccia, M. Patella, and P. Zezula, 'M-tree: An Efficient Access Method for Similarity Search in Metric Spaces,' In Proc Int'l. Conf. on Very Large Data Bases, VLDB, pp. 426-435, 1997
17 H. Eidenberger, 'A New Method for Visual Descriptor Evaluation,' In Proc. SPIE Storage and Retrieval Methods and Applications for Multimedia, pp. 145-157, 2004   DOI
18 C. Faloutsos, R. Barber, M. Flickner, W. Niblack, D. Petkovic, and W. Equitz, 'Efficient and Effective Querying By Image Content,' In Journal of Intelligent Information Systems, Vol. 3 No. 3/4 pp. 231-262, Jul. 1994   DOI
19 S. Jeong, S. Kim, K Kim, and B.-U. Choi, 'An Effective Method for Approximating the Euclidean Distance in High-Dimensional Space,' In Journal of the Institude if Electronics Engineers if Korea, Vol. 42-CI No. 5 pp. 69-78, 2005
20 O. Egecioglu, H. Ferhatosmanoglu, and U. Ogras, 'Dimensionality Reduction and Similarity Computation by Inner Product Approximations,' In IEEE Trans. on Knowledge and Data Engineering, pp. 714-726, 2004   DOI   ScienceOn
21 C. C. Aggarwal, 'On the Effects of Dimensionality Reduction on High Dimensional Similarity Search,' In Proc. Int'l. Symp. on Principles of Database Systems, ACM SIGACT-SIGMOD-SIGART, pp. 256-266, May 2001   DOI
22 K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, 'When Is Nearest Neighbor Meaningful?,' In Proc. Int'l. Conf. on Database Theory, IDCT, pp. 217-235, Jan. 1999
23 C. Bohm, S. Berchtold, and D. Keim, 'Searching in High-Dimensional Spaces-Index Structures for Improving the Performance of Multimedia Databases,' ACM Computing Surveys, Vol. 33, Issue 3, pp. 322-373, Sep. 2001   DOI   ScienceOn
24 N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger, 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp. 322-331, 1990   DOI
25 S. Berchtold, C. Balun, B. Braunrnilller, D. Keirn, and H.-P. Kriegel, 'Fast Parallel Similarity Search in Multimedia Databases,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 1-12, 1997   DOI   ScienceOn
26 R. Agrawal, C. Faloutsos, and A. Swami, 'Efficient Similarity Search in Sequence Database,' In Proc. Int'l. Conf. on Foundations of Data Organization and Algorithms, FODO, pp. 69-84, Oct. 1993