Browse > Article
http://dx.doi.org/10.3745/KTSDE.2022.11.4.157

A Method of Reducing the Processing Cost of Similarity Queries in Databases  

Kim, Sunkyung (메가존클라우드(주) DB Architect)
Park, Ji Su (전주대학교 컴퓨터공학과)
Shon, Jin Gon (한국방송통신대학교 컴퓨터과학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.11, no.4, 2022 , pp. 157-162 More about this Journal
Abstract
Today, most data is stored in a database (DB). In the DB environment, the users requests the DB to find the data they wants. Similarity Query has predicate that explained by a similarity. However, in the process of processing the similarity query, it is difficult to use an index that can reduce the range of processed records, so the cost of calculating the similarity for all records in the table is high each time. To solve this problem, this paper defines a lightweight similarity function. The lightweight similarity function has lower data filtering accuracy than the similarity function, but consumes less cost than the similarity function. We present a method for reducing similarity query processing cost by using the lightweight similarity function features. Then, Chebyshev distance is presented as a lightweight similarity function to the Euclidean distance function, and the processing cost of a query using the existing similarity function and a query using the lightweight similarity function is compared. And through experiments, it is confirmed that the similarity query processing cost is reduced when Chebyshev distance is applied as a lightweight similarity function for Euclidean similarity.
Keywords
Similarity; Lightweight Similarity; Similarity Query; Database;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 C. Shang and F. You "Data analytics and machine learning for smart process manufacturing: Recent advances and perspectives in the big data era," Engineering, Vol.5, Iss.6, pp.1010-1016, 2019.   DOI
2 T. Kim et al., "Similarity query support in big data management systems," Information System, Vol.88, pp.101455, 2020.   DOI
3 D. Ryu and J. Baik, "A comparative study on similarity measure techniques for cross-project defect prediction," KIPS Transactions on Software and Data Engineering, Vol.7, No.6, pp.205-220, 2017.   DOI
4 J. Wang, H. T. Shen, J. Song, and J. Ji, "Hashing for similarity search: A survey," arXiv:1408.2927v1 [cs.DS], 2014.
5 M. Chen, S. Mao, and Y. Liu, "Big data: A survey," Mobile Networks and Applications, Vol.19, No.2, pp.171-209, 2014.   DOI
6 P. M. B. Vitanyi, "Information distance in multiples," IEEE Transactions on Information Theory, Vol.57, No.4, pp.2451-2456, 2011.   DOI
7 J. A. O'Keefe, "The universal transverse mercator grid and projection," The Professional Geographer, Vol.4, Iss.5, pp.19-24, 1952.   DOI
8 elgw, Tests the speed of common mathematical operations [Internet], https://github.com/elgw/math_ops_speed, (down loaded 2021, Oct. 19).
9 K. Kawase, "Concise derivation of extensive coordinate conversion formulae in the Gauss-Kruger projection," Bulletin of the Geospatial Information Authority of Japan, Vol.60, pp.1-6, 2013.
10 H. K. Sharma, Mr. S.C. Nelson, "Explain plan and SQL trace the two approaches for RDBMS tuning," Database Systems Journal, Vol.8, No.1, pp.31-39, 2017.
11 D. Reinsel, J. Gantz, and J. Rydning, "Data age 2025: The evolution of data to life-critical," International Data Corporation, Retrieved 2, 2017.
12 S. K. Vangipuram and R. Appusamy, "A survey on similarity measures and machine learning algorithms for classification and prediction," International Conference on Data Science, E-learning and Information Systems, pp.198-204, 2021.
13 J. S. Park, "A similarity join algorithm using a median as a filter", KIPS Transactions on Software and Data Engineering, Vol.4, No.2, pp.71-76, 2014.   DOI
14 S. H. Cha, "Compresive survey on distance/similarity measures between probability density functions," International Journal of Mathematical Models and Methods in Applied Sciences, Vol.1, No.4, pp.300-307, 2007.
15 Ministry of the Interior and Safety, Location Information Summary DB [Internet], https://www.juso.go.kr/addrlink/addressBuildDevNew.do?menu=geodata, (downloaded 2021, Nov. 14).