DOI QR코드

DOI QR Code

k-NN Query Optimization Scheme Based on Machine Learning Using a DNN Model

DNN 모델을 이용한 기계 학습 기반 k-최근접 질의 처리 최적화 기법

  • 위지원 (충북대학교 정보통신공학부 박사과정) ;
  • 최도진 (충북대학교 정보통신공학부) ;
  • 이현병 (충북대학교 정보통신공학부 박사과정) ;
  • 임종태 (충북대학교 정보통신공학부 조교수) ;
  • 임헌진 (충북대학교 정보통신공학부 박사) ;
  • 복경수 (원광대학교 SW융합학과 조교수) ;
  • 유재수 (충북대학교 정보통신공학부 정교수)
  • Received : 2020.07.30
  • Accepted : 2020.08.31
  • Published : 2020.10.28

Abstract

In this paper, we propose an optimization scheme for a k-Nearest Neighbor(k-NN) query, which finds k objects closest to the query in the high dimensional feature vectors. The k-NN query is converted and processed into a range query based on the range that is likely to contain k data. In this paper, we propose an optimization scheme using DNN model to derive an optimal range that can reduce processing cost and accelerate search speed. The entire system of the proposed scheme is composed of online and offline modules. In the online module, a query is actually processed when it is issued from a client. In the offline module, an optimal range is derived for the query by using the DNN model and is delivered to the online module. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

본 논문에서는 고차원의 특징 벡터에서 질의와 가장 가까운 k개의 데이터를 찾는 k-최근접 질의 최적화 방법을 제안한다. k-최근접 질의는 k개의 데이터를 포함할 가능성이 있는 범위를 기반으로 범위 질의로 변환되어 처리하는 기법이다. 본 논문에서는 처리 비용을 감소시키고 검색 속도를 가속화 할 수 있는 최적의 범위를 도출하기 위해 k-최근접 질의 처리 시 DNN 모델을 이용한 최적화 기법을 제안한다. 제안하는 기법은 온라인 모듈과 오프라인 모듈로 구성된다. 온라인 모듈에서는 클라이언트로부터 요청을 받아 실제 질의를 처리한다. 오프라인 모듈에서는 과거 최적화 기법의 결과를 학습 로그로 사용한 DNN 모델로 최적의 범위를 도출하고 온라인 모듈로 전달한다. 제안하는 기법의 우수성 및 타당성의 입증을 위하여 다양한 성능 평가를 수행한다.

Keywords

References

  1. Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Wenjie Zhang, and Xuemin Lin, "Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement," IEEE Transactions on Knowledge and Data Engineering, 2019.
  2. Abu-Aisheh, Zeina, Romain Raveaux, and Jean-Yves Ramel, "Efficient k-nearest neighbors search in graph space," Pattern Recognition Letters, Vol.134, pp.77-86, 2020. https://doi.org/10.1016/j.patrec.2018.05.001
  3. Yiwei Pan; Zhibin Pan, Yikun Wang, and Wei Wang, "A new fast search algorithm for exact k-nearest neighbors based on optimal triangle-inequality-based check strategy," Knowledge-Based Systems, 189, 105088, 2020. https://doi.org/10.1016/j.knosys.2019.105088
  4. Zhiyin Zhang, Xiaocheng Huang, Chaotang Sun, Shaolin Zheng, Bo Hu, Jagannadan Varadarajan, Yifang Yin, Roger Zimmerman, and Guanfeng Wang, "Sextant: Grab's Scalable In-Memory Spatial Data Store for Real-Time K-Nearest Neighbour Search,", 20th IEEE International Conference on Mobile Data Management (MDM). IEEE, 2019.
  5. Gallego, Antonio-Javier, Jorge Calvo-Zaragoza, and Juan Ramon Rico-Juan, "Insights into efficient k-Nearest Neighbor classification with Convolutional Neural Codes," IEEE Access, 2020.
  6. Vajda, Szilard and K. C. Santosh, "A fast k-nearest neighbor classifier using unsupervised clustering," International conference on recent trends in image processing and pattern recognition. Springer, Singapore, 2016.
  7. Peng Dai, Yuan Yang, Manyi Wang, and Ruqiang Yan, "Combination of DNN and improved KNN for indoor location fingerprinting," Wireless Communications and Mobile Computing, 2019.
  8. K. Atefi, H. Hashim, and M. Kassim, "Anomaly Analysis for the Classification Purpose of Intrusion Detection System with K-Nearest Neighbors and Deep Neural Network," 2019 IEEE 7th Conference on Systems, Process and Control (ICSPC), Melaka, Malaysia, 2019.
  9. 최도진, 박송희, 김연동, 위지원, 이현병, 임종태, 복경수, 유재수, "스파크 환경에서 내용 기반 이미지 검색을 위한 효율적인 분산 인-메모리 고차원 색인 기법," 정보과학회논문지, 제47권, 제1호, pp. 95-108, 2020.
  10. H. Wei, Y. Du, F. Liang, C. Zhou, Z. Liu, J. Yi, and D. Wu, "A kd tree-based Algorithm to Parallelize Kriging Interpolation of Big Spatial Data," Journal of GIScience & Remote Sensing, Vol.52, No.1, pp.40-57, 2015. https://doi.org/10.1080/15481603.2014.1002379
  11. H. V. Jagadish, B. C. Ooi, K. L. Tan, C. Yu, and R. Zhang, "iDistance: An Adaptive B+-tree based Indexing Method for Nearest Neighbor Search," Journal of Transactions on Database Systems (TODS), Vol.30, No.2, pp.364-397, 2005. https://doi.org/10.1145/1071610.1071612
  12. J. Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural networks, Vol.61, pp.85-117, 2015. https://doi.org/10.1016/j.neunet.2014.09.003
  13. R. H. R. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas, and H. S. Seung, "Digital Selection and Analogue Amplification Coexist in aCortex-Inspired Silicon Circuit," Nature, Vol.405, pp.947-951,2000. https://doi.org/10.1038/35016072
  14. D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, Vol.60, No.2, pp.91-110, 2004. https://doi.org/10.1023/B:VISI.0000029664.99615.94
  15. Diederik P. Kingma and Jimmy Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412-6980, 2014.
  16. J. Maillo, S. García, J. Luengo, F. Herrera, and I. Triguero, "Fast and Scalable Approaches to Accelerate the Fuzzy k-Nearest Neighbors Classifier for Big Data," in IEEE Transactions on Fuzzy Systems, Vol.28, No.5, pp.874-886, 2020. https://doi.org/10.1109/TFUZZ.2019.2936356
  17. J. Maillo, S. García, J. Luengo, F. Herrera, and I. Triguero, "Fast and Scalable Approaches to Accelerate the Fuzzy k-Nearest Neighbors Classifier for Big Data," in IEEE Transactions on Fuzzy Systems, Vol.28, No.5, pp.874-886, 2020. https://doi.org/10.1109/TFUZZ.2019.2936356
  18. J. M. Lee, "Fast k-nearest neighbor searching in static objects," Wireless Personal Communications, Vol.93, No.1, pp.147-160, 2017. https://doi.org/10.1007/s11277-016-3524-1
  19. Utsav Sheth, Sanghamitra Dutta, Malhar Chaudhari, Haewon Jeong, Yaoqing Yang, Jukka Kohonen,Teemu Roos, Pulkit Grover, "An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation," in IEEE International Conference on Big Data, Seattle, WA, USA, pp.1113-1120, 2018.
  20. K. Li and Jitendra Malik, "Fast k-nearest neighbour search via prioritized DCI," arXiv preprint arXiv:1703.00440, 2017.
  21. H. C. V. Ngu and J. H. Huh, "B+-Tree Construction on Massive Data with Hadoop," Journal of the Cluster computing, Vol.22, No.1, pp.1011-1021, 2019. https://doi.org/10.1007/s10586-017-1183-y
  22. Mishra, Gaurav, and Sraban Kumar Mohanty, "A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree," Expert Systems with Applications 132,28-43, 2019. https://doi.org/10.1016/j.eswa.2019.04.048
  23. H. J. Jang et al. "Nearest base-neighbor search on spatial datasets," Knowledge and Information Systems, Vol.62, No.3, pp.867-897, 2020. https://doi.org/10.1007/s10115-019-01360-3
  24. D. H. Yan et al, "K-nearest Neighbors Search by Random Projection Forests," IEEE Transactions on Big Data, 2019.