A Batch Processing Algorithm for Moving k-Nearest Neighbor Queries in Dynamic Spatial Networks

Cho, Hyung-Ju;

doi:10.9708/jksci.2021.26.04.063

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

제26권4호
/
Pages.63-74
/
2021
/
1598-849X(pISSN)
/
2383-9945(eISSN)

한국컴퓨터정보학회 (Korean Society of Computer Information)

DOI QR Code

A Batch Processing Algorithm for Moving k-Nearest Neighbor Queries in Dynamic Spatial Networks

Cho, Hyung-Ju (Dept. of Software, Kyungpook National University)

투고 : 2021.02.18
심사 : 2021.03.31
발행 : 2021.04.30

https://doi.org/10.9708/jksci.2021.26.04.063 인용 PDF KSCI HTML

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

위치 기반 서비스(LBS)는 가장 바쁜 시간에 동시에 도착하는 최단 경로 및 k-최근접 이웃 질의를 포함한 다양한 공간 질의를 효과적으로 처리한다. 동시에 도착하는 공간 질의를 빠르게 처리하기 위한 간단한 해결 방법은 LBS 서버를 추가하는 것이다. 이 방법은 서비스 운영 비용을 많이 증가시킨다. 최근에는 공유 가능한 계산을 사용하여 일련의 질의를 한꺼번에 모아서 처리하는 일괄 처리 방법이 제안되었다. 본 연구에서는 교통 상황에 따라 각 도로 구간의 이동 시간이 빈번하게 변하는 동적 공간 네트워크에서 움직이는 k-최근접 이웃 질의를 한꺼번에 처리하는 방법을 연구한다. 순차적 질의 처리를 기반으로 하는 LBS 서버는 중복 계산으로 인해 한꺼번에 요청이 들어오는 움직이는 k-최근접 이웃 질의를 효과적으로 처리하지 못한다. 본 연구의 목표는 움직이는 k-최근접 이웃 질의를 한꺼번에 처리하고 공유 가능한 계산을 재사용하여 알고리즘을 효율성을 개선한다. 실제 지도 데이터를 사용한 실험 평가는 최신 방법보다 제안된 방법이 우수하다는 것을 보여준다.

Location-based services (LBSs) are expected to process a large number of spatial queries, such as shortest path and k-nearest neighbor queries that arrive simultaneously at peak periods. Deploying more LBS servers to process these simultaneous spatial queries is a potential solution. However, this significantly increases service operating costs. Recently, batch processing solutions have been proposed to process a set of queries using shareable computation. In this study, we investigate the problem of batch processing moving k-nearest neighbor (MkNN) queries in dynamic spatial networks, where the travel time of each road segment changes frequently based on the traffic conditions. LBS servers based on one-query-at-a-time processing often fail to process simultaneous MkNN queries because of the significant number of redundant computations. We aim to improve the efficiency algorithmically by processing MkNN queries in batches and reusing sharable computations. Extensive evaluation using real-world roadmaps shows the superiority of our solution compared with state-of-the-art methods.

키워드

Ⅰ. Introduction

Currently, location-based services (LBSs), such as taxi-calling and ridesharing services, utilize real-time spatial data to find kpoints of interest (POI) closest to a query point based on the length of the shortest path from the query point to the POI. For example, a taxi client wishes to be served by available taxicabs that can reach them quickly. LBS servers based on one-query-at-a-time processing often fail to process a large number of simultaneous spatial queries reaching the servers at the peak time. Hence, batch processing algorithms have been introduced to address this critical problem in LBSs [1,2].

Here, we investigate the batch processing of moving k-nearest neighbor (MkNN) queries in dynamic spatial networks, where the travel time for each road segment changes frequently based on the traffic conditions such as the traffic volume and accidents. MkNN queries in a dynamic spatial network have many potential applications for LBSs, such as ride-hailing and car parks. For example, 14 million Uber trips for ridesharing were completed each day in 2019, demonstrating the significance of scalable and efficient solutions to promptly match Uber cabs with passengers. Another example is real-time parking management, which helps drivers find parking spaces nearest to them. It is often difficult for drivers to find available parking spaces when they reach their destinations.

Figure 1 shows two snapshots at timestamps t_i and t_jof a dynamic spatial network, where a set Q of moving query points and a set P of moving datapoints are expressed as Q={q₁,q₂} andP={p₁,p₂,p₃}respectively. Note that for the convenience of presentation, two road segments \(\overline{v_{2} v_{3}}\) and \(\overline{v_{3} v_{4}}\) identified using a double solidline to represent changes in the travel time of these road segments, as shown in Figure 1(b). In Figure 1(a), data point p₁is closest to both q₁and q₂at timestamp t_i. However, in Figure 1(b), data point p₁(p₂) is closest to q₂(q₁) at timestamp t_j. A simple solution for MkNN queries uses a one-query-at-a-time method, which computes k data points that are closest to each query point in Q sequentially. This solution introduces a prohibitive overhead because of redundant network traversal for adjacent query points, despite utilizing efficient kNN search algorithms [3, 4, 5, 6] for retrieving a set of kdata points closest to the query point.

CPTSCQ_2021_v26n4_63_f0001.png 이미지

Fig. 1. Example of MkNN queries in a dynamic spatial network

All nearest neighbor (ANN) queries [7] are similar to MkNN queries. However, ANN queries retrieve only one data point closest to each query point qin Q, indicating k=1 for each q∈Q. Contrarily, MkNN queries retrieve a different number of kdata points closest to each query point q. Furthermore, we consider a highly dynamic situation where both the query and data points move freely in dynamic spatial networks. Herein, we propose an efficient algorithm known as BANK for the batch processing of MkNN queries in dynamic spatial networks. The BANK algorithm first groups adjacent query points into a query group and performs batch computation for the query group to avoid redundant network traversal. To our study, the batch computation approach has not been applied to MkNN queries in dynamic spatial networks; however, the batch computation of spatial queries has received significant attention.

The primary contributions of this study are listed as follows:

● We propose an efficient algorithm called BANK for the batch processing of MkNN queries in dynamic spatial networks. To our study, the BANK algorithm is the first to consider the batch processing of MkNN queries in dynamic spatial networks.

● We present group computation techniques to avoid the redundant computation of network distances for adjacent query points. Furthermore, we present a theoretical analysis to prove the advantage of the BANK algorithm over one-query-at-a-time methods.

● We conduct extensive experiments using real world roadmaps to demonstrate the efficiency of the proposed solution.

The remainder of this paper is organized as follows. In Section II, we review related studies and introduce the background of the study. In Section III, we explain the method for clustering adjacent query points into a query group and present the BANK algorithm for the batch processing of MkNN queries in dynamic spatial networks. In Section IV, we compare the BANK algorithm and its conventional solutions with different setups. Conclusions are presented in Section V.

Ⅱ. Preliminaries

1. Related works

Nearest neighbor (NN) queries have been investigated extensively in spatial networks. NN query processing for spatial networks involves a high cost for computing the length of the shortest path between two points, in which graph traversal may be required. Studies regarding NN queries in spatial networks have presented various techniques to reduce the shortest-path-distance computation. Papadias et al. [4] introduced the incremental Euclidean restriction (IER) and incremental network expansion (INE). IER is based on the assumption that the length of the shortest path between two points cannot be less than their Euclidean distance. INE involves network expansion from the query point in a manner similar to Dijkstra’s algorithm and examines the data points in the sequence encountered. The distance browsing (DisBrw) algorithm [8] uses the spatially induced linkage cognizance index, which stores the shortest path distance between every pair of vertices. The route overlay and association directory (ROAD) [3] algorithm hierarchically partitions the spatial network and precomputes the shortest path distance between border vertices within each partition, where border vertices of a partition are the vertices connecting to other partitions. The G-tree [6] partitions the spatial network; however, it differs from the ROAD in terms of the tree structure and searching paradigm. The V-tree [5] employs a hierarchical structure similar to that of the G-tree; it identifies border nodes at the boundaries of subgraphs. Efficient techniques are used to answer kNN queries by maintaining the lists of data points closest to the border nodes. Abeywickrama et al. [9] performed a thorough experimental evaluation of several kNN search algorithms for spatial networks, including G-tree [6], IER [4], INE [4], DisBrw [8], and ROAD [3]. Cao et al. [10] proposed a scalable in-memory processing method to answer snapshot kNN queries over moving objects in a spatial network. Unfortunately, existing solutions in [9, 11, 12, 13] focused on improving the efficiency of a kNN query, and are referred to as one-query-at-a-time solutions for kNN queries.

ANN queries were investigated in [7]. Unlike MkNN queries, ANN queries stipulate that every query point qin Qretrieves only one data point closest to q, which means k=1. Most studies regarding ANN queries have been conducted in Euclidean spaces. Several previous studies have solved the continuous kNN query problem in spatial networks [14, 15, 16]. Some models [14] have assumed moving query points and stationary data points. However, the models used in [15,16] assumed the opposite. These studies are orthogonal to ours and focus on the efficient maintenance of kNN results. The current study considers multiple snapshot kNN queries such as Uber taxi services, where query and data points correspond to passengers and taxicabs, respectively, and both freely move along a dynamic spatial network.

2. Background

Definition 1. (kNN query)For a positive integer k, query point q, and set of data points P, the kNN query retrieves a set P_k(q)of kdata points in P that are closest to q, dist(q,p⁺)≤dist(q,p^-) for p+∈P_k(q) and p^-∈P-P_k(q).

Definition 2. (MkNN query) For a set of query points Q, the MkNN query retrieves set P_k(q)of k data points closest to each query point qin Q. When query point q_i(q_j) retrieves k_i(k_j) data points closest to q_i(q_j), the k_ivalue may differ from the k_j value for i≠j and 1 ≤ i,j ≤ |Q|. For simplicity, we assume that each query point, q, requires the same number of kdata points closest to q. However, it is not difficult to consider a different number of kdata points closest to the query point, q, which is discussed in SectionIII.2.

Definition 3. (Spatial network)A dynamic spatial network can be described as a dynamic weighted graph G=, where V, E, and Windicate the vertex set, edge set, and edge distance matrix, respectively. Each edge \(\overline{v_{i} v_{j}}\) has a non-negative weight representing the network distance, such as the travel time, and frequent changes in its weight.

Definition 4. (Intersection, intermediate, and terminal vertices)We categorize vertices into three categories based on their degree. (1) If the degree of a vertex is greater than or equal to three, then the vertex is an intersection vertex. (2) If the degree is two, then it is an intermediate vertex. (3) If the degree is one, then it is a terminal vertex.

The symbols and notations used in this study are listed in Table 1. To simplify the presentation, we denote \(\begin{array}{lll} \hline q_{i} q_{i+1} & \cdots & q_{j} \end{array}\) by \(\overline{q_{i} q_{j}}\), where query points q_i,q_i+1,...,q_j are located in the same vertex sequence.

Table 1. Definitions of symbols

CPTSCQ_2021_v26n4_63_t0001.png 이미지

Ⅲ. The Proposed Scheme

1. Grouping adjacent query points

In this section, we consider an MkNN query in a spatial network, as shown in Figure 2. For Q={q₁,q₂,q₃,q₄}, and P={p₁,p₂,p₃,p₄,p₅}, we consider a kNN query that retrieves data points closest to each query point qin Q. For simplicity, we assume that q₁, q₂, q₃, and q₄request one, two, one, and two data points closest to them, respectively, which means thatk₁=k₃=1 and k₂=k₄=2.

CPTSCQ_2021_v26n4_63_f0002.png 이미지

Fig. 2. Population of query and data points at timestamps t₁and t₂

Figure 2 shows the population of the query and data points at timestamps t₁andt₂.Here, we assume that both the query and data points move arbitrarily along the spatial network. In this section, we focus on evaluating MkNN queries at timestamp, t₁, in Figure 2(a).

Figure 3 illustrates a sample grouping of adjacent query points. Two query points, q₁and q₂, in a vertex sequence, \(\overline{v_{1} v_{4} v_{5} v_{3}}\), are transformed into a query segment, \(\overline{q_{1} q_{2}}\), and the other two query points, q₃ and q₄, in a vertex sequence, \(\overline{v_{1} v_{3}}\), are grouped into another query segment, \(\overline{q_{3} q_{4}}\). Therefore, a set of query points, Q={q₁,q₂,q₃,q₄} can be transformed into a set of query groups, \(\bar{Q}=\left\{\overline{q_{1} q_{2}}, \overline{q_{3} q_{4}}\right\}\).

CPTSCQ_2021_v26n4_63_f0003.png 이미지

Fig. 3. Grouping of adjacent query points into query segments

2. BANK algorithm

Algorithm 1 describes the BANK algorithm for the MkNN search over spatial networks. The result set \(\Pi(Q)\) is initialized to an empty set (line 1). In the first step (lines 2―3), adjacent query points q_i,q_i+1,...,q_j in the same vertex sequence aregrouped into a query segment \(\overline{q_{i} q_{j}}\). Therefore, a set \(\bar{Q}\) of query points is converted into a set of query groups. The batch kNN (BkNN) search for a query segment \(\overline{q_{i} q_{j}}\) is performed to obtain data points closest to each query point in \(\overline{q_{i} q_{j}}\) (line 6). Result \(\Pi\left(\overline{q_{i} q_{j}}\right)\) for \(\overline{q_{i} q_{j}}\) is added to the query result \(\Pi(Q)\), where \(\Pi\left(\overline{q_{i} q_{j}}\right)=\left\{\left\langle q, P_{k}(q)\right\rangle \mid q \in \overline{q_{i} q_{j}}\right\}\) (line 7). Subsequently, the query result is returned after performing the BkNN search for all query groups in \(\bar{Q}\)(line 8).

Algorithm 1: BANK(Q,P)

Algorithm 2 describes the BkNN search algorithm. The BkNN search groups query points and batch execution to avoid redundant network traversal. This algorithm comprises two steps. First, two kNN queries are issued for the query segment \(\overline{q_{i} q_{j}}\). We carefully determined the location of kNN queries using the number of query segments adjacent to an intersection vertex v_l(v_m) in \(\overline{v_{l} v_{m}}\) to share the results of kNN queries among the query segments adjacent to an intersection vertex. We assume that \(\overline{q_{i} q_{j}}\) belongs to \(\overline{v_{l} v_{m}}\) and that q_i(q_j) is closer to v_l(v_m) than q_i(q_j). The location of one kNN query is either q_ior v_l, and the location of another kNN query is either q_i or v_l. If more than two query segments are adjacent to v_l, i.e., adj_seg(v_l)≥ 2, v_l issues a kNNquery with \(K_{v_{l}}=\max \left\{k_{a}, k_{a+1}, \cdots, k_{b}\right\}\) assuming that q_a,q_a+1,...,q_b belong to query segments adjacent to v_l and q_a,q_a+1,...,q_b have respectively; otherwise,issues a kNN query with \(K_{q_{i}}=\max \left\{k_{i}, k_{i+1}, \cdots, k_{j}\right\}\) assuming that q_i,q_i+1,...,q_j constitute \(\overline{q_{i} q_{j}}\) and q_a,q_a+1,...,q_b have k_a,k_a+1,...,k_b, respectively. Similarly, if more than two query segments are adjacent to , i.e., adj_segs(v_m)≥ 2, v_m issues another KNN query with \(K_{v_{m}}=\max \left\{k_{c}, k_{c+1}, \cdots, k_{d}\right\}\) assuming that q_c,q_c+1,...,q_d belong to query segments adjacent to v_m and q_c,q_c+1,...,q_d have respectively; otherwise,issues another kNN query with \(K_{q_{j}}=\max \left\{k_{i}, k_{i+1}, \cdots, k_{j}\right\}\). Therefore, we have the following four cases depending on the locations of the two kNN queries evaluated for a query segment \(\overline{q_{i} q_{j}}\): l,v_m>, l,q_j>, i,v_m>, and i,q_j>. Specifically, the first case l,v_m> is described in lines 4―9 of the algorithm. The second case l,q_j> is described in lines 10―15. The third case i,v_m> is described in lines 16―21, and the fourth case i,q_j> is described in lines 22―27. After these two kNN queries are evaluated for a query segment \(\overline{q_{i} q_{j}}\), their results are saved to be included in a set P_canof the candidate data points for in \(\overline{q_{i} q_{j}}\). Next, the kNN set P_k(q) for each query point q in \(\overline{q_{i} q_{j}}\) is retrieved from candidate data points in P_can. Subsequently, a pair of query pointqand its kNN set P_k(q) is added to the result, as follows: \(\Pi\left(\overline{q_{i} q_{j}}\right){\leftarrow} \Pi\left(\overline{q_{i} q_{j}}\right) \cup\left\{\left\langle q, P_{k}(q)\right\rangle\right\}\).

Algorithm 2: BkNN_search\(\left(\overline{q_{i} q_{j}}, P\right)\)

Algorithm 3 describes the kNN search for finding kdata points closest to query point q in \(\overline{q_{i} q_{j}}\) among the candidate data points in \(P_{K_{\mathrm{a}}}(\alpha) \cup P_{K_{\beta}}(\beta){\cup} P(\overline{\alpha \beta})\). First,the set P_k(q) of kNNs of query point qis initialized to an empty set. The distance from qto the candidate data point p is computed based on condition p(lines 3―10). After computing dist(q,p), we can determine whether p is added to the candidate kNN set P_k(q). If |P_k(q)|k(q) (lines 12―13). If |P_k(q)|=k and dist(q,p)kth), then it is added to P_k(q) and p_kth is removed from P_k(q), where p_kth denotes the kth data point closest to q, i.e., \(P_{k}(q) \leftarrow P_{k}(q) \cup\{p\}-\left\{p_{k t h}\right\}\) (lines 14―15). The kNN set P_k(q) for q is returned after all the candidate data points in \(P_{K_{a}}(a) \cup P_{K_{\beta}}(\beta){\cup} P(\overline{\alpha \beta})\) have been examined (line 16).

Furthermore, we present a theoretical analysis to prove the advantages of the BANK algorithm over sequential processing. The time complexity of the BANK algorithm is \(O(|\bar{Q}| \cdot(|E|+|V| \log |V|))\),where \(|\bar{Q}|\) is the number of query segments and \(O(|E|+|V| \log |V|)\) is the time complexity for evaluating a single kNN query. Conversely, the time complexity of the simple solution based on sequential processing is \(O(|\bar{Q}| \cdot(|E|+|V| \log |V|))\). This theoretical analysis shows that the BANK algorithm is superior to the simple solution when the query points exhibit a skewed distribution. This is because \(|\bar{Q}|{«}|Q|\) is for highly skewed query points.

Algorithm 3: kNN_search\(\left(k, q, P_{K_{\alpha}}(\alpha) \cup P_{K_{\beta}}(\beta) \cup P(\overline{\alpha \beta})\right)\)

Ⅳ. Performance study

In this section, we present the results of an empirical analysis of the BANK algorithm. We describe the experimental settings in Section IV.1 and present the experimental results in Section IV.2.

1. Experimental settings

For the performance study, we used the three real-world road networks [17,18] presented in Table 2. These real-world road networks have different sizes and are part of the road network of the US. Table 3 shows the range of each variable used in the experiments with defaults written in bold. For convenience, each dimension of the data universe was normalized independently to a unit length [0,1]. The query and data points exhibited either a centroid or uniform distribution. A centroid-based dataset was generated to resemble real-world data. First, five centroids were randomly selected for the query and data points. The points around each centroid exhibited a normal distribution, where the mean was set to the centroid, and the standard deviation was set to 1% of the side length of the data universe. Unless otherwise stated, the query points exhibited a centroid distribution, whereas the data points exhibited a uniform distribution.

Table 2. Real-world roadmaps

CPTSCQ_2021_v26n4_63_t0002.png 이미지

Table 3. Experimental parameter settings

CPTSCQ_2021_v26n4_63_t0003.png 이미지

As a benchmark for the evaluation of the BANK algorithm, we used INE [4] as a one-query-at-a-time solution, which sequentially computes the kNN set for each query point in Q. We implemented and evaluated two versions of the BANK algorithm: BANK_OPTand BANK_GRP. BANK_OPT was implemented using the proposed algorithms. Conversely, BANK_GRPgrouped query points in a road segment into a query segment and generated two kNN queries at the endpoints of the query segment. The BANK algorithm processed the query points in a batch, whereas INE processed them sequentially. In this study, we assumed that the query and data points can move freely within dynamic spatial networks. Therefore, it was not feasible to use precomputation techniques because the movements of the query and data points might frequently invalidate the precomputed distances in dynamic spatial networks. The methods were implemented in C++ in the Microsoft Visual Studio 2019 development environment. Note that C++ and the development environment use common subroutines for similar tasks. We performed the experiments on a desktop computer with a Windows 10 operating system with a 32 GB RAM and a 3.1 GHz processor (i9-9900).

2. Experimental results

Figure 4 shows a comparison of query processing times using BANK_OPT, BANK_GRP, and INE when evaluating MkNN queries in the SF roadmap. Each chart illustrates the effects of varying one of the parameters in Table 3. The two values in parentheses in Figures 4 and 5 show the query processing times of BANK_OPTand INE. Figure 4(a) shows the query processing times using BANK_OPT, BANK_GRP, and INE when the number |Q| of query points varied between 1000 and 10000, i.e., 1000 ≤ |Q| ≤ 10000. BANK_OPToutperformed INE owing to the batch processing of query points as |Q| increased. BANK_OPTevaluated 47%, 25%, 26%, 19%, and 17% of the number of kNN queries evaluated by INE when |Q| = 1000, 3000, 5000, 7000, and 10000, respectively.

CPTSCQ_2021_v26n4_63_f0004.png 이미지

Fig. 4. Comparison of BANK_OPT, BANK_GRP, and INE for SF

Figure 4(b) shows the query processing times using the three algorithms when the number |P| of data points varied between 1000 and 10000, i.e., 1000 ≤ |P| ≤ 10000. The query processing times using BANK_OPTwere up to 4.8 times shorter than those using INE in all cases. As decreased, the difference in the query processing times between BANK_OPTand INE increased. BANK_OPTevaluated 17% of the number of kNN queries evaluated by INE regardless of |P|.

Figure 4(c) shows the query processing times using the three algorithms when the number of k data points requested by the query points varied between 1 and 128, i.e., 1 ≤ k ≤ 128. The query processing times increased with the kvalue. The query processing times using BANK_OPTwere up to 2.7 times shorter than those using INE in all cases. Figure 4(d) shows the query processing times for various distributions of the query and data points, where each ordered pair (i.e., , , , and ) denotes a combination of the distributions of the query and data points. BANK_OPToutperformed INE for and distributions when the query points exhibited a centroid distribution. However, BANK_OPT and INE demonstrated similar performances for and distributions when the query points exhibited a uniform distribution.

Figure 5 shows a comparison of query processing times using BANK_OPT, BANK_GRP, and INE when evaluating MkNN queries in the COL roadmap. Figure 5(a) shows the query processing time as a function of |Q|. The query processing times using BANK_OPTwere up to 4.9 times shorter than those using INE in all cases. INE evaluated kNN queries to answer MkNN queries, whereas BANK_OPT evaluated \(2 \times|\bar{Q}|\) kNN queries at the maximum because of the batch processing. Figure 5(b) shows the query processing time as a function of |P|. BANK_OPT was superior to INE in all cases. This is because BANK_OPTutilizes batch processing of adjacent query points and requests fewer kNN queries than INE. Figure 5(c) shows the query processing time as a function of k. The query processing times using BANK_OPTwere up to 5.1 times less than those using INE in all cases. Figure 5(d) shows the query processing times for various distributions of the query and data points. For a centroid distribution of query points, i.e., 〈C,C〉 and 〈C,U〉, the query processing times using BANK_OPTwere up to 9.9 times shorter than those using INE. However, for a uniform distribution of query points (i.e., 〈U,C〉and 〈U,U〉), BANK_OPT and INE demonstrated similar performances because the query points were widely scattered, and BANK_OPTand INE performed similarly.

CPTSCQ_2021_v26n4_63_f0005.png 이미지

Fig. 5. Comparison of BANK_OPT, BANK_GRP, and INE for COL

Subsequently, we analyzed the scalability of BANK_OPTand BANK_GRPby varying the number |Q| of query points when the data points exhibited uniform and centroid distributions. We did not include the query processing times of INE for the scalability test because INE yielded poor performance as |Q| increased. Figure 6 shows the query processing times using BANK_OPTand BANK_GRP for \(10^{3} \leq|Q| \leq 10^{5}\). As shown in Figures 6(a) and 6(c), BANK_OPToutperformed BANK_GRPfor the COL and FLA roadmaps, respectively, using the 〈C,C〉 distributions. The difference in query processing times between BANK_OPTand BANK_GRP increased with |Q|. Similarly, as shown in Figures 6(b) and 6(d), BANK_OPToutperformed BANK_GRPfor the COL, and FLA roadmaps, respectively, using the 〈C,U〉 distributions. The difference in query processing times between BANK_OPTand BANK_GRP increased with |Q|. The empirical results indicate that BANK_OPT scaled with |Q| better than BANK_GRP.

CPTSCQ_2021_v26n4_63_f0006.png 이미지

Fig. 6. Scalability test for \(10^{3} \leq|Q| \leq 10^{5}\)

Ⅴ. Conclusions

In this study, we proposed a batch processing algorithm, known as BANK, to efficiently process MkNN queries in dynamic spatial networks. The BANK algorithm was the first attempt at batch processing of MkNN queries in dynamic spatial networks and aimed to minimize the number of kNN queries requested for highly skewed query points. Our extensive evaluation using real-world roadmaps confirmed that the BANK algorithm clearly outperformed INE based on one-query-at-a-time processing and scaled well with the number of query points, particularly when the query points exhibited a non-uniform distribution. Notably, the BANK algorithm was up to 9.9 times faster than INE. However, the BANK and INE showed similar performances when the query points exhibited a uniform distribution. For future studies, we plan to extend the batch processing approach used in this study to problems on the processing of sophisticated spatial queries for large dynamic spatial networks.

ACKNOWLEDGEMENT

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF-2020R1I1A3052713).

참고문헌

T. Kim, H.-J. Cho, H. J. Hong, H. Nam, H. Cho, G. Y. Do, and P. Jeon, "Efficient processing of k-farthest neighbor queries for road networks," Journal of The Korea Society of Computer and Information, vol. 24, no. 10, pp. 79-89, 2019.
F. M. Choudhury, J. S. Culpepper, Z. Bao, and T. Sellis, "Batch processing of top-k spatial-textual queries," ACM Transactions on Spatial Algorithms and Systems, vol. 3, no. 4, pp. article ID 13, 2018.
K. C. K. Lee, W.-C. Lee, B. Zheng, and Y. Tian, "ROAD: a new spatial object search framework for road networks," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 3, pp. 547-560, 2012. https://doi.org/10.1109/TKDE.2010.243
D. Papadias, J. Zhang, N. Mamoulis, and Y. Tao, "Query processing in spatial network databases," In Proc. of International Conference on Very Large Data Bases, pp. 802-813, 2003.
B. Shen, Y. Zhao, G. Li, W. Zheng, Y. Qin, B. Yuan, and Y. Rao, "V-tree: efficient knn search on moving objects with road-network constraints," In Proc. of International Conference on Data Engineering, pp. 609-620, 2017.
R. Zhong, G. Li, K.-L. Tan, L. Zhou, and Z. Gong, "G-tree: an efficient and scalable index for spatial search on road networks," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 8, pp. 2175-2189, 2015. https://doi.org/10.1109/TKDE.2015.2399306
Y. Xu, J. Qi, R. Borovica-Gajic, and L. Kulik, "Finding all nearest neighbors with a single graph traversal," In Proc. of International Conference on Database Systems for Advanced Applications, pp. 221-238, 2018.
H. Samet, J. Sankaranarayanan, and H. Alborzi, "Scalable network distance browsing in spatial databases," In Proc. of International Conference on Mobile Data Management, pp. 43-54, 2008.
T. Abeywickrama and M. A. Cheema, "Efficient landmark-based candidate generation for knn queries on road networks," In Proc. of International Conference on Database Systems for Advanced Applications, pp. 425-440, 2017.
B. Cao, C. Hou, S. Li, J. Fan, J. Yin, B. Zheng, and J. Bao, "SIMkNN: a scalable method for in-memory knn search over moving objects in road networks," IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 10, pp. 1957-1970, 2018. https://doi.org/10.1109/tkde.2018.2808971
T. Dong, Y. Lulu, Y. Shang, Y. Ye, and L. Zhang, "Direction-aware continuous moving k-nearest-neighbor query in road networks," ISPRS International Journal of Geo-Information, vol. 8, no. 9, article ID 379, 2019.
S. Luo, B. Kao, G. Li, J. Hu, R. Cheng, and Y. Zheng, "TOAIN: a throughput optimizing adaptive index for answering dynamic knn queries on road networks," PVLDB, vol. 11, no. 5, pp. 594-606, 2018.
Y. Yang, H. Li, J. Wang, Q. Hu, X. Wang, and M. Leng, "A novel index method for k nearest object query over time-dependent road networks," Complexity, vol. 2019, article ID 4829164, 2019.
B. Zheng, K. Zheng, X. Xiao, H. Su, H. Yin, X. Zhou, and G. Li, "Keyword-aware continuous knn query on road networks," In Proc. of International Conference on Data Engineering, pp. 871-882, 2016.
U. Demiryurek, F. B. Kashani, and C. Shahabi, "Efficient continuous nearest neighbor query in spatial networks using Euclidean restriction," In Proc. of International Symposium on Advances in Spatial and Temporal Databases, pp. 25-43, 2009.
K. Mouratidis, M. L. Yiu, D. Papadias, and N. Mamoulis, "Continuous nearest neighbor monitoring in road networks," In Proc. of International Conference on Very Large Data Bases, pp. 43-54, 2006.
9th DIMACS Implementation Challenge: Shortest Paths. Available online: http://www.dis.uniroma1.it/challenge9/download.shtml (accessed on 17 Feb. 2021).
Real Datasets for Spatial Databases. Available online: https://www.cs.utah.edu/-lifeifei/SpatialDataset.htm (accessed on 17 Feb. 2021).

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

A Batch Processing Algorithm for Moving k-Nearest Neighbor Queries in Dynamic Spatial Networks

초록

키워드

Ⅰ. Introduction

Ⅱ. Preliminaries

1. Related works

2. Background

Ⅲ. The Proposed Scheme

1. Grouping adjacent query points

2. BANK algorithm

Ⅳ. Performance study

1. Experimental settings

2. Experimental results

Ⅴ. Conclusions

ACKNOWLEDGEMENT

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)