DOI QR코드

DOI QR Code

Efficient Processing of Spatial Preference Queries in Spatial Network Databases

  • 투고 : 2018.12.25
  • 심사 : 2019.01.18
  • 발행 : 2019.02.28

초록

Given a positive integer k as input, a spatial preference query finds the k best data objects based on the scores (e.g., qualities) of feature objects in their spatial neighborhoods. Several solutions have been proposed for spatial preference queries in Euclidean space. A few algorithms study spatial preference queries in undirected spatial networks where each edge is undirected and the distance between two points is the length of the shortest path connecting them. However, spatial preference queries have not been thoroughly investigated in directed spatial networks where each edge has a particular orientation that makes the distance between two points noncommutative. Therefore, in this study, we present a new method called ALPS+ for processing spatial preference queries in directed spatial networks. We conduct extensive experiments with different setups to demonstrate the superiority of ALPS+ over conventional solutions.

키워드

1. INTRODUCTION

Recently, location-based services (LBSs) have become popular due to the rapid growth of mobile devices, availability of maps, and easy network access [1, 2]. Thus, many studies have been performed to process spatial queries, such as range queries [3], k nearest neighbor (kNN) queries [4, 5, 6], reverse k nearest neighbor queries [7], and road network distance queries [8, 9].

These spatial queries can be answered based on their distance from the query point. In this study, we investigate the spatial preference queries in directed spatial networks where each edge has a particular orientation that makes the network distance noncommutative, i.e., for two points \(p_1\) and \(p_2\) in a directed graph, \(dist(p_1,\ p_2) = dist(p_2,\ p_1) \) is not guaranteed. Note that\(dist(p_1, \ p_2)\)indicates the length of the shortest path from \(p_1\) to \(p_2\) , whereas \(dist(p_2,\ p_1)\) indicates the length of the shortest path from \(p_2\) to \(p_1\) . A spatial preference query returns a ranked list of the k best data objects based on the scores of feature objects, such as facilities or services in the neighborhood of data objects. Spatial preference queries have a wide range of applications including spatial recommender systems and spatial decision support systems. For example, consider a real estate agent who holds a list of available apartments for lease. A customer may want to rank the available apartments with respect to the quality of their locations, quantified by aggregating nonspatial characteristics of other facilities (e.g., parks, schools, hospitals, and markets) in the spatial neighborhood of the apartment.

 

MTMDCW_2019_v22n2_210_f0002.png 이미지

Fig. 1. Motivating example of spatial preference query in a directed spatial network.

 

Fig. 1 presents a motivating example of a spatial preference query in a directed spatial network where data objects \(d_1\), \(d_2\) , and \(d_3\) are represented by triangles and indicate the available apartments for lease. Feature objects \(a_1\) and \(a_2\) are represented by hollow rectangles, and another type of feature objects \(b_1\), \(b_2\) , and \(b_3\) are represented by solid rectangles, which indicate parks and schools, respectively. The number on an edge indicates the distance between two neighboring objects, e.g., \(dist(d_1, \ n_1)=3\) and \(dist(n_1,\ b_1)=1\). The number within the parenthesis indicates the score of the feature object beside the number. Consider a scenario where a customer finds a list of available apartments for lease that have good parks or schools in their spatial neighborhoods. For simplicity, assume that the customer has provided a spatial constraint \(r=5\) to limit the distance from each available apartment to the eligible parks and schools. If apartments \(d_1\) , \(d_2\) , and \(d_3\) are sorted based on the scores of parks only, the top-1 apartment becomes \(d_2\) because the scores of \(d_1\) , \(d_2\) , and \(d_3\) are 0, 0.9, and 0, respectively. Similarly, if the apartments are sorted based on the scores of schools only, the top-1 apartment becomes d1 because the scores of \(d_1\) , \(d_2\) , and \(d_3\) are 0.8, 0.7, and 0.6, respectively. Finally, if the apartments are sorted based on the sum of the scores of parks and schools, the top-1 apartment becomes \(d_2\) because the scores of \(d_1\) , \(d_3\), and \(d_3\) are 0.8, 1.6, and 0.6, respectively.

Several algorithms have been proposed to process spatial preference queries based on Euclidean distance [10, 11, 12]. However, algorithms based on Euclidean distance are not appropriate to spatial network environments. A few algorithms have been developed to evaluate spatial preference queries in undirected spatial networks where all edges are undirected. However, spatial preference queries in directed spatial networks were not yet thoroughly investigated. Our previous work referred to as ALPS [13] is an attempt to evaluate spatial preference queries in undirected spatial networks. Therefore, we propose a new method called ALPS+ to evaluate spatial preference queries efficiently in directed spatial networks. In the proposed method, data objects in a directed segment are collected and then converted into a data segment. All pairs of data segments and feature objects are mapped to a distance-score space, and a subset of the pairs that is adequate to evaluate spatial preference queries is identified. To this end, we devise a mathematical formula that computes the minimum and maximum distances from the data segment to the feature object in directed spatial networks. Finally, we evaluate spatial preference queries efficiently using the materialization of this subset of the pairs, which makes it possible to avoid investigations of redundant feature objects during query evaluation.

This study is an extended version of our previous work on spatial preference query processing in undirected spatial networks. We extend the techniques in [12] to process spatial preference queries in directed spatial networks and present extensive experimental results for efficiency evaluation. The contributions of this study can be summarized as follows.

 

  •  We propose a new method called ALPS+ to process spatial preference queries efficiently in directed spatial networks.
  • We present materialization strategies to improve the efficiency of the spatial preference search algorithm that exploits grouping of data objects and their skyline sets.
  • We conduct extensive experiments with different setups to demonstrate the superiority of ALPS+ over conventional solutions.

 

The remainder of this paper is organized as follows. In Section 2, we review related studies. In Section 3, we formulate the problem and define the primary terms. In Section 4, we describe the gathering of data objects in a segment and compute the distance from the segment to a point. In Section 5, we elaborate on our solutions for processing spatial preference queries in directed spatial networks. In Section 6, we empirically compare ALPS+ and conventional solutions for different setups. Finally, we conclude this paper in Section 7.

 

2. RELATED WORK

Several algorithms were developed to process spatial preference queries using Euclidean distance. Yiu et al. [11, 12] first introduced spatial preference queries based on three distinct spatial scores, i.e., range, nearest neighbor, and influence scores, and proposed different algorithms to evaluate spatial preference queries for these scores. Rocha-Junior et al. [10] developed a materialization technique to speed up the evaluation of spatial preference queries using Euclidean distance. They presented a mapping of pairs of the data object and feature object to a distance-score space. The minimal subset of the pairs that is adequate to answer spatial preference queries is materialized. However, the techniques based on Euclidean distance are not applicable to our problem concerning network distance-based queries.

A few algorithms were developed to answer spatial preference queries in undirected spatial networks. Our previous work called ALPS [13] is an attempt to evaluate spatial preference queries in undirected spatial networks. Similar to [10], ALPS exploits a materialization technique based on the distance-score space. ALPS+ extends the functionality of ALPS. Specifically, ALPS+ can evaluate spatial preference queries in directed spatial networks as well as undirected spatial networks, whereas ALPS can evaluate spatial preference queries only in undirected spatial networks. This study also presents the trade-off between query processing time and index construction time when a materialization technique is applied to process spatial preference queries. Finally, in recent years, different types of spatial queries have been studied extensively. These include range queries [3], kNN queries [4, 5, 6], spatial keyword queries [14, 15, 16], and spatial network distance queries [8, 9]. These studies have different problem settings from ours and their solutions are not appropriate.

 

3. PRELIMINARIES

3.1 Problem formulation

Given a positive integer k, a set of data objects \(D=\left\{d_{1}, d_{2}, \cdots, d_{|D|}\right\}\), and a set of m feature datasets \(F_i=\left\{f_{1}, f_{2}, \cdots, f_{|F_i|}\right\}\) for \(1\leq\ i \leq m\), the spatial preference query retrieves a ranked list of the best k data objects with the highest scores. The score of a data object d is determined using the scores of feature objects in the spatial neighborhood of the data object. Each feature object f has a score, denoted by \(\sigma(f)\), that indicates its quality, such as user evaluation score of the feature object. The scores of feature objects are normalized in the range [0, 1] and can be combined using an aggregation function to derive an overall quality rating.

The score \(\gamma(d)\) of a data object d is determined by aggregating the component scores  \(\gamma_{i}(d)=\text{max}\left\{\sigma(f)|f \in F_i, \ dist(d,f)\leq \text{r}\right\} (1\leq i \leq m)\) with respect to a range condition and the \(i\)-th feature dataset \(F_i\) and can be formally defined as \(\gamma(d)=agg\left\{\gamma_i(d)|1\leq\ i \leq m\right\}\) where \(agg=\left\{\text{sum, max, min}\right\}\). The aggregation function \(agg\) can be any monotone function. This study mainly considers the range constraint. This is because that this study can be easily extended to the nearest neighbor constraint and the influence constraint. Recall that the component score \(\gamma_i(d)\) is the highest score of feature objects \(f \in F_i\) that satisfy the range constraint of a data object \(d\).

 

3.2 Definition of terms and notations

Directed spatial network A directed spatial network can be modeled using a weighted directed graph G=<N, E, W>, where N, E, and W indicate the node set, edge set, and edge distance matrix, respectively. Each edge has a positive weight and direction.

Classification of nodes Nodes can be divided into three categories based on the degree of the node. (1) If the degree of a node is equal to or larger than 3, the node is referred to as an intersection node. (2) If it is 2, the node is an intermediate node. (3) If it is 1, the node is a terminal node.

Edge sequence and segment An edge sequence \(\overline {n_sn_{s+1}\cdots n_e}\) denotes a path between two nodes, \(n_s\) and \(n_e\) , such that \(n_s\) (or \(n_e\)) is either an intersection node or a terminal node, and the other nodes in the path, \(n_{s+1}, \cdots , n_{e-1}\), are intermediate nodes. The two end nodes, \(n_s\) and \(n_e\), are referred to as boundary nodes of the edge sequence. If an edge sequence forms a cycle, the boundary nodes of the edge sequence are identical. The length of an edge sequence is the total weight of the edges in the edge sequence. A part of an edge sequence is called a segment. Note that by definition, an edge sequence is also a segment defined by the boundary nodes of the edge sequence.

To simplify the presentation, Table 1 presents the notations used in this paper. Our scheme works in the same manner for undirected and directed segments and an undirected segment is used for convenience to describe the proposed scheme.

 

Table 1. Summary of notations used in this paper

MTMDCW_2019_v22n2_210_t0001.png 이미지

 

4. GROUPING AND DISTANCE COMPUTATION

4.1 Grouping of data objects in an edge sequence

The data objects in an edge sequence are gathered and are referred to as data segment. The data objects in a data segment are close to each other in the spatial network; therefore, it is more effective to process them together than to process each object separately. However, feature objects in an edge sequence are not grouped because of frequent wide range of variations in their scores.

Fig. 2 shows a sample grouping of data objects in an edge sequence, which will be discussed throughout this section. As shown in Fig. 2(a), four data objects,  through  , and four feature objects,  through  , are in the directed spatial network. To simplify the presentation, we consider a single feature dataset   {〈 〉〈 〉〈 〉〈 〉}. Each feature dataset is processed independently; thus, the extension to multiple feature datasets is straightforward. The spatial network includes three edge sequences,  ,  , and   , and two intersection nodes,  and  . Fig. 2(b) illustrates the result of grouping data objects in an edge sequence. Specifically, data objects  and  are grouped and transformed into the data segment  , which is represented in the bold line. Similarly, data objects  and  are grouped and transformed into the data segment   . Therefore,  {    } is transformed into {   }, where  denotes the set of data segments generated from the data objects in D.

 

Fig. 2. Grouping of data objects in an edge sequence. (a) Data objects d1 through d4 and (b) Data segments    and   .

 

 

 

4.2 Computation of minimum and maximum distances from data segment to feature object

 

Let ⊗ denote a composite object that is generated from a pair of a data segment dseg and a feature object f, where ∈ and ∈. Here, ⊗ is represented by ⊗ ([mindist(dseg, f), maxdist(dseg, f)], ), where mindist(dseg, f) and maxdist(dseg, f) indicate the minimum and maximum distances from dseg to f, respectively. We plot each ⊗ pair to the distance-score space as shown in Fig. 3, where the x value corresponds to the distance from a data segment dseg to a feature f and the y value corresponds to the score of a feature object f.

In a preprocessing step, a subset of ⊗ pairs is selected and indexed using an R-tree [17, 18],

 

Fig. 3. Mapping of ⊗ to the distance-score space. (a)    and (b) ⊗ ([mindist(dseg, f), maxdist (dseg, f)], ).

 

one of the most popular multi-dimensional access methods. To this end, the minimum and maximum distances from dseg to f must be computed. We now discuss the computation of the minimum and maximum distances from dseg to f in Fig. 2, where ∈{   } and ∈{    }. Table 2 summarizes the computation of the minimum and maximum distances from dseg to f.

 

Table 2. Computation of minimum and maximum distances in Fig. 2

 

Fig. 4(a), 4(b), 4(c), and 4(d) illustrate the computations of minimum and maximum distances from  to each of  ,  ,  , and  , respectively. In the figures, the dashed lines denote that paths from p to f are not the shortest paths for the corresponding intervals. For data segment    and feature object  , we have     ,     , and  ∉ as shown in Table 2. Therefore, as shown in Fig. 4(a), the minimum and maximum distances from  to  are mindist       and maxdist     , respectively. For data segment    and feature object  , we have     ,     , and  ∉   as shown in Table 2. Therefore, as shown in Fig. 4(b), the minimum and maximum distances from  to  are mindist      and maxdist    , respectively. For data segment  and feature object  , we have     ,     , and  ∉ as shown in Table 2. Therefore, as shown in Fig. 4(c), the minimum and maximum distances from  to  are mindist     and maxdist     , respectively. Finally, for data segment  and feature object  , we have     ,     , and  ∉ as shown in Table 2. Therefore, as shown in Fig. 4(d), the minimum and maximum distances from  to  are mindist     and maxdist    , respectively.

 

Fig. 4. Evaluation of mindist  and maxdist  where ∈{    }. (a) mindist     and maxdist       , (b) mindist     and maxdist      , (c) mindist       and maxdist       , and (d) mindist     and maxdist      ,

 

Fig. 5(a), 5(b), 5(c), and 5(d) illustrate the computations of minimum and maximum distances from   to each of  ,  ,  , and  , respectively. For data segment   and feature object  , we have     ,     , and  ∉  as shown in Table 2. Therefore, as shown in Fig. 5(a), the minimum and maximum distances from   to  are mindist      and maxdist     , respectively. For data segment   and feature object  , we have     ,     , and  ∉  as shown in Table 2. Therefore, as shown in Fig. 5(b), the minimum and maximum distances from   to  are mindist      and maxdist      , respectively. For data segment   and feature object  , we have     ,     , and  ∉  as shown in Table 2. Therefore, as shown in Fig. 5(c), the minimum and maximum distances from   to  are mindist       and maxdist     , respectively. Finally, for data segment   and feature object  , we have     ,     , and ∈  as shown in Table 2. Therefore, as shown in Fig. 5(d), the minimum and maximum distances from   to  are mindist      and maxdist      , respectively. Table 3 summarizes the minimum and maximum distances along with the scores for the ⊗ pairs in Fig. 2(b).

 

Fig. 5. Evaluation of mindist   and maxdist   where ∈{    }. (a) mindist      and maxdist      , (b) mindist      and maxdist     , (c) mindist      and maxdist      , and (d) mindist      and maxdist     .

 

Table 3. Summary of all sample ⊗ pairs

 

 

 

5. PROCESSING SPATIAL PREFERENCE QUERIES IN DIRECTED SPATIAL NET- WORKS

 

 

5.1 Mapping pairs of data segment and feature object to distance-score space

 

 

 

 

 

 

6. PERFORMANCE STUDY

6.1 Experimental settings

6.2 Experimental results

 

 

 

 

 

 

 

7. CONCLUSIONS

 

 

 

 

참고문헌

  1. S.Y. Kim and S.M. Cho, "A Haptic Navigation System for Visually Impaired Persons," Journal of Korea Multimedia Society, Vol. 14, No. 1, pp. 133-143, 2011. https://doi.org/10.9717/kmms.2011.14.1.133
  2. W. Park and T. Park, "An Efficient Channel Navigation Scheme Based on Patterns of Watching TV Programs," Journal of Korea Multimedia Society, Vol. 13, No. 9, pp. 1357-1364, 2010.
  3. D. Yung, M.L. Yiu, and E. Lo, "A Safe-exit Approach for Efficient Network-based Moving Range Queries," Data and Knowledge Engineering, Vol. 72, pp. 126-147, 2012. https://doi.org/10.1016/j.datak.2011.10.001
  4. T. Abeywickrama, M.A. Cheema, and D. Taniar, "k-Nearest Neighbors on Road Networks: A Journey in Experimentation and In-Memory Implementation," The Proceedings of the Very Large Data Bases Endowment, Vol. 9, No. 6, pp. 492-503, 2016.
  5. A.M. Aly, W.G. Aref, and M. Ouzzani, "Spatial Queries with Two kNN Predicates," The Proceedings of the Very Large Data Bases Endowment, Vol. 5, No. 11, pp. 1100-1111, 2012.
  6. K. Mouratidis, M.L. Yiu, D. Papadias, and N. Mamoulis, "Continuous Nearest Neighbor Monitoring in Road Networks," Proceedings of International Conference on Very Large Data Bases, pp. 43-54, 2006.
  7. S. Yang, M.A. Cheema, X. Lin, and W. Wang, "Reverse k Nearest Neighbors Query Processing: Experiments and Analysis," The Proceedings of the Very Lare Data Bases Endowment, Vol. 8, No. 5, pp. 605-616, 2015.
  8. S. Peng and H. Samet, "Analytical Queries on Road Networks: An Experimental Evaluation of Two System Architectures," Proceedings of International Conference on Advances in Geographic Information Systems, pp. 1:1-1:10, 2015.
  9. S. Peng, J. Sankaranarayanan, and H. Samet, "SPDO: High-Throughput Road Distance Computations on Spark Using Distance Oracles," Proceedings of International Conference on Data Engineering, pp. 1239-1250, 2016.
  10. J.B. Rocha-Junior, A. Vlachou, C. Doulkeridis, and K. Norvag, "Efficient Processing of Topk Spatial Preference Queries," The Proceedings of the Very Large Data Bases Endowment, Vol. 4, No. 2, pp. 93-104, 2010.
  11. M.L. Yiu, X. Dai, N. Mamoulis, and M. Vaitis, "Top-k Spatial Preference Queries," Proceeding of International Conference on Data Engineering, pp. 1076-1085, 2007.
  12. M.L. Yiu, H. Lu, N. Mamoulis, and M. Vaitis, "Ranking Spatial Data by Quality Preferences," IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 3, pp. 433-446, 2011. https://doi.org/10.1109/TKDE.2010.119
  13. H.J. Cho, S.J. Kwon, and T.S. Chung, "ALPS: an Efficient Algorithm for Top-k Spatial Preference Search in Road Networks," Knowledge and Information Systems, Vol. 42, No. 3, pp. 599-631, 2015. https://doi.org/10.1007/s10115-013-0696-9
  14. L. Chen, J. Xu, X. Lin, C.S. Jensen, and H. Hu, "Answering Why-not Spatial Keyword Topk Queries via Keyword Adaption," Proceedings of International Conference on Data Engineering, pp. 697-708, 2016.
  15. L. Guo, J. Shao, H.H. Aung, and K.L. Tan, "Efficient Continuous Top-k Spatial Keyword Queries on Road Networks," GeoInformatica, Vol. 19, No. 1, pp. 29-60, 2015. https://doi.org/10.1007/s10707-014-0204-8
  16. J.B. Rocha-Junior and K. Norvag, "Top-k Spatial Keyword Queries on Road Networks," Proceeding of International Conference on Extending Database Technology, pp. 168-179, 2012.
  17. N. Beckmann, H.P. Kriegel, R. Schneider, and B. Seeger, "The R*-tree: an Efficient and Robust Access Method for Points and Rectangles," Proceeding of International Conference on Management of Data, pp. 322-331, 1990.
  18. A. Guttman, "R-trees: a Dynamic Index Structure for Spatial Searching," Proceeding of International Conference on Management of Data, pp. 47-57, 1984.
  19. R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," Proceeding of Symposium on Principles of Database Systems, pp. 102-113, 2001.
  20. Real Datasets for Spatial Databases, https://www.cs.utah.edu/-lifeifei/SpatialDataset.htm (accessed Dec., 20, 2018).
  21. American Hotel and Lodging Association, http://www.ahla.com/ (accessed Dec., 20, 2018).
  22. D. Papadias, J. Zhang, N. Mamoulis, and Y. Tao, "Query Processing in Spatial Network Databases," Proceeding of International Conference on Very Large Data Bases, pp. 802-813, 2003.