DOI QR코드

DOI QR Code

Developing an Alias Management Method based on Word Similarity Measurement for POI Application

  • Choi, Jihye (Dept. of Geoinformatics, University of Seoul) ;
  • Lee, Jiyeong (Dept. of Geoinformatics, University of Seoul)
  • Received : 2019.03.20
  • Accepted : 2019.04.25
  • Published : 2019.04.30

Abstract

As the need for the integration of administrative datasets and address information increases, there is also growing interest in POI (Point of Interest) data as a source of location information across applications and platforms. The purpose of this study is to develop an alias database management method for efficient POI searching, based on POI data representing position. First, we determine the attributes of POI alias data as it is used variously by individual users. When classifying aliases of POIs, we excluded POIs in which the typo and names are all in English alphabet. The attributes of POI aliases are classified into four categories, and each category is reclassified into three classes according to the strength of the attributes. We then define the quality of POI aliases classified in this study through experiments. Based on the four attributes of POI defined in this study, we developed a method of managing one POI alias through and integrated method composed of word embedding and a similarity measurement. Experimental results of the proposed POI alias management method show that it is possible to utilize the algorithm developed in this study if there are small numbers of aliases in each POI with appropriate POI attributes defined in this study.

Keywords

GCRHBD_2019_v37n2_81_f0001.png 이미지

Fig. 1. Matching result of CNS POI (left) and Web POI(right) (Kim et al., 2009)

GCRHBD_2019_v37n2_81_f0002.png 이미지

Fig. 2. Architecture of CBOW and Skip-gram model(Mikolov et al., 2013a)

GCRHBD_2019_v37n2_81_f0003.png 이미지

Fig. 3. Illustration of the Word2Vec Skip-gram model learning

GCRHBD_2019_v37n2_81_f0004.png 이미지

Fig. 4. Flowchart of POI search and alias database management algorithm

GCRHBD_2019_v37n2_81_f0005.png 이미지

Fig. 5. Number of correct matches of POI search results through word embedding and similarity measurement according to the number of experiments

GCRHBD_2019_v37n2_81_f0006.png 이미지

Fig. 6. Match rates of POI results according to numbers of experiment

GCRHBD_2019_v37n2_81_f0007.png 이미지

Fig. 7. Validate alias database usability by attribute

Table 1. Four attributes of POI alias

GCRHBD_2019_v37n2_81_t0001.png 이미지

Table 2. Mean and standard deviation of Similarity Measures, according to attributes

GCRHBD_2019_v37n2_81_t0002.png 이미지

Table 3. Pseudocode for POI searches and alias databases management

GCRHBD_2019_v37n2_81_t0003.png 이미지

References

  1. Elman, J.L. (1991), Distributed representations, simple recurrent networks, and grammatical structure, Machine Learning, Vol. 7, No. 2-3, pp. 195-225. https://doi.org/10.1007/BF00114844
  2. Glenberg, A.M. and Robertson, D.A. (2000), Symbol grounding and meaning: a comparison of high-dimensional and embodied theories of meaning, Journal Of Memory And Language, Vol. 43, No. 3, pp. 379-401. https://doi.org/10.1006/jmla.2000.2714
  3. Gomaa, W.H. and Fahmy, A.A. (2013), A survey of text similarity approaches, International Journal of Computer Applications, Vol. 68, No. 13, pp. 13-18. https://doi.org/10.5120/11638-7118
  4. Google. (2013), Word2vec, https://code.google.com/archive/p/word2vec/ (last date accessed : 14 January 2019).
  5. Harris, Z.S. (1954), Distributional structure. Word, Vol. 10, No. 2-3, pp. 146-162. https://doi.org/10.1080/00437956.1954.11659520
  6. Kim, J.O., Huh, Y., Lee, W.H. and Yu, K.Y. (2009), Matching method of digital map and POI for geospatial web platform, Journal of Korean Society for Geospatial Information System, Vol. 17, No. 4, pp. 23-29.
  7. Ko, E.B. and Lee, J.W. (2013), Implementation of a set-based POI search algorithm supporting classifying duplicate characters, Journal of Digital Contents Society, Vol. 14, No. 4, pp. 463-469. (in Korean with English abstract) https://doi.org/10.9728/dcs.2013.14.4.463
  8. Lee, J. (2009), GIS-based geocoding methods for area-based addresses and 3D addresses in urban areas, Environment and Planning B: Planning and Design, Vol. 36, No. 1, pp. 86-106. https://doi.org/10.1068/b31169
  9. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a), Efficient estimation of word representations in vector space, arXiv preprint arXiv:, pp. 1301.3781.
  10. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., and Dean, J. (2013b), Distributed representations of words and phrases and their compositionality, In Advances In Neural Information Processing Systems , pp. 3111-3119.
  11. OGC(Open Geospatial Concortium). (2013), Points of interest (POI) Standards Working Group Charter, https://portal.opengeospatial.org/files/?artifact_id=54800 (last date accessed : 12 April 2019).
  12. Park, J.H., Kang, H.Y., and Lee, J. (2016), A spatial-temporal POI data model for implementing location-based services, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Vol. 34, No. 6, pp. 609-618. https://doi.org/10.7848/ksgpc.2016.34.6.609
  13. Ratcliff, J.W. and Metzener, D.E. (1988), Pattern-matchingthe gestalt approach, Dr Dobbs Journal, Vol. 13, No. 7, pp. 46.
  14. Sasaki, Y., Ishikawa, Y., Fujiwara, Y., and Onizuka, M. (2018), Sequenced Route Query with Semantic Heirarchy. Proceedings of the 21st International Conference on Extending Database Techology, 26-29 March 2019, Lisbon, Portugal, pp. 37-48.
  15. TTA. (2014), POI (Point of Interest) data model, http://www.tta.or.kr/data/ttas_view.jsp?r n=1&by=desc&r n1=Y&standard_no=TTAK.OT-10.0360&order=publish_date&publish_date=%C2%A7ion_code%3D&nowpage=1&totalSu=1&pk_num=TTAK.OT-10.0360&nowSu=1 (last date accessed : 12 April 2019).
  16. Xu, C., Li, Q., and Yong, W. (2012), The Design and Implementation of Address Matching Engine. Proceedings of the International Conference on Geo-spatial Solutions for Emergency Management an the 50 th Anniversary of Chinese Academy of Surveying and Mapping, 14-16 September 2009, Beijing, China. ISPRS Archives Volume XXXVIII-7/C4, pp. 118-120.

Cited by

  1. Improving a Street-Based Geocoding Algorithm Using Machine Learning Techniques vol.10, pp.16, 2019, https://doi.org/10.3390/app10165628