DOI QR코드

DOI QR Code

공간 정보를 가지는 데이터셋의 준자동 융합 기법

Semi-automatic Data Fusion Method for Spatial Datasets

  • Yoon, Jong-chan (Department of Electrical and Computer Engineering, University of Seoul) ;
  • Kim, Han-joon (Department of Electrical and Computer Engineering, University of Seoul)
  • 투고 : 2021.07.27
  • 심사 : 2021.09.17
  • 발행 : 2021.11.30

초록

빅데이터 관련 기술이 발달함에 따라 이전에는 처리할 수 없었던 방대한 규모의 데이터를 처리할 수 있게 되었다. 이에 따라 데이터 선정 및 융합 자동화 프로세스 구축은 빅데이터 기반 서비스 구현에 있어 선택이 아닌 필수인 시대가 되었다. 본 논문은 공간 정보를 담고 있는 데이터셋을 융합하여 유의미한 새로운 정보를 생성하기 위한 준자동화 기법을 제안한다. 우선 Node2Vec 모델을 활용하여 주어진 데이터셋의 키워드를 이용해 데이터셋의 임베딩 벡터를 생성한다. 생성된 각 임베딩 벡터를 이용해 코사인 유사도를 계산하여 데이터셋 간의 시멘틱 유사도를 구한다. 이후 사람이 개입하여 그 시멘틱 유사도가 상대적으로 높은 데이터셋 쌍 중에서 공간 정보를 가진 데이터셋을 선별하고, 데이터셋 쌍을 융합하여 시각화한다. 이러한 일련의 준자동 융합 프로세스를 통해 단일 데이터셋으로부터는 얻을 수 없는 유의미한 융합정보를 생성할 수 있음을 보인다.

With the development of big data-related technologies, it has become possible to process vast amounts of data that could not be processed before. Accordingly, the establishment of an automated data selection and fusion process for the realization of big data-based services has become a necessity, not an option. In this paper, we propose an automation technique to create meaningful new information by fusing datasets containing spatial information. Firstly, the given datasets are embedded by using the Node2Vec model and the keywords of each dataset. Then, the semantic similarities among all of datasets are obtained by calculating the cosine similarity for the embedding vector of each pair of datasets. In addition, a person intervenes to select some candidate datasets with one or more spatial identifiers from among dataset pairs with a relatively higher similarity, and fuses the dataset pairs to visualize them. Through such semi-automatic data fusion processes, we show that significant fused information that cannot be obtained with a single dataset can be generated.

키워드

과제정보

본 논문은 과학기술정보통신부 및 정보통신기술진흥센터의 대학 ICT 연구센터지원사업의 연구결과로 수행되었으며(IITP-2021-2018-0-01417), 또한 2020년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원(No.2020-0-00121, 데이터 품질 평가기반 데이터 고도화 및 데이터셋 보정 기술 개발)을 받아 수행된 연구임.

참고문헌

  1. Bleiholder, Jens, and Felix, N., "Data fusion," ACM computing surveys (CSUR), Vol. 41, No. 1, pp. 1-41, 2009. https://doi.org/10.1145/1456650.1456651
  2. Chang, T. W., "A Study on Integration and Application Plans of Address and Location Information," The Journal of Society for e-Business Studies, Vol. 15, No. 2, pp. 93-105, 2010.
  3. Cho, S. R. and Kim, H. J., "A Preliminary Study on Improving Korean Text Embedding Model," Proceedings of KICS Winter Conference, 2020.
  4. Cho, S. R. and Kim, H. J., "Topic Re-modeling System using Node2Vec," Proceedings of Fall Conference of 2020 Korea Associations of Information Systems, 2020.
  5. Choi, Y. S., Park, H. G., and Kim, G. S., "Establishment of th Plane Coordinate System for Framework Data(UTM-K) in Korea," Korean Journal of Geomatics, Vol. 22, No. 4, 2004.
  6. Gao, J., Li, P., Chen, Z., and Zhang, J., "A Survey on Deep Learning for Multimodal Data Fusion," Neural Computation, Vol. 32, No. 5, pp. 829-864, 2020. https://doi.org/10.1162/neco_a_01273
  7. Grover, A. and Leskovec, "Node2Vec: Scalable feature learning for networks," Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
  8. Khan, S., Nazir, S., Garcia-Magarino, I., and Hussain, A., "Deep learning-based urban big data fusion in smart cities: Towards traffic monitoring and flow-preserving fusion," Computers & Electrical Engineering, Vol. 89, 106906, 2021. https://doi.org/10.1016/j.compeleceng.2020.106906
  9. Korea Ministry of the Interior and Safety, Road Name Address System, http://www.juso.go.kr/.
  10. Lee, S. H., Yang, C. M., and Baek, S. C., "Improvement on Location Based Parcel Numbering System," Journal of Cadastre & Land Informatix, Vol. 42, No. 1, pp. 148-149, 2012.
  11. Li, Y. and Yang, T., "Word embedding for understanding natural language: A survey," Guide to big data applications, pp. 83-104, Springer, 2018.
  12. Liu, J., Li, T., Xie, P., Du, S., Teng, F., and Yang, X., "Urban big data fusion based on deep learning: An overview," Information Fusion, Vol. 53, pp. 123-133, 2020. https://doi.org/10.1016/j.inffus.2019.06.016
  13. Ma, L. and Zhang, Y., "Using Word2Vec to process big text data," Proceedings of IEEE International Conference on Big Data, pp. 2895-2897, 2015.
  14. Wiemann, S., and Lars, B., "Spatial data fusion in spatial data infrastructures using linked data," International Journal of Geographical Information Science, Vol. 30, No. 4, pp. 613-636, 2016. https://doi.org/10.1080/13658816.2015.1084420
  15. Winarno, E., Hadikurniawati, W., and Rosso, R. N., "Location based Service for Presence System using Haversine Method," Proceedings of 2017 International Conference on Innovative and Creative Information Technology (ICITech), pp. 1-4, 2017.
  16. Xia, P., Zhang, L., and Li, F., "Learning Similarity with Cosine Similarity Ensemble," Information Sciences, Vol. 307, pp. 39-52, 2015. https://doi.org/10.1016/j.ins.2015.02.024