DOI QR코드

DOI QR Code

A Study on the Compression and Major Pattern Extraction Method of Origin-Destination Data with Principal Component Analysis

주성분분석을 이용한 기종점 데이터의 압축 및 주요 패턴 도출에 관한 연구

  • Kim, Jeongyun (Dept. of Civil and Environmental Engineering, KAIST) ;
  • Tak, Sehyun (Center for Connected and Automated Driving Research, The Korea Transport Institute) ;
  • Yoon, Jinwon (Dept. of Civil and Environmental Engineering, KAIST) ;
  • Yeo, Hwasoo (Dept. of Civil and Environmental Engineering, KAIST)
  • 김정윤 (한국과학기술원 건설및환경공학과) ;
  • 탁세현 (한국교통연구원 4차산업혁명교통연구본부) ;
  • 윤진원 (한국과학기술원 건설및환경공학과) ;
  • 여화수 (한국과학기술원 건설및환경공학과)
  • Received : 2020.06.22
  • Accepted : 2020.08.13
  • Published : 2020.08.31

Abstract

Origin-destination data have been collected and utilized for demand analysis and service design in various fields such as public transportation and traffic operation. As the utilization of big data becomes important, there are increasing needs to store raw origin-destination data for big data analysis. However, it is not practical to store and analyze the raw data for a long period of time since the size of the data increases by the power of the number of the collection points. To overcome this storage limitation and long-period pattern analysis, this study proposes a methodology for compression and origin-destination data analysis with the compressed data. The proposed methodology is applied to public transit data of Sejong and Seoul. We first measure the reconstruction error and the data size for each truncated matrix. Then, to determine a range of principal components for removing random data, we measure the level of the regularity based on covariance coefficients of the demand data reconstructed with each range of principal components. Based on the distribution of the covariance coefficients, we found the range of principal components that covers the regular demand. The ranges are determined as 1~60 and 1~80 for Sejong and Seoul respectively.

기종점 데이터는 수요 분석 및 서비스 설계를 위해서 대중교통, 도로운영 등 다양한 분야에서 저장 및 활용되고 있다. 최근 빅데이터의 활용성이 증대되면서 기종점 데이터의 분석 및 활용에 대한 수요도 함께 증가하고 있다. 기존의 일반적인 교통 정보 데이터가 수집장비 수(n)에 비례하여 데이터양이 증가(α·n)하는 것과는 다르게, 기종점 데이터는 수집지점 수(n)의 증가에 따라 수집 데이터의 양이 기하급수적으로 증가(α·n2)하는 경향이 있다. 이로 인하여 기종점 데이터를 원시 데이터의 형태로 장기간 저장하고 빅데이터 분석에 활용하는 것은 대용량의 저장 공간이 필요하다는 것을 고려할 때 실용적 대안으로 여겨지지 않고 있다. 이와 함께 기종점 데이터는 0~10 사이의 작은 수요 부분에 패턴화된 형태와 무작위 적인 형태의 데이터가 섞여있어 작은 수요가 그룹화되어 발생하는 주요 패턴을 추출하기에 어려움이 있다. 이러한 기종점 데이터의 저장용량의 한계와 패턴화 분석의 한계를 극복하고자 본 연구에서는 주성분 분석을 활용한 대중교통 기종점 데이터의 압축 및 분석 방법을 제안하였다. 본 연구에서는 서울시와 세종시의 대중교통 이용 데이터를 활용하여 모빌리티 데이터를 분석하고, 모빌리티 기종점 데이터에 포함된 무작위 성향이 높은 데이터를 제거하기 위해 주성분분석 기반의 데이터 압축 및 복원에 관한 연구를 수행하였다. 주성분분석으로 분해된 기종점 데이터와 원데이터를 비교하여 주요한 수요 패턴을 찾고 이를 통해 압축률과 복원율을 높일 수 있는 주성분 범위를 제안하였다. 본 연구에서 분석한 결과, 서울시 기준 1~80, 세종시 기준 1~60까지의 주성분을 사용할 경우 주요 이동 데이터의 손실 없이 기종점 데이터에 포함되어있는 노이즈를 제거하고 데이터를 압축 및 복원이 가능하였다.

Keywords

References

  1. Asif M. T., Kannan S., Dauwels J. and Jaillet P.(2013), "Data compression techniques for urban traffic data," 2013 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS), IEEE, 2013.
  2. Asif M. T., Srinivasan K., Mitrovic N., Dauwels J. and Jaillet P.(2014), "Near-lossless compression for large traffic networks," IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp.1817-1826. https://doi.org/10.1109/TITS.2014.2374335
  3. Barcelo J., Montero L., Marques L. and Carmona C.(2010), "Travel time forecasting and dynamic origin-destination estimation for freeways based on bluetooth traffic monitoring," Transportation Research Record, vol. 2175, no. 1, pp.19-27. https://doi.org/10.3141/2175-03
  4. Calabrese F., Diao M., Di Lorenzo G., Ferreira Jr. J. and Ratti C.(2013), "Understanding individual mobility patterns from urban sensing data: A mobile phone trace example," Transportation Research Part C: Emerging Technologies, vol. 26, pp.301-313. https://doi.org/10.1016/j.trc.2012.09.009
  5. Damaiyanti T. I., Imawan A. and Kwon J.(2014), "Extracting trends of traffic congestion using a nosql database," 2014 IEEE Fourth International Conference on Big Data and Cloud Computing, IEEE, pp.209-213.
  6. Diao M., Zhu Y., Ferreira Jr. J. and Ratti C.(2016), "Inferring individual daily activities from mobile phone traces: A Boston example," Environment and Planning B: Planning and Design, vol. 43, no. 5, pp.920-940. https://doi.org/10.1177/0265813515600896
  7. Djukic T., Van Lint J. W. C. and Hoogendoorn S. P.(2012), "Application of principal component analysis to predict dynamic origin-destination matrices," Transportation Research Record, vol. 2283, no. 1, pp.81-89. https://doi.org/10.3141/2283-09
  8. Feng S., Ke R., Wang X., Zhang Y. and Li L.(2017), "Traffic flow data compression considering burst components," IET Intelligent Transport Systems, vol. 11, no. 9, pp.572-580. https://doi.org/10.1049/iet-its.2016.0328
  9. Feng S., Zhang Y. and Li L. (2016), "A comparison study for traffic flow data compression," 2016 12th World Congress on Intelligent Control and Automation (WCICA), IEEE, pp.977-982.
  10. Ha J. and Lee S.(2016), "The Estimation of Commuting Pattern and the Analysis of the Commuting Network Structure using Smart Card Data: Focused on the Possibility of APplication Through the Validation Process with Household Travel Survey Data," Journal of Korea Planning Association, vol. 51, no. 4, p.123. https://doi.org/10.17208/jkpa.2016.08.51.4.123
  11. Kim J., Kim D., Seoung H. and Song T.(2019), A study on the Reliability of Traffic Demand Prediction Based on Big Data, The Korea Transport Institute, pp.1-777.
  12. Kim S. K.(2007), The estimation and Application of Origin-Destination Tables by Using Smart Card Data, Seoul, Seoul Development Institute, 2007-R-11.
  13. Kim S. K.(2015), "Plans for Raising the Utilization of Smart Card Data," KRIHS Monthly Magazine, vol. 405, pp.18-24.
  14. Kim W., Kim Y. H., Park H. S. and Park J. K.(2017), "Analysis of Traffic Card Big Data by Hadoop and Sequential Mining Technique," Journal of Information Technology Applications & Management, vol. 24, no. 4, pp.187-196. https://doi.org/10.21219/JITAM.2017.24.4.187
  15. Kumar P., Khani A. and Davis G. A.(2019), "Transit Route Origin-Destination Matrix Estimation using Compressed Sensing," Transportation Research Record, vol. 2673, no. 10, pp.164-174. https://doi.org/10.1177/0361198119845896
  16. Lee M., Han J. and Lee H.(2018), "Analysis of the Transit Ridership Pattern using Transportation Card Data: focusing on Ganghwa," The Journal of The Korea Institute of Intelligent Transportation Systems, vol. 17, no. 2, pp.58-72. https://doi.org/10.12815/kits.2018.17.2.58
  17. Li L., Su X., Zhang Y., Hu J. and Li Z.(2014), "Traffic prediction, data compression, abnormal data detection and missing data imputation: An integrated study based on the decomposition of traffic time series," 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), IEEE, pp.282-289.
  18. Li Q., Jianming H. and Yi Z.(2007), "A flow volumes data compression approach for traffic network based on principal component analysis," 2007 IEEE Intelligent Transportation Systems Conference, IEEE, pp.125-130.
  19. Luo D., Cats O. and van Lint H.(2017), "Constructing transit origin-destination matrices with spatial clustering," Transportation Research Record, vol. 2652, no. 1, pp.39-49. https://doi.org/10.3141/2652-05
  20. Maktoubian J., Noori M., Mouziraji M. G. and Amini M.(2017), "Analyzing Large-Scale Smart Card Data to Investigate Public Transport Travel Behaviour Using Big Data Analytics," Journal of Information Technology and Software Engineering, vol. 7, no. 4, p.211.
  21. Mitrovic N., Asif M. T., Dauwels J. and Jaillet P.(2015), "Low-dimensional models for compressed sensing and prediction of large-scale traffic data," IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 5, pp.2949-2954. https://doi.org/10.1109/TITS.2015.2411675
  22. Munizaga M., Palma C. and Fischer D.(2011), "Estimation of a Disaggregate Public Transport OD Matrix from Passive SmartCard Data from Santiago, Chile," Transportation Research Board, 11-0430.
  23. Ryu Y. and Chung U.(2013), "A study on Combined Model of Gravity Model and Growth Factor Model for Trip Distribution Estimation," Journal of Daegu Gyeongbuk Development Institute, vol. 12, no. 1, pp.63-73.
  24. Xu D. W., Wang Y. D., Jia L. M., Zhang G. J. and Guo H. F.(2017), "Compression Algorithm of Road Traffic Spatial Data Based on LZW Encoding," Journal of Advanced Transportation, 2017.
  25. Yang H., Kim G., Nam H. and Jun C.(2018), "An Individual Trip Dynamic Visualization method using Smartcard Data," Journal of Korean Society for Geospatial Information Science, vol. 26, no. 2, pp.3-10. https://doi.org/10.7319/kogsis.2018.26.2.003