DOI QR코드

DOI QR Code

Big Data Analysis of Busan Civil Affairs Using the LDA Topic Modeling Technique

LDA 토픽모델링 기법을 활용한 부산시 민원 빅데이터 분석

  • Park, Ju-Seop (Smart Governance Research Center, Dong-A University) ;
  • Lee, Sae-Mi (Smart Governance Research Center, Dong-A University)
  • Received : 2020.02.18
  • Accepted : 2020.04.01
  • Published : 2020.06.30

Abstract

Local issues that occur in cities typically garner great attention from the public. While local governments strive to resolve these issues, it is often difficult to effectively eliminate them all, which leads to complaints. In tackling these issues, it is imperative for local governments to use big data to identify the nature of complaints, and proactively provide solutions. This study applies the LDA topic modeling technique to research and analyze trends and patterns in complaints filed online. To this end, 9,625 cases of online complaints submitted to the city of Busan from 2015 to 2017 were analyzed, and 20 topics were identified. From these topics, key topics were singled out, and through analysis of quarterly weighting trends, four "hot" topics(Bus stops, Taxi drivers, Praises, and Administrative handling) and four "cold" topics(CCTV installation, Bus routes, Park facilities including parking, and Festivities issues) were highlighted. The study conducted big data analysis for the identification of trends and patterns in civil affairs and makes an academic impact by encouraging follow-up research. Moreover, the text mining technique used for complaint analysis can be used for other projects requiring big data processing.

시민들은 도시 내 발생되고 있는 지역문제에 대해 큰 관심을 가지고 있다. 지방정부는 이러한 지역문제들을 해결하기 위해 노력하고 있지만 시민들의 생활 불편을 줄여주기는 쉽지 않고 이로 인한 시민들의 불만은 민원으로 이어지고 있다. 이를 해소할 수 있는 대안으로 빅데이터 활용을 통해 민원의 특성을 파악하고, 시민들에게 선제적 편의성을 제공하기 위한 노력이 절실하다. 본 논문에서는 LDA 토픽모델링 기법을 활용하여 전자민원의 동향 분석에 관한 연구를 실시한다. 이를 위해 2015~2017년 9,625건의 부산시 전자민원을 대상으로 20개의 민원토픽을 추출하였다. 도출된 민원토픽을 통해 핵심민원을 파악하고, 분기별 비중 추이 분석을 통하여 4개의 Hot 민원(버스정차, 택시기사, 칭찬, 민원처리)과 4개의 Cold 민원(cctv설치, 버스노선, 공원주차장, 축제 불만)을 도출하였다. 본 연구는 민원동향을 파악하기 위해 빅데이터 분석 방법을 제시하였고, 후속 연구를 유발하였다는 학문적 기여도가 있다. 또한 민원분석을 위해 사용한 텍스트마이닝 기법은 빅데이터 처리가 필요한 다른 행정업무에도 활용될 수 있다.

Keywords

References

  1. Abuhay, T. M., Nigatie, Y. G. & Kovalchuk, S. V. (2018). "Towards Predicting Trend of Scientific Research Topics Using Topic Modeling." Procedia Computer Science, 136, 304-310. https://doi.org/10.1016/j.procs.2018.08.284
  2. Alghamdi, R. & Alfalqi, K. A. (2015). "Survey of Topic Modeling in Text Mining." International Journal of Advanced Computer Science and Application, s6(1), 147-153.
  3. Blei, D. M. (2012). "Probabilistic Topic Models." Communications of the ACM, 55(4), 77-84. https://doi.org/10.1145/2133806.2133826
  4. Blei, D. M., Ng, A. Y. & Jordan, M. (2003). "Latent Dirichlet Allocation." Journal of machine Learning research, 3(Jan), 993-1022.
  5. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L. & Blei, D. M. (2009). "Reading Tea Leaves: How Humans Interpret Topic Models." Advances in neural information processing systems, 22, 288-296.
  6. Cheng, X., Yan, X., Lan, Y. & Guo, J. (2014). "BTM: Topic Modeling Over Short Texts." IEEE Transactions on Knowledge and Data Engineering, 26(12), 2928-2941. https://doi.org/10.1109/TKDE.2014.2313872
  7. Cho, T. I. (2016). "Spatiotemporal Characteristics Analysis of Complaints on Officially Assessed Land Price by Big Data Mining." Doctoral Thesis, Department of Civil and Environmental Engineering, Incheon University.
  8. Deerwester, S., Dumais, S., Landauer, T., Furnas, G. & Harshman, R. (1990). "Indexing by Latent Semantic Analysis." Journal of the American Society for Information Science, 41(6), 391-407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  9. DiMaggio, P., Nag, M. & Blei, D. (2013). "Exploiting Affinities Between Topic Modeling and the Sociological Perspective on Culture: Application to Newspaper Coverage of U.S. Government Arts Funding." Poetics, 41(6), 570-606. https://doi.org/10.1016/j.poetic.2013.08.004
  10. Evangelopoulos, N. & Visinescu, L. (2012). "Text-mining the voice of the people." Communications of the ACM, 55(2), 62-69. https://doi.org/10.1145/2076450.2076467
  11. Hagen, L. (2018). "Content Analysis of E-petitions with Topic Modeling: How to Train and Evaluate LDA Models?" Information Processing & Management, 54(6), 1292-1307. https://doi.org/10.1016/j.ipm.2018.05.006
  12. Hagen, L., Harrison, T. M., Uzuner, O., May, W., Fake, T. & Katragadda, S. E. (2016). "Petition Popularity: Do Linguistic and Semantic Factors Matter?" Government Information Quarterly, 33(4), 783-795. https://doi.org/10.1016/j.giq.2016.07.006
  13. Hofmann, T. (2001). "Unsupervised Learning by Probabilistic Latent Semantic Analysis." Machine Learning, 42(1-2), 177-196. https://doi.org/10.1023/A:1007617005950
  14. Hu, Y., Boyd-Graber, J., Satinoff, B. & Smith, A. (2014). "Interactive Topic Modeling." Machine Learning, 95(3), 423-469. https://doi.org/10.1007/s10994-013-5413-0
  15. Jacobi, C., Atteveldt, W. V. & Welbers, K. (2015). "Quantitative Analysis of Large Amounts of Journalistic Texts Using Topic Modelling." Digital Journalism, 4(1), 89-106. https://doi.org/10.1080/21670811.2015.1093271
  16. Jang, B. M. (2015). "Analysis of Public Big Data for Promoting Benefits of Community Residents." Master's Thesis. Kyungpook National University.
  17. Kang, K. J. (2019). "Uijeongbu City, Big Data Analysis Project Completion Report Meeting Held." The Financial News. January 21.
  18. Kim, G. & Yun, H. (2016). "Topic Modeling Approach to Understand Changes in Customer Perceptions on Hotel Services in Seoul." Journal of Korea Service Management Society, 17(3), 217-231. https://doi.org/10.15706/jksms.2016.17.3.010
  19. Kim, H. W. (2017). "Seoul City, Unfavorable Rate Refund, Civil Service Regulation, 40% Reduction in Corporate Taxi Complaints." Dongyang News Agency, August 13.
  20. Korea Data Agency. (2017). 2017 Data Industry White Paper. Seoul: Korea Data Agency.
  21. Kim, C. S., Choi, S. J. & Kwahk, K. Y. (2017a). "Investigation of Research Trends in Information Systems Domain Using Topic Modeling and Time Series Regression Analysis." Journal of Digital Contents Society, 18(6), 1143-1150. https://doi.org/10.9728/dcs.2017.18.6.1143
  22. Kim, C. S., Kwahk, K. Y. & Yoon, H. J. (2017b). "An Analysis of Research Trends in Tourism Studies: Applying Topic Modeling and Time Series Regression Analysis." Journal of Tourism and Leisure Research, 29(12), 25-39.
  23. Kim, J. H., & Chen, W. (2018). "Research Topic Analysis in Engineering Management Using a Latent Dirichlet Allocation Model." Journal of Industrial Integration and Management, 3(4), 1850016. https://doi.org/10.1142/S2424862218500161
  24. Kim, K. W. (2018a). "Daegu City Bus Passenger's Biggest Complaint is 'Unkind Bus Driver'." Maeil Shinmun, October 30.
  25. Korea Institute of Sports Science (2016). Improvement plan of public sports facility management. Seoul: Korea Institute of Sports Science.
  26. Kim, S. K. & Jang, S. Y. (2016). "A Study on the Research Trends in Domestic Industrial and Management Engineering Using Topic Modeling." Journal of the Korea Management Engineers Society, 21(3), 71-95.
  27. Kim, Y. H. (2018b). "Incheon Bupyeong-gu Civil Big Data Analysis. 2nd Half Best 7." Maeil Ilbo, February 11.
  28. National Information Society Agency. (2015). Strategy for Building Administrative Service Integration Delivery Platform. Seoul: National Information Society Agency.
  29. Kwak, J. O. (2016). "Unkind Taxi Driver." The Transportation News Korea, May 31.
  30. Lee, J. M., Lee, J. A. & Jeong, J. H. (2017). "The Jeonse Price Forecasting Used by News Big Data - Focusing on Topic Modeling Analysis." Korea Real Estate Academy Review, 69, 43-57.
  31. Lee, S. S. (2016). "A Study on the Application of Topic Modeling for the Book Report Text." Journal of Korean Library and Information Science Society, 47(4), 1-18. https://doi.org/10.16981/kliss.47.4.201612.1
  32. Liu, L., Tang, L., Dong, W., Yao, S. & Zhou, W. (2016). "An Overview of Topic Modeling and Its Current Applications in Bioinformatics." Springerplus, 5(1), 1608. https://doi.org/10.1186/s40064-016-3252-8
  33. Mannila, H. (2000). "Theoretical Frameworks for Data Mining." ACM SIGKDD Explorations Newsletter, 1(2), 30-32. https://doi.org/10.1145/846183.846191
  34. Mergel, I., Rethemeyer, R. K. & Isett, K. (2016). "Big data in public affairs." Public Administration Review, 76 (6), 928-937. https://doi.org/10.1111/puar.12625
  35. Mika, W., Seppo, L. & Mervi, R. (2018). "A Topic Modelling Analysis of Living Labs Research." Technology Innovation Management Review, 8(7), 40-51. https://doi.org/10.22215/timreview/1170
  36. Ministry of the Interior and Safety. (2018). Good Use of Big Data in Civil, Tourism and National Safety. Sejong: Ministry of the Interior and Safety.
  37. Na, Y. W., Park, H. J. & Jung, J. W. (2015). "Pattern analysis of environment complaint using the spatial big data." Journal of the Korean Society of Civil Engineers, 63(7), 29-35.
  38. Park, D. S., Moon, Y. S., Park, Y. H., Yoon, C. H., Jeong, Y. S. & Jang, H. S. (2014). Big data computing technology. Seoul: Hanbit Academy, Inc.
  39. Park, H. J., Kim, H. N. & Hong, Y. J. (2017a). "A Topic Modeling Analysis on the Major Social Issues of the Students' Human Rights Ordinance in Korea." Asian Journal of Education, 18(4), 683-711. https://doi.org/10.15753/aje.2017.12.18.4.683
  40. Park, J. S., Hong, S. G. & Kim, J. W. (2017b). "A study on science technology trend and prediction using topic modeling." Journal of the Korea Industrial Information Systems Research, 22(4), 19-28. https://doi.org/10.9723/jksiis.2017.22.4.019
  41. Park, S. H., Moon, H. S. & Kim, J. K. (2017c). "Online reviews analysis for prediction of product ratings based on topic modeling." Journal of Information Technology Services, 16(3), 113-125. https://doi.org/10.9716/KITS.2017.16.3.113
  42. Park, W. D. (2016). "Improvement Plan for the Civil Affairs Administration Service based on the Level of Resident Satisfaction." Master's Thesis. Myongji University.
  43. Ramirez, E. H., Brena, R., Magatti, D., Stella, F. (2012). "Topic model validation." Neurocomputing, 76(1), 125-133. https://doi.org/10.1016/j.neucom.2011.04.032
  44. Seol, D. H., Ko, J. H., & Yoo, S. H. (2018). "Korean Sociological Association and sociological research: Changes in the areas of sociology in Korea 1964-2017." Korean Journal of Sociology, 52(1), 153-213. https://doi.org/10.21562/kjs.2018.02.52.1.153
  45. Shi, Z., Lee, G. M., Whinston, A. B. (2016). "Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence." MIS Quarterly, 40(4), 1035-1056. https://doi.org/10.25300/MISQ/2016/40.4.11
  46. Shin, H. C. (2009). "Administrative Service Improvement Program of Inhabitants Evaluation." Master's Thesis. Kyungpook National University.
  47. Son, N. R. & Kim, S. Y. (2017). "Complaints Statistics and Department of Automated Classifications System through Public Complaints Big Data Analysis." The Journal of Korean Institute of Next Generation Computing, 13(1), 22-35.
  48. Song, M. & Kim, S. Y. (2013). "Detecting the Knowledge Structure of Bioinformatics by Mining Full-text Collections." Scientometrics, 96(1), 183-201. https://doi.org/10.1007/s11192-012-0900-9
  49. Stylios, G., Christodoulakis, D., Besharat, J., Vonitsanou, M. A., Kotrotsos, I., Koumpouri, A. & Stamou, S. (2010). "Public Opinion Mining for Governmental Decisions." Electronic Journal of e-Government, 8(2), 202-213.
  50. Suh, J. H., Park, C. H. & Jeon, S. H. (2010). "Applying Text and Data Mining Techniques to Forecasting the Trend of Petitions Filed to E-People." Expert Systems with Applications, 37(10), 7255-7268. https://doi.org/10.1016/j.eswa.2010.04.002
  51. van der Meer, T. G. (2016). "Automated Content Analysis and Crisis Communication Research." Public Relations Review, 425, 952-961. https://doi.org/10.1016/j.pubrev.2016.09.001
  52. Won, T. H. & Yoo, H. H. (2016). "Pattern Analysis for Civil Complaints of Local Governments Using a Text Mining." Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, 34(3), 319-327. https://doi.org/10.7848/ksgpc.2016.34.3.319
  53. Yang, H. C. (2018). "Big Data Analysis on Gimpo City bus civil complaint, The Most Frequent Complaint is Nonstop Bus." Kyeong Gi Ilbo, October 11.
  54. Yang, H. L., Chang, T. W. & Choi, Y. (2018). "Exploring the Research Trend of Smart Factory with Topic Modeling." Sustainability, 10(8), 2779. https://doi.org/10.3390/su10082779
  55. Yoon, J. E. & Suh, C. J. (2018). "Research Trend Analysis on Smart Healthcare by Using Topic Modeling and Ego Network Analysis." Journal of Digital Contents Society, 19(5), 981-993. https://doi.org/10.9728/dcs.2018.19.5.981
  56. Yoon, M. Y. (2013). "Analysis of Major Data Promotion Strategies and Implications." The Journal of Science and Technology Policy, 23(3), 31-43.
  57. Yoon, S. Y, & Yoon, D. K. (2017). "A Trends Analysis on Disaster and Safety Management Using Topic Modeling." Journal of Korean Society for Geospatial Information System, 25(3), 75-85. https://doi.org/10.7319/kogsis.2017.25.3.075
  58. Yu, Y. L. (2017). "Analysis of Media Coverage on 2015 Revised Curriculum Policy using Big Data Analysis." Doctoral Thesis, Department of Education, Seoul National University.