DOI QR코드

DOI QR Code

A Study on the Trends of Construction Safety Accident in Unstructured Text Using Topic Modeling

비정형 텍스트 기반의 토픽 모델링을 이용한 건설 안전사고 동향 분석

  • Lee, Sang-Gyu (Korea Institute of Civil Engineering and Building Technology)
  • Received : 2018.06.27
  • Accepted : 2018.10.05
  • Published : 2018.10.31

Abstract

In order to understand and track the trends of construction safety accident, this study shows the topic trends in the construction safety accident with LDA(Latent Dirichlet Allocation)-based topic modeling method for data analytics. Especially, it performs to figure out the main issue of construction safety accident with unstructured data analysis based on the topic modeling rather than a variety of structured data analysis for preventing to safety accident in construction industry. To apply this methodology, I randomly collected to 540 news article data about construction accident from January 2017 to February 2018. Based on the unstructured data with the LDA-based topic modeling, I found the 10 topics and identified key issues through 10 keyword in each 10 topics. I forecasted the topic issue related to construction safety accident based on analysis of time-series trends about the news data from January 2017 to February 2018. With this method, this research gives a hint about ways of using unstructured news article data to anticipate safety policy and research field and to respond to construction accident safety issues in the future.

본 연구는 건설 안전사고에 대한 트랜드 분석을 위해 LDA(Latent Dirichlet Allocation) 기반의 토픽모델링(Topic Modeling)을 제시하여 분석하고자 한다. 특히, 건설산업의 안전사고를 예방하기 위해 제시되고 있는 기존의 다양한 정형데이터 분석에서 벗어난 비정형 데이터 분석 기반의 토픽 모델링을 통해 건설 안전사고 주요 핵심 키워드의 흐름에 대해 파악이 가능하다. 본 방법론을 적용하기 위해 540개의 건설 안전사고 관련 뉴스데이터를 수집하였다. 이를 기반으로, 10가지 토픽과 각 토픽 내의 10가지 키워드를 통해 주요 이슈를 도출하였고 각 토픽에 대한 2017년 1월부터 2018년 2월까지의 뉴스 데이터를 월별 시계열 분석을 통해 향후 토픽에 관한 이슈를 예측한다. 본 연구를 바탕으로 향후 건설 안전사고의 다양한 이슈를 선제적으로 예측하고 이를 기반으로 건설 안전사고 정책과 연구에 좋은 방향을 제시할 것으로 판단한다.

Keywords

References

  1. S. Y. Choi, S. I. Choi, W. S. Yu, "A Study in Improvement of Current Problems of Safety Management System in Korean Construction Industry", Construction Economy Research Institute of Korea Report, 2017.
  2. J. W. Lee, S. K. Kim, "Complementary Research and Analysis for Hadoop", The Korea Society of Computer and Information Winter Conference 2012, Vol.20, No.2, pp.3-6, July, 2012.
  3. C. Liu, N. Kim, "Individual Interests Tracking : Beyond Macro-level Issue Tracking", Journal of Information Technology Services, Vol.13, No.4, pp.275-287, 2014. DOI: https://dx.doi.org/10.9716/kits.2014.13.4.275
  4. D. H. Jeong, M. Song, "Time gap analysis by the topic model-based temporal technique", Journal of Informetrics, Vol.8, No.3, pp.776-790, 2014. DOI: https://dx.doi.org/10.1016/j.joi.2014.07.005
  5. S. H. Min, "Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis", Journal of Intelligence and Information Systems, Vol.22, No.1, pp.139-157, 2016. DOI: https://dx.doi.org/10.13088/jiis.2016.22.1.139
  6. D. M. Park, "News source network analysis as big data analytics of news articles", Korean Society for Journalism and Communication Studies, Vol.57, No.6, pp.234-262, 2012.
  7. J. H. Lee, H. K. Lee, "A study on unstructured text mining algorithm through R programming based on data dictionary", Journal of the Korea Industrial Information Systems Research, Vol.20, No.2, pp.113-124, 2015 DOI: https://dx.doi.org/10.9723/jksiis.2015.20.2.113
  8. D. M. Blei, A. Y. Ng, M. I. Jordan, "Latent Dirichlet allocation", Journal of Machine Learning Research, Vol.3, pp.993-1022, 2003.
  9. D. Jeong, J. Kim, G. N. Kim, J. U. Heo, B. W. On, M. Kang, "A Proposal of a Keyword Extraction System for Detecting Social Issues", Journal of Intelligence and Information Systems, Vol.19, No.3, pp.1-23, 2013. DOI: https://dx.doi.org/10.13088/jiis.2013.19.3.001
  10. J. H. Park, M. Song, "A Study on the Research Trends in Library and Information Science in Korea using Topic Modeling", Journal of the Korean Society for Information Management, Vol.30, No.1, pp.7-32, 2013. DOI: https://dx.doi.org/10.3743/kosim.2013.30.1.007
  11. T. K. Kim, H. R. Choi, H. C. Lee, "A study on the research trends in fintech using topic modeling", Journal of the Korea Academia-Industrial cooperation Society, Vol.17, No.11, pp.670-681, 2017. DOI: https://dx.doi.org/10.5762/kais.2016.17.11.670
  12. J. H. Park, H. J. Oh, "Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea : focused on LDA and HDP", Journal of Korean Library and Information Science Society, Vol.48, No.4, pp.235-258, 2017. https://doi.org/10.16981/kliss.48.201712.235
  13. J. H. Bae, N. G. Han, M. Song, "Twitter Issue Tracking System by Topic Modeling Techniques", Journal of Intelligence and Information Systems, Vol.20, No.2, pp.109-122, June, 2014. DOI: https://dx.doi.org/10.13088/jiis.2014.20.2.109
  14. B. Kang, M. Song, W. Jho, "A Study on Opinion Mining of Newspaper Texts based on Topic Modeling,", Journal of the Korean Library and Information Science Society, Vol.47, No.4, pp.315-334, 2013. DOI: https://dx.doi.org/10.4275/kslis.2013.47.4.315
  15. J. S. Park, S. G. Hong, J. W. Kim, "A Study on Science Technology Trend and Prediction Using Topic Modeling", Journal of the Korea Industrial Information Systems Research, Vol.22, No.4, pp.19-28, 2016. https://doi.org/10.9723/JKSIIS.2017.22.4.019
  16. K. S. Shin, H. R. Choi, H. C. Lee, "Topic Model Analysis of Research Trend on Renewable Energy", Journal of the Korea Academia-Industrial cooperation Society, Vol.16, No.9, pp.6411-6418, 2015. DOI: https://dx.doi.org/10.5762/KAIS.2015.16.9.6411
  17. B. Kang, M. Song, W. Jho, "A Study on Opinion Mining of Newspaper Texts based on Topic Modeling,", Journal of the Korean Society for Library and Information Science, Vol.47, No.4, pp.315-334, 2013. DOI: https://dx.doi.org/10.4275/kslis.2013.47.4.315