DOI QR코드

DOI QR Code

Term Distribution Index and Word2Vec Methods for Systematic Exploring and Understanding of the Rule on Occupational Safety and Health Standards

산업안전보건기준에 관한 규칙의 체계적 탐색과 이해를 위한 단어분포 지표와 Word2Vec 분석 방법

  • Jae Ho Jeong (Department of Safety Engineering, Pukyong National University) ;
  • Seong Rok Chang (Department of Chemical Engineering, Pukyong National University) ;
  • Yongyoon Suh (Department of Industrial & Systems Engineering, Dongguk University)
  • 정재호 (부경대학교 안전공학과) ;
  • 장성록 (부경대학교 안전공학과) ;
  • 서용윤 (동국대학교 산업시스템공학과)
  • Received : 2022.10.17
  • Accepted : 2023.06.11
  • Published : 2023.06.30

Abstract

The purpose of the rules on the Occupational Safety and Health Standards (hereafter safety and health rules) is to regulate the safety and health measures stipulated in the Occupational Safety and Health Act and the specific instructions necessary for their implementation. However, the safety and health rules are extensive and complexly connected, making navigation difficult for users. In order for users to readily access safety and health rules, this study analyzed the frequency, distribution, and significance of terms included in the overall rules. First, the term distribution index was created based on the frequency and distribution of words extracted through text mining. The term distribution index derives from whether a word appears only in a specific chapter or across all rules. This allows users to effectively explore terms to be followed in a specific working environment and terms to be complied with in the overall working environment. Next, the related words of the previously derived terms were visualized through t-SNE and the Word2Vec algorithm. This can help prioritize the things that need to be managed first, focusing on key terms without checking the overall rules. Moreover, this study can help users explore safety and health rules by allowing them to understand the distribution of words and visualize related terms.

Keywords

Acknowledgement

이 성과는 2020년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. NRF-2020R1C1C1007302).

References

  1. S. Wang, "Analysis of Penalties Imposed in Violation of Industrial Safety and Health Act", Occupational Safety and Health Research Institute, 2018.
  2. N. Kim and H. Kim, "A study on the Law2Vec Model for Searching Related Law", Journal of Digital Contents Society, Vol. 18, No. 7, pp. 1419-1425, 2017.
  3. Y. Suh, "A study on Visualizing Law for Universal Understanding of Occupational Safety and Health Act", Occupational Safety and Health Research Institute, 2021.
  4. A. Aizawa, "An Information-theoretic Perspective of Tfidf Measures", Information Processing and Management, Vol. 39, No. 1, pp. 45-65, 2003. https://doi.org/10.1016/S0306-4573(02)00021-3
  5. S. Lee and H. Kim, "Keyword Extraction from News Corpus using Modified TF-IDF", The Journal of Society for e-Business Studies, Vol. 14, No. 4, pp. 59-73, 2009.
  6. T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient Estimation of Word Representations in Vector Space", arXiv preprint arXiv:1301.3781, 2013.
  7. T. Mikolov, I. Sutskever, K. Chen, G. Corrado and J. Dean, "Distributed Representations of Words and Phrases and their Compositionality", Advances in Neural Information Processing Systems 26, pp: 3111-3119, 2013.
  8. L. Ma and Y. Zhang, "Using Word2Vec to process big text data", IEEE International Conference on Big Data, 2015.
  9. E. Park and S. Cho, "KoNLPy: Korean natural language processing in Python", Proceedings of the 26th Annual Conference on Human and Cognitive Language Technology, pp. 133-136, 2014.
  10. S. Kang, S. Chang, J. Lee and Y. Suh, "Structuring Risk Factors of Industrial Incidents Using Natural Language Process", J. Korean Soc. Saf., Vol. 36, No. 1, pp. 56-63, 2021.
  11. R. Rehurek and P. Sojka, "Software Framework for Topic Modelling with Large Corpora", Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta Malta, pp. 45-50, 2010.
  12. S. Choi, J. Seol and S. Lee, "On Word Embedding Models and Parameters Optimized for Korean", Korean Language information Science Society, pp. 252-256, 2016.
  13. H. Kang and J. Yang, "Optimization of Word2vec Models for Korean Word Embeddings", Journal of Digital Contents Society, Vol. 20, No. 4, pp. 825-833, 2019. https://doi.org/10.9728/dcs.2019.20.4.825
  14. D. Smilkov, N. Thorat, C. Nicholson, E. Reif, Fernanda B. Viegas and M. Wattenberg, "Embedding Projector: Interactive Visualization and Interpretation of Embeddings", arXiv:1611.05469, 2016.
  15. Ministry of Employment and Labor, "Statistical Survey and Analysis of Industrial Disasters", 2021.