DOI QR코드

DOI QR Code

Group-wise Keyword Extraction of the External Audit using Text Mining and Association Rules

텍스트마이닝과 연관규칙을 이용한 외부감사 실시내용의 그룹별 핵심어 추출

  • Seong, Yoonseok (Dept. of Accounting, School of Business, Dongguk University-Seoul) ;
  • Lee, Donghee (Dept. of Management, School of Business, Dongguk University-Seoul) ;
  • Jung, Uk (Dept. of Management, School of Business, Dongguk University-Seoul)
  • 성윤석 (동국대학교-서울 경영대학 회계학과) ;
  • 이동희 (동국대학교-서울 경영대학 경영학과) ;
  • 정욱 (동국대학교-서울 경영대학 경영학과)
  • Received : 2022.02.01
  • Accepted : 2022.02.24
  • Published : 2022.03.31

Abstract

Purpose: In order to improve the audit quality of a company, an in-depth analysis is required to categorize the audit report in the form of a text document containing the details of the external audit. This study introduces a systematic methodology to extract keywords for each group that determines the differences between groups such as 'audit plan' and 'interim audit' using audit reports collected in the form of text documents. Methods: The first step of the proposed methodology is to preprocess the document through text mining. In the second step, the documents are classified into groups using machine learning techniques and based on this, important vocabularies that have a dominant influence on the performance of classification are extracted. In the third step, the association rules for each group's documents are found. In the last step, the final keywords for each group representing the characteristics of each group are extracted by comparing the important vocabulary for classification with the important vocabulary representing the association rules of each group. Results: This study quantitatively calculates the importance value of the vocabulary used in the audit report based on machine learning rather than the qualitative research method such as the existing literature search, expert evaluation, and Delphi technique. From the case study of this study, it was found that the extracted keywords describe the characteristics of each group well. Conclusion: This study is meaningful in that it has laid the foundation for quantitatively conducting follow-up studies related to key vocabulary in each stage of auditing.

Keywords

Acknowledgement

본 연구는 2022학년도 동국대학교 논문게재장려금 지원으로 이루어졌음.

References

  1. Aggarwal, C. and Zhai, C. 2012. A Survey of Text Classification Algorithms, in Charu C. Aggarwal and ChengXiang Zhai (eds) Mining Text Data. Berlin/Heidelberg: Springer, pp.163-22.
  2. Alloghani, M., Dhiya A., Jamila M., Abir H., and Ahmed J. Aljaaf. 2019. A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science' in Michael W. Berry, Azlinah Mohamed and Bee Wah Yap (eds) Supervised and Unsupervised Learning for Data Science, Cham. Switzerland: Springer, pp.3-21.
  3. Analyzer for Financial Information Utilization. The Journal of Information Systems 28(4):155-174. https://doi.org/10.5859/KAIS.2019.28.4.155
  4. Blokdijk, H., F. Drieenhuizen, D. A. Simunic, and M.l T. Stein. 2006. An Analysis of Cross-Sectional Differences in Big and Non-Big Public Accounting Firms' Audit Programs. Auditing: A Journal of Practice & Theory 25(1): 27-48. https://doi.org/10.2308/aud.2006.25.1.27
  5. Boskou, G., Kirkos, E., and Spathis, C. 2018. Assessing internal audit with text mining. Journal of Information & Knowledge Management 17(02):1850020. https://doi.org/10.1142/S021964921850020X
  6. Breiman, L. 2001. Random Forests, Machine Learning 45(1):5-32. https://doi.org/10.1023/A:1010933404324
  7. Byrnes, P. E., Al-Awadhi, A., Gullvist, B., Brown-Liburd, H., Teeter, R., Warren, J.D., and Vasarhelyi, M., 2018. Evolution of Auditing: From the Traditional Approach to the Future Audit1. In Continuous auditing. Emerald Publishing Limited.
  8. Chan, D. Y. and M. A. Vasarhelyi. 2018. Innovation and practice of continuous auditing. In Continuous Auditing: Theory and Application Emerald Publishing Limited: 271-283.
  9. Cho, S., M. A. Vasarhelyi, T. Sun, C. Zhang. 2020. Learning from Machine Learning in Accounting and Assurance. Journal of Emerging Technologies in Accounting 17(1):1-10. https://doi.org/10.2308/jeta-52760tn
  10. Goltz, N. and Mayo, M., 2018. Enhancing regulatory compliance by using artificial intelligence text mining to identify penalty clauses in legislation. RAIL, 1, p.175.
  11. Gray, G. L. and R. S., Debreceny. 2014. A taxonomy to guide research on the application of data mining to fraud detection in financial statement audits. International Journal of Accounting Information Systems 15(4): 357-380. https://doi.org/10.1016/j.accinf.2014.05.006
  12. Hastie T, Tibshirani R., and Friedman J. 2001. The Elements of Statistical Learning; Data Mining, Inference, and Prediction. Spring.
  13. Heo, B. K. and Jung, Y. K. 2013. The Effects of Data Mining Ensemble Techniques on Audit Risk Reduction. Korean Management Review 42(5):1523-1559.
  14. Jung, G.Y., Yoon, S.S., and Kang, J.Y, 2019. Development of Text Mining-Based Accounting Terminology.
  15. Kamaruddin, S.S., Bakar, A.A., Hamdan, A.R., Nor, F.M., Nazri, M.Z.A., Othman, Z.A., and Hussein, G.S., 2015. A text mining system for deviation detection in financial documents. Intelligent Data Analysis 19(s1):S19-S44. https://doi.org/10.3233/IDA-150768
  16. Kim, D. Y. 2019. A Study on the Effects of Human Resource in KEJI Internal Accounting Control System on Audit Times. Korean Journal of Business Administration 32(6):1087-1107.
  17. Kim, K.S. and Cho, N.W. 2021. A Study on Networks of Defense Science and Technology using Patent Mining. Journal of the Korean Society for Quality Management 49(1):97-112. https://doi.org/10.7469/JKSQM.2021.49.1.97
  18. Kim, S., Park, E., Cho, H., Hong, S., Sohn, B., and Hong, J. 2021. Pattern Analysis of Nonconforming Farmers in Residual Pesticides using Exploratory Data Analysis and Association Rule Analysis. Journal of the Korean Society for Quality Management 49(1):81-95. https://doi.org/10.7469/JKSQM.2021.49.1.81
  19. Kwon, S.Y., Jung,K.C., and Yun, Y.S. 2016. The Effects of Audit Planning and Interim Audit on Audit Hours and Audit Fees. Study on Accounting, Taxation & Auditing 58(3):137-172.
  20. Na, H. J., Lee, K. C., Choi, S. U., and Kim, S. T. 2019. An Analysis on the Validity of Audit Opinion Using Unstructured Data from Audit Reports and Audit Fees and Hours : Emphasis on Utilizing Text Mining and Sentiment Analysis. Korean Accounting Review 44(4):175-214. https://doi.org/10.24056/kar.2019.08.002
  21. Siroky, D. S. 2009, Navigating and Random Forest and Related Advances in Algorithmic Modeling, Statistics Survey, 3, 147-163. https://doi.org/10.1214/07-SS033
  22. Strobl, C. and James, M. and Gerhard, T. 2009. An Introduction to the Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging and Random Forests, Psychological Methods. 14(4), Dec.:323-3. https://doi.org/10.1037/a0016973