Browse > Article
http://dx.doi.org/10.15207/JKCS.2021.12.4.023

An Automated Industry and Occupation Coding System using Deep Learning  

Lim, Jungwoo (Department of Computer Science and Engineering, Korea University)
Moon, Hyeonseok (Department of Computer Science and Engineering, Korea University)
Lee, Chanhee (Department of Computer Science and Engineering, Korea University)
Woo, Chankyun (Survey System Management Division)
Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)
Publication Information
Journal of the Korea Convergence Society / v.12, no.4, 2021 , pp. 23-30 More about this Journal
Abstract
An Automated Industry and Occupation Coding System assigns statistical classification code to the enormous amount of natural language data collected from people who write about their industry and occupation. Unlike previous studies that applied information retrieval, we propose a system that does not need an index database and gives proper code regardless of the level of classification. Also, we show our model, which utilized KoBERT that achieves high performance in natural language downstream tasks with deep learning, outperforms baseline. Our method achieves 95.65%, 91.51%, and 97.66% in Occupation/Industry Code Classification of Population and Housing Census, and Industry Code Classification of Census on Basic Characteristics of Establishments. Moreover, we also demonstrate future improvements through error analysis in the respect of data and modeling.
Keywords
Statistic Code Convergence; Classification; Automated Industry/Occupation Coding; Deep learning; Bi-LSTM; KoBERT;
Citations & Related Records
연도 인용수 순위
  • Reference
1 H. D. Cheol. (2007). A Research on the Design and Implementation of the Automated Industry and Occupation Coding System. Masters dissertation. Hannam University, Daejeon
2 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez & I. Polosukhin. (2017, December). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 6000-6010).
3 M. Thompson, M. E. Kornbau & J. Vesely. (2012). Creating an Automated Industry and Occupation Coding Process for the American Community Survey. Seattle : U.S Census Bureau.
4 S. Wood, R. Muthyala, Y. Jin, Y. Qin, N. Rukadikar, A. Rai & H. Gao. (2017, December). Automated industry classification with deep learning. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 122-129). IEEE. DOI : 10.1109/bigdata.2017.8257920   DOI
5 K. He, X. Zhang, S. Ren & J. Sun. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). DOI : 10.1109/cvpr.2016.90   DOI
6 J. S. Lee, S. P, Jun, & H. S. Yoo. (2018). A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification. Journal of Intelligence and Information Systems, 24(3), 221-241 DOI : 10.13088/jiis.2018.24.3.221   DOI
7 S. Hochreiter & J. Schmidhuber. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. DOI : 10.1162/neco.1997.9.8.1735   DOI
8 S. M. Park, C. W. Na, M. S. Choi, D. H, Lee & B. W. On. (2018). KNU Korean Sentiment Lexicon - Bi-LSTM-based Method for Building a Korean Sentiment Lexicon -. Journal of Intelligence and Information Systems, 24(4), 219-240. DOI : 10.13088/jiis.2018.24.4.219   DOI
9 M. S. Choi, & B. W. On. (2019). A Comparative Study on the Accuracy of Sentiment Analysis of Bi-LSTM Model by Morpheme Feature. Proceedings of KIIT Conference, 2019(6), 307-309.
10 Y. T. Oh, M. T. Kim & W. J. Kim (2019). Korean Movie-review Sentiment Analysis Using Parallel Stacked Bidirectional LSTM Model. Journal of KIISE, 46(1), 45-49 DOI : 10.5626/JOK.2019.46.1.45   DOI
11 J. Devlin, M. W. Chang, K. Lee & K. Toutanova. (2019, June). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171-4186). DOI : 10.18653/v1/N19-1423   DOI
12 H. J. Park & K. S, Shin. (2020). Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models. Journal of Intelligence and Information Systems, 26(4), 1-25 DOI : 10.13088/jiis.2020.26.4.001   DOI
13 K. H. Kim, C. E. Park, C. K. Lee, & H. K. Kim. (2020). Korean End-to-end Neural Coreference Resolution with BERT. Journal of KIISE, 47(10), 942-947. DOI : 10.5626/JOK.2020.47.10.942   DOI
14 Y. S. Choi & K. J. Lee. (2020). Performance Analysis of Korean Morphological Analyzer based on Transformer and BERT. Journal of KIISE, 47(8), 730-741. DOI : 10.5626/JOK.2020.47.8.730   DOI
15 C. K. Woo. (2020). A Study on Automatic Coding of Korean Standard Industrial Classification Based on Deep Learning. Masters dissertation. Korea University, Seoul.
16 T. Kudo & J. Richardson. (2018, November). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 66-71). DOI : 10.18653/v1/D18-2012   DOI
17 Y. K. Kang. (2001). Automatic coding system for industry and occupation classification. The Korean Association for Survey Research. Fall Conference 2001, 33-45.
18 Population and Housing Census. (2020) Understanding of the Census. https://www.census.go.kr/cui/cuiDefView.do?q_menu=3&q_sub=1
19 Statistics Korea. (Year Unknown) Statistics Korea Census on Establishments . https://kostat.go.kr/understand/info/info_kost/1/index.action?bmode=read&cd=S010004
20 H. S. Lim. (2004). An automated Classification System of Standard Industry and Occupation Codes by Using Information Retrieval Techniques. The Journal of Korean Association of Computer Education 7(4), 51-60.