Browse > Article
http://dx.doi.org/10.15207/JKCS.2021.12.1.057

Development of Korean dataset for joint intent classification and slot filling  

Han, Seunggyu (Department of Computer Science and Engineering, Korea University)
Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)
Publication Information
Journal of the Korea Convergence Society / v.12, no.1, 2021 , pp. 57-63 More about this Journal
Abstract
Spoken language understanding, which aims to understand utterance as naturally as human would, are mostly focused on English language. In this paper, we construct a Korean language dataset for spoken language understanding, which is based on a conversational corpus between reservation system and its user. The domain of conversation is limited to restaurant reservation. There are 7 types of slot tags and 5 types of intent tags in 6857 sentences. When a model proposed in English-based research is trained with our dataset, intent classification accuracy decreased a little, while slot filling F1 score decreased significantly.
Keywords
Natural Language Processing; Spoken Language Understanding; Intent Classification; Slot Filling; Dataset; BERT;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Vaswani, et al. (2017). Attention is all you need. In Advances in neural information processing systems, 5998-6008.
2 Q. Chen, Z. Zhuo, & W. Wang. (2019). Bert for joint intent classification and slot filling. arXiv preprint, arXiv:1902.10909.
3 SKT-Brain. (2019). KoBERT, GitHub[Online], https://github.com/SKTBrain/KoBERT
4 J. Oh, S. Jo, Y. Lim, & Y.S. Choi. (2018). Improving Utterance Intent Classification via Hierarchical Attention-based Recurrent Neural Network. The Korean Institute of Information Scientists and Engineers, 575-577.
5 K. Park, S. Na, J. Shin, & Y. Kim. (2019). BERT for Korean Natural Language Processing: Named Entity Tagging, Sentiment Analysis, Dependency Parsing and Semantic Role Labeling. The Korean Institute of Information Scientists and Engineers, 584-586.
6 A. So, K. Park, & H. Lim. (2018). A study on building korean dialogue corpus for restaurant reservation and recommendation. Annual Conference on Human and Language Technology, 630-632.
7 S. Yu, N. Kulkarni, H. Lee, & J. Kim. (2017). Syllable-level neural language model for agglutinative language. arXiv preprint, arXiv:1708.05515.
8 Y. Kim. (2014). Convolutional neural networks for sentence classification. arXiv preprint, arXiv:1408.5882.
9 Z. Zhao & Y. Wu. (2016). Attention-based convolutional neural networks for sentence classification. INTERSPEECH, 705-709.
10 S. Hochreiter & J. Schmidhuber, (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.   DOI
11 B. Liu & I. Lane. (2016). Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint, arXiv:1609.01454.
12 K. Yao, B. Peng, Y. Zhang, D. Yu, G. Zweig, & Y. Shi. (2014). Spoken language understanding using long short-term memory neural networks. 2014 IEEE Spoken Language Technology Workshop (SLT), 189-194.
13 Y. B. Kim, S. Lee, & K. Stratos. (2017). Onenet: Joint domain, intent, slot prediction for spoken language understanding. IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), 547-553.
14 Z. Huang, W. Xu, and K. Yu. (2015). Bidirectional lstm-crf models for sequence tagging. arXiv preprint, arXiv:1508.01991.
15 J. Devlin, M. W. Chang, K. Lee, & K. Toutanova. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, arXiv:1810.04805.