Development of Korean dataset for joint intent classification and slot filling

Han, Seunggyu;Lim, Heuiseok;

doi:10.15207/JKCS.2021.12.1.057

Journal of the Korea Convergence Society (한국융합학회논문지)

Volume 12 Issue 1
/
Pages.57-63
/
2021
/
2233-4890(pISSN)
/
2713-6353(eISSN)

Korea Convergence Society (한국융합학회)

DOI QR Code

Development of Korean dataset for joint intent classification and slot filling

발화 의도 예측 및 슬롯 채우기 복합 처리를 위한 한국어 데이터셋 개발

Han, Seunggyu (Department of Computer Science and Engineering, Korea University) ;
Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)

한승규 (고려대학교 컴퓨터학과) ;
임희석 (고려대학교 컴퓨터학과)

Received : 2020.11.26
Accepted : 2021.01.20
Published : 2021.01.28

https://doi.org/10.15207/JKCS.2021.12.1.057 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Spoken language understanding, which aims to understand utterance as naturally as human would, are mostly focused on English language. In this paper, we construct a Korean language dataset for spoken language understanding, which is based on a conversational corpus between reservation system and its user. The domain of conversation is limited to restaurant reservation. There are 7 types of slot tags and 5 types of intent tags in 6857 sentences. When a model proposed in English-based research is trained with our dataset, intent classification accuracy decreased a little, while slot filling F1 score decreased significantly.

사람의 발화 내용을 이해하도록 하는 언어 인식 시스템은 주로 영어로 연구되어 왔다. 본 논문에서는 시스템과 사용자의 대화 내용을 수집한 말뭉치를 바탕으로 언어 인식 시스템을 훈련시키고 평가할 때 사용할 수 있는 한국어 데이터셋을 개발하고, 관련 통계를 제시한다. 본 데이터셋은 식당 예약이라는 고정된 주제 안에서 사용자의 발화 의도와 슬롯 채우기를 해야 하는 데이터셋이다. 본 데이터셋은 6857개의 한국어 문장으로 이루어져 있으며, 표기된 단어 슬롯의 종류는 총 7개이다. 본 데이터셋에서 표기된 발화의 종류는 총 5개이며, 문장의 발화 내용에 따라 최대 2개까지 동시에 기입되어 있다. 영어권에서 연구된 모델을 본 데이터셋에 적용시켜 본 결과, 발화 의도 추측 정확도는 조금 하락하였고, 슬롯 채우기 F1 점수는 크게 차이나는 모습을 보였다.

Keywords

References

S. Yu, N. Kulkarni, H. Lee, & J. Kim. (2017). Syllable-level neural language model for agglutinative language. arXiv preprint, arXiv:1708.05515.
Y. Kim. (2014). Convolutional neural networks for sentence classification. arXiv preprint, arXiv:1408.5882.
Z. Zhao & Y. Wu. (2016). Attention-based convolutional neural networks for sentence classification. INTERSPEECH, 705-709.
S. Hochreiter & J. Schmidhuber, (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
K. Yao, B. Peng, Y. Zhang, D. Yu, G. Zweig, & Y. Shi. (2014). Spoken language understanding using long short-term memory neural networks. 2014 IEEE Spoken Language Technology Workshop (SLT), 189-194.
Y. B. Kim, S. Lee, & K. Stratos. (2017). Onenet: Joint domain, intent, slot prediction for spoken language understanding. IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), 547-553.
Z. Huang, W. Xu, and K. Yu. (2015). Bidirectional lstm-crf models for sequence tagging. arXiv preprint, arXiv:1508.01991.
B. Liu & I. Lane. (2016). Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint, arXiv:1609.01454.
J. Devlin, M. W. Chang, K. Lee, & K. Toutanova. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, arXiv:1810.04805.
A. Vaswani, et al. (2017). Attention is all you need. In Advances in neural information processing systems, 5998-6008.
Q. Chen, Z. Zhuo, & W. Wang. (2019). Bert for joint intent classification and slot filling. arXiv preprint, arXiv:1902.10909.
SKT-Brain. (2019). KoBERT, GitHub[Online], https://github.com/SKTBrain/KoBERT
J. Oh, S. Jo, Y. Lim, & Y.S. Choi. (2018). Improving Utterance Intent Classification via Hierarchical Attention-based Recurrent Neural Network. The Korean Institute of Information Scientists and Engineers, 575-577.
K. Park, S. Na, J. Shin, & Y. Kim. (2019). BERT for Korean Natural Language Processing: Named Entity Tagging, Sentiment Analysis, Dependency Parsing and Semantic Role Labeling. The Korean Institute of Information Scientists and Engineers, 584-586.
A. So, K. Park, & H. Lim. (2018). A study on building korean dialogue corpus for restaurant reservation and recommendation. Annual Conference on Human and Language Technology, 630-632.

Journal of the Korea Convergence Society (한국융합학회논문지)

Development of Korean dataset for joint intent classification and slot filling

발화 의도 예측 및 슬롯 채우기 복합 처리를 위한 한국어 데이터셋 개발

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)