[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9708/jksci.2019.24.12.017

An Automatic Data Construction Approach for Korean Speech Command Recognition

Lim, Yeonsoo (Dept. of Computer Engineering, Kumoh National Institute of Technology)
Seo, Deokjin (Dept. of Computer Engineering, Kumoh National Institute of Technology)
Park, Jeong-sik (Dept. of English Linguistics & Language Technology, Hankuk University of Foreign Studies)
Jung, Yuchul (Dept. of Computer Engineering, Kumoh National Institute of Technology)

Publication Information

Journal of the Korea Society of Computer and Information / v.24, no.12, 2019 , pp. 17-24 More about this Journal

Abstract

The biggest problem in the AI field, which has become a hot topic in recent years, is how to deal with the lack of training data. Since manual data construction takes a lot of time and efforts, it is non-trivial for an individual to easily build the necessary data. On the other hand, automatic data construction needs to handle data quality issue. In this paper, we introduce a method to automatically extract the data required to develop Korean speech command recognizer from the web and to automatically select the data that can be used for training data. In particular, we propose a modified ResNet model that shows modest performance for the automatically constructed Korean speech command data. We conducted an experiment to show the applicability of the command set of the health and daily life domain. In a series of experiments using only automatically constructed data, the accuracy of the health domain was 89.5% in ResNet15 and 82% in ResNet8 in the daily lives domain, respectively.

Keywords

Korean Speech Command; Speech Recognition; Automatic Data Construction; ResNet; CNN;

Citations & Related Records

Reference

1	E. Lakomkin, S. Magg, C. Weber, and S. Wermter, "KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos," arXiv:1903.00216 , 2019.
2	Zeroth project, Available at https://github.com/goodatlas/zeroth
3	KSS data set, Available at https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset
4	J. Kaewprateep and S. Prom-On, "Evaluation of small-scale deep learning architectures in Thai speech recognition," 1st Int. ECTI North. Sect. Conf. Electr. Electron. Comput. Telecommun. Eng. ECTI-NCON 2018, pp. 60-64, 2018.
5	Y. Choi and B. Lee, "Pansori: ASR Corpus Generation from Open Online Video Contents," IEEE Seoul Sect. Student Pap. Contest, pp. 117-121, 2018.
6	D. Oneata and H. Cucu, "Kite: Automatic Speech Recognition for Unmanned Aerial Vehicles," Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2019-September, pp. 2998-3002, 2019.
7	T. Rajapakshe, R. Rana, S. Latif, S. Khalifa, and B. W. Schuller, "Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition," arXiv:1910.11256, 2019.
8	R. Tang and J. Lin, "Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting," arXiv:1710.06554, 2017.
9	H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, "MixUp: Beyond empirical risk minimization," 6th Int. Conf. Learn. Represent. ICLR 2018 - Conf. Track Proc., pp. 1-13, 2018.
10	T. N. Sainath and C. Parada, "Convolutional Neural Networks for Small-footprint Keyword Spotting," Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, pp. 1478-1482, 2015.
11	G. Chen, C. Parada, and G. Heigold, "Small-footprint keyword spotting using deep neural networks," ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., pp. 4087-4091, 2014.
12	P. Warden, "Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition," arXiv:1804.03209, 2018.
13	R. Tang and J. Lin, "Deep residual learning for small-footprint keyword spotting," ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, pp. 5484-5488, 2018.
14	K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-December, pp. 770-778, 2016.
15	K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9908 LNCS, pp. 630-645, 2016.
16	S. Choi et al., "Temporal convolution for real-time keyword spotting on mobile devices," Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2019-September, pp. 3372-3376, 2019.
17	J. Vadillo and R. Santana, "Universal adversarial examples in speech command classification," arXiv:1911.10182, 2019.