과제정보
This work is funded by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT), Grant/Award Number: 2019-0-00004.
참고문헌
- W. J. Ha and H. Choi, Systematic review for AI-based language learning tools, arXiv Preprint (2021), DOI 10.48550/arXiv.2111.04455.
- Y. Gong, Z. Chen, I. H. Chu, P. Chang, and J. Glass, Transformer-based multi-aspect multi-granularity non-native English speaker pronunciation assessment, (IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Singapore), 2022, pp. 7262-7266.
- O.-W. Kwon, K.-Y. Lee, Y.-H. Roh, H.-X. Huang, S.-K. Choi, Y.- K. Kim, H. B. Jeon, Y. R. Oh, Y.-K. Lee, B. O. Kang, E. Chung, J. G. Park, and Y. Lee, GenieTutor: A computer-assisted second language learning system based on spoken language understanding, In Natural language dialog systems and intelligent assistants, Springer, Cham, Switzerland, 2015, pp. 257-262.
- Y. K. Lee and J. G. Park, Multimodal unsupervised speech translation for recognizing and evaluating second language speech, Appl. Sci. 11 (2021), 2642.
- S. Bibauw, T. Francois, and P. Desmet, Discussing with a computer to practice a foreign language: research synthesis and conceptual framework of dialogue-based CALL, Comput. Assist. Lang. Learn. 32 (2021), 827-877. https://doi.org/10.1080/09588221.2018.1535508
- L. Chen, K. Zechner, S. Y. Yoon, K. Evanini, X. Wang, A. Loukina, J. Tao, L. Davis, C. M. Lee, M. Ma, and R. Mundkowsky, Automated scoring of nonnative speech using the SpeechRaterSM v. 5.0 engine, ETS Res. Rep. Ser. (2018), 1-31.
- A. Kholis, Elsa speak app: automatic speech recognition (ASR) for supplementing English pronunciation skills, Engl. Lang. Teach. 9 (2021), 1-14.
- M. F. Sholekhah and R. Fakhrurriana, The use of ELSA speak as a mobile-assisted language learning (MALL) towards EFL students' pronunciation, J. Educ. Lang. Innov. Appl. Linguist. 2 (2023), 93-100.
- Y. R. Oh, K. Y. Park, H. B. Jeon, and J. G. Park, Automatic proficiency assessment of Korean speech read aloud by non-natives using bidirectional LSTM based speech recognition, ETRI J. 42 (2020), 761-772. https://doi.org/10.4218/etrij.2019-0400
- Y. Hayashi, Y. Kondo, and Y. Ishii, Automated speech scoring of dialogue response by Japanese learners of English as a foreign language, Innov. Lang. Learn. Teach. (2023), 1-15.
- A. Graves, N. Jaitly, and A. R. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, (IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic), 2013, pp. 273-278.
- A. Vaswami, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017), 5998-6008.
- S. Karita, N. E. Y. Soplin, S. Watanabe, M. Delcroix, A. Ogawa, and T. Nakatani, Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration, (Annual Conference of the International Speech Communication Association, Interspeech, Graz, Austria), 2019, pp. 1408-1412.
- H. Miao, G. Cheng, C. Gao, P. Zhang, and Y. Yan, Transformer-based online CTC/attention end-to-end speech recognition architecture (Proc. IEEE International Conf. Acoustics, Speech and Signal Processing, ICASSP, Barcelona, Spain), 2020, pp. 6084-6088.
- X. Chang, W. Zhang, Y. Qian, J. Le Roux, and S. Watanabe, End-to-end multi-speaker speech recognition with transformer, (IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Barcelona, Spain), 2020, pp. 6134-6138.
- T. Hori, N. Moritz, C. Hori, and J. Le oux, Transformer-based long context end-to-end speech recognition, (Annual Conference of the International Speech Communication Association, Interspeech, Shanghai, China), 2020, pp. 5011-5015.
- S. Watanabe, T. Hori, S. Karita, and others, ESPnet: End-to-end speech processing toolkit, (Annual Conference of the International Speech Communication Association, Interspeech, Hyderabad, India), 2018, pp. 2207-2211.
- Y. R. Oh, K. Y. Park, and J. G. Park, Fast offline transformer-based end-to-end automatic speech recognition for real-world applications, ETRI J. 44 (2022), 476-490. https://doi.org/10.4218/etrij.2021-0106
- J. U. Bang, J. G. Maeng, J. Park, S. Yun, and S. H. Kim, English-Korean speech translation corpus (EnKoST-C): construction procedure and evaluation results, ETRI J. 45 (2023), 18-27.
- T. Ochiai, S. Watanabe, T. Hori, and J. R. Hershey, Multichannel end-to-end speech recognition, (34th International Conference on Machine Learning, Sydney, Australia), 2017, pp. 2632-2641.
- T. Hori, R. Astudillo, T. Hayashi, Y. Zhang, S. Watanabe, and J. Le Roux, Cycle-consistency training for end-to-end speech recognition, (IEEE International Conference Acoustics, Speech and Signal Processing, ICASSP, Brighton, UK), 2019, pp. 6271-6275.
- L. Lamel, J.-L. Gauvain, and G. Adda, Lightly supervised and unsupervised acoustic model training, Comput. Speech Lang. 16 (2002), 115-129. https://doi.org/10.1006/csla.2001.0186
- J. Ma and R. Schwartz, Unsupervised versus supervised training of acoustic models, (Ninth Annual Conf. International Speech Communication Association, Brisbane, Australia), 2008, pp. 2374-2377.
- B. Li, T. N. Sainath, R. Pang, and Z. Wu, Semi-supervised training for end-to-end models via weak distillation, (IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Brighton, UK), 2019, pp. 2837-2841.
- B. O. Kang, H. B. Jeon, and J. G. Park, Speech recognition for task domains with sparse matched training data, Appl. Sci. 10 (2020), 6155.
- Y. Chen, W. Wang, and C. Wang, Semi-supervised ASR by end-to-end self-training, arXiv Preprint, (2020), DOI 10.48550/arXiv. 2001.09128.
- A. H. Liu, W. N. Hsu, M. Auli, and A. Baevski, Towards end-to-end unsupervised speech recognition, (2022 IEEE Spoken Language Technology Workshop, SLT, Doha, Qatar), 2022, pp. 221-228.
- H. Chung, H. B. Jeon, and J. G. Park, Semi-supervised training for sequence-to-sequence speech recognition using reinforcement learning, (2020 International Joint Conference on Neural Networks, IJCNN, Glasgow, UK), 2020, pp. 1-6.
- Y. Zhang, J. Qin, D. S. Park, W. Han, C. C. Chiu, R. Pang, Q. V. Le, and Y. Wu, Pushing the limits of semi-supervised learning for automatic speech recognition, arXiv Preprint, (2020), DOI 10.48550/arXiv.2010.10504.
- C. Wang, J. Pino, and J. Gu, Improving cross-lingual transfer learning for end-to-end speech recognition with speech translation, arXiv Preprint, (2020), DOI 10.48550/arXiv.2006.05474.
- B. O. Kang, H. B. Jeon, and J. G. Park, A study on transfer learning method for speech recognition in domains with sparse speech data, (Winter Annual Conference of KICS, Kangwon, Republic of Korea), 2021.
- D. Yu, K. Yao, H. Su, G. Li, and F. Seide, KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition, (IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada), 2013, pp. 7893-7897.
- D. Povey, A. Ghoshal, G. Boulianne, and others, The Kaldi speech recognition toolkit, (IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU), 2011.
- A. Stolcke, SRILM-an extensible language modeling toolkit, (Proc. International Conf. Spoken Language Process, Denver, CO, USA), 2002, pp. 901-904.
- H. B. Jeon and S. Y. Lee, Language model adaptation based on topic probability of latent dirichlet allocation, ETRI J. 38 (2016), 487-493. https://doi.org/10.4218/etrij.16.0115.0499
- A. Kannan, Y. Wu, P. Nguyen, T. N. Sainath, Z. Chen, and R. Prabhavalkar, An analysis of incorporating an external language model into a sequence-to-sequence model, (IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Calgary, Canada), 2018.
- A. Gulati, J. Qin, C. C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, Conformer: convolution-augmented transformer for speech recognition, arXiv Preprint, (2020), DOI 10.48550/arXiv.2005.08100.
- Y. Peng, S. Dalmia, I. Lane, and S. Watanabe, Branchformer: parallel MLP-attention architectures to capture local and global context for speech recognition and understanding, (International Conference on Machine Learning, Baltimore, MD, USA), 2022. pp. 17627-17643.
- K. Kim, F. Wu, Y. Peng, J. Pan, P. Sridhar, K. J. Han, and S. Watanabe, E-branchformer: branchformer with enhanced merging for speech recognition, (2022 IEEE Spoken Language Technology Workshop, SLT, Doha, Qatar), 2023. pp. 84-91.
- A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, wav2vec 2.0: a framework for self-supervised learning of speech representations, Adv. Neural Inf. Proc. Syst. 33 (2020), 12449-12460.
- W. N. Hsu, B. Bolte, Y. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed, Hubert: self-supervised speech representation learning by masked prediction of hidden units, IEEE/ACM Trans. Audio Speech Lang. Process. 29 (2021), 3451-3460. https://doi.org/10.1109/TASLP.2021.3122291