과제정보
이 논문은 2019년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의지원을 받아 수행된 연구임(2019-0-00004, 준지도학습형 언어지능 원천기술 및 이에 기반한 외국인 지원용 한국어 튜터링 서비스 개발).
참고문헌
- F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks," Proc. INTERSPEECH, 437-440 (2011).
- W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," Proc. ICASSP. 4960-4964 (2016).
- A. Vaswami, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NIPS. 5998-6008 (2017).
- T. Hori, R. Astudillo, T. Hayashi, Y. Zhang, S. Watanabe, and J. L. Roux, "Cycle-consistency training for end-to-end speech recognition," Proc. ICASSP. 6271-6275 (2019).
- M.-K. Baskar, S. Watanabe, R. Astudillo, T. Hori, L. Burget, and J. Cernocky, "Semi-supervised sequence-to-sequence ASR using unpaired speech and text," Proc. ICASSP. 3790-3794 (2019).
- Q. Xie, Z. Dai, E. Hovy, M. T. Luong, and Q. V. Le, "Unsupervised data augmentation for consistency training," arXiv:1904.12848 (2019).
- J. Li, M. L. Seltzer, X. Wang, R. Zhao, and Y. Gong, "Large-scale domain adaptation via teacher-student learning," Proc. INTERSPEECH, 2386-2390 (2017).
- Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self-training with noisy student improves ImageNet classification," Proc. CVPR. 10687-10698 (2020).
- N. Jaitly and G. E. Hinton, "Vocal tract length perturbation (VTLP) improves speech recognition," Proc. ICML. 625-660 (2013).
- D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, "SpecAugment: A simple data augmentation method for automatic speech recognition," Proc. INTERSPEECH, 2613-2617 (2019).
- X. Song, Z. Wu, Y. Huang, D. Su, and H. Meng, "SpecSwap: A simple data augmentation method for end-to-end speech recognition," Proc. INTERSPEECH, 581-585 (2020).
- D. P. Kingma and M. Welling, "Auto-encoding variational bayes," Proc. ICLR. 1-14 (2014).
- D. B. Paul and J. M. Baker, "The design for the Wall Street Journal-based CSR corpus," Proc. ACL. 357-362 (1992).
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "LibriSpeech: An ASR corpus based on public domain audio books," Proc. ICASSP. 5206-5210 (2015).
- S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, "ESPnet: End-to-end speech processing toolkit," Proc. INTERSPEECH, 2207-2211 (2018).
- L. V. D. Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008).