Acknowledgement
이 논문은 2022년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임(No.2022-0-00621,대화 기반 설명가능성을 멀티모달로 제공하는 인공지능 기술 개발).
References
- A. J. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," Proc. IEEE ICASSP, 373-376 (1996).
- T. Yoshimura, K. Tokuda, T. Masuko, T, Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM based speech synthesis," Proc. Eurospeech, 2347-2350 (1999).
- J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, R. A. Saurous, Y. Agiomyrgiannakis, and Y. Wu, "Natural tts synthesis by conditioning wavenet on mel spectrogram predictions," Proc. IEEE ICASSP, 4779-4783 (2018).
- Y. Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T-Y. Liu. "Fastspeech2: Fast and high-quality end-to-end text to speech," arXiv:2006.04558 (2021).
- A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: a generative model for raw audio," arXiv:1609.03499 (2016).
- R. Yamamoto, E. Song, and J. Kim. "Parallel wave-GAN: A fast waveformgeneration model based on generative adversarial networks with multi-resolution spectrogram," Proc. IEEE ICASSP, 6199-6203 (2020).
- Y. Ren, Y. Ruan, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T. Liu. "Fastspeech:Fast, robust and controllable text to speech," Proc. NIPS, 3165-3174 (2019).
- A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Yu Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, "Conformer: Convolution-augmented transformer for speech recognition," Proc. Interspeech, 5036-5040 (2020).
- M. Koo, "A korean speech recognition based on conformer" (In Korean), J. Acoust. Soc. Kr. 40, 488-495 (2021)
- P. Guo, F. Boyer, X. Chang, T. Hayashi, Y. Higuchi, H. Inaguma, N. Kamo, C. Li, D. Garcia-Romero, J. Shi, J. Shi, S. Watanabe, K. Wei, W. Zhang, and Y. Zhang, "Recent developments on espnet toolkit boosted by conformer," Proc. IEEE ICASSP, 5874-5878 (2021)
- N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NeurIPS, 1-11 (2017)
- P. Ramachandran, B. Zoph, and Q. V. Le, "Swish: A self-gated activation function," arXiv:1710.05941v1 (2017).