과제정보
이 논문은 2017년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(No. NRF-2017R1E1A1A01078157).
참고문헌
- B. Ko, K. Lee, I.-C. Yoo, and D. Yook, "Korean voice conversion experiments using CC-GAN and VAW-GAN" (in Korean), Proc, Speech Communication and Signal Processing, 36, 39 (2019).
- B. Jang, H. Seo, I.-C. Yoo, and D. Yook, "CycleVAE based many-to-many voice conversion experiments using Korean speech corpus" (in Korean), J. Acoust. Soc. Suppl.2(s) 40, 79 (2021).
- I.-C. Yoo, K. Lee, S.-G. Leem, H. Oh, B. Ko, and D. Yook, "Speaker anonymization for personal information protection using voice conversion techniques," IEEE Access, 8, 198637-198645 (2020). https://doi.org/10.1109/access.2020.3035416
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," Proc. NIPS, 2672-2680 (2014).
- D. Kingma and M. Welling, "Auto-encoding variational Bayes," arXiv:1312.6114 (2013).
- J. Zhu, T. Park, P. Isola, and A. Efros, "Unpaired image-to image translation using cycle-consistent adversarial networks," Proc. IEEE Int. Conf. Computer Vision, 2242-2251 (2017).
- T. Kaneko and H. Kameoka, "CycleGAN-VC: Nonparallel voice conversion using cycle-consistent adversarial networks," Proc. EUSIPCO, 2114-2118 (2018).
- T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, "CycleGAN-VC2: Improved CycleGAN-based nonparallel voice conversion," Proc. IEEE ICASSP, 6820- 6824 (2019).
- T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, "CycleGAN-VC3: Examining and improving CycleGAN-VCs for Mel-spectrogram conversion," Proc. Interspeech, 2017-2021 (2020).
- D. Yook, I.-C. Yoo, and S. Yoo, "Voice conversion using conditional CycleGAN," Proc. Int. Conf. CSCI, 1460-1461 (2018).
- S. Lee, B. Ko, K. Lee, I.-C. Yoo, and D. Yook, "Many-to-many voice conversion using conditional cycle-consistent adversarial networks," Proc. IEEE ICASSP, 6279-6283 (2020).
- H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, "StarGAN-VC: Non-parallel many-to-many voice conversion using star generative adversarial networks," Proc. IEEE Workshop on SLT, 266-273 (2018).
- T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, "StarGAN-VC2: Rethinking conditional methods for StarGAN-based voice conversion," Proc. Interspeech, 679-683 (2019).
- C. Hsu, H. Hwang, Y. Wu, Y. Tsao, and H. Wang, "Voice conversion from non-parallel corpora using variational autoencoder," Proc. APSIPA, 1-6 (2016).
- A. Oord and O. Vinyals, "Neural discrete representation learning," Proc. NIPS, 6309-6318 (2017).
- C. Hsu, H. Hwang, Y. Wu, Y. Tsao, and H. Wang, "Voice conversion from unaligned corpora using variational autoencoding Wasserstein generative adversarial networks," Proc. Interspeech, 3364-3368 (2017).
- H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, "ACVAE-VC: Non-parallel voice conversion with auxiliary classifier variational autoencoder," IEEE/ ACM Trans. on Audio, Speech, and Lang. Process. 27, 1432-1443 (2019).
- P. Tobing, Y. Wu, T. Hayashi, K. Kobayashi, and T. Toda, "Non-parallel voice conversion with cyclic variational autoencoder," Proc. Interspeech, 674-678 (2019).
- D. Yook, S.-G. Leem, K. Lee, and I.-C. Yoo, "Manyto-Many voice conversion using cycle-consistent variational autoencoder with multiple decoders," Proc. Odyssey: The Speaker Language Recognition Workshop, 215-221 (2020).
- B. Ko, Many-to-many voice conversion using cycle-consistency for Korean speech (in Korean), (Master Thesis, Korea University, 2020).
- M. Morise, F. Yokomori, and K. Ozawa, "WORLD: A vocoder-based high-quality speech synthesis system for real-time applications," IEICE Trans. on Information and Systems, 99, 1877-1884 (2016).
- D. Kingma and J. Ba, "Adam: A method for stochastic optimization," Proc. ICLR, 1-13 (2015).
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. on Audio, Speech, and Lang. Process. 15, 2222-2235 (2007). https://doi.org/10.1109/TASL.2007.907344
- S. Takamichi, T. Toda, A. Black, G. Neubig, S. Sakti, and S. Nakamura, "Postfilters to modify the modulation spectrum for statistical parametric speech synthesis," IEEE/ACM Trans. on Audio, Speech, and Lang. Process. 24, 755-767 (2016). https://doi.org/10.1109/TASLP.2016.2522655