Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation |
Kwon, Hye-Jeong
(Department of Computer Science, Kyonggi University)
Kim, Min-Jeong (Department of Computer Science, Kyonggi University) Baek, Ji-Won (Department of Computer Science, Kyonggi University) Chung, Kyungyong (Division of AI Computer Science and Engineering, Kyonggi University) |
1 | W. Al-Dulaimi, T. K. Moon, J. H. Gunther, "Voice transformation using two-level dynamic warping and neural networks," Signals, vol. 2, no. 3, pp. 456-474, 2021. DOI |
2 | P. Narvaez, W. S. Percybrooks, "Synthesis of normal heart sounds using generative adversarial networks and empirical wavelet transform," Appl. Sci., vol. 10, no. 19, pp. 7003-7018, 2020. DOI |
3 | K. Chung, S. Y. Oh, "Voice activity detection using an improved unvoiced feature normalization process in noisy environments," Wirel. Pers. Commun., vol. 89, no. 3, pp. 747-759, 2016. DOI |
4 | J. H. He, S. J. Kou, C. H. He, Z. W. Zhang, K. A. Gepreel, "Fractal oscillation and its frequency-amplitude property," Fractals, vol. 29, no. 4, pp. 2150105-991, Jan. 2021. DOI |
5 | M. Tan, X. Xu, A. Boes, B. Corcoran, J. Wu, T. G. Nguyen, S. T. Chu, B. E. Little, R. Morandotti, A. Mitchell, D. J. Moss, "Photonic RF arbitrary waveform generator based on a soliton crystal micro-comb source," J. Light. Technol., vol. 38, no. 22, pp. 6221-6226, Jul. 2020. DOI |
6 | S. Qamar, H. Mujtaba, H. Majeed, M. O. Beg, "Relationship identification between conversational agents using emotion analysis," Cognit Comput, vol. 13, no. 3, pp. 673-687, Jan. 2021. DOI |
7 | K. Zhou, B. Sisman, H. Li, "Transforming spectrum and prosody for emotional voice conversion with non-parallel training data," arXiv, 2020. |
8 | M. S. Al-Radhi, T. G. Csapo, C. Zainko, G. Nemeth, "Continuous wavelet vocoder-based decomposition of parametric speech waveform synthesis," arXiv, Jun. 2021. |
9 | H. Ma, W. Huang, Y. Jing, S. Pignatti, G. Laneve, Y. Dong, H. Ye, L. Liu, A. Guo, J. Jiang, "Identification of Fusarium head blight in winter wheat ears using continuous wavelet analysis," Sensors, vol. 20, no. 1, pp. 20, Dec. 2020. |
10 | N. Hekmat, T. Vogel, Y. Wang, S. Mansourzadeh, F. Aslani, A. Omar, M. Hoffmann, F. Meyer, C. J. Saraceno, "Cryogenically cooled GaP for optical rectification at high excitation average powers," Opt. Mater. Express., vol. 10, no. 11, pp. 2768-2782, 2020. DOI |
11 | S. Kim and H. Choi, "Emotional voice conversion using generative adversarial networks," GAN., vol. 8, no. 3.169, pp. 5-784, 2017. |
12 | S. R. Livingstone, F. A. Russo, "The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north American English," PLoS ONE, vol. 13, no. 5, e0196391, 2018. DOI |
13 | L. Teng, Z. Fu, and Y. Yao, "Interactive translation in echocardiography training system with enhanced cycle-GAN," IEEE Access, vol. 8, pp. 106147-106156, 2020. DOI |
14 | J. C. Kim, K. Chung, "Prediction model of user physical activity using data characteristics-based long short-term memory recurrent neural networks," KSII Transactions on Internet and Information Systems, vol. 13, no. 4, pp. 2060-2077, Apr. 2019. DOI |
15 | H. Yoo, K. Chung, "Deep learning-based evolutionary recommendation model for heterogeneous big data integration," KSII Transactions on Internet and Information Systems, Vol. 14, No. 9, pp. 3730-3744, Sep. 2020. DOI |
16 | R. Aihara, R. Takashima, T. Takiguchi, and Y. Ariki, "GMM-based emotional voice conversion using spectrum and prosody features," J. signal process., vol. 2, no. 5, pp. 134-138, Oct. 2012. DOI |
17 | Z. Luo, J. Chen, T. Takiguchi, and Y. Ariki, "Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform," EURASIP Journal on Audio, Speech, and Music Processing, vol. 2017, pp. 1-13, 2017. DOI |
18 | H. Fei, D. Ji, Y. Zhang, and Y. Ren, "Topic-enhanced capsule network for multi-label emotion classification," IEEE/ACM Trans. Audio, Speech, Language Process., vol. 28, pp. 1839-1848, 2020. DOI |
19 | H. J. Kwon, D. H. Shin, K. Chung, "PGGAN-based anomaly classification on chest x-ray using weighted multi-scale similarity," IEEE Access, vol. 9, pp. 113315-113325, Aug. 2021. DOI |
20 | D. H. Shin, R. C. Park, K. Chung, "Decision boundary-based anomaly detection model using improved ANOGAN from ECG data," IEEE Access, vol. 8, pp. 108664-108674, Jun. 2020. DOI |
21 | R. Ramos-Aguilar, J. A. Olvera-Lopez, I. Olmos-Pineda, S. Sanchez-Urrieta, "Feature extraction from EEG spectrograms for epileptic seizure detection," Pattern Recognit. Lett., vol. 133, pp. 202-209. May. 2020. DOI |
22 | Z. Luo, T. Takiguchi, and Y. Ariki, "Emotional voice conversion using deep neural networks with MCC and F0 features," in Proc. of the 15th International Conference on Computer and Information Science (ICIS), pp. 1-5, Jun. 2016. |
23 | M. Pasini, "MelGAN-VC: voice conversion and audio style transfer on arbitrarily long samples using spectrograms," arXiv, Dec. 2019. |
24 | H. Ming, D. Y. Huang, L. Xie, J. Wu, M. Dong, and H. Li, "Deep bidirectional LSTM modeling of timbre and prosody for emotional voice conversion," in Proc. of the International Conference of the Speech Communication Association, pp. 2453-2457, Sep. 2016. |
25 | K. Zhou, B. Sisman, and H. Li, "Transforming spectrum and prosody for emotional voice conversion with non-parallel training data," arXiv, 2020. |
26 | C. C. Hsu, H. T. Hwang, Y. C. Wu, Y. Tsao, H. M. Wang, "Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks," arXiv, Jun. 2017. |
27 | R. Yamamoto, E. Song, and J. M. Kim, "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6199-6203, May. 2020. |
28 | J. Lee, Y. Jung, H. Kim, "Dual attention in time and frequency domain for voice activity detection," arXiv, Aug. 2020. |
29 | J. Zhu, T. Park, P. Isola, and A. Efros, "Unpaired image-to image translation using cycle-consistent adversarial networks," arXiv, 2017. |
30 | V. Popa, H. Silen, J. Nurminen, M. Gabbouj, "Local linear transformation for voice conversion," in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4517-4520, 2012. |
31 | H. Yoo, R. C. Park, K. Chung, "IoT-based health big-data process technologies: a survey," KSII Transactions on Internet and Information Systems, Vol. 15, No. 3, pp. 974-992, Mar. 2021. |
32 | S. Cho, S. Jeon, W. Choi, R. Managuli, C. Kim, "Nonlinear pth root spectral magnitude scaling beamforming for clinical photoacoustic and ultrasound imaging," Opt. Lett., vol. 45, no. 16, pp. 4575-4578, 2020. DOI |