Acknowledgement
This work was supported by the National Natural Science Foundation of China (No.61901227) and the Natural Science Foundation for Colleges and Universities in Jiangsu Province, China (No.19KJB510049).
References
- F. Eyben et al., The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing, IEEE Trans. Affect. Comput. 7 (2015), 190-202. https://doi.org/10.1109/TAFFC.2015.2457417
- A. Origlia, V. Galata, and B. Ludusan, Automatic classification of emotions via global and local prosodic features on a multilingual emotional database, in Proc. Int. Conf. Speech Prosody (Chicago, IL, USA), May 2010, pp. 1-4.
- W. H. Li and L. Jiang, Analysis of common feature recognition performance of Chinese speech emotion, Intell. Comput. Appl. 7 (2017), 56-58.
- W. H. Cao, J. P. Xu, and Z. T. Liu, Speaker-independent speech emotion recognition based on random forest feature selection algorithm, in Proc. Chin. Control. Conf. (CCC), (Dalian, China), July 2017, pp. 10995-10998.
- M. Lugger and B. Yang, The relevance of voice quality features in speaker independent emotion recognition, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (Honolulu, HI, USA), Apr. 2007, doi: 10.1109/ICASSP.2007.367152
- P. Shi, Speech emotion recognition based on deep belief network, in Proc. Int. Conf. Netw., Sens. Control (Zhuhai, China), Mar. 2018, pp. 1-5.
- K. H. Lee, H. K. Choi, and B. T. Jang, A Study on speech emotion recognition using a deep neural network (in Proc. Int. Conf. Inf. Commun. Technol. Converg. (Jeju, Rep. of Korea), Oct. 2019, pp. 1162-1165.
- Z. Yao et al., Speech emotion recognition using fusion of three multi-task learning-based classifiers: HSF-DNN, MS-CNN and LLD-RNN, Speech Commun. 120 (2020), 11-19. https://doi.org/10.1016/j.specom.2020.03.005
- G. Liu, W. He, and B. Jin, Feature fusion of speech emotion recognition based on deep learning, in Proc. Int. Conf. Netw. Infrastruct. Digit. Content Guiyang China, Aug. 2018, pp. 193-197.
- L. Chao et al., Improving generation performance of speech emotion recognition by denoising autoencoders, in Proc. Int. Symp. Chin. Spoken Lang. Process. (Singapore), Sept. 2014, pp. 341-344.
- L. Li et al., Deep factorization for speech signal, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (Calgary, Canada), Apr. 2018, pp. 5094-5098.
- Q. Mao et al., Learning salient features for speech emotion recognition using convolutional neural networks, IEEE Trans. Multimed. 16 (2014), 2203-2213. https://doi.org/10.1109/TMM.2014.2360798
- K. Han, D. Yu, and I. Tashev, Speech emotion recognition using deep neural network and extreme learning machine, in Proc. Annu. Conf. Int. Speech Commun. Assoc. Sept. 2014, pp. 223-227.
- P. Guo, X. Wang, and Y. Han, The enhanced genetic algorithms for the optimization design, in Proc. Int. Conf. Biomed. Eng. Inf. (Yantai, China), Oct. 2010, pp. 2990-2994.
- J. Wang, Z. Han, and S. Lun, Speech emotion recognition system based on genetic algorithm and neural network, in Proc. Int. Conf. Image Anal. Signal Process. (Wuhan, China), Oct. 2011, pp. 578-582.
- Y. Wang and H. Huo, Speech Recognition based on genetic algorithm optimized support vector machine, in Proc. Int. Conf. Syst. Informatics (Shanghai, China), Nov. 2019, pp. 439-444.
- L. Qin, Q. Li, and X. Guan, Pitch extraction for musical signals with modified AMDF, in Proc. Int. Conf. Multimed. Technol. (Hangzhou, China), July 2011, pp. 3599-3602.
- M. Jalil, F. A. Butt, and A. Malik, Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals, in Proc. Int. Conf. Technol. Adv. Electr., Electron. Comput. Eng. (Konya, Turkey), May 2013, pp. 208-212.
- F. Richardson, D. Reynolds, and N. Dehak, Deep neural network approaches to speaker and language recognition, IEEE Signal Process. Lett. 22 (2015), 1671-1675. https://doi.org/10.1109/LSP.2015.2420092
- Y. Tian et al., Investigation of bottleneck features and multilingual deep neural networks for speaker verification, in Proc. Annu. Conf. Int. Speech Commun. Assoc. (Dresden, Germany), Sept. 2015, pp. 1151-1155.
- X. Zhou, J. Guo, and R. Bie, Deep learning based affective model for speech emotion recognition, in Proc. Int. IEEE Conf. Ubiquitous Intell. Comput. Adv. Trusted Comput. Scalable Comput. Commun. Cloud Big Data Comput. & Internet People Smart World Congr. (Toulouse, France), July 2016, pp. 841-846.
- P. Matejka et al., Neural network bottleneck features for language identification, in Proc. Odyssey 2014: Speak. Lang. Recognit. Workshop, (Joensuu, Finland), June 2014, pp. 299-304.
- M. McLaren, L. Ferrer, and A. Lawson, Exploring the role of phonetic bottleneck features for speaker and language recognition, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (Shanghai, China), Mar. 2016, pp. 5575-5579.
- Y. Lei et al., Application of convolutional neural networks to language identification in noisy conditions, in Proc. Speak. Lang. Recognit. Workshop (Joensuu, Finland), June 2014, pp. 287-292.
- H. S. Das and P. Roy, Bottleneck feature-based hybrid deep autoencoder approach for Indian language identification, Arab. J. Sci. Eng. 45 (2020), 3425-3436. https://doi.org/10.1007/s13369-020-04430-9
- A. Fischer and C. Igel, Bounding the bias of contrastive divergence learning, Neural Comput. 23 (2011), 664-673. https://doi.org/10.1162/NECO_a_00085
- L. Chen et al., Speech emotion recognition: Features and classification models, Digit. Signal Process. 22 (2012), 1154-1160. https://doi.org/10.1016/j.dsp.2012.05.007
- L. Sun, S. Fu, and F. Wang, Decision tree SVM model with Fisher feature selection for speech emotion recognition, EURASIP J. Aud. Speech Music Process. 2019 (2019), 1-14. https://doi.org/10.1186/s13636-018-0144-6
- A. D. Dileep and C. C. Sekhar, GMM-based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines, IEEE Trans. Neural Netw. Learn. Syst. 25 (2014), 1421-1432. https://doi.org/10.1109/tnnls.2013.2293512
- S. Gupta and A. Mehra, Speech emotion recognition using SVM with thresholding fusion, in Proc. Int. Conf. Signal Process. Integr. Netw. (Noida, India), Feb. 2015, pp. 570-574.
- P. Shen, Z. Changjun, and X. Chen, Automatic speech emotion recognition using support vector machine in Proc. Int. Conf. Electron. Mech. Eng. Inf. Technol. (Harbin, China), Aug. 2011, pp. 621-625.
- C. Torres-Valencia, M. Alvarez-L opez, and A. Orozco-Gutierrez, SVM-based feature selection methods for emotion recognition from multimodal data, J. Multimodal User Interfaces 11 (2017), 9-23. https://doi.org/10.1007/s12193-016-0222-y
- L. M. Saini, S. K. Aggarwal, and A. Kumar, Parameter optimisation using genetic algorithm for support vector machine-based price-forecasting model in National electricity market, IET Gener. Transm. Distrib. 4 (2010), 36-49. https://doi.org/10.1049/iet-gtd.2008.0584
- L. Chen et al., Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction, Inf. Sci. 509 (2020), 150-163. https://doi.org/10.1016/j.ins.2019.09.005