Acknowledgement
이 논문은 2021년도정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. 2021R1A2C1014044).
References
- J. H. L. Hansen and T. Hasan, "Speaker recognition by machines and humans: A tutorial review," IEEE Signal Processing Magazine, 32, 74-99 (2015). https://doi.org/10.1109/MSP.2015.2462851
- N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans on. Audio, Speech, and Lang. Process. 19, 788-798 (2011). https://doi.org/10.1109/TASL.2010.2064307
- S. Ioffe, "Probabilistic linear discriminant analysis," Proc. ECCV. 531-542 (2006).
- A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, and M. Mason, "I-vector based speaker recognition on short utterances," Proc. Interspeech, 2341-2344 (2011).
- A. Hajavi and A. Etemad, "A deep neural network for short-segment speaker recognition," Proc. Interspeech, 2878-2882 (2019).
- Y. Jung, S. M. Kye, Y. Choi, M. Jung, and H. Kim, "Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances," Proc. Interspeech, 1501-1505 (2020).
- Y. Jung, Y. Choi, H. Lim, and H. Kim, "A unified deep learning framework for short-duration speaker verification in adverse environments," IEEE Access, 8, 175448-175466 (2020). https://doi.org/10.1109/ACCESS.2020.3025941
- V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," Proc. Interspeech, 3214-3218 (2015.)
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," Proc. ICLR. 1-14 (2015).
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE CVPR. 770-778 (2016).
- A. Nagrani, J. S. Chung, and A. Zisserman, "VoxCeleb: A largescale speaker identification dataset," Proc. Interspeech, 2616-2620 (2017).
- W. Cai, J. Chen, and M. Li, "Exploring the encoding layer and loss function in end-to-end speaker and language recognition system," Proc. Odyssey, 74-81 (2018).
- Y. Jung, Y. Kim, H. Lim, Y. Choi, and H. Kim, "Spatial pyramid encoding with convex length normalization for text-independent speaker verification," Proc. Interspeech, 4030-4034 (2019).
- E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez, "Deep neural networks for small footprint text-dependent speaker verification," Proc. IEEE ICASSP. 4052-4056 (2014).
- Z. Huang, S. Wang, and K. Yu, "Angular softmax for short-duration text-independent speaker verification," Proc. Interspeech, 3623-3627 (2018).
- Y. Liu, L. He, and J. Liu, "Large margin softmax loss for speaker verification," Proc. Interspeech, 2873-2877 (2019).
- Y. Kim, W. Park, M-C. Roh, and J. Shin, "Groupface: learning latent groups and constructing group-based representations for face recognition," Proc. IEEE CVPR. 5621-5630 (2020).
- K. Okabe, T. Koshinaka, and K. Shinoda, "Attentive statistics pooling for deep speaker embedding," Proc. Interspeech, 2252-2256 (2018).