References
- H. Liao, G. Pundak, O. Siohan, M. Carroll, N. Coccaro, Q.-M. Jiang, T. N. Sainath, A. Senior, F. Beaufays, and M. Bacchiani, "Large vocabulary automatic speech recognition for children," Proc. Interspeech, 1611-1615 (2015).
- P. G. Shivakumar and P. Georgiou, "Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations," Computer speech and language, arXiv:1805.03322 (2020).
- L. Rumberg, H. Ehlert, U. Ludtke, and J. Ostermann, "Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning," Proc. Interspeech, 3850-3854 (2021).
- V. Kadyan, S. Shanawazuddin, and A. Singh, "Developing children's speech recognition system for low resource Punjabi language," Applied Acoustics, 178, 108002 (2021).
- R. Serizel and D. Giuliani, "Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition," Proc. IEEE SLT, 135-140 (2014).
- P. G. Shivakumar, A. Potamianos, S. Lee, and S. S. Narayanan, "Improving speech recognition for children using acoustic adaptation and pronunciation modeling," Proc. WOCCI, 15-19 (2014).
- S. S. Gray, D. Willett, J. Lu, J. Pinto, P. Maergner, and N. Bodenstab, "Child automatic speech recognition for US English: child interaction with living-roomelectronic-devices," Proc. WOCCI, 21-26 (2014).
- R. Duan and N. F. Chen, "Unsupervised feature adaptation using adversarial multi-task training for automatic evaluation of children's speech," Proc. Interspeech, 3037-3041 (2020).
- Y. Cui, M. Jia, T. Y. Lin, Y. Song, and S. Belongie, "Class-balanced loss based on effective number of samples," Proc. IEEE CVPR, 9268-9277 (2019).
- A. Sellami and H. Hwang, "A robust deep convolutional neural network with batch-weighted loss for heartbeat classification," Expert Systems with Applications, 122, 75-84 (2019). https://doi.org/10.1016/j.eswa.2018.12.037
- K. R. M. Fernando and C. P. Tsokos, "Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks," IEEE Trans. Neural Netw. Learn. Syst. 33, 2940-2951 (2021).
- S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, "Analysis of representations for domain adaptation," Proc. NIPS, 137-144 (2006).
- Y. Ganin and V. Lempitsky, "Unsupervised domain adaptation by backpropagation," Proc. ICML, 1180-1189 (2015).
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L.Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NIPS, 5998-6008 (2017).
- L. Dong, S. Xu, and B. Xu, "Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition," Proc. IEEE ICASSP, 5884-5888 (2018).
- H. Miao, G. Cheng, C. Gao, P. Zhang, and Y. Yan, "Transformer-based online CTC/attention end-to-end speech recognition architecture," Proc. IEEE ICASSP, 6084-6088 (2020).
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, "Domain-adversarial training of neural networks," J. Mach. Learn. Res. 17, 2096-2030 (2016).
- S. Kullback and R. A. Leibler. "On information and sufficiency," Ann. Math. Stat. 22, 79-86 (1951). https://doi.org/10.1214/aoms/1177729694
- M. Chen, S. Zhao, H. Liu, and D. Cai, "Adversariallearned loss for domain adaptation," Proc. AAAI, 3521-3528 (2020).
- PyTorch 1.12 documentation, "https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html," (Last viewed September 16, 2022).
- AI Hub Free Conversation (General Men and Women) Dataset, https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=109, (Last viewed July 27, 2022).
- AI Hub Free Conversation (Children, Infants) Dataset, https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=108, (Last viewed July 27, 2022).
- S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, "ESPnet: end-to-end speech processing toolkit," arXiv: 1804.00015 (2018).
- T. Kudo and J. Richardson, "SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing," Proc. EMNLP 66-71 (2018).
- A. Tripathi, A. Mohan, S. Anand, and M. Singh, "Adversarial learning of raw speech features for domain invariant speech recognition," Proc. IEEE ICASSP, 5959-5963 (2018).
- S. Sun, C. F. Yeh, M. Y. Hwang, M. Ostendorf, and L. Xie, "Domain adversarial training for accented speech recognition," Proc. IEEE ICASSP, 4854-4858 (2018).
- L. Van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008).