References
- Bang, J. U., Yun, S., Kim, S. H., Choi, M. Y., Lee, M. K., Kim, Y. J., Kim, D. H., & Kim, S. H. (2020). KsponSpeech: Korean spontaneous speech corpus for automatic speech recognition. Applied Sciences, 10(19), 6936. https://doi.org/10.3390/app10196936
- Boeddeker, C., Nakatani, T., Kinoshita, K., & Haeb-Umbach, R. (2020, May). Jointly optimal dereverberation and beamforming. Proceedings of the 2020−2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 216-220). Barcelona, Spain.
- Boersma, P., & Weenink, D. (2018). Praat: Doing phonetics by computer (version 6.0.37) [Computer program]. Retrieved from http://www.praat.org/
- Chapman, W. W., Aronsky, D., Fiszman, M., & Haug, P. J. (2000). Contribution of a speech recognition system to a computerized pneumonia guideline in the emergency department. Proceedings of the AMIA Symposium (p. 131).
- Cho, B. J., Lee, J. M., & Park, H. M. (2019). A beamforming algorithm based on maximum likelihood of a complex Gaussian distribution with time-varying variances for robust speech recognition. IEEE Signal Processing Letters, 26(9), 1398-1402. https://doi.org/10.1109/lsp.2019.2932848
- Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., & Quatieri, T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech Communication, 71, 10-49. https://doi.org/10.1016/j.specom.2015.03.004
- Grondin, F., & Glass, J. (2019, May). SVD-PHAT: A fast sound source localization method. Proceedings of the 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4140-4144). Brighton, UK.
- Hernandez, A., Kim, S., & Chung, M. (2020). Prosody-based measures for automatic severity assessment of dysarthric speech. Applied Sciences, 10(19), 6999. https://doi.org/10.3390/app10196999
- Higuchi, T., Ito, N., Yoshioka, T., & Nakatani, T. (2016, March). Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5210-5214). Shanghai, China.
- Huang, Z., Epps, J., Joachim, D., Stasak, B., Williamson, J. R., & Quatieri, T. F. (2020). Domain adaptation for enhancing Speech-based depression detection in natural environmental conditions using dilated CNNs.Interspeech 2020 (pp. 4561-4565). Shanghai, China.
- Kudo, T., & Richardson, J. (2018, August). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 66-71).
- Kubo, Y., Nakatani, T., Delcroix, M., Kinoshita, K., & Araki, S. (2019). Mask-based MVDR beamformer for noisy multisource environments: introduction of time-varying spatial covariance model. Proceedings of the 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 6855-6859). Brighton, UK.
- Laukka, P., Linnman, C., Ahs, F., Pissiota, A., Frans, O., Faria, V., Palmquist, A. M., & Furmark, T. (2008). In a nervous voice: Acoustic analysis and perception of anxiety in social phobics' speech. Journal of Nonverbal Behavior, 32(4), 195. https://doi.org/10.1007/s10919-008-0055-9
- Lee, Y., Shon, S., & Kim, T. (2018). Learning pronunciation from a foreign language in speech synthesis network. arXiv. Retrieved from https://arxiv.org/abs/1811.09364
- Mariani, C., Tronchi, A., Oncini, L., Pirani, O., & Murri, R. (2006). Analysis of the X-ray work flow in two diagnostic imaging departments with and without a RIS/PACS system. Journal of Digital Imaging, 19(1), 18-28. https://doi.org/10.1007/s10278-006-0858-3
- Maryn, Y., Roy, N., De Bodt, M., Van Cauwenberge, P., & Corthals, P. (2009). Acoustic measurement of overall voice quality: A meta-analysis. The Journal of the Acoustical Society of America, 126(5), 2619-2634. https://doi.org/10.1121/1.3224706
- Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. Interspeech 2019 (pp. 2613-2617). Graz, Austria.
- Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., ... Vesely, K. (2011). The Kaldi speech recognition toolkit. IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. Big Island, HI.
- Seo, I., & Seong, C. (2013). Voice quality of dysarthric speakers in connected speech. Journal of the Korean Society of Speech Sciences, 5(4), 33-41. https://doi.org/10.13064/KSSS.2013.5.4.033
- Wang, D., Wang, X., & Lv, S. (2019). An overview of end-to-end automatic speech recognition. Symmetry, 11(8), 1018. https://doi.org/10.3390/sym11081018
- Weiner, J., Engelbart, M., & Schultz, T. (2017). Manual and automatic transcriptions in dementia detection from speech. Interspeech 2017 (pp. 3117-3121). Stockholm, Sweden.
- Xezonaki, D., Paraskevopoulos, G., Potamianos, A., & Narayanan, S. (2020). Affective conditioning on hierarchical attention networks applied to depression detection from transcribed clinical interviews. Interspeech 2020 (pp. 4556-4560). Shanghai, China.
- Xu, H., Stenner, S. P., Doan, S., Johnson, K. B., Waitman, L. R., & Denny, J. C. (2010). MedEx: A medication information extraction system for clinical narratives. Journal of the American Medical Informatics Association, 17(1), 19-24. https://doi.org/10.1197/jamia.M3378
- Yoshioka, T., & Nakatani, T. (2012). Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening. IEEE Transactions on Audio, Speech, and Language Processing, 20(10), 2707-2720. https://doi.org/10.1109/TASL.2012.2210879
- Yoshioka, T., & Nakatani, T. (2013, September). Dereverberation for reverberation-robust microphone arrays. 21st European Signal Processing Conference (EUSIPCO 2013) (pp. 1-5). Marrakech, Morocco.