과제정보
This study was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean Government (23ZS1100, Core Technology Research for Self-improving Integrated Artificial Intelligence Systems).
참고문헌
- J.-U. Bang, M.-Y. Choi, S.-H. Kim, and O.-W. Kwon, Automatic construction of a large-scale speech recognition database using multi-genre broadcast data with inaccurate subtitle timestamps, IEICE Trans. Inform. Syst. 103 (2020), no. 2, 406-415.
- J.-U. Bang, J.-G. Maeng, J. Park, S. Yun, and S.-H. Kim, English-Korean speech translation corpus (enkost-c): construction procedure and evaluation results, ETRI J. 45 (2023), no. 1, 18-27.
- J. Chun, C. Jo, J. Lee, and M.-W. Koo. Number normalization in Korean using the transformer model, KIISE 48 (2021), no. 5, 510-517. https://doi.org/10.5626/JOK.2021.48.5.510
- Y. Choi, Y. Jung, Y. Kim, Y. Suh, and H. Kim, An end-to-end synthesis method for Korean text-to-speech systems, Phonet. Speech Sci. 10 (2018), no. 1, 39-48.
- M. Sunkara, C. Shivade, S. Bodapati, and K. Kirchhoff, Neural inverse text normalization, (ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada), 2021, pp. 7573-7577.
- M. Mohri, Weighted finite-state transducer algorithms. An overview, Formal Lang. Appl. 2004 (2004), 551-563.
- L. Pandey, D. Paul, P. Chitkara, Y. Pang, X. Zhang, K. Schubert, M. Chou, S. Liu, and Y. Saraf, Improving data driven inverse text normalization using data augmentation, arXiv preprint, 2022, DOI 10.48550/arXiv.2207.09674
- D. Paul, Y. Pang, S.-J. Chen, and X. Zhang, Improving data driven inverse text normalization using data augmentation and machine translation, (Proc. Interspeech, Incheon, Rep. of Korea), 2022, pp. 5221-5222.
- Y. Gaur, N. Kibre, J. Xue, K. Shu, Y. Wang, I. Alphanso, J. Li, and Y. Gong, Streaming, fast and accurate on-device inverse text normalization for automatic speech recognition, (IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar), 2023, pp. 237-244.
- M. Ihori, H. Sato, T. Tanaka, R. Masumura, S. Mizuno, and N. Hojo, Transcribing speech as spoken and written dual text using an autoregressive model, (Proc. Interspeech, Dublin, Ireland), 2023, DOI 10.21437/Interspeech.2023-1655.
- M. Ihori, A. Takashima, and R. Masumura, Parallel corpus for Japanese spoken-to-written style conversion, (Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France), 2020, pp. 6346-6353.
- J. Guo, T. N. Sainath, and R. J. Weiss, A spelling correction model for end-to-end speech recognition, (ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Bridhton, UK), 2019, pp. 5651-5655.
- O. Hrinchuk, M. Popova, and B. Ginsburg, Correction of automatic speech recognition with transformer sequence-to-sequence model, (ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain), 2020, pp. 7074-7078.
- C. Park, J. Seo, S. Lee, C. Lee, H. Moon, S. Eo, and H.-S. Lim, BTS: back transcription for speech-to-text post-processor using text-to-speech-to-text, (Proceedings of the 8th Workshop on Asian Translation (WAT2021)), 2021, pp. 106-116.
- J.-U. Bang, S. Yun, S.-H. Kim, M.-Y. Choi, M.-K. Lee, Y.-J. Kim, D.-H. Kim, J. Park, Y.-J. Lee, and S.-H. Kim, Ksponspeech: Korean spontaneous speech corpus for automatic speech recognition, Appl. Sci. 10 (2020), no. 19, DOI 10.3390/app10196936.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, Adv. Neural Inform. Process. Syst. 30 (2017).
- S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, and N. Chen, Espnet: end-to-end speech processing toolkit, arXiv preprint, 2018, DOI 10.48550/arXiv.1804.00015
- AIHub, Aihub Korean lecture speech dataset, 2020. Last accessed on August 27, 2023.
- ETRI, Etri Korean common speech dataset, 2004. Last accessed on August 27, 2023.
- Y.-I. Jung, J.-S. Kim, S.-H. Kim, Y.-J. Lee, and A.-S. Yoon, A study on the arabic numeral reading rules in modern Korean, (Annual Conference on Human and Language Technology. Human and Language Technology), 2002, pp. 16-23.
- M. Post, A call for clarity in reporting bleu scores, arXiv preprint, 2018, DOI 10.48550/arXiv.1804.08771
- T. Sellam, D. Das, and A. P. Parikh, BLEURT: Learning robust metrics for text generation, (Proceedings of Annual Meeting of the Association for Computational Linguistics), 2020. DOI , 10.18653/v1/2020.acl-main.704.
- D. Se, Deepl translate: the world's most accurate translator, 2017. https://www.deepl.com/translator