1 |
Chang, X., Zhang, W., Qian, Y., Le Roux, J., & Watanabe, S. (2020, May). End-to-end multi-speaker speech recognition with transformer. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6134-6138). Barcelona, Spain.
|
2 |
Gale, W. A., & Sampson, G. (1995). Good-turing frequency estimation without tears. Journal of Quantitative Linguistics, 2(3), 217-237.
DOI
|
3 |
Graves, A., Fernandez, S., Gomez, F., & Schmidhuber, J. (2006, June). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning (pp. 369- 376). Pittsburgh, PA.
|
4 |
James, F. (2000). Modified kneser-ney smoothing of n-gram models (RIACS Technical Report 00.07). Mountain View, CA: Research institute for advanced computer science. Retrieved from https://www.researchgate.net/profile/Frankie-James/publication/255479295_Modified_Kneser-Ney_Smoothing_of_n-gram_Models/links/54d156750cf28959aa7adc08/Modified-Kneser-Ney-Smoothingof-n-gram-Models.pdf
|
5 |
Kim, S., Bae, S., & Won, C. (2020). KoSpeech: open-source toolkit for end-to-end Korean speech recognition. arXiv. Retrieved from https://arxiv.org/abs/2009.03092
|
6 |
Kingma, D. P., & Ba, J. (2014). Adam: a method for stochastic optimization. arXiv. Retrieved from https://arxiv.org/abs/1412.6980
|
7 |
Koutsoukas, A., Monaghan, K. J., Li, X., & Huan, J. (2017). Deeplearning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. Journal of Cheminformatics, 9(1), 1-13.
DOI
|
8 |
Lakomkin, E., Zamani, M. A., Weber, C., Magg, S., & Wermter, S. (2019, May). dorporating end-to-end speech recognition models for sentiment analysis. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA) (pp. 7976-7982). Montreal, QC.
|
9 |
LeCun, Y. A., Bottou, L., Orr, G. B., & Muller, K. R. (2012). Efficient backprop. In G. Montavon, G. B. Orr & K. S. Muller (Eds.), Neural networks: tricks of the trade (2nd ed., Vol. 7700, pp. 9-48). Berlin, Germany: Springer.
|
10 |
Miao, H., Cheng, G., Gao, C., Zhang, P., & Yan, Y. (2020, May). Transformer-based online CTC/attention end-to-end speech recognition architecture. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6084-6088). Barcelona, Spain.
|
11 |
Nakatani, T. (2019, September). Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. Proceedings of the Interspeech 2019. Graz, Austria
|
12 |
Okewu, E., Adewole, P., & Sennaike, O. (2019, July). Experimental comparison of stochastic optimizers in deep learning. Proceedings of the International Conference on Computational Science and its Applications (pp. 704-715). Saint Petersburg, Russia.
|
13 |
Popel, M., & Bojar, O. (2018). Training tips for the transformer model. The Prague Bulletin of Mathematical Linguistics, 110(1), 43-70.
DOI
|
14 |
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser L., & Polosukhin, I. (2017). Attention is all you need. arXiv. Retrieved from https://arxiv.org/abs/1706.03762
|
15 |
Watanabe, S., Hori, T., Kim, S., Hershey, J. R., & Hayashi, T. (2017). Hybrid CTC/attention architecture for end-to-end speech recognition. IEEE Journal of Selected Topics in Signal Processing, 11(8), 1240-1253.
DOI
|
16 |
Wang, C., Wu, Y., Du, Y., Li, J., Liu, S., Lu, L., Ren S., ...Zhou, M. (2019). Semantic mask for transformer based end-to-end speech recognition. arXiv. Retrieved from https://arxiv.org/abs/1912.03010
|
17 |
Watanabe, S., Boyer, F., Chang, X., Guo, P., Hayashi, T., Higuchi, Y., Hori, T., … Zhang, W. (2020). The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans. arXiv, arXiv:2012.13006
|
18 |
Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Soplin, N. E. Y., ... Ochiai, T. (2018). Espnet: end-to-end speech processing toolkit. arXiv. Retrieved from https://arxiv.org/abs/1804.00015
|
19 |
Wei, C., Yu, Z., & Fong, S. (2018, February). How to build a chatbot: chatbot framework and its capabilities. Proceedings of the 2018 10th International Conference on Machine Learning and Computing (pp. 369-373). Macau, China.
|
20 |
You, Y., Li, J., Reddi, S., Hseu, J., Kumar, S., Bhojanapalli, S., & Hsieh, C. J. (2019). Large batch optimization for deep learning: training bert in 76 minutes. arXiv. Retrieved from https://arxiv.org/abs/1904.00962
|