Korean speech recognition using deep learning
![]() |
Lee, Suji
(Department of Statistics, Seoul National University)
Han, Seokjin (Department of Statistics, Seoul National University) Park, Sewon (Department of Statistics, Seoul National University) Lee, Kyeongwon (Department of Statistics, Seoul National University) Lee, Jaeyong (Department of Statistics, Seoul National University) |
1 | Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473 |
2 | Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, 5, 157-166. DOI |
3 | Blundell, C., Cornebise, J., Kavukcuoglu, K., and Wierstra, D. (2015). Weight uncertainty in neural networks, arXiv preprint, arXiv:1505.05424 |
4 | Chan, W., Jaitly, N., Le, Q. V., and Vinyals, O. (2015). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, 4960-4964. IEEE, 2016. |
5 | Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio, Y. (2014). On the properties of neural machine translation: Encoder-Decoder approaches. In Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. |
6 | Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, arXiv preprint, arXiv:1406.1078 |
7 | Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint, arXiv:1412.3555 |
8 | Gal, Y. and Ghahramani, Z. (2016a). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, 1050-1059. |
9 | Gal, Y. and Ghahramani, Z. (2016b). A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems, 1019-1027. |
10 | Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep learning (Vol. 1), MIT press, Cambridge. |
11 | Graves, A., Fernandez, S., Gomez, F., and Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, 369-376. ACM. |
12 |
Gales, M., and Young, S. (2008). The application of hidden Markov models in speech recognition, Foundations and Trends |
13 | Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735-1780. DOI |
14 | Huang, X., Acero, A., and Hon, H. (2001). Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Prentice hall PTR, New Jersey. |
15 | Jelinek, F. (1997). Statistical Methods for Speech Recognition, MIT press, Cambridge. |
16 | Kim, S., Hori, T., andWatanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, 4835-4839. IEEE. |
17 | Kingma, D. P. and Ba, J. (2014). Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. |
18 | Luong, M.-T., Pham, H., and Manning, C. D. (2015). Effective Approaches to Attention-based Neural Machine Translation. arXiv preprint arXiv:1508.04025 |
19 | Kwon, O. W. and Park, J. (2003). Korean large vocabulary continuous speech recognition with morphemebased recognition units, Speech Communication, 39, 287-300. DOI |
![]() |