1 |
Zhang, Ying, et al., "Towards end-to-end speech recognition with deep convolutional neural networks," arXiv preprint arXiv:1701.02720 (2017).
|
2 |
Graves, Alex, et al., "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
|
3 |
Hori, Takaaki, et al., "Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM," arXiv preprint arXiv:1706.02737 (2017).
|
4 |
National Institute of the Korean Language (NIKL), Seoul Reading Speech Corpus("서울말 낭독체 발화 말뭉치"), 2003. URL: https://ithub.korean.go.kr
|
5 |
Yejin Cho, Korean Grapheme-to-Phoneme Analyzer (KoG2P), 2017. GitHub repository : https://github.com/scarletcho/KoG2P
|
6 |
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556 (2014).
|
7 |
Amodei, Dario, et al., "Deep speech 2: End-to-end speech recognition in english and mandarin," International Conference on Machine Learning. 2016.
|
8 |
Xiong, Wayne, et al., "The Microsoft 2016 conversational speech recognition system," Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
|
9 |
Sainath, Tara N., et al., "Convolutional, long short-term memory, fully connected deep neural networks," Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.
|
10 |
Chung, Junyoung, et al., "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555 (2014).
|
11 |
Xiong, Wayne, et al., "The Microsoft 2016 conversational speech recognition system," Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.
|
12 |
Schwarz, Petr, Pavel Matejka, and Jan Cernocky. "Towards lower error rates in phoneme recognition," International Conference on Text, Speech and Dialogue. Springer, Berlin, Heidelberg, 2004.
|
13 |
Gales, Mark JF. "Maximum likelihood linear transformations for HMM-based speech recognition," Computer Speech & Language, Vol.12, No.2, pp.75-98, 1998.
DOI
|
14 |
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradientbased learning applied to document recognition," Proceedings of the IEEE, Vol.86, No.11, pp.2278-2324, 1998.
DOI
|
15 |
Glass, James R. "A probabilistic framework for segmentbased speech recognition," Computer Speech & Language , Vol.17, No.2-3, pp.137-152, 2003.
DOI
|
16 |
Waibel, Alexander, et al., "Phoneme recognition using timedelay neural networks," Readings in Speech Recognition, 1990. 393-404.
|
17 |
Ji-Young Shin. "Phoneme and Syllable Frequencies of Korean Based on the Analysis of Spontaneous Speech Data," Communication Sciences and Disorders, Vol.13, No.2, pp.193-215, 2008.
|
18 |
Bengio, Yoshua. "A connectionist approach to speech recognition," Advances in Pattern Recognition Systems Using Neural Network Technologies, pp.3-23. 1993.
|
19 |
Mohamed, Abdel-rahman, George E. Dahl, and Geoffrey Hinton. "Acoustic modeling using deep belief networks," IEEE Transactions on Audio, Speech, and Language Processing, Vol.20, No.1, pp.14-22, 2012.
DOI
|
20 |
Ardussi Mines, M., Hanson, B. F., & Shoup, J. E. "Frequency of Occurrence of Phonemes in Conversational English," Language and Speech, Vol.21, No.3, pp.221-241, 1978.
DOI
|
21 |
Seltzer, Michael L., and Jasha Droppo. "Multi-task learning in deep neural networks for improved phoneme recognition," Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.
|
22 |
Graves, Alex, Navdeep Jaitly, and Abdel-rahman Mohamed. "Hybrid speech recognition with deep bidirectional LSTM," Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.
|
23 |
Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. "Speech recognition with deep recurrent neural networks," Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.
|
24 |
Minsoo Na and Minhwa Chung, "Assistive Program for Automatic Speech Transcription based on G2P Conversion and Speech Recognition," Proc. Conference on Korean Society of Speech Sciences, pp.131-132, 2016.
|
25 |
Palaz, Dimitri, Ronan Collobert, and Mathew Magimai Doss. "End-to-end phoneme sequence recognition using convolutional neural networks," arXiv preprint arXiv: 1312.2137 (2013).
|
26 |
Heck, Michael, et al., "Ensembles of Multi-scale VGG Acoustic Models," Proc. Interspeech 2017 (2017): 1616-1620.
|
27 |
Palaz, Dimitri, Mathew Magimai Doss, and Ronan Collobert. "Convolutional neural networks-based continuous speech recognition using raw speech signal," Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.
|