Browse > Article
http://dx.doi.org/10.13064/KSSS.2018.10.1.049

A knowledge-based pronunciation generation system for French  

Kim, Sunhee ((주)네이버)
Publication Information
Phonetics and Speech Sciences / v.10, no.1, 2018 , pp. 49-55 More about this Journal
Abstract
This paper aims to describe a knowledge-based pronunciation generation system for French. It has been reported that a rule-based pronunciation generation system outperforms most of the data-driven ones for French; however, only a few related studies are available due to existing language barriers. We provide basic information about the French language from the point of view of the relationship between orthography and pronunciation, and then describe our knowledge-based pronunciation generation system, which consists of morphological analysis, Part-of-Speech (POS) tagging, grapheme-to-phoneme generation, and phone-to-phone generation. The evaluation results show that the word error rate of POS tagging, based on a sample of 1,000 sentences, is 10.70% and that of phoneme generation, using 130,883 entries, is 2.70%. This study is expected to contribute to the development and evaluation of speech synthesis or speech recognition systems for French.
Keywords
pronunciation generation; French; speech synthesis; speech recognition; knowledge-based;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Allen, J., Hunnicutt, M., Klatt, D., Armstrong, R., & Pisoni, D. (1987). From text to speech: The MITalk system. NY: Cambridge University Press.
2 Arik, S., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., Li, X., Miller, J., Ng, A., Raiman, J., Sengupta, S., & Shoeybi, M. (2017). Deep voice: Real-time neural text-to-speech. Proceedings of the 34th International Conference on Machine Learning (ICML 2017) (pp. 1234-1252).
3 Bechet, F. (2001). LIA PHON: Un systeme complet de phonetisation de textes. Traitement Automatique Des Langues, 42(1), 47-67.
4 Black, A., Lenzo, K., & Pagel, V. (1998). Issues in building general letter to sound rules. 3rd ESCA Workshop on Speech Synthesis (pp. 77-80).
5 Hahn, S., Vozila, P., & Bisani, M. (2012). Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and LVCSR tasks. In 13th Annual Conference of the International Speech Communication Association.
6 Byrd, R., & Tzoukermann, E. (1988). Adapting an English morphological analyzer for French. Proceedings of the 26th Annual Meeting on Association for Computational Linguistics (pp. 1-6). Association for Computational Linguistics.
7 de Mareuil, P., d'Alessandro, C., Bailly, G., Bechet, F., Garcia, M., Morel, M., Prudon, R., & Veronis, J. (2005). Evaluating the pron unciation of proper names by four French grapheme-to-phoneme converters. Proceedings of the Interspeech 2005 (pp. 1521-1524). Interspeech.
8 Gruaz, C., Jacquemin, C., & Tzoukerman, E. (1996). Une approche a deux niveaux de la morphologie derivationnelle du francais. Actes du seminaire Lexique. Representations et Outils pour les bases lexicales. Morphologie Robuste, 107-114.
9 Jiampojamarn, S., & Kondrak, G. (2010). Phoneme alignment: An exploration. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 780-788). Association for Computational Linguistics.
10 Larreur, D., & Sorin, C. (1991). Quality evaluation of French text-to-speech synthesis within a task the importance of the mute "e". Proceedings of the ESCA Workshop on Speech Synthesis. Lannion. 25-28 September, 1990.
11 Shen, J., Pang, R., Weiss, R., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., Skerry-Ryan, R., Saurous, R., Agiomyrgiannakis, Y., & Wu, Y. (2017). Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. arXiv preprint arXiv:1712.05884. February 16, 2018.
12 Lecorve, G., & Lolive, D. (2015). Adaptive statistical utterance phonetization for French. Proceedings of the Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4864-4868). IEEE.
13 Marchand, Y., & Damper, R. (2000). A multistrategy approach to improving pronunciation by analogy. Computational Linguistics, 26(2), 195-219.
14 Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., & Kavukcuoglu, K. (2016). WaveNet: A generative model for raw audio. Retrieved from http://arxiv.org/abs/1609.03499 [Computing Research Repository] on September 19, 2016.
15 Perennou, G., & De Calmes, M. (2000). MHATLex: Lexical resources for modelling the french pronunciation. Proceedings of the LREC 2000.
16 Rao, K., Peng, F., Sak, H., & Beaufays, F. (2015). Grapheme-to-pho neme conversion using long short-term memory recurrent neural networks. Proceedings of the Acoustics, Speech and Signal Proce ssing (ICASSP), 2015 IEEE International Conference on (pp. 4225-4229). IEEE.
17 Sotelo, J., Mehri, S., Kumar, K., Santos, J., Kastner, K., Courville, A., & Bengio, Y. (2017). Char2Wav: End-to-End Speech Synthesis. Proceedings of the 5th International Conference on Learning Representations (ICLR 2017) Workshop. Retrieved from https://openreview.net/forum?id=B1VWyySKx on 18 February, 2017.
18 Taylor, P. (2005). Hidden Markov models for grapheme to phoneme conversion. Proceedings of the 9th European Conference on Speech Communication and Technology.
19 Tokuda, K., Nankaku, U., Toda, T., Zen, H., Yamagishi, J., & Oura, K. (2013). Speech synthesis based on Hidden Markov models. Proceedings of IEEE (pp. 1234-1252).
20 Taylor, P. (2009). Text-to-speech synthesis. NY: Cambridge University Press.
21 Yvon, F., De Mareuil, P., d'Alessandro, C., Auberge, V., Auberge, V., Bagein, M., Bailly, G., Bechet, F., Foukia, S., Goldman, J., Keller, E., O'Shaughnessy, D., Pagel, V., Sannier, F., Veronis, J., & Zellner, B. (1998). Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French. Computer Speech & Language, 12(4), 393-410.   DOI
22 Van Den Bosch, A., & Canisius, S. (2006). Improved morpho-phonological sequence processing with constraint satisfaction inference. Proceedings of the 8th Meeting of the ACL Special Interest Group on Computational Phonology and Morphology (pp. 41-49). Association for Computational Linguistics.
23 Wang, Y., Skerry-Ryan, R., Stanton, D., Wu, Y., Weiss, R., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q., Agiomyrgia nnakis, Y., Clark, R., & Saurous, R. (2017). Tacotron: Towards end-to-end speech synthesis. Retrieved from http://arxiv.org/abs/1703.10135 [Computing Research Repository] on April 6, 2017.
24 Yoon, K., & Brew, C. (2006). A linguistically motivated approach to grapheme-to-phoneme conversion for Korean. Computer Speech & Language, 20(4), 357-381.   DOI
25 Zen, H., & Sak, H. (2015). Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. Proceedings of the Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4470-4474). IEEE.
26 Zen, H., Senior, A., & Schuster, M. (2013). Statistical parametric speech synthesis using deep neural networks. Proceedings of the Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 7962-7966). IEEE.