DOI QR코드

DOI QR Code

A knowledge-based pronunciation generation system for French

지식 기반 프랑스어 발음열 생성 시스템

  • Received : 2018.02.19
  • Accepted : 2018.03.28
  • Published : 2018.03.31

Abstract

This paper aims to describe a knowledge-based pronunciation generation system for French. It has been reported that a rule-based pronunciation generation system outperforms most of the data-driven ones for French; however, only a few related studies are available due to existing language barriers. We provide basic information about the French language from the point of view of the relationship between orthography and pronunciation, and then describe our knowledge-based pronunciation generation system, which consists of morphological analysis, Part-of-Speech (POS) tagging, grapheme-to-phoneme generation, and phone-to-phone generation. The evaluation results show that the word error rate of POS tagging, based on a sample of 1,000 sentences, is 10.70% and that of phoneme generation, using 130,883 entries, is 2.70%. This study is expected to contribute to the development and evaluation of speech synthesis or speech recognition systems for French.

Keywords

References

  1. Allen, J., Hunnicutt, M., Klatt, D., Armstrong, R., & Pisoni, D. (1987). From text to speech: The MITalk system. NY: Cambridge University Press.
  2. Arik, S., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., Li, X., Miller, J., Ng, A., Raiman, J., Sengupta, S., & Shoeybi, M. (2017). Deep voice: Real-time neural text-to-speech. Proceedings of the 34th International Conference on Machine Learning (ICML 2017) (pp. 1234-1252).
  3. Bechet, F. (2001). LIA PHON: Un systeme complet de phonetisation de textes. Traitement Automatique Des Langues, 42(1), 47-67.
  4. Black, A., Lenzo, K., & Pagel, V. (1998). Issues in building general letter to sound rules. 3rd ESCA Workshop on Speech Synthesis (pp. 77-80).
  5. Byrd, R., & Tzoukermann, E. (1988). Adapting an English morphological analyzer for French. Proceedings of the 26th Annual Meeting on Association for Computational Linguistics (pp. 1-6). Association for Computational Linguistics.
  6. de Mareuil, P., d'Alessandro, C., Bailly, G., Bechet, F., Garcia, M., Morel, M., Prudon, R., & Veronis, J. (2005). Evaluating the pron unciation of proper names by four French grapheme-to-phoneme converters. Proceedings of the Interspeech 2005 (pp. 1521-1524). Interspeech.
  7. Gruaz, C., Jacquemin, C., & Tzoukerman, E. (1996). Une approche a deux niveaux de la morphologie derivationnelle du francais. Actes du seminaire Lexique. Representations et Outils pour les bases lexicales. Morphologie Robuste, 107-114.
  8. Hahn, S., Vozila, P., & Bisani, M. (2012). Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and LVCSR tasks. In 13th Annual Conference of the International Speech Communication Association.
  9. Jiampojamarn, S., & Kondrak, G. (2010). Phoneme alignment: An exploration. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 780-788). Association for Computational Linguistics.
  10. Larreur, D., & Sorin, C. (1991). Quality evaluation of French text-to-speech synthesis within a task the importance of the mute "e". Proceedings of the ESCA Workshop on Speech Synthesis. Lannion. 25-28 September, 1990.
  11. Lecorve, G., & Lolive, D. (2015). Adaptive statistical utterance phonetization for French. Proceedings of the Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4864-4868). IEEE.
  12. Marchand, Y., & Damper, R. (2000). A multistrategy approach to improving pronunciation by analogy. Computational Linguistics, 26(2), 195-219.
  13. Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., & Kavukcuoglu, K. (2016). WaveNet: A generative model for raw audio. Retrieved from http://arxiv.org/abs/1609.03499 [Computing Research Repository] on September 19, 2016.
  14. Perennou, G., & De Calmes, M. (2000). MHATLex: Lexical resources for modelling the french pronunciation. Proceedings of the LREC 2000.
  15. Rao, K., Peng, F., Sak, H., & Beaufays, F. (2015). Grapheme-to-pho neme conversion using long short-term memory recurrent neural networks. Proceedings of the Acoustics, Speech and Signal Proce ssing (ICASSP), 2015 IEEE International Conference on (pp. 4225-4229). IEEE.
  16. Shen, J., Pang, R., Weiss, R., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., Skerry-Ryan, R., Saurous, R., Agiomyrgiannakis, Y., & Wu, Y. (2017). Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. arXiv preprint arXiv:1712.05884. February 16, 2018.
  17. Sotelo, J., Mehri, S., Kumar, K., Santos, J., Kastner, K., Courville, A., & Bengio, Y. (2017). Char2Wav: End-to-End Speech Synthesis. Proceedings of the 5th International Conference on Learning Representations (ICLR 2017) Workshop. Retrieved from https://openreview.net/forum?id=B1VWyySKx on 18 February, 2017.
  18. Taylor, P. (2005). Hidden Markov models for grapheme to phoneme conversion. Proceedings of the 9th European Conference on Speech Communication and Technology.
  19. Taylor, P. (2009). Text-to-speech synthesis. NY: Cambridge University Press.
  20. Tokuda, K., Nankaku, U., Toda, T., Zen, H., Yamagishi, J., & Oura, K. (2013). Speech synthesis based on Hidden Markov models. Proceedings of IEEE (pp. 1234-1252).
  21. Van Den Bosch, A., & Canisius, S. (2006). Improved morpho-phonological sequence processing with constraint satisfaction inference. Proceedings of the 8th Meeting of the ACL Special Interest Group on Computational Phonology and Morphology (pp. 41-49). Association for Computational Linguistics.
  22. Wang, Y., Skerry-Ryan, R., Stanton, D., Wu, Y., Weiss, R., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q., Agiomyrgia nnakis, Y., Clark, R., & Saurous, R. (2017). Tacotron: Towards end-to-end speech synthesis. Retrieved from http://arxiv.org/abs/1703.10135 [Computing Research Repository] on April 6, 2017.
  23. Yoon, K., & Brew, C. (2006). A linguistically motivated approach to grapheme-to-phoneme conversion for Korean. Computer Speech & Language, 20(4), 357-381. https://doi.org/10.1016/j.csl.2005.03.002
  24. Yvon, F., De Mareuil, P., d'Alessandro, C., Auberge, V., Auberge, V., Bagein, M., Bailly, G., Bechet, F., Foukia, S., Goldman, J., Keller, E., O'Shaughnessy, D., Pagel, V., Sannier, F., Veronis, J., & Zellner, B. (1998). Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French. Computer Speech & Language, 12(4), 393-410. https://doi.org/10.1006/csla.1998.0104
  25. Zen, H., & Sak, H. (2015). Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. Proceedings of the Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4470-4474). IEEE.
  26. Zen, H., Senior, A., & Schuster, M. (2013). Statistical parametric speech synthesis using deep neural networks. Proceedings of the Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 7962-7966). IEEE.