• Title/Summary/Keyword: 발음 훈련

Search Result 51, Processing Time 0.014 seconds

A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2 (콘포머 기반 FastSpeech2를 이용한 한국어 음식 주문 문장 음성합성기)

  • Choi, Yerin;Jang, JaeHoo;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.359-366
    • /
    • 2022
  • In this paper, we present the Korean menu-ordering Sentence Text-to-Speech (TTS) system using conformer-based FastSpeech2. Conformer is the convolution-augmented transformer, which was originally proposed in Speech Recognition. Combining two different structures, the Conformer extracts better local and global features. It comprises two half Feed Forward module at the front and the end, sandwiching the Multi-Head Self-Attention module and Convolution module. We introduce the Conformer in Korean TTS, as we know it works well in Korean Speech Recognition. For comparison between transformer-based TTS model and Conformer-based one, we train FastSpeech2 and Conformer-based FastSpeech2. We collected a phoneme-balanced data set and used this for training our models. This corpus comprises not only general conversation, but also menu-ordering conversation consisting mainly of loanwords. This data set is the solution to the current Korean TTS model's degradation in loanwords. As a result of generating a synthesized sound using ParallelWave Gan, the Conformer-based FastSpeech2 achieved superior performance of MOS 4.04. We confirm that the model performance improved when the same structure was changed from transformer to Conformer in the Korean TTS.