• Title/Summary/Keyword: Singing Voice Synthesis

Search Result 4, Processing Time 0.016 seconds

Singing Voice Synthesis Using HMM Based TTS and MusicXML (HMM 기반 TTS와 MusicXML을 이용한 노래음 합성)

  • Khan, Najeeb Ullah;Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.5
    • /
    • pp.53-63
    • /
    • 2015
  • Singing voice synthesis is the generation of a song using a computer given its lyrics and musical notes. Hidden Markov models (HMM) have been proved to be the models of choice for text to speech synthesis. HMMs have also been used for singing voice synthesis research, however, a huge database is needed for the training of HMMs for singing voice synthesis. And commercially available singing voice synthesis systems which use the piano roll music notation, needs to adopt the easy to read standard music notation which make it suitable for singing learning applications. To overcome this problem, we use a speech database for training context dependent HMMs, to be used for singing voice synthesis. Pitch and duration control methods have been devised to modify the parameters of the HMMs trained on speech, to be used as the synthesis units for the singing voice. This work describes a singing voice synthesis system which uses a MusicXML based music score editor as the front-end interface for entry of the notes and lyrics to be synthesized and a hidden Markov model based text to speech synthesis system as the back-end synthesizer. A perceptual test shows the feasibility of our proposed system.

Deep Learning based Singing Voice Synthesis Modeling (딥러닝 기반 가창 음성합성(Singing Voice Synthesis) 모델링)

  • Kim, Minae;Kim, Somin;Park, Jihyun;Heo, Gabin;Choi, Yunjeong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.127-130
    • /
    • 2022
  • This paper is a study on singing voice synthesis modeling using a generator loss function, which analyzes various factors that may occur when applying BEGAN among deep learning algorithms optimized for image generation to Audio domain. and we conduct experiments to derive optimal quality. In this paper, we focused the problem that the L1 loss proposed in the BEGAN-based models degrades the meaning of hyperparameter the gamma(𝛾) which was defined to control the diversity and quality of generated audio samples. In experiments we show that our proposed method and finding the optimal values through tuning, it can contribute to the improvement of the quality of the singing synthesis product.

  • PDF

Application and Technology of Voice Synthesis Engine for Music Production (음악제작을 위한 음성합성엔진의 활용과 기술)

  • Park, Byung-Kyu
    • Journal of Digital Contents Society
    • /
    • v.11 no.2
    • /
    • pp.235-242
    • /
    • 2010
  • Differently from instruments which synthesized sounds and tones in the past, voice synthesis engine for music production has reached to the level of creating music as if actual artists were singing. It uses the samples of human voices naturally connected to the different levels of phoneme within the frequency range. Voice synthesis engine is not simply limited to the music production but it is changing cultural paradigm through the second creations of new music type including character music concerts, media productions, albums, and mobile services. Currently, voice synthesis engine technology makes it possible that users input pitch, lyrics, and musical expression parameters through the score editor and they mix and connect voice samples brought from the database to sing. New music types derived from such a development of computer music has sparked a big impact culturally. Accordingly, this paper attempts to examine the specific case studies and the synthesis technologies for users to understand the voice synthesis engine more easily, and it will contribute to their variety of music production.

A performance evaluation study of a deep learning-based voice synthesis technique using Mel-Conceptual Distortion (MCD). (멜-셉스트럴 왜곡(MCD)를 활용한 딥러닝 기반 목소리 합성 기술의 성능 평가 연구)

  • Jaesang Han;Yunseo Kang;Sangwoo Na;Hayeon Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.488-489
    • /
    • 2023
  • 노래 음성 변환(Singing Voice Conversion, SVC)은 오디오 처리 분야에서 최근 활발히 연구되는 분야 중 하나로, 원래의 멜로디와 가사를 유지하면서 소스 가수의 노래 음성을 대상 가수의 음성으로 변환하는 것을 목표로 한다. 본 논문에서는 딥러닝 기반 SVC 모델을 중심으로 멜 셉스트럴 왜곡 지표를 활용해 모델 간 성능 평가를 진행한다. 이를 통해 엔터테인먼트, 교육 등 분야에서 효율적인 SVC 모델을 찾아 활용할 수 있을 것이다.