• 제목/요약/키워드: voice conversion

검색결과 64건 처리시간 0.023초

고음질을 갖는 음색변경에 관한 연구 (A Study on the Voice Conversion Algorithm with High Quality)

  • 박형빈;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 제13회 신호처리 합동 학술대회 논문집
    • /
    • pp.157-160
    • /
    • 2000
  • In the generally a voice conversion has used VQ(Vector Quantization) for partitioning the spectral feature and has performed by adding an appropriate offset vector to the source speaker's spectral vector. But there is not represented the target speaker's various characteristics because of discrete characteristics of transformed parameter. In this paper, these problems are solved by using the LMR(Linear Multivariate Regression) instead of the mapping codebook which is determined to the relationship of source and target speaker vocal tract characteristics. Also we propose the method for solved the discontinuity which is caused by applying to time aligned parameters using Dynamic Time Warping the time or pitch-scale modified speech. In our proposed algorithm for overcoming the transitional discontinuities, first of all, we don't change time or pitch scale and by using the LMR change a speaker's vocal tract characteristics in speech with non-modified time or pitch. Compared to existed methods based on VQ and LMR, we have much better voice quality in the result of the proposed algorithm.

  • PDF

단위 선택 기반의 음성 변환 (Feature Selection-based Voice Transformation)

  • 이기승
    • 한국음향학회지
    • /
    • 제31권1호
    • /
    • pp.39-50
    • /
    • 2012
  • A voice transformation (VT) method that can make the utterance of a source speaker mimic that of a target speaker is described. Speaker individuality transformation is achieved by altering three feature parameters, which include the LPC cepstrum, pitch period and gain. The main objective of this study involves construction of an optimal sequence of features selected from a target speaker's database, to maximize both the correlation probabilities between the transformed and the source features and the likelihood of the transformed features with respect to the target model. A set of two-pass conversion rules is proposed, where the feature parameters are first selected from a database then the optimal sequence of the feature parameters is then constructed in the second pass. The conversion rules were developed using a statistical approach that employed a maximum likelihood criterion. In constructing an optimal sequence of the features, a hidden Markov model (HMM) was employed to find the most likely combination of the features with respect to the target speaker's model. The effectiveness of the proposed transformation method was evaluated using objective tests and informal listening tests. We confirmed that the proposed method leads to perceptually more preferred results, compared with the conventional methods.

RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장 (Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation)

  • 박보경;박소민;홍현기
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제12권7호
    • /
    • pp.303-314
    • /
    • 2023
  • 음성 변환(Voice Conversion)은 개인의 음성 데이터를 다른 사람의 음향적 특성(음조, 리듬, 성별 등)으로 재생성할 수 있는 기술로, 교육, 의사소통, 엔터테인먼트 등 다양한 분야에서 활용되고 있다. 본 논문은 StarGAN-VC 모델을 기반으로 한 접근 방식을 제안하여, 병렬 발화(Utterance) 없이도 현실적인 음성을 생성할 수 있다. 고정된 원본(source) 및 목표(target)화자 정보의 원핫 벡터(One-hot vector)를 이용하는 기존 StarGAN-VC 모델의 제약을 극복하기 위해, 본 논문에서는 사전 훈련된 Rawnet3를 사용하여 목표화자의 특징 벡터를 추출한다. 이를 통해 음성 변환은 직접적인 화자 간 매핑 없이 잠재 공간(latent space)에서 이루어져 many-to-many를 넘어서 any-to-any 구조가 가능하다. 기존 StarGAN-VC 모델에서 사용된 손실함수 외에도, Wasserstein-1 거리를 사용하여 생성된 음성 세그먼트가 목표 음성의 음향적 특성과 일치하도록 보장했다. 또한, 안정적인 훈련을 위해 Two Time-Scale Update Rule (TTUR)을 사용한다. 본 논문에서 제시한 평가 지표들을 적용한 실험 결과에 따르면, 제한된 목소리 변환만이 가능한 기존 StarGAN-VC 기법 대비, 본 논문의 제안 방법을 통해 다양한 발화자에 대한 성능이 개선된 음성 변환을 제공할 수 있음을 정량적으로 확인하였다.

음원 모델에 기초한 합성음의 피치 조절 (Pitch Modification based on a Voice Source Model)

  • 최용진;여수진;김진영;성굉모
    • 음성과학
    • /
    • 제3권
    • /
    • pp.132-147
    • /
    • 1998
  • Previously developed methods for pitch modification have not been based on the voice source model. Therefore, the synthesized speech often sounds unnatural although it may be highly intelligible. The purpose of this paper is to analyze the alteration of a voice source signal with pitch period and to establish the pitch-modification rule based on the result of this analysis. We examine the alteration of the interval of closing phase, closed phase and open phase using the excitation waveform as the pitch increases. In comparison to the previous methods which performed directly on the speech signal, the pitch modification method based on a voice source model shows high intelligibility and naturalness. This study might benefit the application to the speaker identification and the voice color conversion. Therefore the proposed method will provide high quality synthetic speech.

  • PDF

표본화율 변환을 이용한 합성음의 운율제어 (Prosody Control of the Synthetic Speech using Sampling Rate Conversion)

  • 이현구;홍광석
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1999년도 추계종합학술대회 논문집
    • /
    • pp.676-679
    • /
    • 1999
  • In this paper, we presents a method to control prosody of the synthetic speech using sampling rate conversion technique. In prosody control, the conventional methods perform overlap and add. So the synthetic speech has a distortion and the voice quality is not satisfied. Using sampling rate conversion technique, we can get high Qualify of the synthetic speech. Also we can control various talking speeds according to speaker's patterns.

  • PDF

카운터테너의 음성학적 분석 (Voice Analysis of Countertenors)

  • 정성민;김문정;윤선옥;신혜정;박수경;신유리;권영경
    • 대한후두음성언어의학회지
    • /
    • 제12권1호
    • /
    • pp.39-45
    • /
    • 2001
  • Background and Objectives : A post-pubescent male classical singer has lower vocal register than a female classical singer. Countertenors who can produce higher vocal register like female classical singers with their falsetto voice and head resonance are recently active. The general purpose of this study is to analyze voice of countertenors and to determine the differences with those of classical singers. Materials and Methods : Four countertenors in Korea were examined using a videostrobos-copy and their voice were analyzed using aerodynamic, acoustic and voice range profile methods. Results and Conclusion : Countertenors could produce elevated fundamental frequency, voice intensity and mean air flow rate using large pulmonary capacity and head voiced falsetto. It means the presence of greater energy in countertenor is due to the more efficient conversion of the air flow to acoustic energy. But, they had unstable amplitude perturbation per each vocal cycle. The results indicated that countertenor is the acoustic products of different laryngeal mechanism with other classical register and it can be recognized as one of the registers of male classical singers.

  • PDF

Text-to-speech 시스템에서의 화자 변환 기능 구현 (Implementation of the Voice Conversion in the Text-to-speech System)

  • 황철규;김형순
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1999년도 학술발표대회 논문집 제18권 2호
    • /
    • pp.33-36
    • /
    • 1999
  • 본 논문에서는 기존의 text-to-speech(TTS) 합성방식이 미리 정해진 화자에 의한 단조로운 합성음을 가지는 문제를 극복하기 위하여, 임의의 화자의 음색을 표현할 수 있는 화자 변환(Voice Conversion) 기능을 구현하였다. 구현된 방식은 화자의 음향공간을 Gaussian Mixture Model(GMM)로 모델링하여 연속 확률 분포에 따른 화자 변환을 가능케 했다. 원시화자(source)와 목적화자(target)간의 특징 벡터의 joint density function을 이용하여 목적화자의 음향공간 특징벡터와 변환된 벡터간의 제곱오류를 최소화하는 변환 함수를 구하였으며, 구해진 변환 함수로 벡터 mapping에 의한 스펙트럼 포락선을 변환했다. 운율 변환은 음성 신호를 정현파 모델에 의해서 모델링하고, 분석된 운율 정보(피치, 지속 시간)는 평균값을 고려해서 변환했다. 성능 평가를 위해서 VQ mapping 방법을 함께 구현하여 각각의 정규화된 켑스트럼 거리를 구해서 성능을 비교 평가하였다. 합성시에는 ABS-OLA 기반의 정현파 모델링 방식을 채택함으로써 자연스러운 합성음을 생성할 수 있었다.

  • PDF

방화벽 문제를 극복한 인터넷 전화 서비스의 구현 (Implementation of an Internet Telephony Service that Overcomes the Firewall Problem)

  • 손주영
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제27권1호
    • /
    • pp.65-75
    • /
    • 2003
  • The internet telephony service is one of the successful internet application services. VoIP is the key technology for the service to come true. VoIP uses H.323 or SIP as the standard protocol for the distributed multimedia services over the internet environment, in which QoS is not guaranteed. VoIP carries the packetized voice by using the RTP/UDP/IP protocol stack. The UDP-based internet services cause the data transmission problem to the users behind the internet firewall. So does the internet telephony service. The users are not able to listen the voices of the counter-parts on the public internet or PSTN. It makes the problem more difficult that the internet telephony service addressed in this paper uses only one UDP port number to send the voice data of all sessions from gateway to terminal node. In this paper, two schemes including the usage of dummy UDP datagrams, and the protocol conversion are suggested. The implementation of one of the schemes, the protocol conversion, and the performance evaluation are described in detail.

산업용 다관절로봇 음성제어솔루션 설계 (Design of Voice Control Solution for Industrial Articulated Robot)

  • 곽광진;김대연;박정민
    • 한국인터넷방송통신학회논문지
    • /
    • 제21권2호
    • /
    • pp.55-60
    • /
    • 2021
  • 스마트 팩토리화가 진행됨에 따라 자동화 설비 및 로봇의 활용이 늘어나고 있다. 또한 IT 기술의 발달로 음성인식을 활용한 시스템의 활용도도 올라가고 있다. 음성인식 기술은 스마트홈과 각종 IoT 기술에서 두각을 나타내고 있는 기술이지만 공장의 특수성으로 공장에 적용되기 힘든 상황에 있다. 따라서 본 연구에서는 제조 현장의 상황을 고려한 음성인식 기술을 활용하여 산업용 다관절 로봇을 제어하는 방법을 설계하였다. 모바일을 통해 로봇 조작을 위한 음성명령을입력 받은 후 네트워크 프로토콜 변환 및 명령어 변환 과정을 거쳐 로봇을 제어할 수 있음을 확인하였다.

A nonlinear transformation methods for GMM to improve over-smoothing effect

  • Chae, Yi Geun
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제38권2호
    • /
    • pp.182-187
    • /
    • 2014
  • We propose nonlinear GMM-based transformation functions in an attempt to deal with the over-smoothing effects of linear transformation for voice processing. The proposed methods adopt RBF networks as a local transformation function to overcome the drawbacks of global nonlinear transformation functions. In order to obtain high-quality modifications of speech signals, our voice conversion is implemented using the Harmonic plus Noise Model analysis/synthesis framework. Experimental results are reported on the English corpus, MOCHA-TIMIT.