Search | Korea Science

A Study on Implementation of Emotional Speech Synthesis System using Variable Prosody Model (가변 운율 모델링을 이용한 고음질 감정 음성합성기 구현에 관한 연구)

Min, So-Yeon;Na, Deok-Su
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.14 no.8
- /
- pp.3992-3998
- /
- 2013
This paper is related to the method of adding a emotional speech corpus to a high-quality large corpus based speech synthesizer, and generating various synthesized speech. We made the emotional speech corpus as a form which can be used in waveform concatenated speech synthesizer, and have implemented the speech synthesizer that can be generated various synthesized speech through the same synthetic unit selection process of normal speech synthesizer. We used a markup language for emotional input text. Emotional speech is generated when the input text is matched as much as the length of intonation phrase in emotional speech corpus, but in the other case normal speech is generated. The BIs(Break Index) of emotional speech is more irregular than normal speech. Therefore, it becomes difficult to use the BIs generated in a synthesizer as it is. In order to solve this problem we applied the Variable Break[3] modeling. We used the Japanese speech synthesizer for experiment. As a result we obtained the natural emotional synthesized speech using the break prediction module for normal speech synthesize.
https://doi.org/10.5762/KAIS.2013.14.8.3992 인용 PDF KSCI

On a Duration Control Method of Speech Waveform by an Automatic Pitch Point Detection (자동 피치시점 검출에 의한 음성신호의 지속시간 조절 법에 관한 연구)

Park Won;Park HyungBin;Bae MyungJin
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.217-220
- /
- 2000
일반적으로 고음질 음성합성을 하기 위해서는 합성음의 지속 시간을 변경하여 줌으로써 운율을 조절하는 기법이 필요하다 이에 먼저 고음질용 음성부호화법을 선정하여야 하고 정확한 피치와 피치시점검출을 통해서 음원분류가 되어야한다. 본 논문에서는 제안한 자동 피치시점 검출을 적용해서 운율조절에 필요한 지속시간 조절 법을 제안하고자 한다. 제안한 방법은 시간영역에서 직접 처리하기 때문에 피치동기분석이 용이하고 다른 영역으로의 변환과정이 불필요하다. 결과적으로 파형부호화법을 적용하고 제안한 자동 피치서점 검출에 의한 지속시간 조절법을 적용하였을 때 비교적 우수한 결과를 얻을 수 있었다.
PDF

A Study on Real Time Pitch Alteration of Speech Signal (음성신호의 실시간 피치변경에 관한 연구)

김종국;박형빈;배명진
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.1
- /
- pp.82-89
- /
- 2004
This paper describes how to reduce the effect of an occupation threshold by that the transform of mixture components of HMM parameters is controlled in hierarchical tree structure to prevent from over-adaptation. To reduce correlations between data elements and to remove elements with less variance, we employ PCA (principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible, and decline the effect of over-adaptation. When we set lower occupation threshold and increase the number of transformation function, ordinary WLLR adaptation algorithm represents lower recognition rate than SI models, whereas the proposed MLLR adaptation algorithm represents the improvement of over 2% for the word recognition rate as compared to performance of SI models.
PDF KSCI

Scanning Attack by using SIP message and Detection Method in VoLTE (VoLTE에서의 SIP 메시지를 이용한 스캐닝 공격 및 탐지 방법)

Park, Seong Min;Cho, Jun Jyung;Kim, Se Kwon;Im, Chae Tae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.11a
- /
- pp.449-452
- /
- 2014
최근 이동통신 사업자들은 All-IP 기반의 서비스를 개발하고 상용화하기 위해 힘쓰고 있다. 그 이유는 All-IP 기반의 서비스가 LTE의 넓은 대역폭을 사용하여 기존 서비스와는 현저한 차별성을 가지고 있기 때문이다. 음성통화를 LTE 기반으로 제공하는 VoLTE 서비스도 그 중의 하나로서 현재 이동통신 3사 모두 상용화하여 이 새로운 고음질 및 고화질 커뮤니케이션 서비스에 대해 마케팅을 벌이고 있다. 하지만 VoLTE 서비스는 보안에 대한 충분한 고려가 이루어지지 않은 상태로 상용화되었으며, VoLTE에서 사용되는 SIP(Session Initiation Protocol) 프로토콜을 악용한 여러 유형의 공격에 매우 취약하다. 본 논문에서는 VoLTE 서비스에 대한 보안 위협 중 가장 기본이 되는 스캐닝 공격에 대해 기술하고 이를 탐지할 수 있는 방안을 제시한다.
https://doi.org/10.3745/PKIPS.y2014m11a.449 인용 PDF

On a Multiband Nonuniform Samping Technique with a Gaussian Noise Codebook for Speech Coding (가우시안 코드북을 갖는 다중대역 비균일 음성 표본화법)

Chung, Hyung-Goue;Bae, Myung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.6
- /
- pp.110-114
- /
- 1997
When applying the nonuniform sampling to noisy speech signal, the required data rate increases to be comparable to or more than that by uniform sampling such as PCM. To solve this problem, we have proposed the waveform coding method, multiband nonuniform waveform coding(MNWC), applying the nonuniform sampling to band-separated speech signal[7]. However, the speech quality is deteriorated when it is compared to the uniform sampling method, since the high band is simply modeled as a Gaussian noise with average level. In this paper, as a good method to overcome this drawback, the high band is modeled as one of 16 codewords having different center frequencies. By doing this, with maintaining high speech quality as MOS score of average 3.16, the proposed method achieves 1.5 times higher compression ratio than that of the conventional nonuniform sampling method(CNSM).
PDF

Real-Time H/W Implementation of RPE-LTP Speech Coder for Digital Mobile Communications (디지틀 이동 통신용 RPE-LTP 음성 부호화기의 실시간 H/W 구현)

김선영;김재공
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.16 no.1
- /
- pp.85-100
- /
- 1991
In the discussion of digital mobile communication systems the speech coder based on the high quality low bit rate is an essential part of topics to overcome the limited availability of radio spectrum, which will enhance the communication services. In this paper we present the implementation and performance evaluation of 13kbps RPE LTP speech coder. An implementation of a real time full duplex coder with 75% of DSP loading rate using a single DSP chip has been shown, and also the fixed point simulations for H/W implementation has been performed. Finally, analysis result for relative bit importance of each transmitting parameter has been shown for channel coding.
PDF

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

Lee Ki-Seung
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.3
- /
- pp.150-159
- /
- 2005
This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.
PDF KSCI

On the Reduction of Pitch Search Time for G.723.1 Using the Skipping Technique (G.723.1에서 Skipping Technique을 이용한 피치검색시간 단축에 관한 연구)

김정진
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.285-288
- /
- 1998
G.723.1은 저 전송률 환경에서 고음질을 제공하여 주고 있으나 CELP형 부호화기가 갖는 합성에 의한 분석(analysis by synthesis) 방식의 구조로 인해 많은 처리 시간과 계산량을 요구하게 된다. 본 논문에서는 G.723.1에 대해 skipping 기법을 이용하여 피치 검색과정이 계산량을 줄여 부호화기의 전체 처리 시간을 감소시키는 방법을 제안하였다. 예측 피치를 찾기 위한 개회로 피치 예측(open loop pitch estimation) 과정에서 계산량을 줄이기 위해 skipping 기법을 사용하였다. 피치 예측 과정시 상관관계를 파형은 양과 음의 파형이 교대로 나타나는 특징을 가지고 있기 때문에 계산시 음의 파형을 생략하는 방법을 사용하였다. 실제 음성시료에 대해 제안한 피치 검색법을 적용하였을 때 부호화시 평균 처리시간은 약 10%정도 감소하였으며 기존 G.723.1과 제안한 방법을 적용한 G.723.1의 음질 비교를 위하여 MOS 평가를 했을 때 기존의 방법이 평균 3.76인데 비해 제안한 방법의 평균 MOS는 3.73으로 주관적인 음질 저하는 거의 나타나지 않았다.
PDF

A Study on the Pitch Alteration Technique by Sub-band Linear Approximation in Spectrum (서브밴드 선형근사에 의한 피치변경법에 관한 연구)

김영규;김봉영;배명진
- Proceedings of the IEEK Conference
- /
- 2003.07e
- /
- pp.2423-2426
- /
- 2003
음성합성은 합성방식에 따라 파형부호화법, 신호원부호화법, 혼성부호화법으로 분류할 수 있다. 특히 고음질 합성을 위해서는 파형부호화를 이용한 합성방식이 적합하다 하지만 파형부호화를 이용한 합성법은 여기 성분과 여파기 성분을 분리하지 않고 처리하기 때문에 음절단위나 음소단위의 합성기법으로는 바람직하지 못하다. 따라서 파형부호화법을 규칙에 의한 합성에 적용되도록 음원피치를 변경시키기 위한 피치 변경법이 필요하게 된다. 본 논문에서는 스펙트럼 왜곡을 최소화하기 위해 서브 선형근사에 의하여 스펙트럼 평탄화 시킨 후 스펙트럼 스케일링을 이용하여 피치를 변경하는 방법에 대하여 제안하였다. 기존 방법인 LPC법, Cepstrum법과 비교하여 어느 정도의 우수성을 보이는지 평가하였고 평가방법은 각각의 평탄화 된 신호의 분산을 구하여 평탄화의 정도를 측정하였다. 이때 평탄화 된 신호는 최고점이 영이 되도록 정규화 시키고 평균이 영인 분산을 계산하였다. 제안한 방법의 성능을 평가하기 위해 스펙트럼 왜곡율을 측정하여 본 결과 평균 스펙트럼 왜곡율은 평균 2.12％ 이하로 유지되었으며 실험결과 제안한 방법이 기존의 방법보다 우수함을 보여주었다.
PDF

On a Pitch Alteration Technique by Cepstrum Analysis of Flatten Excitation Spectrum (평탄화된 여기 스펙트럼에서 켑스트럼 피치 변경법에 관한 연구)

조왕래;함명규;배명진
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.8
- /
- pp.82-87
- /
- 1998
음성합성은 합성방식에 따라 파형부호화법, 신호원부호화법, 혼성부호화법으로 분류 할 수 있다. 특히 고음질 합성을 위해서는 파형부호화를 이용한 합성방식이 적합하다. 그렇 지만, 파형부호화를 이용한 합성법은 여기 성분과 여파기 성분을 분리하지 않고 처리하기 때문에 음절단위나 음소단위의 합성기법으로는 바람직하지 못하다. 따라서 파형부호화법을 규칙에 의한 합성에 적용되도록 음원피치를 변경시키기 위한 피치 변경법이 필요하게 된다. 본 논문에서는 스펙트럼 왜곡을 최소화하기 위해 켑스트럼의 성질을 이용하여 피치를 변경 하는 방법에 대하여 제안하였다. 이 방법은 주파수영역상에서 여기 스펙트럼과 여파기 스펙 트럼을 분리하여 여기 스펙트럼을 여기 켑스트럼으로 변환한 후 영값 삽입이나 삭제에 의해 피치를 변경하고 스펙트럼영역에서 피치 변경된 스펙트럼을 재구성하는 기법을 적용하였다. 제안한 방법의 성능을 평가하기 위해 스펙트럼 왜곡율을 측정하여 본 결과 평균 스펙트럼 왜곡율은 평균 2.29%이하로 유지되었으며 주관적인 음질도 평균 3.74로 우수하였다.
PDF

Search Result 30, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)