• Title/Summary/Keyword: 보코더

Search Result 25, Processing Time 0.029 seconds

Transmission Effect Analysis of security communication using MELP encoding scheme in the HF communication (HF통신에서 MELP 부호화방식을 이용한 보안통신의 전송영향 분석)

  • Lee, Hyun-Su;Hong, Jin-Keun;Han, Kun-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.4
    • /
    • pp.1000-1005
    • /
    • 2008
  • The US government has designed new military standard vocoder algorithm, which is called MELP algorithm, to provide robust communication performance on poor channel environment. In this paper, we analyze transmission effect of security communication in MELP vocoder environment of HF channel. MELP vocoder develop properly application in environment of HF channel and influence of MELP vocoder and channel encoding apply to in envitonment of wireless burst and performance of plaintext communication and security communication study a matter from analysis of MOS and spectrum analysis.

Implementation of the Variable Bit Rate Vocoder Using G.729 Vocoder (G.729 음성 보코더를 이용한 가변 전송율 보코더 구현)

  • Ham MyungKyu;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.73-76
    • /
    • 2002
  • 본 논문에서는 8kbps의 전송율을 가진 ITU G.729 보코더와 PSOLA(Pitch Synchronized Overlap -Add) 알고리즘을 적용하여 전송율을 6kbps와 4kbp까지 낮출 수 있는 가변 전송율 보코더를 구현하였다. 제안한 방법은 4kbps일 경우에 G.729의 부호화전에 PSOLA를 적용하여 피치의 주기를 반으로 줄여 부호화한다. 이렇게 부호화된 데이터는 G.729의 복호화를 거치고 다시 PSOLA를 통해 음성의 피치 주기를 2배로 늘려주어 원음성을 합성하게된다. 기존의 Bkbp의 전송율을 갖는 G.729는 음성의 크기가 반으로 줄어 부호화되므로 전송율이 4kpb로 줄어들게 된다. 실험의 평가는 MOS 테스트를 통해 수행되었으며 4kbp에서 MOS값이 3.37정도로 측정되었다. 또한 처리해야할 음성의 길이가 줄어들게 되므로 계산시간도 줄어들게 된다.

  • PDF

A Study on Improving Pitch Search for Vocoder (보코더에서 피치검색 성능개선에 관한 연구)

  • Baek, Geum-Ran;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.7
    • /
    • pp.419-426
    • /
    • 2012
  • The pitch searching is a vital process in a vocoder. Generally, the method of pitch searching is employed after highlighting the periodicity, where a correlation is identified with the signal by changing the interval of two pulses. When the correlation value reaches the peak, the pitch can be found by the pulse interval because it is the repetition interval with most striking period. However if the identified period happens to be one of half period, double period or triple period, this cannot be considered as the pitch period. Many methods were suggested to solve this problem. An inaccurate pitch could be obtained as well, when there is an interval where signal amplitude is not constant but varies abruptly in the frame. To solve this matter, searching the pitch by dividing a frame into various subframes is adopted, but too much calculation has to be followed while it leads the correct value. This paper suggests an algorithm to resolve these two problems. First, to search the pitch after advance correction of the signal energy level with an estimated overall energy change ratio in the frame before pitch search to reduce half period, double period and triple period is suggested. Second, to vary the number of subframes by predicting the amplitude change rate in the frame by the energy ratio obtained by the above-mentioned method is advised. If these two methods are applied, the pitch searching time can be reduced and the general pitch searching performance can be improved without affecting the sound quality in the synthesized signal.

Performance Comparison of State-of-the-Art Vocoder Technology Based on Deep Learning in a Korean TTS System (한국어 TTS 시스템에서 딥러닝 기반 최첨단 보코더 기술 성능 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.2
    • /
    • pp.509-514
    • /
    • 2020
  • The conventional TTS system consists of several modules, including text preprocessing, parsing analysis, grapheme-to-phoneme conversion, boundary analysis, prosody control, acoustic feature generation by acoustic model, and synthesized speech generation. But TTS system with deep learning is composed of Text2Mel process that generates spectrogram from text, and vocoder that synthesizes speech signals from spectrogram. In this paper, for the optimal Korean TTS system construction we apply Tacotron2 to Tex2Mel process, and as a vocoder we introduce the methods such as WaveNet, WaveRNN, and WaveGlow, and implement them to verify and compare their performance. Experimental results show that WaveNet has the highest MOS and the trained model is hundreds of megabytes in size, but the synthesis time is about 50 times the real time. WaveRNN shows MOS performance similar to that of WaveNet and the model size is several tens of megabytes, but this method also cannot be processed in real time. WaveGlow can handle real-time processing, but the model is several GB in size and MOS is the worst of the three vocoders. From the results of this study, the reference criteria for selecting the appropriate method according to the hardware environment in the field of applying the TTS system are presented in this paper.

Comparison of Korean Real-time Text-to-Speech Technology Based on Deep Learning (딥러닝 기반 한국어 실시간 TTS 기술 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.640-645
    • /
    • 2021
  • The deep learning based end-to-end TTS system consists of Text2Mel module that generates spectrogram from text, and vocoder module that synthesizes speech signals from spectrogram. Recently, by applying deep learning technology to the TTS system the intelligibility and naturalness of the synthesized speech is as improved as human vocalization. However, it has the disadvantage that the inference speed for synthesizing speech is very slow compared to the conventional method. The inference speed can be improved by applying the non-autoregressive method which can generate speech samples in parallel independent of previously generated samples. In this paper, we introduce FastSpeech, FastSpeech 2, and FastPitch as Text2Mel technology, and Parallel WaveGAN, Multi-band MelGAN, and WaveGlow as vocoder technology applying non-autoregressive method. And we implement them to verify whether it can be processed in real time. Experimental results show that by the obtained RTF all the presented methods are sufficiently capable of real-time processing. And it can be seen that the size of the learned model is about tens to hundreds of megabytes except WaveGlow, and it can be applied to the embedded environment where the memory is limited.

The Research about Voice Transmission between CDMA Network and PSTN Network Using CDMA Circuit Data Service (CDMA 회선 데이터 서비스를 이용한 CDMA망과 PSTN 망간의 음성 전송에 관한 연구)

  • Park, Yong-Seok;Ahn, Jae-Hwan;Ryou, Jae-Cheol
    • The KIPS Transactions:PartC
    • /
    • v.15C no.5
    • /
    • pp.367-374
    • /
    • 2008
  • To realize the voice privacy between CDMA mobile phone and PSTN terminal, the voice frames shall be transmitted transparently between the heterogeneous networks. For satisfying this requirement, we propose the method which transmits voice frames using the CDMA circuit data channel in real time. In this paper we analyze the causes of voice delay which occurs during voice transmission using circuit data channel. And in order to overcome this kind of delay, the technique controlling the TCP control flag and the variable audio block construction algorithm according to the vocoder output rate are proposed. As a result of experimenting by applying the proposed method, we confirmed that the transit delay was improved with about average 70%.

A Study of Real-Time Implementation of Audio/Data Processor for Digital/Analog Dual mode Mobile Phone (디지탈/아날로그 겸용 이동통신 단말기를 위한 오디오/데이타 프로세서의 실시간 구현에 관한 연구)

  • Byun, Kyung-Jin;Kim, Jong-Jae;Han, Ki-Chun;Yoo, Hah-Young;Cha, Jin-Jong;Kim, Kyung-Su
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.80-88
    • /
    • 1997
  • In this paper, the implementation of audio/data processor using ETRI DSP to support analog mode in digital/analog dual mode mobile phone is presented. Audio/data processor performs the wideband data processing, audio signal processing, demodulation function, and data rate conversion when it is operated in analog mode. These functions are programmed in assembly language, and then loaded to ETRI DSP together with vocoder program for the digital mode operation. This is a very efficient implementation of the dual mode cellular phone ASIC since the vocoder for the digital mode and audio/data processor for the analog mode are programmed together in the same hardware.

  • PDF

A study on the improvement of generation speed and speech quality for a granularized emotional speech synthesis system (세밀한 감정 음성 합성 시스템의 속도와 합성음의 음질 개선 연구)

  • Um, Se-Yun;Oh, Sangshin;Jang, Inseon;Ahn, Chung-hyun;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.453-455
    • /
    • 2020
  • 본 논문은 시각 장애인을 위한 감정 음성 자막 서비스를 생성하는 종단 간(end-to-end) 감정 음성 합성 시스템(emotional text-to-speech synthesis system, TTS)의 음성 합성 속도를 높이면서도 합성음의 음질을 향상시키는 방법을 제안한다. 기존에 사용했던 전역 스타일 토큰(Global Style Token, GST)을 이용한 감정 음성 합성 방법은 다양한 감정을 표현할 수 있는 장점을 갖고 있으나, 합성음을 생성하는데 필요한 시간이 길고 학습할 데이터의 동적 영역을 효과적으로 처리하지 않으면 합성음에 클리핑(clipping) 현상이 발생하는 등 음질이 저하되는 양상을 보였다. 이를 보안하기 위해 본 논문에서는 새로운 데이터 전처리 과정을 도입하였고 기존의 보코더(vocoder)인 웨이브넷(WaveNet)을 웨이브알엔엔(WaveRNN)으로 대체하여 생성 속도와 음질 측면에서 개선됨을 보였다.

  • PDF

Blind Classification of Speech Compression Methods using Structural Analysis of Bitstreams (비트스트림의 구조 분석을 이용한 음성 부호화 방식 추정 기법)

  • Yoo, Hoon;Park, Cheol-Sun;Park, Young-Mi;Kim, Jong-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.1
    • /
    • pp.59-64
    • /
    • 2012
  • This paper addresses a blind estimation and classification algorithm of the speech compression methods by using analysis on the structure of compressed bitstreams. Various speech compression methods including vocoders are developed in order to transmit or store the speech signals at very low bitrates. As a key feature, the vocoders contain the block structure inevitably. In classification of each compression method, we use the Measure of Inter-Block Correlation (MIBC) to check whether the bitstream includes the block structure or not, and to estimate the block length. Moreover, for the compression methods with the same block length, the proposed algorithm estimates the corresponding compression method correctly by using that each compression method has different correlation characteristics in each bit location. Experimental results indicate that the proposed algorithm classifies the speech compression methods robustly for various types and lengths of speech signals in noisy environment.

An Algorithm on Improving a Pitch Searching by Energy Compensation in a Frame for Vocoder (보코더에서 프레임별 에너지 보상에 의한 피치검색 성능 개선에 관한 연구)

  • Baek, Geum-Ran;Min, So-Yeon;Bae, Myung-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.7
    • /
    • pp.3188-3193
    • /
    • 2012
  • It is important to search a pitch for vocoder. The major drawback to vocoders is their large computational requirements in searching a pitch and a codebook. In this paper, a simple method is proposed to improve the pitch searching process in the pitch filter almost without degradation of quality. The period of speech signal is emphasized by using Dual Pulse technique, the same type of autocorrelation method, in pitch search. Sometimes the incorrect pitch can be obtained by halving, doubling and trifling, To solve it, before searching a pitch, we estimate energy rate in a frame and compensate envelop of signal with it. By using the proposed algorithm in pitch search, its required computation are reduced and searching pitch is improved.