Search | Korea Science

Audio /Speech Codec Using Variable Delay MDCT/IMDCT (가변 지연 MDCT/IMDCT를 이용한 오디오/음성 코덱)

Sangkil Lee;In-Sung Lee
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.16 no.2
- /
- pp.69-76
- /
- 2023
A high-quality audio/voice codec using the MDCT/IMDCT process can perfectly restore the current frame through an overlap-add process with the previous frame. In the overlap-add process, an algorithm delay equal to the frame length occurs. In this paper, we propose a MDCT/IMDCT process that reduces algorithm delay by using a variable phase shift in MDCT/IMDCT process. In this paper, a low-delay audio/speech codec was proposed by applying the low delay MDCT/IMDCT algorithm to the ITU-T standard codec G.729.1 codec. The algorithm delay in the MDCT/IMDCT process can be reduced from 20 ms to 1.25 ms. The performance of the decoded output signal of the audio/speech codec to which low-delay MDCT/IMDCT is applied is evaluated through the PESQ test, which is an objective quality test method. Despite of the reduction in transmission delay, it was confirmed that there is no difference in sound quality from the conventional method.
https://doi.org/10.17661/jkiiect.2023.16.2.69 인용 PDF HTML

A Study on Measuring the Speaking Rate of Speaking Signal by Using Line Spectrum Pair Coefficients

Jang, Kyung-A;Bae, Myung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3E
- /
- pp.18-24
- /
- 2001
Speaking rate represents how many phonemes in speech signal have in limited time. It is various and changeable depending on the speakers and the characters of each phoneme. The preprocessing to remove the effect of variety of speaking rate is necessary before recognizing the speech in the present speech recognition systems. So if it is possible to estimate the speaking rate in advance, the performance of speech recognition can be higher. However, the conventional speech vocoder decides the transmission rate for analyzing the fixed period no regardless of the variety rate of phoneme but if the speaking rate can be estimated in advance, it is very important information of speech to use in speech coding part as well. It increases the quality of sound in vocoder as well as applies the variable transmission rate. In this paper, we propose the method for presenting the speaking rate as parameter in speech vocoder. To estimate the speaking rate, the variety of phoneme is estimated and the Line Spectrum Pairs is used to estimate it. As a result of comparing the speaking rate performance with the proposed algorithm and passivity method worked by eye, error between two methods is 5.38% about fast utterance and 1.78% about slow utterance and the accuracy between two methods is 98% about slow utterance and 94% about fast utterances in 30 dB SNR and 10 dB SNR respectively.
PDF

The Effects of Voice and Speech Intelligibility Improvements in Parkinson Disease by Training Loudness and Pitch: A Case Study (강도 및 음도 조절을 이용한 훈련이 파킨슨병 환자의 음성 및 발화명료도 개선에 미치는 효과: 사례연구)

Lee, Ok-Bun;Jeong, Ok-Ran;Ko, Do-Heung
- Speech Sciences
- /
- v.8 no.3
- /
- pp.173-184
- /
- 2001
The purpose of this study was to examine the effects of manipulating loudness and pitch in terms of speech intelligibility and voice of a patient with Parkinson's Disease. The subject, who was diagnosed as a patient with Parkinson's disease 11 years ago, demonstrated a severely breath voice with low intensity. The accuracy of articulation in consonants was intelligible only at the single word level, and the overall intelligibility in continuous speech was low. The results showed that the subject's articulation accuracy and speech intelligibility was significantly improved after having loudness and pitch training. Habitual Fo, Jitter, Shimmer, Fo tremor, Amp tremor were decreased after training. In addition, the value of HNR also increased after training. It was shown that the changes of these acoustic parameters were closely related to the decrease of breathiness in Parkinson's voice, and this decrease of breathiness affected speech intelligibility considerably. Based on the experimental results, it was claimed that the vocal training by manipulating the loudness and pitch could be highly effective in improving the voice quality and speech intelligibility in Parkinson's Disease.
PDF

Performance improvement and Realtime implementation in CELP Coder (CELP 보코더의 성능 개선 및 실시간 구현)

정창경
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06c
- /
- pp.199-204
- /
- 1994
In this paper, we researched abut CELP speech coding algorithm using efficlent pseudo-stochastic block codes, adaptive-codebook and improved fixed-gain codebook. The pseudo-stochastic block codes refer to stochastically populated block codes in which the adjacent codewords in an innovation codebook are non-independent. The adaptive-codebook was made with previous prediction speech data by storage-shift register. This CELP coding algorithm enables the coding of toll quality speech at bit rates from 4.8kbits/s to 9.6 kbits/s. This algorithm was realized TMS320C30 microprocessor in realtime.
PDF

Phonation Type Index k (발성유형지수 k)

Park Hansang
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.77-80
- /
- 2002
This study proposes phonation type index k as a descriptor of the overall spectral tilt, which is free from the effects of fundamental frequency and vowel quality. The newly proposed phonation type index k presents a simple and single measure of the overall spectral tilt. Phonation type index k can be applied to speech technology. It can also be used in diagnosing patients voice qualities in speech pathology. The distribution of phonation type index k, which is speaker-dependent, may be useful in forensic phonetics and voice recognition as an indicator of speaker identity.
PDF

Improved Excitation Coding for 13 kbps Variable Rate QCELP Coder

Kang, Sangwon;Lee, Dong-Ho
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.3E
- /
- pp.3-6
- /
- 1997
This paper reports on the optimal design of the excitation codebook in the 13 kbps variable rate QCELP coder of Korean speech. We present two optimal excitation codebooks which consist of 128 and 556 samples, respectively. For the design and test of the improved codebook, a data base of Korean speech is used. A quasi-Newton optimization algorithm was developed to design the codebook. The optimized codebook which remains sparse, can produce an average gain of 0.84 and 0.45 dB in SNR and SEGSNR respectively. Informal listening tests confirm the improvement in speech quality.
PDF

A Transcoding Algorithm from G.729A to EVRC (G.729A에서 EVRC로의 상호부호화)

곽영진;정지민;권구락;임정석;황인호;이경훈;고성제
- Proceedings of the IEEK Conference
- /
- 2003.07e
- /
- pp.2248-2251
- /
- 2003
Communication between speech networks employing different speech codecs requires interoperability. The cascade connection of two different codecs, called tandem coding, not only degrades speech quality, but also produces high computational loads. These Problems can be solved by using the transcoding algorithm. This paper presents an effective algorithm for transcoding from G.729A to EVRC and its simulation results.
PDF

MPEG-4 오디오 기술 동향

한민수;강경옥;변경진
- Broadcasting and Media Magazine
- /
- v.4 no.1
- /
- pp.62-79
- /
- 1999
In this survey paper the emerging MPEG-4 audio technology is discribed In the previous MPEG-1 and the MPEG-4 audio words, only the natural audio and the speech coding techniques were the standadization objects But in the MPEG-4 audio standadization, not only the natural audio and the speech coding, but also the structured audio and the synthetic speech techniques are inclued, The purpose of this expansion can be summarized as the preparation for the versatile high-quality multimedia services supposed emerge in the 21st century.
PDF

Tree Coding of Speech Signals (음성신호에 대한 트리 코우딩)

김경수;이상욱
- Proceedings of the Korean Institute of Communication Sciences Conference
- /
- 1984.04a
- /
- pp.18-21
- /
- 1984
In this paper, the tree coding using the (M, L) multi-path search algorithm has teen investigated. A hybrid adaptation scheme which employs a block adaptation as well as a sequential dadptation is described for application in quantization and compression of speech signals. Simulation results with the gybrid adaptation scheme indicate that a relatively good speech quality can be obtained at rate about 8Kbps. All necessary parameters such as MlL and filter-order were found from simulation and these parameters turned out to be a good compromise between the complexity and overall performance.
PDF

A 4800 BPS LPS Vocoder with Improved Exitation (개선된 여기신호의 4800BPS LPC 보코우터)

은종관;성원용
- The Journal of the Acoustical Society of Korea
- /
- v.1 no.1
- /
- pp.54-59
- /
- 1982
We present an improved 4800 bps LPC vocoder system that virtually eleminates the buzzy effect from synthetic speech. Excitation signal in the new system is formed by adding high-pass filtered pitch pulses or random noise to a baseband residual signal that has been coded by pitch predictive PCM. Since the baseband residual is used as a part of excitation, the system is also robust to V/UV and pitch errors. According to our informal listening tests, the synthetic speech of the new system does not have the buzzy effect. As a result the vocoder speech quality is more natural than that of a conventioinal LPC vocoder.
PDF

Search Result 808, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)