• Title/Summary/Keyword: Speech and audio coding

Search Result 37, Processing Time 0.02 seconds

An Audio Coding Technique Employing the Inter-channel Phase Difference Skip (채널 간 위상차 파라미터 생략 기법을 이용한 오디오 부호화)

  • Kim, Hyun-Hwi;Kim, Rin-Chul
    • Journal of Broadcast Engineering
    • /
    • v.21 no.3
    • /
    • pp.369-379
    • /
    • 2016
  • This paper deals with an efficient method for skipping inter-channel phase differences (IPD) in the MPEG surround of the unified speech and audio coding (USAC). Based on the psycho-acoustic sensitivity on the IPD, we estimate a threshold on IPD, below which we can not notice degradation in spatial cue. We propose an IPD skip method, in which any IPDs within the threshold are set to zero and are not transmitted. The proposed IPD skip method gives about 38% savings in terms of bit amount for IPD. Nevertheless, in the MUSHRA test, the proposed method does not show any noticeable degradation in the decoded audio quality.

Multi Mode Harmonic Transform Coding for Speech and Music

  • Kim, Jonghark;Shin, Jae-Hyun;Lee, Insung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3E
    • /
    • pp.101-109
    • /
    • 2003
  • A multi-mode harmonic transform coding (MMHTC) for speech and music signals is proposed. Its structure is organized as a linear prediction model with an input of harmonic and transform-based excitation. The proposed coder also utilizes harmonic prediction and an improved quantizer of excitation signal. To efficiently quantize the excitation of music signals, the modulated lapped transform(MLT) is introduced. In other words, the coder combines both the time domain (linear prediction) and the frequency domain technique to achieve the best perceptual quality. The proposed coder showed better speech quality than that of the 8 kbps QCELP coder at a bit-rate of 4 kbps.

Audio /Speech Codec Using Variable Delay MDCT/IMDCT (가변 지연 MDCT/IMDCT를 이용한 오디오/음성 코덱)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.2
    • /
    • pp.69-76
    • /
    • 2023
  • A high-quality audio/voice codec using the MDCT/IMDCT process can perfectly restore the current frame through an overlap-add process with the previous frame. In the overlap-add process, an algorithm delay equal to the frame length occurs. In this paper, we propose a MDCT/IMDCT process that reduces algorithm delay by using a variable phase shift in MDCT/IMDCT process. In this paper, a low-delay audio/speech codec was proposed by applying the low delay MDCT/IMDCT algorithm to the ITU-T standard codec G.729.1 codec. The algorithm delay in the MDCT/IMDCT process can be reduced from 20 ms to 1.25 ms. The performance of the decoded output signal of the audio/speech codec to which low-delay MDCT/IMDCT is applied is evaluated through the PESQ test, which is an objective quality test method. Despite of the reduction in transmission delay, it was confirmed that there is no difference in sound quality from the conventional method.

Preprocessing method for enhancing digital audio quality in speech communication system (음성통신망에서 디지털 오디오 신호 음질개선을 위한 전처리방법)

  • Song Geun-Bae;Ahn Chul-Yong;Kim Jae-Bum;Park Ho-Chong;Kim Austin
    • Journal of Broadcast Engineering
    • /
    • v.11 no.2 s.31
    • /
    • pp.200-206
    • /
    • 2006
  • This paper presents a preprocessing method to modify the input audio signals of a speech coder to obtain the finally enhanced signals at the decoder. For the purpose, we introduce the noise suppression (NS) scheme and the adaptive gain control (AGC) where an audio input and its coding error are considered as a noisy signal and a noise, respectively. The coding error is suppressed from the input and then the suppressed input is level aligned to the original input by the following AGC operation. Consequently, this preprocessing method makes the spectral energy of the music input redistributed all over the spectral domain so that the preprocessed music can be coded more effectively by the following coder. As an artifact, this procedure needs an additional encoding pass to calculate the coding error. However, it provides a generalized formulation applicable to a lot of existing speech coders. By preference listening tests, it was indicated that the proposed approach produces significant enhancements in the perceived music qualities.

Study on Noise Filling algorithm of Unified Speech and Audio Coding (통합 음성/오디오 부호화기의 Noise Filling 알고리즘에 대한 연구)

  • Song, Jeongook;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.260-261
    • /
    • 2012
  • 본 논문에서는 Unified Speech and Audio Coding (USAC)에 적용된 Noise Filling의 부호화 과정에서 음질 왜곡 정도에 따라 Noise level을 설정하는 방법을 제안한다. USAC는 Moving Picture Experts Group (MPEG)에서 표준화한 최신의 음성/오디오 통합 코덱으로 현존하는 코덱 중에 최고의 성능을 가지고 있다. 하지만, 복호화기 기술만 표준화하여, 인코더를 설계하는 방법에 따라 음질의 차이가 존재한다 현재 오픈 소스 기반으로 진행되고 있는 프로젝트 JAME에서는 이러한 음질 차이를 극복하고, USAC에 적용된 핵섬 인코더 기술의 성능을 최대화 할 수 있는 여러 가지 방법을 포함하고 있다. 그 중 Noise Filling은 저 전송률 부호화 과정에서 양자화 되지 않는 스펙트럼에 대하여 일정한 noise level을 넣어 인지적으로 음질을 향상시키는 방법이다. 제안된 Noise Filling 부호화 방법은 현재 프레임의 음질 왜곡 정도를 반영하여, noise-like 신호 성분을 더욱 정교하게 부호화 할 수 있게 하였다.

  • PDF

Implementation of a High-Quality Audio Collaboration System Over IP Networks (IP 네트워크 기반 고품질 오디오 협업 시스템)

  • Kang, Jin-Ah;Kim, Hong-Kook
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.218-223
    • /
    • 2008
  • In this paper, we implement several methods to improve an audio collaboration system over IP networks, and then evaluate the performance of the implemented methods. In general, speech and audio quality degrades depending on the characteristics of IP networks such as jitter and packet loss. In order to reduce this quality degradation, we propose a lower bit rate audio delivery scheme using the MPEG-2 AAC (Advanced Audio Coding) audio codec in a viewpoint that a packet loss rate could be reduced by a smaller packet size. In addition, iLBC (Internet Low-Bitrate Codec) and the G.711 packet loss concealment algorithm defined by IEFT and ITU-T, respectively, are applied to a audio collaboration system. RAT (Robust-Audio Tool)[7] is used as a baseline platform for the implementation of the proposed methods. It is shown from the implementation that the implemented MPEG-2 AAC audio codec with a bitrate of 256 kbit/s performs as similar as the uncompressed audio quality with a bitrate of 512 kbit/s, and that iLBC and the G.711 packet loss concealment algorithm can improve speech quality when a packet loss rate is 2~10%.

  • PDF

Feature Parameter Extraction and Analysis in the Wavelet Domain for Discrimination of Music and Speech (음악과 음성 판별을 위한 웨이브렛 영역에서의 특징 파라미터)

  • Kim, Jung-Min;Bae, Keun-Sung
    • MALSORI
    • /
    • no.61
    • /
    • pp.63-74
    • /
    • 2007
  • Discrimination of music and speech from the multimedia signal is an important task in audio coding and broadcast monitoring systems. This paper deals with the problem of feature parameter extraction for discrimination of music and speech. The wavelet transform is a multi-resolution analysis method that is useful for analysis of temporal and spectral properties of non-stationary signals such as speech and audio signals. We propose new feature parameters extracted from the wavelet transformed signal for discrimination of music and speech. First, wavelet coefficients are obtained on the frame-by-frame basis. The analysis frame size is set to 20 ms. A parameter $E_{sum}$ is then defined by adding the difference of magnitude between adjacent wavelet coefficients in each scale. The maximum and minimum values of $E_{sum}$ for period of 2 seconds, which corresponds to the discrimination duration, are used as feature parameters for discrimination of music and speech. To evaluate the performance of the proposed feature parameters for music and speech discrimination, the accuracy of music and speech discrimination is measured for various types of music and speech signals. In the experiment every 2-second data is discriminated as music or speech, and about 93% of music and speech segments have been successfully detected.

  • PDF

STRUCTURED CODEWORD SEARCH FOR VECTOR QUANTIZATION (백터양자화가의 구조적 코더 찾기)

  • 우홍체
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.467-470
    • /
    • 2000
  • Vector quantization (VQ) is widely used in many high-quality and high-rate data compression applications such as speech coding, audio coding, image coding and video coding. When the size of a VQ codebook is large, the computational complexity for the full codeword search method is a significant problem for many applications. A number of complexity reduction algorithms have been proposed and investigated using such properties of the codebook as the triangle inequality. This paper proposes a new structured VQ search algorithm that is based on a multi-stage structure for searching for the best codeword. Even using only two stages, a significant complexity reduction can be obtained without any loss of quality.

  • PDF

Modified Generic Mode Coding Scheme for Enhanced Sound Quality of G.718 SWB (G.718 초광대역 코덱의 음질 향상을 위한 개선된 Generic Mode Coding 방법)

  • Cho, Keun-Seok;Jeong, Sang-Bae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.119-125
    • /
    • 2012
  • This paper describes a new algorithm for encoding spectral shape and envelope in the generic mode of G.718 super-wide band (SWB). In the G.718 SWB coder, generic mode coding and sinusoidal enhancement are used for the quantization of modified discrete cosine transform (MDCT)-based parameters in the high frequency band. In the generic mode, the high frequency band is divided into sub-bands and for every sub-band the most similar match with the selected similarity criteria is searched from the coded and envelope normalized wideband content. In order to improve the quantization scheme in high frequency region of speech/audio signals, the modified generic mode by the improvement of the generic mode in G.718 SWB is proposed. In the proposed generic mode, perceptual vector quantization of spectral envelopes and the resolution increase for spectral copy are used. The performance of the proposed algorithm is evaluated in terms of objective quality. Experimental results show that the proposed algorithm increases the quality of sounds significantly.

Enhancement of SBR for Speech Signal Using Adaptive Noise Floor Level (가변 잡음 레벨을 이용한 음성신호에 대한 SBR 성능 항상 기술)

  • Lee, Se-Won;Oh, Seoung-Jun;Ahn, Chang-Beom;Lee, Tae-Jin;Kang, Kyoung-Ok;Park, Ho-Chong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.148-154
    • /
    • 2009
  • In audio coding, SBR technology synthesizes the high-bands using patched time-frequency information from low-bands and the correction parameters, Since SBR transmits only correction parameters for high-bands, it provides a low-rate coding of high-bands, and is used as a core module of MPEG-4 HE-AAC, SBR was originally designed for audio signal and its performance for speech signal tends to decrease, and the major reason is an excessive noise floor in high-bands which is caused by incorrect tonality computation, In this paper, a new method to determine noise floor level in an adaptive fashion according to the speech characteristics is proposed in order to solve the problem of SBR for speech signal, The proposed method maintains the compatibility with the standard SBR, and the subjective performance evaluation shows that the proposed method improves the SBR performance especially for male speech signal compared with the standard SBR.