• Title/Summary/Keyword: 음성신호 대역

Search Result 146, Processing Time 0.027 seconds

A Study on TSIUVC Approximate-Synthesis Method using Least Mean Square and Frequency Division (주파수 분할 및 최소 자승법을 이용한 TSIUVC 근사합성법에 관한 연구)

  • 이시우
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.3
    • /
    • pp.462-468
    • /
    • 2003
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and an unvoiced consonants in a frame. So, I propose TSIUVC(Transition Segment Including Unvoiced Consonant) searching and extraction method in order to uncoexistent with a voiced and unvoiced consonants in a frame. This paper present a new method of TSIUVC approximate-synthesis by using Least Mean Square and frequency band division. As a result, this method obtain a high quality approximation-synthesis waveforms within TSIUVC by using frequency information of 0.547KHz below and 2.813KHz above. The important thing is that the maximum error signal can be made with low distortion approximation-synthesis waveform within TSIUVC. This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis and speech synthesis.

  • PDF

A Study on the Frequency Scaling Methods Using LSP Parameters Distribution Characteristics (LSP 파라미터 분포특성을 이용한 주파수대역 조절법에 관한 연구)

  • 민소연;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.304-309
    • /
    • 2002
  • We propose the computation reduction method of real root method that is mainly used in the CELP (Code Excited Linear Prediction) vocoder. The real root method is that if polynomial equations have the real roots, we are able to find those and transform them into LSP. However, this method takes much time to compute, because the root searching is processed sequentially in frequency region. In this paper, to reduce the computation time of real root, we compare the real root method with two methods. In first method, we use the mal scale of searching frequency region that is linear below 1 kHz and logarithmic above. In second method, The searching frequency region and searching interval are ordered by each coefficient's distribution. In order to compare real root method with proposed methods, we measured the following two. First, we compared the position of transformed LSP (Line Spectrum Pairs) parameters in the proposed methods with these of real root method. Second, we measured how long computation time is reduced. The experimental results of both methods that the searching time was reduced by about 47% in average without the change of LSP parameters.

Intelligibility Enhancement of Multimedia Contents Using Spectral Shaping (스펙트럼 성형기법을 이용한 멀티미디어 콘텐츠의 명료도 향상)

  • Ji, Youna;Park, Young-cheol;Hwang, Young-su
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.82-88
    • /
    • 2016
  • In this paper, we propose an intelligibility enhancement algorithm for multimedia contents using spectral shaping. The dialogue signals is essential to understand the plot of audio-visual media contents such as movie and TV. However, the non-dialogue components as like sound effects and background music often degrade the dialogue clarity. To overcome this problem, this paper tries to improves the dialogue clarity of audio soundtracks which contain important cues for the visual scenes. In the proposed method, the dialogue components are first detected by soft masker based on speech presence probability (SPP) which is widely used in speech enhancement field. Then, extracted dialogue signals are applied to the spectral shaping method. It reallocate the spectral-temporal energy of speech to enhanced the intelligibility. The total energy is maintained as unchanged via a loudness normalization process to prevent saturation. The algorithm was evaluated using the modeled and real movie soundtracks and it was shown that the proposed algorithm enhances the dialogue clarity while preserving the total audio power.

Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information (음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할)

  • 박창목;왕지남
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.24-30
    • /
    • 2001
  • This article is concerned with automatic segmentation for Korean speech signals. All kinds of transition cases of phonetic units are classified into 3 types and different strategies for each type are applied. The type 1 is the discrimination of silence, voiced-speech and unvoiced-speech. The histogram analysis of each indicators which consists of wavelet coefficients and SVF (Spectral Variation Function) in wavelet coefficients are used for type 1 segmentation. The type 2 is the discrimination of adjacent vowels. The vowel transition cases can be characterized by spectrogram. Given phonetic transcription and transition pattern spectrogram, the speech signal, having consecutive vowels, are automatically segmented by the template matching. The type 3 is the discrimination of vowel and voiced-consonants. The smoothed short-time RMS energy of Wavelet low pass component and SVF in cepstral coefficients are adopted for type 3 segmentation. The experiment is performed for 342 words utterance set. The speech data are gathered from 6 speakers. The result shows the validity of the method.

  • PDF

Low Rate Speech Coding Using the Harmonic Coding Combined with CELP Coding (하모닉 코딩과 CELP방법을 이용한 저 전송률 음성 부호화 방법)

  • 김종학;이인성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.26-34
    • /
    • 2000
  • In this paper, we propose a 4kbps speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding in the voiced frame and uses the vector excitation coding with the structure of analysis-by-synthesis in the unvoiced frame, respectively. But two mode coding method is not effective for transition frame mixed in voiced and unvoiced signal and a new method beyond using unvoiced/voiced mode coding is needed. Thus, we designed a time-separated transition coding method for transition frame in which a voiced/unvoiced decision algorithm separates unvoiced and voiced duration in a frame, and harmonic-harmonic excitation coding and vector-harmonic excitation coding method is selectively used depending on the previous frame U/V decision. In the decoder, the voiced excitation signals are generated efficiently through the inverse FFT of harmonic magnitudes and the unvoiced excitation signals are made by the inverse vector quantization. The reconstructed speech signal are synthesized by the Overlap/Add method.

  • PDF

An Implementation of Wavelet-based ISA Card for Audio Compression (음성 압축용 웨이브렛 변환 ISA 카드 구현)

  • 윤상인;백승현;황희융
    • Proceedings of the KAIS Fall Conference
    • /
    • 2000.10a
    • /
    • pp.203-207
    • /
    • 2000
  • 최근 신호 처리 분야에서 많은 연구가 되고 있는 웨이브렛 변환을 적용하고, DSP(Digital Signal Processor)인 TMS320C31을 사용하여 고속 처리 가능한 하드웨어를 구현하였다. 그리고, 컴퓨터하고 일정한 통신 대역을 유지하고 다른 장치에 영향을 주지 안기 위해서 ISA 버스를 사용하였다. 여기서는 웨이브렛 변환과 푸리에 변환의 차이 및 필터뱅크에 대해서 알아보고, DSP를 이용하여 웨이브렛 변환을 시키는 하드웨어를 구현했다.

The suppression of noise-induced speech distortions for speech recognition (음성인식을 위한 잡음하의 음성왜곡제거)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.12
    • /
    • pp.93-102
    • /
    • 1998
  • In noisy environments, human speech productions are influenced by noises(Lombard effect), and speech signals are contaminated. These distortions dramatically reduce the performance of speech recognition systems. This paper proposes a method of the Lombard effect compensation and noise suppression in order to improve speech recognition performance in noise environments. To estimate the intensity of the Lombard effect which is a nonlinear distortion depending on the ambient noise levels, speakers, and phonetic units, we formulate the measure of the Lombard effect level based on the acoustic speech signal, and the measure is used to compensate the Lombard effect. The distortions of speech under noisy environments are cancelled out as follows. First, spectral subtraction and band-pass filtering are used to cancel out noise. Second, energy nomalization is proposed to cancel out the variation of vocal intensity by the Lombard effect. Finally, the Lombard effect level controls the transform which converts Lombard speech cepstrum to clean speech cepstrum. The proposed method was validated on 50 korean word recognition. Average recognition rates were 82.6%, 95.7%, 97.6% with the proposed method, while 46.3%, 75.5%, 87.4% without any compensation at SNR 0, 10, 20 dB, respectively.

  • PDF

Real-Time Implementation of the EHSX Speech Coder Using a Floating Point DSP (부동 소수점 DSP를 이용한 4kbps EHSX 음성 부호화기의 실시간 구현)

  • 이인성;박동원;김정호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.420-427
    • /
    • 2004
  • This paper presents real time implementation of 4kbps EHSX (Enhanced Harmonic Stochastic Excitation) speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding for voiced frames and used the vector excitation coding with the structure of analysis-by-synthesis for unvoiced frames, respectively. For transition frames mixed with voiced and unvoiced signal, we use the time-separated transition coding. In this paper. we present the optimization methods of implementation speech coder on the EMS320C6701/sup (R)/ DSP. To reduce the complex for real-time implementation. we perform the optimization method in algorithm by replacing the complex sinusoidal synthesis method with IFFT. and we apply fully pipelines hand assembly coding after converting it from floating source to fixed source. To generate a more efficient code. we also make use or the available EMS320C6701/sup (R)/ resources such as Fastest67x library and memory organization.

A Novel Speech Enhancement Based on Speech/Noise-dominant Decision in Time-frequency Domain (시간-주파수 영역에서 음성/잡음 우세 결정에 의한 새로운 잡음처리)

  • 윤석현;유창동
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.48-55
    • /
    • 2001
  • A novel method to reduce additive non-stationary noise is proposed. The method requires neither the information about noise nor the estimate of the noise statistics from any pause regions. The enhancement is performed on a band-by-band basis for each time frame. Based on both the decision on whether a particular band in a frame is speech or noise dominant and the masking property of the human auditory system, an appropriate amount of noise is reduced using spectral subtraction. The proposed method was tested on various noisy conditions (car noise, Fl6 noise, white Gaussian noise, pink noise, tank noise and babble noise) and on the basis of comparing segmental SNR with spectral subtraction method and visually inspecting the enhanced spectrograms and listening to the enhanced speech, the method was able to effectively reduce various noise while minimizing distortion to speech.

  • PDF

Performance analysis of audio super-resolution based on neural networks (신경망 기반 오디오 초 해상도 기술 성능 분석)

  • Lim, Wootaek;Beack, Seungkwon;Sung, Jongmo;Lee, Taejin
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.337-339
    • /
    • 2020
  • 오디오 초 해상도 기술은 저 해상도의 오디오 신호를 이용하여 고 해상도의 오디오를 복원 또는 생성해 내는 기술이다. 본 기술 분야는 기존에 주파수 대역 확장, 인공 대역 확장 기술 등으로 연구되었으나, 최근 딥러닝 기술의 발전, 이미지 초 해상도 기술 연구 등에 힘입어 오디오 초 해상도 기술 이라는 이름으로 주로 연구되고 있다. 본 논문에서는 이러한 오디오 초 해상도 기술에 연구 동향에 대하여 설명하고, 기존의 논문 들에서 주로 다루고 있는 음성 데이터 베이스가 아닌 MedleyDB 음악 데이터 베이스를 활용하여 실험을 수행하였다. 실험은 4-폴드 교차 검증을 통해 수행되었으며, 실험 결과 제안하는 컨벌루션 신경망 구조 기반 오디오 초 해상도 기술은 입력 저해상도 오디오 대비 SNR 이 3.41 dB 향상됨을 확인하였다.

  • PDF