• Title/Summary/Keyword: Speech Enhancement

Search Result 340, Processing Time 0.031 seconds

Voice quality distinctions of the three-way stop contrast under prosodic strengthening in Korean

  • Jiyoung Jang;Sahyang Kim;Taehong Cho
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.17-24
    • /
    • 2024
  • The Korean three-way stop contrast (lenis, aspirated, fortis) is currently undergoing a sound change, such that the primary cue distinguishing lenis and aspirated stops is shifting from voice onset time (VOT) to F0. Despite recent discussions of this shift, research on voice quality, traditionally considered an additional cue signaling the contrast, remains sparse. This study investigated the extent to which the associated voice quality [as reflected in the acoustic measurements of H1*-H2*, H1*- A1*, and cepstral peak prominence (CPP)] contributes to the three-way stop contrast, and how the realization is conditioned by prominence- vs. boundary-induced prosodic strengthening amid the ongoing sound change. Results for 12 native Korean speakers indicate that there was a substantial distinction in voice quality among the three stop categories with the breathiness of the vowel being the greatest after the lenis, intermediate after the aspirated, and least after the fortis stops, indicating the role of voice quality in the maintenance of the three-way stop contrast. Furthermore, prosodic strengthening has different effects on the contrast and contributes to the enhancement of the phonological contrast contingent on whether it is induced by prominence or boundary.

Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

  • HyunJung Choi;Muyeol Choi;Seonhui Kim;Yohan Lim;Minkyu Lee;Seung Yun;Donghyun Kim;Sang Hun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.127-136
    • /
    • 2024
  • The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.

Acoustic Feedback and Noise Cancellation of Hearing Aids by Deep Learning Algorithm (심층학습 알고리즘을 이용한 보청기의 음향궤환 및 잡음 제거)

  • Lee, Haeng-Woo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.6
    • /
    • pp.1249-1256
    • /
    • 2019
  • In this paper, we propose a new algorithm to remove acoustic feedback and noise in hearing aids. Instead of using the conventional FIR structure, this algorithm is a deep learning algorithm using neural network adaptive prediction filter to improve the feedback and noise reduction performance. The feedback canceller first removes the feedback signal from the microphone signal and then removes the noise using the Wiener filter technique. Noise elimination is to estimate the speech from the speech signal containing noise using the linear prediction model according to the periodicity of the speech signal. In order to ensure stable convergence of two adaptive systems in a loop, coefficient updates of the feedback canceller and noise canceller are separated and converged using the residual error signal generated after the cancellation. In order to verify the performance of the feedback and noise canceller proposed in this study, a simulation program was written and simulated. Experimental results show that the proposed deep learning algorithm improves the signal to feedback ratio(: SFR) of about 10 dB in the feedback canceller and the signal to noise ratio enhancement(: SNRE) of about 3 dB in the noise canceller than the conventional FIR structure.

The Study on Speaker Change Verification Using SNR based weighted KL distance (SNR 기반 가중 KL 거리를 활용한 화자 변화 검증에 관한 연구)

  • Cho, Joon-Beom;Lee, Ji-eun;Lee, Kyong-Rok
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.6
    • /
    • pp.159-166
    • /
    • 2017
  • In this paper, we have experimented to improve the verification performance of speaker change detection on broadcast news. It is to enhance the input noisy speech and to apply the KL distance $D_s$ using the SNR-based weighting function $w_m$. The basic experimental system is the verification system of speaker change using GMM-UBM based KL distance D(Experiment 0). Experiment 1 applies the input noisy speech enhancement using MMSE Log-STSA. Experiment 2 applies the new KL distance $D_s$ to the system of Experiment 1. Experiments were conducted under the condition of 0% MDR in order to prevent missing information of speaker change. The FAR of Experiment 0 was 71.5%. The FAR of Experiment 1 was 67.3%, which was 4.2% higher than that of Experiment 0. The FAR of experiment 2 was 60.7%, which was 10.8% higher than that of experiment 0.

Front-End Processing for Speech Recognition in the Telephone Network (전화망에서의 음성인식을 위한 전처리 연구)

  • Jun, Won-Suk;Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we study the efficient feature vector extraction method and front-end processing to improve the performance of the speech recognition system using KT(Korea Telecommunication) database collected through various telephone channels. First of all, we compare the recognition performances of the feature vectors known to be robust to noise and environmental variation and verify the performance enhancement of the recognition system using weighted cepstral distance measure methods. The experiment result shows that the recognition rate is increasedby using both PLP(Perceptual Linear Prediction) and MFCC(Mel Frequency Cepstral Coefficient) in comparison with LPC cepstrum used in KT recognition system. In cepstral distance measure, the weighted cepstral distance measure functions such as RPS(Root Power Sums) and BPL(Band-Pass Lifter) help the recognition enhancement. The application of the spectral subtraction method decrease the recognition rate because of the effect of distortion. However, RASTA(RelAtive SpecTrAl) processing, CMS(Cepstral Mean Subtraction) and SBR(Signal Bias Removal) enhance the recognition performance. Especially, the CMS method is simple but shows high recognition enhancement. Finally, the performances of the modified methods for the real-time implementation of CMS are compared and the improved method is suggested to prevent the performance degradation.

  • PDF

Preprocessing method for enhancing digital audio quality in speech communication system (음성통신망에서 디지털 오디오 신호 음질개선을 위한 전처리방법)

  • Song Geun-Bae;Ahn Chul-Yong;Kim Jae-Bum;Park Ho-Chong;Kim Austin
    • Journal of Broadcast Engineering
    • /
    • v.11 no.2 s.31
    • /
    • pp.200-206
    • /
    • 2006
  • This paper presents a preprocessing method to modify the input audio signals of a speech coder to obtain the finally enhanced signals at the decoder. For the purpose, we introduce the noise suppression (NS) scheme and the adaptive gain control (AGC) where an audio input and its coding error are considered as a noisy signal and a noise, respectively. The coding error is suppressed from the input and then the suppressed input is level aligned to the original input by the following AGC operation. Consequently, this preprocessing method makes the spectral energy of the music input redistributed all over the spectral domain so that the preprocessed music can be coded more effectively by the following coder. As an artifact, this procedure needs an additional encoding pass to calculate the coding error. However, it provides a generalized formulation applicable to a lot of existing speech coders. By preference listening tests, it was indicated that the proposed approach produces significant enhancements in the perceived music qualities.

Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction (필터 뱅크 에너지 차감을 이용한 묵음 특징 정규화 방법의 성능 향상)

  • Shen, Guanghu;Choi, Sook-Nam;Chung, Hyun-Yeol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.7C
    • /
    • pp.604-610
    • /
    • 2010
  • In this paper we proposed FSFN (Filter bank sub-band energy subtraction based CLSFN) method to improve the recognition performance of the existing CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization). The proposed FSFN reduces the energy of noise components in filter bank sub-band domain when extracting the features from speech data. This leads to extract the enhanced cepstral features and thus improves the accuracy of speech/silence classification using the enhanced cepstral features. Therefore, it can be expected to get improved performance comparing with the existing CLSFN. Experimental results conducted on Aurora 2.0 DB showed that our proposed FSFN method improves the averaged word accuracy of 2% comparing with the conventional CLSFN method, and FSFN combined with CMVN (Cepstral Mean and Variance Normalization) also showed the best recognition performance comparing with others.

A Study on the Realization of Wireless Home Network System Using High-performance Speech Recognition in Variable Position (가변위치 고음성인식 기술을 이용한 무선 홈 네트워크 시스템 구현에 관한 연구)

  • Yoon, Jun-Chul;Choi, Sang-Bang;Park, Chan-Sub;Kim, Se-Yong;Kim, Ki-Man;Kang, Suk-Youb
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.4
    • /
    • pp.991-998
    • /
    • 2010
  • In realization of wireless home network system using speech recognition in indoor voice recognition environment, background noise and reverberation are two main causes of digression in voice recognition system. In this study, the home network system resistant to reverberation and background noise using voice section detection method based on spectral entropy in indoor recognition environment is to be realized. Spectral subtraction can reduce the effect of reverberation and remove noise independent from voice signal by eliminating signal distorted by reverberation in spectrum. For effective spectral subtraction, the correct separation of voice section and silent section should be accompanied and for this, improvement of performance needs to be done, applying to voice section detection method based on entropy. In this study, experimental and indoor environment testing is carried out to figure out command recognition rate in indoor recognition environment. The test result shows that command recognition rate improved in static environment and reverberant room condition, using voice section detection method based on spectral entropy.

A study on adaptive noise cancellation for enhancement of digital speech articulation (디지털음성명료도 향상을 위한 적응형 잡음제거 기법에 관한 연구)

  • Kim, Soo-Yong;Jee, Suk-Kun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.5
    • /
    • pp.961-968
    • /
    • 2007
  • Today, we can use radio communication device anywhere-anytime. Sometimes, we use the device in acoustic noise environment. The acoustic noise makes many problems in communication system. In acoustic noise environment, speaker cannot send clear information to receiver, because the received signal includes both speech signal and noise signal. A digital filter is useful to remove noise to get desired signal. One of methods is the adaptive digital filter using the adaptive noise canceller that automatically adjust filter parameters. This thesis addresses articulation algorithms against actual acoustic noises by means of two adaptive filtering methods. One is the adaptive noise canceller with two input channels and another is the spectral subtraction filter with one input channel. The experimental result from the proposed filter shows that the adaptive noise canceller is useful to reduce the non-stationary noises, while the spectral amplitude filter is effective for stationary noises.

Delayless MDCT for Scalable Speech Codec (계층구조 음성 부호화기를 위한 지연 없는 MDCT 구조)

  • Sung, Ho-Sang;Park, Ho-Chong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3
    • /
    • pp.102-108
    • /
    • 2007
  • A high-Performance scalable speech codec generally requires a very low-rate first layer and a fine granule second layer, and this codec can be implemented with the harmonic codec and the MDCT-based transform codec for each layer. In this structure, however. each codec requires independent frequency transform and the time delay of each codec is accumulated. resulting in long time delay for the overall codec. In this paper, new MDCT structure in the second layer is Proposed. where MDCT is forced to share the look-ahead region of the first layer in order to prevent the time delay accumulation and the resulting functional error of MDCT is analyzed and removed after IMDCT The Proposed delayless MDCT requires no additional bits and Provides the equivalent coding performance with the reduced time delay, yielding a meaningful enhancement of the overall codec.