• 제목/요약/키워드: speech communication

Search Result 890, Processing Time 0.038 seconds

Evaluation Performance of Speech Coder in Speech Signal Processing

  • Lee, Kwang-Seok
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.2
    • /
    • pp.177-180
    • /
    • 2007
  • We compared CS-ACELP with QCELP speech coder in CDMA cellular under channel error environment and experimented performance with its measured value under channel error environment. Also, we specified the effective coding scheme to overcome. CS-ACELP speech coder using a LSP vector quantizer shows transparent speech quality from the results that SD is 0.92dB and outlier frames over 2dB is 2.9% in the BER 0.10% condition. CS-ACELP speech coder which is utilizing MA predictor shows better results on SVR and SEGSNR than QCELP speech coder(IS-96) adopting DPCM type predictor when bit error occurs from BER 0.01% to 0.50%.

SPEECH ENHANCEMENT BY FREQUENCY-WEIGHTED BLOCK LMS ALGORITHM

  • Cho, D.H.
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1985.10a
    • /
    • pp.87-94
    • /
    • 1985
  • In this paper, enhancement of speech corrupted by additive white or colored noise is stuided. The nuconstrained frequency-domain block least-mean-square (UFBLMS) adaptation algorithm and its frequency-weighted version are newly applied to speech enhancement. For enhancement of speech degraded by white noise, the performance of the UFBLMS algorithm is superior to the spectral subtraction method or Wiener filtering technique by more than 3 dB in segmented frequency-weighted signal-to-noise ratio(FWSNERSEG) when SNR of speech is in the range of 0 to 10 dB. As for enhancement of noisy speech corrupted by colored noise, the UFBLMS algorithm is superior to that of the spectral subtraction method by about 3 to 5 dB in FWSNRSEG. Also, it yields better performance by about 2 dB in FWSNR and FWSNRSEG than that of time-domain least-mean-square (TLMS) adaptive prediction filter(APF). In view of the computational complexity and performance improvement in speech quality and intelligibility, the frequency-weighted UFBLMS algorithm appears to yield the best performance among various algorithms in enhancing noisy speech corrupted by white or colored noise.

  • PDF

Performance Evaluation of Frame Erasure Concealment Algorithms in VoIP Coders (VoIP 코더들의 프레임손실은닉 알고리즘 성능평가)

  • Han, Seung-Ho;Moon, Kwang;Han, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.235-238
    • /
    • 2004
  • Frame erasures cause speech quality degradation in wireless communication networks or packet networks. The degradation becomes worse when consecutive frame erasures occur. Speech coders have a frame erasure concealment(FEC) mechanism to compensate for frame erasures. It is meaningful to evaluate the performance of FEC mechanisms for frame erasures that occur in communications networks. In this paper, various frame erasures are designed. And the FEC algorithms of speech coders are evaluated and analyzed with the Perceptual Evaluation of Speech Quality(PESQ). It is found that the performances vary in accordance with frame erasure types, frame erasure rates, and utterance lengths.

  • PDF

Applying Mobile Agent for Internet-based Distributed Speech Recognition

  • Saaim, Emrul Hamide Md;Alias, Mohamad Ashari;Ahmad, Abdul Manan;Ahmad, Jamal Nasir
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.134-138
    • /
    • 2005
  • There are several application have been developed on internet-based speech recognition. Internet-based speech recognition is a distributed application and there were various techniques and methods have been using for that purposed. Currently, client-server paradigm was one of the popular technique that been using for client-server communication in web application. However, there is a new paradigm with the same purpose: mobile agent technology. Mobile agent technology has several advantages working on distributed internet-based system. This paper presents, applying mobile agent technology in internet-based speech recognition which based on client-server processing architecture.

  • PDF

Two-Microphone Generalized Sidelobe Canceller with Post-Filter Based Speech Enhancement in Composite Noise

  • Park, Jinsoo;Kim, Wooil;Han, David K.;Ko, Hanseok
    • ETRI Journal
    • /
    • v.38 no.2
    • /
    • pp.366-375
    • /
    • 2016
  • This paper describes an algorithm to suppress composite noise in a two-microphone speech enhancement system for robust hands-free speech communication. The proposed algorithm has four stages. The first stage estimates the power spectral density of the residual stationary noise, which is based on the detection of nonstationary signal-dominant time-frequency bins (TFBs) at the generalized sidelobe canceller output. Second, speech-dominant TFBs are identified among the previously detected nonstationary signal-dominant TFBs, and power spectral densities of speech and residual nonstationary noise are estimated. In the final stage, the bin-wise output signal-to-noise ratio is obtained with these power estimates and a Wiener post-filter is constructed to attenuate the residual noise. Compared to the conventional beamforming and post-filter algorithms, the proposed speech enhancement algorithm shows significant performance improvement in terms of perceptual evaluation of speech quality.

Adaptive Kernel Function of SVM for Improving Speech/Music Classification of 3GPP2 SMV

  • Lim, Chung-Soo;Chang, Joon-Hyuk
    • ETRI Journal
    • /
    • v.33 no.6
    • /
    • pp.871-879
    • /
    • 2011
  • Because a wide variety of multimedia services are provided through personal wireless communication devices, the demand for efficient bandwidth utilization becomes stronger. This demand naturally results in the introduction of the variable bitrate speech coding concept. One exemplary work is the selectable mode vocoder (SMV) that supports speech/music classification. However, because it has severe limitations in its classification performance, a couple of works to improve speech/music classification by introducing support vector machines (SVMs) have been proposed. While these approaches significantly improved classification accuracy, they did not consider correlations commonly found in speech and music frames. In this paper, we propose a novel and orthogonal approach to improve the speech/music classification of SMV codec by adaptively tuning SVMs based on interframe correlations. According to the experimental results, the proposed algorithm yields improved results in classifying speech and music within the SMV framework.

Analysis of Mobile Application Trends for Speech and Language Therapy of Children with Disabilities in Korea (국내 장애 아동을 위한 언어치료용 모바일 어플리케이션 현황 분석)

  • Lee, Youngmee;Lee, Soobok;Sung, Minkyoung
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.153-163
    • /
    • 2015
  • This study investigated the trends of mobile applications which were developed for prompting speech and language skills for children with disabilities, and analyzed the function and contents of these applications as a tool of speech and language therapy. For this analysis, twenty applications among 71 ones were selected according to the exclusion criteria. These applications were classified by the 8 using types of contents and analyzed the function of mobile applications by the revised mobile contents evaluation standard (ease of use, value of education, interest level, and interactivity). As a results, applications for augmentative and alternative communication were developed much more than any other types. And the ease of use got the highest score whereas the interest level got the lowest score in whole evaluation analysis. The result of this study would suggest way to evaluate applications for speech language therapy and to contribute to developing the contents and function of mobile applications aims to help children with disabilities improving their speech and language skills.

Audio /Speech Codec Using Variable Delay MDCT/IMDCT (가변 지연 MDCT/IMDCT를 이용한 오디오/음성 코덱)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.2
    • /
    • pp.69-76
    • /
    • 2023
  • A high-quality audio/voice codec using the MDCT/IMDCT process can perfectly restore the current frame through an overlap-add process with the previous frame. In the overlap-add process, an algorithm delay equal to the frame length occurs. In this paper, we propose a MDCT/IMDCT process that reduces algorithm delay by using a variable phase shift in MDCT/IMDCT process. In this paper, a low-delay audio/speech codec was proposed by applying the low delay MDCT/IMDCT algorithm to the ITU-T standard codec G.729.1 codec. The algorithm delay in the MDCT/IMDCT process can be reduced from 20 ms to 1.25 ms. The performance of the decoded output signal of the audio/speech codec to which low-delay MDCT/IMDCT is applied is evaluated through the PESQ test, which is an objective quality test method. Despite of the reduction in transmission delay, it was confirmed that there is no difference in sound quality from the conventional method.

Audio Fingerprint Retrieval Method Based on Feature Dimension Reduction and Feature Combination

  • Zhang, Qiu-yu;Xu, Fu-jiu;Bai, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.522-539
    • /
    • 2021
  • In order to solve the problems of the existing audio fingerprint method when extracting audio fingerprints from long speech segments, such as too large fingerprint dimension, poor robustness, and low retrieval accuracy and efficiency, a robust audio fingerprint retrieval method based on feature dimension reduction and feature combination is proposed. Firstly, the Mel-frequency cepstral coefficient (MFCC) and linear prediction cepstrum coefficient (LPCC) of the original speech are extracted respectively, and the MFCC feature matrix and LPCC feature matrix are combined. Secondly, the feature dimension reduction method based on information entropy is used for column dimension reduction, and the feature matrix after dimension reduction is used for row dimension reduction based on energy feature dimension reduction method. Finally, the audio fingerprint is constructed by using the feature combination matrix after dimension reduction. When speech's user retrieval, the normalized Hamming distance algorithm is used for matching retrieval. Experiment results show that the proposed method has smaller audio fingerprint dimension and better robustness for long speech segments, and has higher retrieval efficiency while maintaining a higher recall rate and precision rate.