• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.024 seconds

Perceptual weighting on English lexical stress by Korean learners of English

  • Goun Lee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.19-24
    • /
    • 2022
  • This study examined which acoustic cue(s) that Korean learners of English give weight to in perceiving English lexical stress. We manipulated segmental and suprasegmental cues in 5 steps in the first and second syllables of an English stress minimal pair "object". A total of 27 subjects (14 native speakers of English and 13 Korean L2 learners) participated in the English stress judgment task. The results revealed that native Korean listeners used the F0 and intensity cues in identifying English stress and weighted vowel quality most strongly, as native English listeners did. These results indicate that Korean learners' experience with these cues in L1 prosody can help them attend to these cues in their L2 perception. However, L2 learners' perceptual attention is not entirely predicted by their linguistic experience with specific acoustic cues in their native language.

QoS-, Energy- and Cost-efficient Resource Allocation for Cloud-based Interactive TV Applications

  • Kulupana, Gosala;Talagala, Dumidu S.;Arachchi, Hemantha Kodikara;Fernando, Anil
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.158-167
    • /
    • 2017
  • Internet-based social and interactive video applications have become major constituents of the envisaged applications for next-generation multimedia networks. However, inherently dynamic network conditions, together with varying user expectations, pose many challenges for resource allocation mechanisms for such applications. Yet, in addition to addressing these challenges, service providers must also consider how to mitigate their operational costs (e.g., energy costs, equipment costs) while satisfying the end-user quality of service (QoS) expectations. This paper proposes a heuristic solution to the problem, where the energy incurred by the applications, and the monetary costs associated with the service infrastructure, are minimized while simultaneously maximizing the average end-user QoS. We evaluate the performance of the proposed solution in terms of serving probability, i.e., the likelihood of being able to allocate resources to groups of users, the computation time of the resource allocation process, and the adaptability and sensitivity to dynamic network conditions. The proposed method demonstrates improvements in serving probability of up to 27%, in comparison with greedy resource allocation schemes, and a several-orders-of-magnitude reduction in computation time, compared to the linear programming approach, which significantly reduces the service-interrupted user percentage when operating under variable network conditions.

Design of a 4kb/s ACELP Codec Using the Generalized AbS Principle (Generalized AbS 구조를 이용한 4kb/s ACELP 음성 부호화기의 설계)

  • 성호상;강상원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.33-38
    • /
    • 1999
  • In this paper, we combine a generalized analysis-by-synthesis (AbS) structure and an algebraic excitation scheme to propose a new 4kb/s speech codec. This codec partly uses the structure of G.729. We design a line spectrum pair (LSP) quantizer, an adaptive codebook, and an excitation codebook to fit the 4 kb/s bit rate. The codec has a 25㎳ algorithmic delay, which corresponds to a 20㎳ frame size and a 5㎳ lookahead. At the bit rates below 4kb/s, most CELP speech codecs using the AbS principle have a drawback that results a rapid degradation of speech quality. To overcome this drawback we use the generalized AbS structure which is efficient for the low bit rate speech codec. LP coefficients are converted to LSP and quantized using a predictive 2-stage VQ. A low complexity algebraic codebook which uses shifting method is used for the fixed codebook excitation, and gains of the adaptive codebook and the fixed codebook are quantized using the VQ. To evaluate the performance of the proposed codec A-B preference tests are done with the fixed rate 8kb/s QCELP. As the result of the test, the performance of the codec is similar to that of the fixed rate 8kb/s QCELP.

  • PDF

A Systematic Review on Voice Characteristics and Risk Factors of Voice Disorder of Korea Teachers (우리나라 교사의 음성 특성과 음성장애 위험 요인에 관한 체계적 문헌고찰)

  • Cha, Seulki;Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.8
    • /
    • pp.149-154
    • /
    • 2018
  • As the range of professional voice users are expanding, interest towards voice increases as well. Especially as teachers compose the occupational group, exposed to high risk of voice disorder, it is necessary to identify the cause of speech problems and speech disorders. The purpose of this study is to analyze the voice characteristics of teachers and to investigate the causes of voice disorders. From 2000 to 2018, 414 studies were found under a combinated set search words of 'profession', 'Teacher', 'Professional Voice User', 'Voice', 'Voice disorders', 'Risk' and out of them, 8 studies were selected as final focus analysis subjects. The qualitative evaluation was carried out by modifying the Quality: checklist for assessing the Risk of bias. The study confirmed that voice misuse frequently occurred to teachers when they used their voice and this feature was affected by the environment. These results suggest that environment improvement of teachers' speech abuse and consistent voice education are necessary.

Noise Cancellation using Microphone Array in Digital Hearing Aids (디지털 보청기에서 마이크로폰 어레이를 이용한 잡음제거)

  • Bang, Dong-Hyeouck;Kil, Se-Kee;Kang, Hyun-Deok;Yoon, Gwang-Sub;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.4
    • /
    • pp.857-866
    • /
    • 2009
  • In this paper, a noise cancellation-method using microphone array for digital hearing aids is proposed. The microphone array is located around the ear of a dummy. Speech sound is generated from the forward speaker positioned in the front of the dummy and noise sound is generated from the backward speaker. The speech and noise are mixed in the air space and entered into the microphones. VAD(voice activity detector) and ANC(adaptive noise cancellation) methods were used to eliminate noise in the sound of the microphones. 10 two-syllable words and 4 sentences were used for speech signals. Babble and car interior noise were used for noise signals. The performance of the proposed algorithm was evaluated by SNR(signal-to-noise ratio) and PESQ-MOS(perceptual evaluation of speech quality-mean opinion score). In babble noise condition, SNR was improved as much as $7.963{\pm}1.3620dB\;and\;3.968{\pm}0.6659dB$ for words and sentences respectively. In the case of car interior noise, SNR was improved as $10.512{\pm}2.0665dB\;and\;6.000{\pm}1.7642dB$ for words and sentences respectively. PESQ-MOS of the babble noise was improved as much as $0.1722{\pm}0.0861$ score for words and $0.083{\pm}0.0417$ score for sentences. And PESQ-MOS of the car interior noise was improved as $0.2661{\pm}0.0335$ score and $0.040{\pm}0.0201$ score for words and sentences respectively. It is verified that the proposed algorithm has a good performance in noise cancellation of microphone array for digital hearing aids.

Design of Channel Coding Combined with 2.4kbps EHSX Coder (2.4kbps EHSX 음성부호화기와 결합된 채널코딩 방법)

  • Lee, Chang-Hwan;Kim, Young-Joon;Lee, In-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.9
    • /
    • pp.88-96
    • /
    • 2010
  • We propose the efficient channel coding method combined with a 2.4kbps speech coder. The code rate of a channel coder is given by 1/2 and 1/2 rate convolutional coder is obtained from the punctured convolutional coder with rate of 1/3. The punctured convolutional coder is used for a variable rate allocation. The puncturing method according to the importance of the output data of the source encoder is applied for the convolutional coder. The importance of output data is analyzed by evaluating the bit error sensitivity of speech parameter bits. The performance of proposed coder is analyzed and simulated in Rayleigh fading channel and AWGN channel. The experimental results with 2.4kbps EHSX coder show that the variable rate channel coding method is superior to non-variable channel coding method from the subjective speech quality.

Design of the LSF Parameter Quantizer for the Wideband Speech Codec (광대역 음성 부호화기용 선 스펙트럼 주파수 계수 양자화기 설계)

  • 지상현;강상원;윤병식
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.29-34
    • /
    • 2001
  • In this paper, we designed an LSF coefficient quantizer of the wideband speech codec that can produce high quality speech service. For the efficient LSF coefficient quantizer, the interframe correlation was used. Also we separately quantized the LSF coefficients with high and low interframe correlation. Predictive pyramid vector quantizer (PVQ) was used for quantizing the LSF coefficients with high interframe correlation, and PVQ was used for quantizing the LSF coefficients with low interframe correlation. Experiments show that the proposed UF quantizer can quantize LSF information in 40 bits/frame, with an average spectral distortion (SD) of 1 dB and less than 3.87% frames having SD greater than 2 dB.

  • PDF

An Efficient Pitch Estimation for IMBE (Improved Multi-band Excitation) Speech Coder (개량형 다중대역 여기 (IMBE: Improved Multi-band Excitation) 음성 부호기의 피치 예측 개선)

  • Na, Hoon;Jeong, Dae-Gwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.34-41
    • /
    • 2001
  • In an IMBE (Improved Multi-band Excitation) speech coder, initial pitch estimation occupies most of the total computing time for the coder due to complex cost function and exhaustive search over candidate pitches. Future frames in initial pitch estimation cause inevitable time delay. Therefore, it is difficult to implement a real-time coder. Furthermore, unvoiced frames use the unnecessary pitch estimation as in the voiced frames. In this paper, each frame is determined voiced or unvoiced by Dyadic Wavelet Transform (DyWT) and, then, initial pitch estimation is performed only for voiced frame. Therefore different pitch estimation algorithms are employed between voiced and unvoiced frames incurring reduced time delay at transmitter and receiver. Simulation result show that the relative complexity of initial pitch estimation is reduced by 23%, and the processing time decreases down to 1/10 ∼ 1/1l of the IMBE coder while speech quality is almost maintained.

  • PDF

Efficient Codebook Search Method for AMR Wideband Speech Codec (광대역 AMR 음성 압축기를 위한 효율적인 코드북 검색 방법)

  • 김윤희;박호종
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.308-314
    • /
    • 2003
  • Wideband speech communications with 7㎑ bandwidth can provide high-quality speech services that are almost impossible with current narrow-band speech communications with 3.4 ㎑ bandwidth, and AMR wideband codec was recently developed for these services. The performance of AMR wideband codec is excellent due to its wideband information and partially to ACELP structure, but it requires high computational complexity especially in codebook search. In this paper, to solve this problem, an efficient codebook search method for AMR wideband codec is proposed. The proposed method first determines the coarse initial codevector, then improves the performance of codevector by replacing a poor pulse in codevector with better one iteratively. Simulations show that AMR wideband codec with proposed codebook search method has higher performance with much less computational cost than conventional AMR wideband codec.

Feature Extraction by Optimizing the Cepstral Resolution of Frequency Sub-bands (주파수 부대역의 켑스트럼 해상도 최적화에 의한 특징추출)

  • 지상문;조훈영;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.35-41
    • /
    • 2003
  • Feature vectors for conventional speech recognition are usually extracted in full frequency band. Therefore, each sub-band contributes equally to final speech recognition results. In this paper, feature Teeters are extracted indepedently in each sub-band. The cepstral resolution of each sub-band feature is controlled for the optimal speech recognition. For this purpose, different dimension of each sub-band ceptral vectors are extracted based on the multi-band approach, which extracts feature vector independently for each sub-band. Speech recognition rates and clustering quality are suggested as the criteria for finding the optimal combination of sub-band Teeter dimension. In the connected digit recognition experiments using TIDIGITS database, the proposed method gave string accuracy of 99.125%, 99.775% percent correct, and 99.705% percent accuracy, which is 38%, 32% and 37% error rate reduction relative to baseline full-band feature vector, respectively.