• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.022 seconds

Developing a Low Power BWE Technique Based on the AMR Coder (AMR 기반 저 전력 인공 대역 확장 기술 개발)

  • Koo, Bon-Kang;Park, Hee-Wan;Ju, Yeon-Jae;Kang, Sang-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.4
    • /
    • pp.190-196
    • /
    • 2011
  • Bandwidth extension is a technique to improve speech quality and intelligibility, extending from 300-3400 Hz narrowband speech to 50-7000 Hz wideband speech. This paper designs an artificial bandwidth extension (ABE) module embedded in the AMR (adaptive multi-rate) decoder, reducing LPC/LSP analysis and algorithm delay of the ABE module. We also introduce a fast search codebook mapping method for ABE, and design a low power BWE technique based on the AMR decoder. The proposed ABE method reduces the computational complexity and the algorithm delay, respectively, by 28 % and 20 msec, compared to the traditional DTE (decode then extend) method. We also introduce a weighted classified codebook mapping method for constructing the spectral envelope of the wideband speech signal.

Acoustic Characteristics on the Adolescent Period Aged from 16 to 18 Years (16~18세 청소년기 음성의 음향음성학적 특성)

  • Ko, Hye-Ju;Kang, Min-Jae;Kwon, Hyuk-Jae;Choi, Yaelin;Lee, Mi-Geum;Choi, Hong-Shik
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.81-90
    • /
    • 2013
  • During adolescence the mutational period is characterized by the changes in the laryngeal structure, the length of the vocal cords, and a tone of voice. Usually, adolescents at 15 or 16 reach the voice of adults but the mutational period is sometimes delayed. Therefore, studies on the voice of adolescents between 16 ~ 18 right after the mutational period are required. Accordingly, this paper attempted to provide basic data about the normal standard for patients with voice disorders during this period by evaluating the vocal characteristics of males and females between 16 ~ 18 with an objective device bycomparing and analyzing them by sex and age. The study was conducted on a total of 60 subjects composed of each 10 subjects of each age. The vocal analysis was conducted by MPT (Maximum Phonation Time) measurement, sustained vowels and sentence reading. As for /a/ sustained vowels, fundamental frequency, hereinafter referred to as $F_0$, jitter, shimmer, noise-to-harmonic ratio, hereinafter referred to as NHR were measured by using the Multi-dimensional voice program (MDVP) among the Multi-Speech program of Computerized Speech Lab (Kay Elemetrics). The sentence reading, mean $F_0$, maximum $F_0$ and minimum $F_0$ were measured using the Real-Time Pitch (RTP) Model 5121 among the Multi-Speech program of Computerized Speech Lab (Kay Elemetrics). As a result, according to sex, there were statistically significant differences in $F_0$, jitter, shimmer, mean $F_0$, maximum $F_0$, and minimum $F_0$; and according to age, there were statistically significant differences in MPT. In conclusion, the voice of the adolescents between 16 ~ 18 reached the maturity levels of adults but the voice quality which can be considered on the scale of voice disorders showed transition to the voice of an adult during the mutational period.

Acoustic Features of Oral Vowels in the Esophagus Speakers (식도음성의 모음종류에 따른 음향학적 특성)

  • Yun, Eunmi;Mok, Eunhee;Minh, Phan huu Ngoc;Hong, Kihwan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.85-92
    • /
    • 2015
  • This study aimed to establish characteristics related to voice and speech through the natural base frequency analysis of esophagus vocalization. In the study, 8 subjects were selected for esophagus vocals, and 10 other subjects were selected for a control group. MDVP(Multi-dimensional Voice Program, Model 4800, USA, 2001), Multi Speech(Model 3700, Kaypantax, USA, 2008) were used as experiment equipment. The speech samples selected for evaluation were vowels and sentences (both declarative and interrogative). For acoustic analysis, the intonation form of fo, jitter, energy, shimmer, HNR, and intonation patterns of the speech sample were measured. The results were as follows: First, the natural intrinsic frequency of extended vowels in the esophagus vocal group was lower than the frequency in the normal vocal group. In particular, the intrinsic frequency difference for high vowel /i/ was much greater than the frequency difference for low vowel /a/. Second, the jitter values of the esophagus vocal group were higher than the control group. In particular, there was a large difference between the jitter values for /a/ and /i/, with the jitter values being highest for /i/. Third, there was no significant difference in vocal strength between the esophagus vocal patient group and the control group. Fourth, the shimmer values of the voices in the esophagus vocal group were higher than shimmer values in the control group. In particular, there was a large difference in shimmer values for low vowel /a/. Fifth, the HNR values of the esophagus vocal group were showed significantly lower than the control group. In particular, the largest difference in HNR values between the two groups was for high vowel /i/. Sixth, the pitch contours of interrogative and declarative sentences of the esophagus vocal patient group showed a different form or only had with small differences compared to the pitch contours of the normal vocal group, thus presenting an inconsistent pattern.

Real-time Implementation of Speech and Channel Coder on a DSP Chip for Radio Communication System (무선통신 적용을 위한 단일 DSP칩상의 음성/채널 부호화기 실시간 구현)

  • Kim Jae-Won;Sohn Dong-Chul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.6
    • /
    • pp.1195-1201
    • /
    • 2005
  • This paper deals with procedures and results for teal time implementation of G.729 speech coder and channel coder including convolution codec, viterbi decoder, and interleaver using a fixed point DSP chip for radio communication systems. We described the method for real-time implementation based on integer simulation results and explained the implemented results by quality performance and required complexity for real-time operation. The required complexity was 24MIPS and 9MIPS in computational load, and 12K words and 4K words in execution code length for speech and channel. The functional evaluation was performed into two steps. The one was bit exact comparison with a fixed point C code, the other was executed by actual speech samples and error test vectors. Unlik other results such as individual implementation, We implemented speech and channel coders on a DSP chip with 160MIPS computation capability and 64 K words memory on chip. This results outweigh the conventional methods in the point of system complexity and implementation cost for radio communication system.

Low-Complexity Speech Enhancement Algorithm Based on IMCRA Algorithm for Hearing Aids (보청기를 위한 IMCRA 기반 저연산 음성 향상 알고리즘)

  • Jeon, Yuyong;Lee, Sangmin
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.4
    • /
    • pp.363-370
    • /
    • 2017
  • In this paper, we proposed a low-complexity speech enhancement algorithm based on a improved minima controlled recursive averaging (IMCRA) and log minimum mean square error (logMMSE). The IMCRA algorithm track the minima value of input power within buffers in local window and identify the speech presence using ratio between input power and its minima value. In this process, many number of operations are required. To reduce the number of operations of IMCRA algorithm, minima value is tracked using time-varying frequency-dependent smoothing based on speech presence probability. The proposed algorithm enhanced speech quality by 2.778%, 3.481%, 2.980% and 2.162% in 0, 5, 10 and 15dB SNR respectively and reduced computational complexity by average 9.570%.

A Real-time Implementation of G.729.1 Codec on an ARM Processor for the Improvement of VoWiFi Voice Quality (VoWiFi 음질 향상을 위한 G.729.1 광대역 코덱의 ARM 프로세서에의 실시간 구현)

  • Park, Nam-In;Kang, Jin-Ah;Kim, Hong-Kook
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.230-235
    • /
    • 2008
  • This paper addresses issues associated with the real-time implementation of a wideband speech codec such as ITU-T G. 729. 1 on an ARM processor in order to provide an improved voice quality of a VoWiFi service. The real-time implementation features in optimizing the C-source code of G.729. 1 and replacing several parts of the codec algorithm with faster ones. The performance of the implementation is measured by the CPU time spent for G.729.1 on the ARM926EJ processor that is used for a VoWiFi phone. It is shown from the experiments that the G.729.1 codec works in real-time with better voice quality than G 729 codec that is conventionally used for VoIP or VoWiFi phones.

  • PDF

Voice hygiene habits and the characteristics of Korean Voice-Related Quality of Life (K-VRQOL) among classical singers (성악가의 음성위생 습관과 한국어판 음성관련 삶의 질(K-VRQOL) 특성)

  • Kang, Haneul;Kim, Seonhee;Yoo, Jeayeon
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.49-59
    • /
    • 2018
  • The purpose of this study was to investigate vocal hygiene habits and Korean Voice-Related Quality of Life (K-VRQOL) among classical singers. A total of 128 classical singers filled an online voice and K-VRQOL questionnaire, and the results were analyzed. In order to investigate the characteristics of K-VRQOL according to age groups, and the presence or absence of a history of voice problems, we conducted a two-way ANOVA. The results are as follows. Of the 128 classical singers, 28 (21.9%) with a history of voice problems said that excessive conversation, singing practice, and yelling were the causes of their voice problems. The symptoms of voice problems were fatigue, loss of range, hoarseness, and breathiness. In addition, classical singers were less likely to smoke, or to drink alcohol or caffeine. The K-VRQOL was highly correlated with all sub-domains. There was a statistically significant difference according to age groups (p<.05) and history of voice problems (p<.01). There was no correlation between age groups and history of voice problems. Voice management is important because classical singers can ruin their voice by speaking, and the risk of voice disorder is high. Voice problems affect quality of life. In future studies, it is necessary to obtain information on the subjective voice characteristics of classical singers by examining the relationship between their voice hygiene habits and VHI, SVHI, and K-VRQOL.

Acoustic Analysis of Voice Change According to Extent of Thyroidectomy (갑상선 수술범위에 따른 음성의 음향적 분석)

  • Kang, Young Ae;Koo, Bon Seok
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.77-83
    • /
    • 2015
  • Voice complication without the laryngeal nerve injury can occur after thyroidectomy. The purpose of this study is to investigate voice changes according to extent of thyroidectomy with acoustic analysis. Thirty-five female patients with papillary thyroid carcinoma took voice evaluation at before and 1 month, and 3 months after thyroidectomy. Acoustic analysis parameters were speaking fundamental frequency(SFF), min $F_0$, max $F_0$, dynamic range $F_0$, jitter, shimmer, noise-to-harmonic ratio(NHR), and Cepstral prominence peak(CPP). Repeated-measured analysis of variance was applied. Time-related voice changes showed significant differences in all parameters except NHR. At 1 month after surgery, voice quality was worse and pitch was decreasing, but voice quality and pitch were improving at 3-month follow-up. Voice changes according to the extent of surgery were in SFF, max $F_0$, and dynamic range $F_0$. Time by surgery-related voice change existed only in min $F_0$. The result showed that the severity of voice complication depended on the extend of thyroidectomy which had a negative impact on $F_0$-related parameters. The deterioration of voice quality at 1 month after thyroidectomy may be affected by the loss of thyroid hormone in the blood. The descent of $F_0$-related parameters may be impacted by laryngeal fixation of surgical site adhesion.

The Stability and Variability based on Vowels in Voice Quality Analysis (음질 분석에 있어서 모음에 따른 안정성과 변이성)

  • Choi, Seong Hee;Choi, Chul-Hee
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.79-86
    • /
    • 2015
  • This study explored the vowel effect on acoustic perturbation measures in voice quality analysis. For this study, the perturbation parameters (%jitter, %shimmer) and noise parameter (SNR) were measured with 7 Korean vowels (/a/, /ɛ/, /i/, /o/, /u/, /ɯ/, /ʌ/) using CSpeech with 50 Korean normal young adults (24 males and 26 females). A significant vowel effect was found only in %shimmer and in particular, low-back /a/vowel was significantly different from other vowels in %shimmer. The least perturbation and noise were exhibited on high-back /ɯ/ and /o/ vowel, respectively. Based on tongue height, a significant higher %shimmer was demonstrated on low vowels than high vowels. In addition, back vowels in tongue advancement and rounded vowels in lip rounding showed significantly less perturbation and noise. The least variability of perturbation and noise within individuals was demonstrated on the vowel /i/ in three repeated measures. However, there was no significant difference among 3 token measures in single session among vowels tested except the vowel /o/. Consequently, the vowel /a/ commonly used in acoustic perturbation measures exhibited higher perturbation and noise whereas higher stability and less variability were demonstrated on the high-back vowel /u/. These results suggested that the Korean high-back vowel /u/ can be more appropriate and reliable for perturbation acoustic measures.

The Performance Improvement of PLC by Using RTP Extension Header Data for Consecutive Frame Loss Condition in CELP Type Vocoder (CELP Type Vocoder에서 RTP 확장 헤더 데이터를 이용한 연속적인 프레임 손실에 대한 PLC 성능개선)

  • Hong, Seong-Hoon;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.48-55
    • /
    • 2010
  • It has a falling off in speech quality, especially when consecutive packet loss occurs, even if a vocoder implemented in the packet network has its own packet loss concealment (PLC) algorithm. PLC algorithm is divided into transmitter and receiver algorithm. Algorithm in the transmitter gives superior quality by additional information. however it is impossible to provide mutual compatibility and it occurs extra delay and transmission rate. The method applied in the receiver does not require additional delay. However, it sets limits to improve the speech quality. In this paper, we propose a new method that puts extra information for PLC in a part of Extension Header Data which is not used in RTP Header. It can solve the problem and obtain enhanced speech quality. There is no extra delay occurred by the proposed algorithm because there is a jitter buffer to adjust network delay in a receiver. Extra information, 16 bits each frame for G.729 PLC, is allocated for MA filter index in LP synthesis, excitation signal, excitation signal gain and residual gain reconstruction. It is because a transmitter sends speech data each 20 ms when it transfers RTP payload. As a result, the proposed method shows superior performance about 13.5%.