• Title/Summary/Keyword: speech parameter

Search Result 373, Processing Time 0.026 seconds

A Temporal Decomposition Method Based on a Rate-distortion Criterion (비트율-왜곡 기반 음성 신호 시간축 분할)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.315-322
    • /
    • 2002
  • In this paper, a new temporal decomposition method is proposed. which takes into consideration not only spectral distortion but also bit rates. The interpolation functions, which are one of necessary parameters for temporal decomposition, are obtained from the training speech corpus. Since the interval between the two targets uniquely defines the interpolation function, the interpolation can be represented without additional information. The locations of the targets are determined by minimizing the bit rates while the maximum spectral distortion maintains below a given threshold. The proposed method has been applied to compressing the LSP coefficients which are widely used as a spectral parameter. The results of the simulation show that an average spectral distortion of about 1.4 dB can be achieved at an average bit rate of about 8 bits/Frame.

The Stability and Variability based on Vowels in Voice Quality Analysis (음질 분석에 있어서 모음에 따른 안정성과 변이성)

  • Choi, Seong Hee;Choi, Chul-Hee
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.79-86
    • /
    • 2015
  • This study explored the vowel effect on acoustic perturbation measures in voice quality analysis. For this study, the perturbation parameters (%jitter, %shimmer) and noise parameter (SNR) were measured with 7 Korean vowels (/a/, /ɛ/, /i/, /o/, /u/, /ɯ/, /ʌ/) using CSpeech with 50 Korean normal young adults (24 males and 26 females). A significant vowel effect was found only in %shimmer and in particular, low-back /a/vowel was significantly different from other vowels in %shimmer. The least perturbation and noise were exhibited on high-back /ɯ/ and /o/ vowel, respectively. Based on tongue height, a significant higher %shimmer was demonstrated on low vowels than high vowels. In addition, back vowels in tongue advancement and rounded vowels in lip rounding showed significantly less perturbation and noise. The least variability of perturbation and noise within individuals was demonstrated on the vowel /i/ in three repeated measures. However, there was no significant difference among 3 token measures in single session among vowels tested except the vowel /o/. Consequently, the vowel /a/ commonly used in acoustic perturbation measures exhibited higher perturbation and noise whereas higher stability and less variability were demonstrated on the high-back vowel /u/. These results suggested that the Korean high-back vowel /u/ can be more appropriate and reliable for perturbation acoustic measures.

A Design of Lowpass Active Filter for ADLS Tx/Rx Stage (ADSL 송수신단용 저역통과 능동필터 설계)

  • Lee Geun-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.38-42
    • /
    • 2005
  • CMOS analog lowpass filters using speech signal bandwidth for a Asymmetrical Digital Subscriver Line(ADSL) modem are presented. Designed active lowpass filters are composed of the CMOS complementary high-swing cascode stage which can increase transconductance of an active element. As a result, their cutoff frequency are 138kHz and 1,100kHz respectively. A low-voltage high-swing cascode integrator which improved on a gain and unit gain frequency used to design the filters. The designed filters are verified by HSPICE simulation with the $0.251{\mu}m\;CMOS\;n-well$ Parameter and a single 2.5V power supply.

Korean Speech Recognition using DHMM (DHMM을 이용한 한국어 음성 인식)

  • Ann, T.O.;Lee, K.S.;Yoo, H.K.;Lee, H.J.;Cho, H.J.;Byun, Y.G.;Kim, S.H.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.52-60
    • /
    • 1991
  • This paper describes the study on isolated word recognition by using DHMM(Dynamic Hidden Markov Model) which has dynamic feature of spectrum as a parameter. This paper discusses speech recognition experiment basedon HMM which can evaluate not only instantaneous spectral features but also dynamic spectral features. LPC cepstrum parameters is used as a static feature and LPC cepstrum's regression coefficient is used as a dynamic feature. These two features are quantized by each VQ codebook. DHMM is modeled by receiving static vector and dynamic vector by input. In the whole experiment, as recognition experiment using DHMM shows 92.7% of recognition rate while the experiment using conventional HMM shows 88.8% of recognition rate, DHMM proved to be a useful model.

  • PDF

The Tense-Lax Question and Intraoral Air Pressure in English Stops

  • Kim, Dae-Won
    • Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.113-130
    • /
    • 2002
  • Measurements were made of pressure rise time (PoRT), voice cessation time, flattened peak intraoral air pressure (Po), pressure static time (PoST), pressure-fall time and the duration of oral closure as four English speakers uttered isolated nonsense $V_{1}CV_{2}$ words containing /b/ and /p/ ($V_{1}=V_{2}$ and the V was /$\alpha$/), with stress on either $V_{1}orV_{2}$ alternately. The hypothesis tested was: The tense stop consonant. will be characterized either by a higher Po or a longer PoST, and/or by both against lax. Findings: (1) PoRT was significantly greater in /b/ than /p/, (2) the voiceless stop /p/ produced generally greater mean Po, averaged across five tokens, than its voiced counterpart /b/, but statistically insignificant, and (3) altogether, across stress, tokens and subjects, the difference in the calculated pressure static time (PoSTc), i.e., PoST + PoRT, between /p/ and /b/ was highly significant (p $\leq$ 0.003). Although further investigations remain to be taken, the results strongly supported the linguistic hypothesis of tense-lax distinction, with /b/ being lax and /p/ tense. Airflow resistance at the glottis and supraglottal air volume are assumed to be responsible for much of difference in PoRT between /p/ and /b/. The PoSTc reflecting, although indirectly, the respiratory efforts during the oral closure of a stop, was a convincing phonetic parameter of the consonantal tenseness based on respiratory efforts. The effects of stress on Po and PoSTc were inconsistent, and the shorter PoRT than consonantal constriction interval was always accompanied by Po and PoST.

  • PDF

Acoustic Characteristics of Female Senior Citizens in Communities: The Effects of Residence and Depression (지역사회 여성 노인 음성의 음향학적 특성: 거주지 및 우울감의 영향)

  • Hwang, Jaeho;Kim, JungWan
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.155-162
    • /
    • 2012
  • The population of Korea is ageing as the number of elderly people increases due to improvements in health care and diet. Accordingly, it is expected that interest in how to live actively during the years after retirement and how to communicate effectively will increase the demand for voice improvement methods and technology. However, the criteria to evaluate the voice strength and characteristics of the elderly are lacking. In this study, we analyzed the acoustic characteristics of elderly women living in the community according to residential status and mental health status (e.g. depressive mood). Accordingly, we selected women (n=63) above the age of 65 age who were living in the Seoul metropolitan area and Daegu Gyeongbuk. The selected subjects were divided into two groups: a normal speaker group (n=40) and a speaker group comprised of those suffering from depressive mood (n=23). This study analyzed the voice characteristics of subjects based on collected data through the sustained phonation of the vowel /a/. It was shown that there were differences among MPT, F0, Jitter, Shimmer and NHR depending on location of residence but no difference with regard to depressive mood. Therefore, we must consider location of residence in elderly as the key factor in demonstrating the voice norms of seniors.

A Study of depression symptom in patients with voice disorders (음성장애환자에게서의 우울감 연구)

  • Kang, Young Ae;Koo, Bon Seok
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.47-54
    • /
    • 2015
  • The objectives of this study are to research the frequency of depression symptom in patients with voice disorders and to investigate parameters associated with depression from voice evaluation. A hundred ninety six patients(106 males and 90 females) who had been diagnosed with voice disorders first in their lifetime were selected. All the patients were examined by laryngeal stroboscopy. For depression and voice study, personal interview, acoustic and aerodynamic analysis, voice handicap index(VHI), reflux symptom index(RSI), and beck depression index(BDI) were done respectively. Mild to severe BDI were seen in 26.2%(52 patients) of the whole patients. A BDI mean score of female patients was $8.8{\pm}7.5$ which was higher than that of male patients($5.6{\pm}6.6$), the difference observed being statistically significant(p<0.001). In the acoustic analysis, the score of sent_duration parameter was increasing in the patients with depression, which was significantly higher than the score of the patients without depression(p<0.05). In the addition, the scores of VHI and RSI were higher in the patients with depression(p<0.001). Our findings suggest that the prevalence of depression in patients with voice disorders is related to female, speaking velocity, and self-questionnaire. This result can be used for psychologically based approach to therapy.

Acoustic parameters that differentiate /o/ from /u/ in Seoul Korean (서울말 /ㅗ/와 /ㅜ/를 구별하는 음향변수)

  • Byun, Hi-Gyung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.15-24
    • /
    • 2018
  • Earlier studies reported that the /o/ and /u/ phonemes of Seoul Korean were currently merging in the F1/F2 space. However, studies on perception tests have shown that rates of correctness were high, even in cases where the two vowels overlapped. This study explores whether there is another acoustic parameter that differentiates /o/ from /u/, besides the F1/F2 contrast. Seventy-five native speakers of Seoul Korean, born between 1953 and 1999, participated in a production test. The data collected were analyzed in terms of F1 and F2, H1-H2, and F0. The result shows that the /o/ and /u/ of female speakers almost overlap in the F1/F2 space for all ages, while H1-H2 values are significantly different between the two vowels regardless of age. On the other hand, the /o/ and /u/ of male speakers are largely well separated in the F1/F2 space, while the H1-H2 values between the two vowels are very close at all ages. F0 effect is relatively small for both male and female speakers, even though there is a statistically significant difference. The result of this study provides evidence that female speakers use phonation differences to distinguish /o/ from /u/, and that the F1/F2 contrast has been replaced by H1-H2 values.

The Comparisons of GRBAS Perceptual Judgments according to Levels of Utterances

  • Pyo, Hwa-Young;Sim, Hyun-Sub
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.135-142
    • /
    • 2001
  • The present study was performed to investigate adequate levels of utterances which can give essential as well as useful information about the patients' voice, by examining the degrees of correlation between the levels of utterances (vowels, words, and phrase paragraph reading) and the entire utterance including all of the levels. For this purpose, a total of 10 individual utterance samples (5 vowels, 3 words, 1 phrase, 1 paragraph reading) were collected from each of the 30 subjects with voice disorder patients, and four experienced voice therapists evaluated them using GRBAS. The results showed that four therapists highly agreed upon on 'G' parameter. The coefficient of the correlation between each level of utterance and entire utterance tended to be above 0.70. Judgements of the vowel /$\varepsilon$/ as well as /o/ highly correlated with the judgement of the entire utterance. Regardless of severity, the judgement of the entire utterance highly correlated with the judgements of the vowel /u/ and the paragraph reading. These results suggest that experienced voice therapists can precisely evaluate patients' voice quality with only one sustained vowel in the clinic field, as is done with the entire utterance evaluation.

  • PDF

A Study on Trend Sharing in Segmental-feature HMM (분절 특징 은닉 마코프 모델에서의 경향 공유에 관한 연구)

  • 윤영선
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.7
    • /
    • pp.641-647
    • /
    • 2002
  • In this paper, we propose the reduction method of the number of parameters in the segmental-feature HMM using trend quantization method. The proposed method shares the trend information of the polynomial trajectories by quantization. The trajectory is obtained by the sequence of feature vectors of speech signals and can be divided by trend and location information. The trend indicates the variation of consequent frame features, while the location points to the positional difference of the trajectories. Since the trend occupies the large portion of SFHMM, if the trend is shared, the number of parameters maybe decreases. To exploit the proposed system the experiments are performed on TIMIT corpus. The experimental results show that the performance of the proposed system is roughly similar to that of previous system. Therefore, the proposed system can be considered one of parameter reduction method.