• Title/Summary/Keyword: Speech rate

Search Result 1,246, Processing Time 0.029 seconds

Korean listeners' mode of perceiving the durational variations of /s/ as prolongations (한국어 평마찰음 /s/ 연장음에 대한 비유창성 양상 연구)

  • Park, Jin;Go, Boksun;Park, Sohyun
    • Phonetics and Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.67-76
    • /
    • 2022
  • This study aimed to examine Korean listeners' mode of perceiving sound duration as prolongation, whether dichotomous or continuous. Thirty-five Korean participants (17 men and 18 women) listened to the Korean segment /s/, which was lengthened by 0-980ms in 20-ms increments. Then, the participants were asked to rate each version of the sound based on a rating of one to 100 (the closer to 100, the more disfluent). To examine whether listeners perceived durational variations for the fricative segment dichotomously or continuously, a curve was estimated using the best-fitting regression model for the observed data with the highest adjusted R-squared value. The mode of perceiving durational variations for the segment was continuous (or gradient) rather than discontinuous (or dichotomous). No gender difference was found in the mode of perceiving prolongation. However, there was a significant gender difference in that men rated the most disfluent sounds higher than women. The findings of this study were further discussed in relation to the existing literature, and clinical implications for the assessment of stuttering were presented.

A comparative study of coarticulation features between children with and without reading disability (읽기장애아동과 일반아동의 동시조음 특성 비교)

  • Sungsook Park;Cheoljae Seong
    • Phonetics and Speech Sciences
    • /
    • v.16 no.2
    • /
    • pp.99-109
    • /
    • 2024
  • Coarticulation is affected by the continuous movement of the articulator within a limited time and space through the neighboring segments and various overlaps. This study investigated the differences in coarticulation characteristics of children with reading disabilities and nondisabled children in CVC and VCV syllables consisted of stops, affricates, and vowels (a, i, u). The subjects were 13 children with reading disabilities and nondisabled children in the 2nd to 6th grades in elementary school. Two second formants were measured. One was measured at the point where the vowel began, and the other was measured at the mid point of the vowel stable section. Regression analysis was performed with F2 onset and F2 of the following vowel to obtain the locus equation (LE). 3-way ANOVA was conducted to the slope of the LE according to the groups (reading disabilities vs. nondisabled), places of articulation, and phonation types. In CVC syllable, dyslexic children showed a flatter slope than nondisabled children. With respect to the places of articulation, velar or bilabial sounds showed steeper LE slope than alveolar or palatal sounds. There were no main effects regarding group and phonation types variable for VCV syllable, and the significant differences in the places of articulation were also differed from the results for the CVC syllables. This study confirmed that dyslexic children showed a different pattern of coarticulation slope depending on the syllable structure. We also found that the higher pause rate of the dyslexic children had a stronger effect on the coarticulation in VCV structures.

Front-End Processing for Speech Recognition in the Telephone Network (전화망에서의 음성인식을 위한 전처리 연구)

  • Jun, Won-Suk;Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we study the efficient feature vector extraction method and front-end processing to improve the performance of the speech recognition system using KT(Korea Telecommunication) database collected through various telephone channels. First of all, we compare the recognition performances of the feature vectors known to be robust to noise and environmental variation and verify the performance enhancement of the recognition system using weighted cepstral distance measure methods. The experiment result shows that the recognition rate is increasedby using both PLP(Perceptual Linear Prediction) and MFCC(Mel Frequency Cepstral Coefficient) in comparison with LPC cepstrum used in KT recognition system. In cepstral distance measure, the weighted cepstral distance measure functions such as RPS(Root Power Sums) and BPL(Band-Pass Lifter) help the recognition enhancement. The application of the spectral subtraction method decrease the recognition rate because of the effect of distortion. However, RASTA(RelAtive SpecTrAl) processing, CMS(Cepstral Mean Subtraction) and SBR(Signal Bias Removal) enhance the recognition performance. Especially, the CMS method is simple but shows high recognition enhancement. Finally, the performances of the modified methods for the real-time implementation of CMS are compared and the improved method is suggested to prevent the performance degradation.

  • PDF

Video Software Dealers Association v. Arnold Schwarzenegger(2009) of the United States Court of Appeals, Ninth Circuit and its Implication to the Korean Game Law (폭력성 비디오게임에 대한 미국 연방순회항소법원판결이 한국게임법제도에 주는 시사점 : Video Software Dealers Association v. Arnold Schwarzenegger(2009))

  • Park, Min;Hwang, Seung-Heum
    • Journal of Korea Game Society
    • /
    • v.10 no.1
    • /
    • pp.65-78
    • /
    • 2010
  • In Video Software Dealers Association v. Arnold Schwarzenegger, the federal 9th Circuit Court decided that a California law imposing restrictions and a labeling requirement on the sale or rental of violent video games to minors (the "Act") violated rights guaranteed by the First and Fourteenth Amendments to the United States Constitution because: (1) the state introduced insufficient evidence to support a compelling interest that video games created psychological or neurological harm, (2) the Act was not the least-restrictive alternative to negate the harm, and (3) the lower, rational basis standard applicable to commercial speech did not apply to the Act's labeling requirements because the required label did not convey factual information. On the contrary, Korean Constitutional Court decided that "Harmful Medium to Youth" and "Preliminary Rate Classification" would be constitutional. However, under the least-restrictive method rule of the U. S. Court and Korean Court, overlap application of "Harmful Medium to Youth" and "Preliminary Rate Classification" could be a problem and it would be possible that stronger regulation among these would be found as unconstitutional.

Performance Comparison of Out-Of-Vocabulary Word Rejection Algorithms in Variable Vocabulary Word Recognition (가변어휘 단어 인식에서의 미등록어 거절 알고리즘 성능 비교)

  • 김기태;문광식;김회린;이영직;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.27-34
    • /
    • 2001
  • Utterance verification is used in variable vocabulary word recognition to reject the word that does not belong to in-vocabulary word or does not belong to correctly recognized word. Utterance verification is an important technology to design a user-friendly speech recognition system. We propose a new utterance verification algorithm for no-training utterance verification system based on the minimum verification error. First, using PBW (Phonetically Balanced Words) DB (445 words), we create no-training anti-phoneme models which include many PLUs(Phoneme Like Units), so anti-phoneme models have the minimum verification error. Then, for OOV (Out-Of-Vocabulary) rejection, the phoneme-based confidence measure which uses the likelihood between phoneme model (null hypothesis) and anti-phoneme model (alternative hypothesis) is normalized by null hypothesis, so the phoneme-based confidence measure tends to be more robust to OOV rejection. And, the word-based confidence measure which uses the phoneme-based confidence measure has been shown to provide improved detection of near-misses in speech recognition as well as better discrimination between in-vocabularys and OOVs. Using our proposed anti-model and confidence measure, we achieve significant performance improvement; CA (Correctly Accept for In-Vocabulary) is about 89%, and CR (Correctly Reject for OOV) is about 90%, improving about 15-21% in ERR (Error Reduction Rate).

  • PDF

Reconstruction with Radial Forearm Free Flap after Ablative Surgery for Oral Cavity and Oropharyngeal Cancers (구강암과 구인두암의 절제술 후 전완유리피판술을 이용한 재건술)

  • Cho Kwang-Jae;Chun Byung-Jun;Sun Dong-Il;Cho Seung-Ho;Kim Mn-Sik
    • Korean Journal of Head & Neck Oncology
    • /
    • v.19 no.1
    • /
    • pp.41-46
    • /
    • 2003
  • Background and Objectives: Surgical ablation of tumors in the oral cavity and the oropharynx results in a three dimensional defect because of the needs to resect the adjacent area for the surgical margin. Although a variety of techniques are available, radial forearm free flap has been known as an effective method for this defect, which offers a thin, pliable, and relatively hairless skin and a long vascular pedicle. We report the clinical results of our 54 consecutive radial forearm free flaps used for oral cavity and oropharynx cancers. Materials and Methods: We reviewed the medical records of patients who were offered intraoral reconstruction with a radial forearm free flap after ablative surgery for oral cavity and oropharyngeal cancers from August 1994 to February 2003 and analyzed surgical methods, flap survival rate, complication, and functional results. Among these, 20 cases were examined with modified barium swallow to evaluate postoperative swallowing function and other 8 cases with articulation and resonance test for speech. We examined recovery of sensation with two-point discrimination test in 15 cases who were offered sensate flaps. Results: The primary sites were as follows : mobile tongue (18), tonsil (17), floor of mouth (4), base of tongue (2), soft palate (2), retromolar trigone (3), buccal mucosa (1), oro-hypopharynx (6), and lower lip (1). The paddles of flaps were tailored in multilobed designs from oval shape to tetralobed design and in variable size according to the defects after ablation. This procedures resulted in satisfactory flap success rate (96.3%) and showed good swallowing function and social speech. Eight of 15 cases (53.3%) who had offered sensate flap showed recovery of sensation between 1 and 6 postoperative months (average 2.6 month). Conclusion: The reconstruction with radial forearm free flap might be an excellent method for the maximal functional results after ablative surgery of oral cavity and oropharyngeal cancers that results in multidimensional defect.

Improvement of AMR Data Compression Using the Context Tree Weighting Method (Context Tree Weighting을 이용한 AMR 음성 데이터 압축 성능 개선)

  • Lee, Eun-su;Oh, Eun-ju;Yoo, Hoon
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.35-41
    • /
    • 2020
  • This paper proposes an algorithm to improve the compression performance of the adaptive multi-rate (AMR) speech coding using the context tree weighting (CTW) method. AMR is the voice encoding standard adopted by IMT-2000, and supports 8 transmission rates from 4.75 kbit/s to 12.2 kbit/s to cope with changes in the channel condition. CTW as a kind of the arithmetic coding, uses a variable-order Markov model. Considering that CTW operates bit by bit, we propose an algorithm that re-orders AMR data and compresses them with CTW. To verify the validity of the proposed algorithm, an experiment is conducted to compare the proposed algorithm with existing compression methods including ZIP in terms of compression ratio. Experimental results indicate that the average additional compression rate in AMR data is about 3.21% with ZIP and about 9.10% with the proposed algorithm. Thus our algorithm improves the compression performance of AMR data by about 5.89%.

Effects of an Inverted Position on EEG and Heart Rate Variability before and after Qi-gong Training (어깨지지형 도립위(倒立位)가 기공수련(氣功修鍊) 전후(前後)의 뇌파(腦波) 및 심박변이도(心搏變移度)에 미치는 영향)

  • Lee, Sang-Nam;Kwon, Young-Kyu
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.22 no.4
    • /
    • pp.918-929
    • /
    • 2008
  • This study investigated the effects of inverted position on EEG and heart rate variability before and after Bang song gong. BSG is a training method using in qi-gong and meditation to give a convergence of consciousness on body segments in order and take a silent speech of 'song'. The subjects were the 14 university students(n=7 per group) who had not experienced any medical problem and had not practiced BSG. They took a practice of the two way of BSG training program for 30 minutes every other day for two weeks. During practicing BSG, A group took sitting position and lean sitting position by turns, B group took inverted and lean sitting position in the same way. Statistical analysis conducted by two-way ANOVA($2groups^{\ast}2periods$) with p<0.05 for average difference of EEG and HR according to position change in each group before and after BSG. In A group, EEG and HR were changeless irrespective of the change of position and BSG. On the other hand, in B group, significant changes were observed in EEG(p<0.05). ${\alpha}$ wave of inverted position were on the increase, ${\beta}$ and ${\delta}$ wave of inverted position showed smaller power after two weeks training. In the variation of HR, there were smaller variation according to the position change after BSG compared to before BSG(p<0.05). The results suggested that an inverted position may make the depth of meditation deeper, and is likely to be effective for decreasing tension of brain and the sleepiness during qi-gong training. In addition to, an inverted position seemed to promote control of blood pressure of brain. So the application of an inverted position to 'BSG' will be very helpful to achieve deeper relaxation and to obtain the desired effect from qi-gong training.

The Prosodic Changes of Korean English Learners in Robot Assisted Learning (로봇보조언어교육을 통한 초등 영어 학습자의 운율 변화)

  • In, Jiyoung;Han, JeongHye
    • Journal of The Korean Association of Information Education
    • /
    • v.20 no.4
    • /
    • pp.323-332
    • /
    • 2016
  • A robot's recognition and diagnosis of pronunciation and its speech are the most important interactions in RALL(Robot Assisted Language Learning). This study is to verify the effectiveness of robot TTS(Text to Sound) technology in assisting Korean English language learners to acquire a native-like accent by correcting the prosodic errors they commonly make. The child English language learners' F0 range and speaking rate in the 4th grade, a prosodic variable, will be measured and analyzed for any changes in accent. We compare whether robot with the currently available TTS technology appeared to be effective for the 4th graders and 1st graders who were not under the formal English learning with native speaker from the acoustic phonetic viewpoint. Two groups by repeating TTS of RALL responded to the speaking rate rather than F0 range.

A Study on the Spoken KOrean-Digit Recognition Using the Neural Netwok (神經網을 利用한 韓國語 數字音 認識에 관한 硏究)

  • Park, Hyun-Hwa;Gahang, Hae Dong;Bae, Keun Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.5-13
    • /
    • 1992
  • Taking devantage of the property that Korean digit is a mono-syllable word, we proposed a spoken Korean-digit recognition scheme using the multi-layer perceptron. The spoken Korean-digit is divided into three segments (initial sound, medial vowel, and final consonant) based on the voice starting / ending points and a peak point in the middle of vowel sound. The feature vectors such as cepstrum, reflection coefficients, ${\Delta}$cepstrum and ${\Delta}$energy are extracted from each segment. It has been shown that cepstrum, as an input vector to the neural network, gives higher recognition rate than reflection coefficients. Regression coefficients of cepstrum did not affect as much as we expected on the recognition rate. That is because, it is believed, we extracted features from the selected stationary segments of the input speech signal. With 150 ceptral coefficients obtained from each spoken digit, we achieved correct recognition rate of 97.8%.

  • PDF