• Title/Summary/Keyword: Speech Analysis

Search Result 1,580, Processing Time 0.04 seconds

On a Reduction of Computation Time of FFT Cepstrum (FFT 켑스트럼의 처리시간 단축에 관한 연구)

  • Jo, Wang-Rae;Kim, Jong-Kuk;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.57-64
    • /
    • 2003
  • The cepstrum coefficients are the most popular feature for speech recognition or speaker recognition. The cepstrum coefficients are also used for speech synthesis and speech coding but has major drawback of long processing time. In this paper, we proposed a new method that can reduce the processing time of FFT cepstrum analysis. We use the normal ordered inputs for FFT function and the bit-reversed inputs for IFFT function. Therefore we can omit the bit-reversing process and reduce the processing time of FFT ceptrum analysis.

  • PDF

Acoustic Features of Phonatory Offset-Onset in the Connected Speech between a Female Stutterer and Non-Stutterers (연속구어 내 발성 종결-개시의 음향학적 특징 - 말더듬 화자와 비말더듬 화자 비교 -)

  • Han, Ji-Yeon;Lee, Ok-Bun
    • Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.19-33
    • /
    • 2006
  • The purpose of this paper was to examine acoustical characteristics of phonatory offset-onset mechanism in the connected speech of female adults with stuttering and normal nonfluency. The phonatory offset-onset mechanism refers to the laryngeal articulatory gestures. Those gestures are required to mark word boundaries in phonetic contexts of the connected speech. This mechanism included 7 patterns based on the speech spectrogram. This study showed the acoustic features in the connected speech in the production of female adults with stuttering (n=1) and normal nonfluency (n=3). Speech tokens in V_V, V_H, and V_S contexts were selected for the analysis. Speech samples were recorded by Sound Forge, and the spectrographic analysis was conducted using Praat. Results revealed a stuttering (with a type of block) female exhibited more laryngealization gestures in the V_V context. Laryngealization gesture was more characterized by a complete glottal stop or glottal fry both in V_H and in V_S contexts. The results were discussed from theoretical and clinical perspectives.

  • PDF

Developing the speech screening test for 4-year-old children and application of Korean speech sound analysis tool (KSAT) (4세 말소리발달 선별검사 개발과 한국어말소리분석도구(Korean Speech Sound Analysis Tool, KSAT)의 활용)

  • Soo-Jin Kim;Ki-Wan Jang;Moon-Soo Chang
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.49-55
    • /
    • 2024
  • This study aims to develop a three-sentence speech screening test to evaluate speech development in 4-year-old children and provide standards for comparison with peers. Screening tests were conducted on 24 children each in the first and second halves of 4 years old. The screening test results showed a correlation of .7 with the existing speech disorder evaluation test results. We compared whether there was a difference between the two groups of 4-year-old in the phonological development indicators and error patterns obtained through the screening test. The developmental indicators of the children in the second half were high, but there were no statistically significant differences. The Korean Speech Sound Analysis Tool (KSAT) was used for all analyses, and the automatic analysis results and contents of the clinician's manual analysis were compared. The degree of agreement between the automatic and manual error pattern analyses was 93.63%. The significance of this study is that the standard of speech of a 4-year-old child of the speech screening test according to three sentences at the level of elicited sentences, and the applicability of the KSAT were reviewed in both clinical and research fields.

A Speech Homomorphic Encryption Scheme with Less Data Expansion in Cloud Computing

  • Shi, Canghong;Wang, Hongxia;Hu, Yi;Qian, Qing;Zhao, Hong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.5
    • /
    • pp.2588-2609
    • /
    • 2019
  • Speech homomorphic encryption has become one of the key components in secure speech storing in the public cloud computing. The major problem of speech homomorphic encryption is the huge data expansion of speech cipher-text. To address the issue, this paper presents a speech homomorphic encryption scheme with less data expansion, which is a probabilistic statistics and addition homomorphic cryptosystem. In the proposed scheme, the original digital speech with some random numbers selected is firstly grouped to form a series of speech matrix. Then, a proposed matrix encryption method is employed to encrypt that speech matrix. After that, mutual information in sample speech cipher-texts is reduced to limit the data expansion. Performance analysis and experimental results show that the proposed scheme is addition homomorphic, and it not only resists statistical analysis attacks but also eliminates some signal characteristics of original speech. In addition, comparing with Paillier homomorphic cryptosystem, the proposed scheme has less data expansion and lower computational complexity. Furthermore, the time consumption of the proposed scheme is almost the same on the smartphone and the PC. Thus, the proposed scheme is extremely suitable for secure speech storing in public cloud computing.

The Use of Phonetics in the Analysis of the Acquisition of Second Language Syntax

  • Fellbaum, Marie
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.430-431
    • /
    • 1996
  • Among the scholars of second language (L2) acquisition who have used prosodic considerations in syntactic analyses, pausing and intonation contours have been used to define utterances in the speech of second language learners (e.g., Sato, 1990). In recent research on conversational analysis, it has been found that lexically marked causal clause combining in the discourse of native speakers can be distinguished as "intonational subordination" and "intonational coordination(Couper-Kuhlen, Elizabeth, forthcoming.)". This study uses Pienemann's Processability Theory (1995) for an analysis of the speech of native speakers of Japanese (L1) learning English. In order to accurately assess the psycholinguistic stages of syntactic development, it is shown that pitch, loudness, and timing must all be considered together with the syntactic analysis of interlanguage speech production. Twelve Japanese subjects participated in eight fifteen minute interviews, ninety-six dyads. The speech analyzed in this report is limited to the twelve subjects interacting with two different non-native speaker interviews for a total of twenty-four dyads. Within each of the interviews, four different tasks are analyzed to determine the stage of acquisition of English for each subject. Initially the speech is segmented according to intonation contour arid pauses. It is then classified accoding to specific syntactic units and further analysed for pitch, loudness and timing. Results indicate that the speech must be first claasified prosodic ally and lexically, prior to beginning syntactic analysis. This analysis stinguishes three interlanguage lexical categories: discourse markers, coordinator $s_ordinators, and transfer from Japanese. After these lexical categories have been determined, the psycholinguistic stages of syntactic development can be more accurately assessed.d.

  • PDF

Differentiation of Aphasic Patients from the Normal Control Via a Computational Analysis of Korean Utterances

  • Kim, HyangHee;Choi, Ji-Myoung;Kim, Hansaem;Baek, Ginju;Kim, Bo Seon;Seo, Sang Kyu
    • International Journal of Contents
    • /
    • v.15 no.1
    • /
    • pp.39-51
    • /
    • 2019
  • Spontaneous speech provides rich information defining the linguistic characteristics of individuals. As such, computational analysis of speech would enhance the efficiency involved in evaluating patients' speech. This study aims to provide a method to differentiate the persons with and without aphasia based on language usage. Ten aphasic patients and their counterpart normal controls participated, and they were all tasked to describe a set of given words. Their utterances were linguistically processed and compared to each other. Computational analyses from PCA (Principle Component Analysis) to machine learning were conducted to select the relevant linguistic features, and consequently to classify the two groups based on the features selected. It was found that functional words, not content words, were the main differentiator of the two groups. The most viable discriminators were demonstratives, function words, sentence final endings, and postpositions. The machine learning classification model was found to be quite accurate (90%), and to impressively be stable. This study is noteworthy as it is the first attempt that uses computational analysis to characterize the word usage patterns in Korean aphasic patients, thereby discriminating from the normal group.

Conveyed Message in YouTube Product Review Videos: The discrepancy between sponsored and non-sponsored product review videos

  • Kim, Do Hun;Suh, Ji Hae
    • The Journal of Information Systems
    • /
    • v.32 no.4
    • /
    • pp.29-50
    • /
    • 2023
  • Purpose The impact of online reviews is widely acknowledged, with extensive research focused on text-based reviews. However, there's a lack of research regarding reviews in video format. To address this gap, this study aims to explore the connection between company-sponsored product review videos and the extent of directive speech within them. This article analyzed viewer sentiments expressed in video comments based on the level of directive speech used by the presenter. Design/methodology/approach This study involved analyzing speech acts in review videos based on sponsorship and examining consumer reactions through sentiment analysis of comments. We used Speech Act theory to perform the analysis. Findings YouTubers who receive company sponsorship for review videos tend to employ more directive speech. Furthermore, this increased use of directive speech is associated with a higher occurrence of negative consumer comments. This study's outcomes are valuable for the realm of user-generated content and natural language processing, offering practical insights for YouTube marketing strategies.

Analysis on Vowel and Consonant Sounds of Patent's Speech with Velopharyngeal Insufficiency (VPI) and Simulated Speech (구개인두부전증 환자와 모의 음성의 모음과 자음 분석)

  • Sung, Mee Young;Kim, Heejin;Kwon, Tack-Kyun;Sung, Myung-Whun;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.7
    • /
    • pp.1740-1748
    • /
    • 2014
  • This paper focuses on listening test and acoustic analysis of patients' speech with velopharyngeal insufficiency (VPI) and normal speakers' simulation speech. In this research, a set consisting of 50-words, vowels and single syllables is determined for speech database construction. A web-based listening evaluation system is developed for a convenient/automated evaluation procedure. The analysis results show the trend of incorrect recognition for VPI speech and the one for simulation speech are similar. Such similarity is also confirmed by comparing the formant locations of vowel and spectrum of consonant sounds. These results show that the simulation method for VPI speech is effective at generating the speech signals similar to actual VPI patient's speech. It is expected that the simulation speech data can be effectively employed for our future work such as acoustic model adaptation.

An Analysis of Acoustic Features Caused by Articulatory Changes for Korean Distant-Talking Speech

  • Kim Sunhee;Park Soyoung;Yoo Chang D.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.2E
    • /
    • pp.71-76
    • /
    • 2005
  • Compared to normal speech, distant-talking speech is characterized by the acoustic effect due to interfering sound and echoes as well as articulatory changes resulting from the speaker's effort to be more intelligible. In this paper, the acoustic features for distant-talking speech due to the articulatory changes will be analyzed and compared with those of the Lombard effect. In order to examine the effect of different distances and articulatory changes, speech recognition experiments were conducted for normal speech as well as distant-talking speech at different distances using HTK. The speech data used in this study consist of 4500 distant-talking utterances and 4500 normal utterances of 90 speakers (56 males and 34 females). Acoustic features selected for the analysis were duration, formants (F1 and F2), fundamental frequency, total energy and energy distribution. The results show that the acoustic-phonetic features for distant-talking speech correspond mostly to those of Lombard speech, in that the main resulting acoustic changes between normal and distant-talking speech are the increase in vowel duration, the shift in first and second formant, the increase in fundamental frequency, the increase in total energy and the shift in energy from low frequency band to middle or high bands.

Use of Acoustic Analysis for Indivisualised Therapeutic Planning and Assessment of Treatment Effect in the Dysarthric Children (조음장애 환아에서 개별화된 치료계획 수립과 효과 판정을 위한 음향음성학적 분석방법의 활용)

  • Kim, Yun-Hee;Yu, Hee;Shin, Seung-Hun;Kim, Hyun-Gi
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.19-35
    • /
    • 2000
  • Speech evaluation and treatment planning for the patients with articulation disorders have traditionally been based on perceptual judgement by speech pathologists. Recently, various computerized speech analysis systems have been developed and commonly used in clinical settings to obtain the objective and quantitative data and specific treatment strategies. 10 dysarthric children (6 neurogenic and 4 functional dysarthria) participated in this experiment. Speech evaluation of dysarthria was performed in two ways; first, the acoustic analysis by Visi-Pitch and a Computerized Speech Lab and second, the perceptual scoring of phonetic errors rates in 100 word test. The results of the initial evaluation served as primary guidlines for the indivisualized treatment planning of each patient's speech problems. After mean treatment period of 5 months, the follow-up data of both dysarthric groups showed increased maximum phonation time, increased alternative motion rate and decreased occurrence of articulatory deviation. The changes of acoustic data and therapeutic effects were more prominent in children with dysarthria due to neurologic causes than with functional dysarthria. Three cases including their pre- and post treatment data were illustrated in detail.

  • PDF