• Title/Summary/Keyword: Speech characteristics

Search Result 969, Processing Time 0.027 seconds

Analysis of Transient Features in Speech Signal by Estimating the Short-term Energy and Inflection points (변곡점 및 단구간 에너지평가에 의한 음성의 천이구간 특징분석)

  • Choi, I.H.;Jang, S.K.;Cha, T.H.;Choi, U.S.;Kim, C.S.
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.156-166
    • /
    • 1998
  • In this paper, I would like to propose a dividing method by estimating the inflection points and the average magnitude energy in speech signals. The method proposed in this paper gave not only a satisfactory solution for the problems on dividing method by zero-crossing rate, but could estimate the feature of the transient period after dividing the starting point and transient period in speech signals before steady state. In the results of the experiment carried out with monosyllabic speech, it was found that even through speech samples indicated in D.C. level, the staring and ending point of the speech signals were exactly divided by the method. In addition to the results, I could compare with the features, such as the length of transient period, the short term energy, the frequency characteristics, in each speech signal.

  • PDF

Acoustic Features of Phonatory Offset-Onset in the Connected Speech between a Female Stutterer and Non-Stutterers (연속구어 내 발성 종결-개시의 음향학적 특징 - 말더듬 화자와 비말더듬 화자 비교 -)

  • Han, Ji-Yeon;Lee, Ok-Bun
    • Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.19-33
    • /
    • 2006
  • The purpose of this paper was to examine acoustical characteristics of phonatory offset-onset mechanism in the connected speech of female adults with stuttering and normal nonfluency. The phonatory offset-onset mechanism refers to the laryngeal articulatory gestures. Those gestures are required to mark word boundaries in phonetic contexts of the connected speech. This mechanism included 7 patterns based on the speech spectrogram. This study showed the acoustic features in the connected speech in the production of female adults with stuttering (n=1) and normal nonfluency (n=3). Speech tokens in V_V, V_H, and V_S contexts were selected for the analysis. Speech samples were recorded by Sound Forge, and the spectrographic analysis was conducted using Praat. Results revealed a stuttering (with a type of block) female exhibited more laryngealization gestures in the V_V context. Laryngealization gesture was more characterized by a complete glottal stop or glottal fry both in V_H and in V_S contexts. The results were discussed from theoretical and clinical perspectives.

  • PDF

Real-Time Implementation of Wireless Remote Control of Mobile Robot Based-on Speech Recognition Command (음성명령에 의한 모바일로봇의 실시간 무선원격 제어 실현)

  • Shim, Byoung-Kyun;Han, Sung-Hyun
    • Journal of the Korean Society of Manufacturing Technology Engineers
    • /
    • v.20 no.2
    • /
    • pp.207-213
    • /
    • 2011
  • In this paper, we present a study on the real-time implementation of mobile robot to which the interactive voice recognition technique is applied. The speech command utters the sentential connected word and asserted through the wireless remote control system. We implement an automatic distance speech command recognition system for voice-enabled services interactively. We construct a baseline automatic speech command recognition system, where acoustic models are trained from speech utterances spoken by a microphone. In order to improve the performance of the baseline automatic speech recognition system, the acoustic models are adapted to adjust the spectral characteristics of speech according to different microphones and the environmental mismatches between cross talking and distance speech. We illustrate the performance of the developed speech recognition system by experiments. As a result, it is illustrated that the average rates of proposed speech recognition system shows about 95% above.

Algorithm for Concatenating Multiple Phonemic Units for Small Size Korean TTS Using RE-PSOLA Method

  • Bak, Il-Suh;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.85-94
    • /
    • 2003
  • In this paper an algorithm to reduce the size of Text-to-Speech database is proposed. The algorithm is based on the characteristics of Korean phonemic units. From the initial database, a reduced phoneme unit set is induced by articulatory similarity of concatenating phonemes. Speech data is read by one female announcer for 1000 phonetically balanced sentences. All the recorded speech is then segmented by phoneticians. Total size of the original speech data is about 640 MB including laryngograph signal. To synthesize wave, RE-PSOLA (Residual-Excited Pitch Synchronous Overlap and Add Method) was used. The voice quality of synthesized speech was compared with original speech in terms of spectrographic informations and objective tests. The quality of the synthesized speech is not much degraded when the size of synthesis DB was reduced from 320 MB to 82 MB.

  • PDF

The Prosodic Characteristics of Pre-school Age Children-Related Adults (학령전기아동 관련 성인의 운율 특성)

  • Kim, Jiwon;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.23-32
    • /
    • 2014
  • This study presents the prosodic characteristics of 'Motherese' and 'Teacherese (child care teacher and kindergarten teacher)'. 21 mothers and 24 teachers spoke to children in the child care center or kindergarten. Children are in their 4;00-6;11. Speech and articulation rate, number of accentual phrases (APs), number of intonational phrases (IPs), pitch-related factors (f0, pitch range, f0 standard deviation), and intonation slope (mean Absolute, f0, q-tone slope) were measured. 2 groups spoke 2 sentential types (interrogative_ alternative question, declarative_ coordinated sentence) in 2 situations (one accompanied with the children, the other done without children, but pretending as if they were in front of the children). The results indicate that teachers show more noticeable prosodic characteristics than mothers do.

Acoustic Characteristics of Korean Deaf Speakers

  • Lee, S.H.;Huh, M.J.;Jeoung, O.R.;Cho, T.H.
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.89-94
    • /
    • 1997
  • This study was attempted to analyze the acoustic characteristics of profoundly deaf students. The 59 profoundly hearing-impaired and 36 normal subjects were divided into 3 age groups: 6-10 yrs group, 11-15 yrs group, and 16-20 yrs group. The voice was sampled in /a/ prolongation, counting, reading, and conversation using the Computerized Speech ,Lab (CSL). The vocal pitch of the deaf subjects was significantly higher than the normal subjects. The younger in age was tended to be higher in pitch and jitter values of the deaf subjects. The three age groups of the deaf subjects did not show any difference in loudness and shimmer, excepted to minimum loudness. The pitch mean of males was significantly lower than that for females.

  • PDF

Prosody of cerebral palsic adults' speech (뇌성마비 성인 발화의 운율 특징)

  • Lee, Sook-Hyang;Ko, Hyun-Ju;Kim, Soo-Jin
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.49-51
    • /
    • 2007
  • The purpose of this study is to investigate prosodic characteristics of cerebral palsic adults' speech. The results showed some correlations between their articulation scores and prosodic properties of their speech: speakers with low articulation scores showed slower speech rate, larger number of IPs and pauses, and longer duration of pauses. They also showed steeper slopes of [L +H] in their APs.

  • PDF

Characteristics of Laryngeal-Diadochokinesis (L-DDK) in Nonfluent Speakers (비유창성 화자의 후두 교호운동 특성)

  • Han, Ji-Yeon;Lee, Ok-Bun;Park, Hee-Jun;Lim, Hye-Jin
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.55-64
    • /
    • 2007
  • Laryngeal DDK involve with the rate, pattern, and regularity (periodicity) in opening and closing of vocal fold. This study was aimed at investigating the characteristics of laryngeal DDK between nonfluent and fluent speakers. One with an ataxic dysarthria (with cerebellar lesion) and the other with stuttering, and 13 normal speakers were evaluated. L-DDK were analyzed with MSP (motor speech profile, CSL 4400). Measures of DDK included: DDKavr, DDKcvp, DDKjit, DDKavp. An ataxic dysarthric speaker and a stutterer showed more reduced rate and aperiodic L-DDK (both adductory and abductory movement) than normal speakers. But the average L-DDK period (ms) in adductory movement in a speaker with stuttering showed more decreased than the other. Results from this study are preliminary. Nonetheless, results of L-DDK produced by nonfluent speakers suggested the possibility to have relation with slow rate of phonatory initiation and connected speech. In the future, perceptual studies are needed in conjuction with acoustic and speech production.

  • PDF

A Speech Homomorphic Encryption Scheme with Less Data Expansion in Cloud Computing

  • Shi, Canghong;Wang, Hongxia;Hu, Yi;Qian, Qing;Zhao, Hong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.5
    • /
    • pp.2588-2609
    • /
    • 2019
  • Speech homomorphic encryption has become one of the key components in secure speech storing in the public cloud computing. The major problem of speech homomorphic encryption is the huge data expansion of speech cipher-text. To address the issue, this paper presents a speech homomorphic encryption scheme with less data expansion, which is a probabilistic statistics and addition homomorphic cryptosystem. In the proposed scheme, the original digital speech with some random numbers selected is firstly grouped to form a series of speech matrix. Then, a proposed matrix encryption method is employed to encrypt that speech matrix. After that, mutual information in sample speech cipher-texts is reduced to limit the data expansion. Performance analysis and experimental results show that the proposed scheme is addition homomorphic, and it not only resists statistical analysis attacks but also eliminates some signal characteristics of original speech. In addition, comparing with Paillier homomorphic cryptosystem, the proposed scheme has less data expansion and lower computational complexity. Furthermore, the time consumption of the proposed scheme is almost the same on the smartphone and the PC. Thus, the proposed scheme is extremely suitable for secure speech storing in public cloud computing.

Prominence Detection Using Feature Differences of Neighboring Syllables for English Speech Clinics (영어 강세 교정을 위한 주변 음 특징 차를 고려한 강조점 검출)

  • Shim, Sung-Geon;You, Ki-Sun;Sung, Won-Yong
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.15-22
    • /
    • 2009
  • Prominence of speech, which is often called 'accent,' affects the fluency of speaking American English greatly. In this paper, we present an accurate prominence detection method that can be utilized in computer-aided language learning (CALL) systems. We employed pitch movement, overall syllable energy, 300-2200 Hz band energy, syllable duration, and spectral and temporal correlation as features to model the prominence of speech. After the features for vowel syllables of speech were extracted, prominent syllables were classified by SVM (Support Vector Machine). To further improve accuracy, the differences in characteristics of neighboring syllables were added as additional features. We also applied a speech recognizer to extract more precise syllable boundaries. The performance of our prominence detector was measured based on the Intonational Variation in English (IViE) speech corpus. We obtained 84.9% accuracy which is about 10% higher than previous research.

  • PDF