• 제목/요약/키워드: speech cues

검색결과 117건 처리시간 0.025초

Spike Train Decoding에 기반한 인공와우 어음처리기의 음성시작점 정보 전달특성 평가 (Performance Evaluation of Speech Onset Representation Characteristic of Cochlear Implants Speech Processor using Spike Train Decoding)

  • 김두희;김진호;김경환
    • 대한의용생체공학회:의공학회지
    • /
    • 제28권5호
    • /
    • pp.694-702
    • /
    • 2007
  • The adaptation effect originating from the chemical synapse between auditory nerve and inner hair cell gives advantage in accurate representation of temporal cues of incoming speech such as speech onset. Thus it is expected that the modification of conventional speech processing strategies of cochlear implant(CI) by incorporating the adaptation effect will result in considerable improvement of speech perception performance such as consonant perception score. Our purpose in this paper was to evaluate our new CI speech processing strategy incorporating the adaptation effect by the observation of auditory nerve responses. By classifying the presence or absence of speech from the auditory nerve responses, i. e. spike trains, we could quantitatively compare speech onset detection performances of conventional and improved strategies. We could verify the effectiveness of the adaptation effect in improving the speech onset representation characteristics.

한국어 음성 분할을 위한 특징 검출에 관한 연구 (A Study on the Feature Extraction for the Segmentation of Korean Speech)

  • 이극;황희융
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1987년도 정기총회 및 창립40주년기념 학술대회 학회본부
    • /
    • pp.338-340
    • /
    • 1987
  • The speech recognition system usually consists of two modules, segmentation module and identification module. So, the performance of the system heavily depends on the segmentation accuracy and the segmentation unit. This paper is concerned with the agreeable features for segmentation in syllables. Total energy and two band width energy. (LE:4000-5000Hz and HE:900-3100Hz) are suitable cues for segmentation. And we testify it through the experiment using connected digit.

  • PDF

아동이 산출한 치조마찰음 /ㅅ/에 대한 청지각적·음향학적 연구 (A perceptual and acoustical study of /ㅅ/ in children's speech)

  • 김지연;성철재
    • 말소리와 음성과학
    • /
    • 제10권3호
    • /
    • pp.41-48
    • /
    • 2018
  • This study examined the acoustic characteristics of Korean alveolar fricatives of normal children. Developing children aged 3 and 7, typically produced 2 types of nonsense syllables containing alveolar fricative /sV/ and /VsV/ sequences where V was any one of three corner vowels (/i, a, and u/). Stimuli containing the speech materials used in a production experiment were presented randomly to 12 speech language pathologists (SLPs) for a perception test. The SLPs responded by selecting one of seven alternative sounds. Acoustic measures such as duration of frication noise, normalized intensity, skewness, and center of gravity were examined. There was significant difference in acoustic measures when comparing vowels. Comparison of syllable structures indicated statistically significant differences in duration of frication noise and normalized intensity. Acoustic parameters could account for the perceptual data. Relating the acoustic and perception data by means of logistic regression suggests that duration of frication noise and normalized intensity are the primary cues to perceiving Korean fricatives.

순환 신경망 모델을 이용한 한국어 음소의 음성인식에 대한 연구 (A Study on the Speech Recognition of Korean Phonemes Using Recurrent Neural Network Models)

  • 김기석;황희영
    • 대한전기학회논문지
    • /
    • 제40권8호
    • /
    • pp.782-791
    • /
    • 1991
  • In the fields of pattern recognition such as speech recognition, several new techniques using Artifical Neural network Models have been proposed and implemented. In particular, the Multilayer Perception Model has been shown to be effective in static speech pattern recognition. But speech has dynamic or temporal characteristics and the most important point in implementing speech recognition systems using Artificial Neural Network Models for continuous speech is the learning of dynamic characteristics and the distributed cues and contextual effects that result from temporal characteristics. But Recurrent Multilayer Perceptron Model is known to be able to learn sequence of pattern. In this paper, the results of applying the Recurrent Model which has possibilities of learning tedmporal characteristics of speech to phoneme recognition is presented. The test data consist of 144 Vowel+ Consonant + Vowel speech chains made up of 4 Korean monothongs and 9 Korean plosive consonants. The input parameters of Artificial Neural Network model used are the FFT coefficients, residual error and zero crossing rates. The Baseline model showed a recognition rate of 91% for volwels and 71% for plosive consonants of one male speaker. We obtained better recognition rates from various other experiments compared to the existing multilayer perceptron model, thus showed the recurrent model to be better suited to speech recognition. And the possibility of using Recurrent Models for speech recognition was experimented by changing the configuration of this baseline model.

Word class information in perception of prosodic prominence by Korean learners of English

  • Im, Suyeon
    • 말소리와 음성과학
    • /
    • 제11권4호
    • /
    • pp.1-8
    • /
    • 2019
  • This study aims to investigate how prosodic prominence is perceived in relation to word class information (or parts-of-speech) by Korean learners of English compared with native English speakers in public speech. Two groups, Korean learners of English and native English speakers, were asked to judge words perceived as prominent simultaneously while listening to a speech. Parts-of-speech and three acoustic cues (i.e., max F0, mean phone duration, and mean intensity) were analyzed for each word in the speech. The results showed that content words tended to be higher in pitch and longer in duration than function words. Both groups of listeners rated prominence on content words more frequently than on function words. This tendency, however, was significantly greater for Korean learners of English than for native English speakers. Among the parts-of-speech of the content words, Korean learners of English were more likely than native English speakers to judge nouns and verbs as prominent. This study presents evidence that Korean learners of English consider most, if not all, content words as landing locations of prosodic prominence, in alignment with the previous study on the production of prominence.

Perception of Tamil Mono-Syllabic and Bi-Syllabic Words in Multi-Talker Speech Babble by Young Adults with Normal Hearing

  • Gnanasekar, Sasirekha;Vaidyanath, Ramya
    • Journal of Audiology & Otology
    • /
    • 제23권4호
    • /
    • pp.181-186
    • /
    • 2019
  • Background and Objectives: This study compared the perception of mono-syllabic and bisyllabic words in Tamil by young normal hearing adults in the presence of multi-talker speech babble at two signal-to-noise ratios (SNRs). Further for this comparison, a speech perception in noise test was constructed using existing mono-syllabic and bi-syllabic word lists in Tamil. Subjects and Methods: A total of 30 participants with normal hearing in the age range of 18 to 25 years participated in the study. Speech-in-noise test in Tamil (SPIN-T) constructed using mono-syllabic and bi-syllabic words in Tamil was used as stimuli. The stimuli were presented in the background of multi-talker speech babble at two SNRs (0 dB and +10 dB SNR). Results: The effect of noise on SPIN-T varied with SNR. All the participants performed better at +10 dB SNR, the higher of the two SNRs considered. Additionally, at +10 dB SNR performance did not vary significantly for neither mono-syllabic or bi-syllabic words. However, a significant difference existed at 0 dB SNR. Conclusions: The current study indicated that higher SNR leads to better performance. In addition, bi-syllabic words were identified with minimal errors compared to mono-syllabic words. Spectral cues were the most affected in the presence of noise leading to more of place of articulation errors for both mono-syllabic and bi-syllabic words.

Perception of Tamil Mono-Syllabic and Bi-Syllabic Words in Multi-Talker Speech Babble by Young Adults with Normal Hearing

  • Gnanasekar, Sasirekha;Vaidyanath, Ramya
    • 대한청각학회지
    • /
    • 제23권4호
    • /
    • pp.181-186
    • /
    • 2019
  • Background and Objectives: This study compared the perception of mono-syllabic and bisyllabic words in Tamil by young normal hearing adults in the presence of multi-talker speech babble at two signal-to-noise ratios (SNRs). Further for this comparison, a speech perception in noise test was constructed using existing mono-syllabic and bi-syllabic word lists in Tamil. Subjects and Methods: A total of 30 participants with normal hearing in the age range of 18 to 25 years participated in the study. Speech-in-noise test in Tamil (SPIN-T) constructed using mono-syllabic and bi-syllabic words in Tamil was used as stimuli. The stimuli were presented in the background of multi-talker speech babble at two SNRs (0 dB and +10 dB SNR). Results: The effect of noise on SPIN-T varied with SNR. All the participants performed better at +10 dB SNR, the higher of the two SNRs considered. Additionally, at +10 dB SNR performance did not vary significantly for neither mono-syllabic or bi-syllabic words. However, a significant difference existed at 0 dB SNR. Conclusions: The current study indicated that higher SNR leads to better performance. In addition, bi-syllabic words were identified with minimal errors compared to mono-syllabic words. Spectral cues were the most affected in the presence of noise leading to more of place of articulation errors for both mono-syllabic and bi-syllabic words.

A New Method for Segmenting Speech Signal by Frame Averaging Algorithm

  • Byambajav D.;Kang Chul-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • 제24권4E호
    • /
    • pp.128-131
    • /
    • 2005
  • A new algorithm for speech signal segmentation is proposed. This algorithm is based on finding successive similar frames belonging to a segment and represents it by an average spectrum. The speech signal is a slowly time varying signal in the sense that, when examined over a sufficiently short period of time (between 10 and 100 ms), its characteristics are fairly stationary. Generally this approach is based on finding these fairly stationary periods. Advantages of the. algorithm are accurate border decision of segments and simple computation. The automatic segmentations using frame averaging show as much as $82.20\%$ coincided with manually verified segmentation of CMU ARCTIC corpus within time range 16 ms. More than $90\%$ segment boundaries are coincided within a range of 32 ms. Also it can be combined with many types of automatic segmentations (HMM based, acoustic cues or feature based etc.).

Closure Duration and Pitch as Phonetic Cues to Korean Stop Identity in AP-medial Position: Perception Test

  • Kang, Hyun-Sook;Dilley, Laura
    • 음성과학
    • /
    • 제14권4호
    • /
    • pp.25-39
    • /
    • 2007
  • The present study investigated some perceptual phonetic attributes of two Korean stop types, aspirated and lax, in medial position of an accentual phrase. The intonational pattern across syllables (Jun, 1993) is argued to depend on the type of stop (aspirated vs. lax) only in the initial position of an accentual phrase. In Kang & Dilley (2007), we showed that significant differences between aspirated and lax stops in medial position of an accentual phrase exist in closure duration, voice-onset time, and fundamental frequency (F0) values for post-stop vowels. In the present perception experiment, we investigated whether these phonetic attributes contribute to the perception of these two types of stops: The closure durations and/or F0's of post-stop vowels on accentual-phrase medial words were altered and twenty native Korean speakers then judged these words as beginning with an aspirated or lax stop. Both closure duration and F0 significantly affected judgments of stop identity. These results indicate that a wider range of acoustic cues that distinguish aspirated and lax Korean stops in production also plays a role in perception. To account for these results we suggest some phonetic and phonological models of consonant-tone interactions for Korean.

  • PDF

The Production of Stops by Seoul and Yanbian Korean Speakers

  • 오미라
    • 말소리와 음성과학
    • /
    • 제5권4호
    • /
    • pp.185-193
    • /
    • 2013
  • This study investigates dialectal differences in the acoustic properties of Korean lenis, aspirated, and tense stops Seoul Korean (standard Korean) and Yanbian Korean (spoken in the largest Korean Autonomous Prefecture in China). This production study the main acoustic cues that each dialect uses to mark the laryngeal distinction between the three types of Korean stops. Measurements included VOT, and the initial F0 of the following vowel. Data collected from 10 young Seoul Korean speakers, 10 young Yanbian Korean speakers, and 6 older Yanbian speakers. two key findings: First, aspirated and lenis stops are mainly differentiated by F0 in Seoul Korean, and by $H1^*-H2^*$ in Yanbian Korean. Second, there is no VOT merger between lenis and aspirated stops in Yanbian Korean, whereas there is in Seoul Korean. These results are discussed in terms of the phenomenon of VOT shift and the function of F0t is argued that the function of F0 to substitute for VOT difference as a primary cue for the coding of laryngeal contrast can be predicted by the pitch accent system of the language involved.