• Title/Summary/Keyword: speech rates

Search Result 271, Processing Time 0.021 seconds

Evaluation of the Device Failure Using Stimulus Artifact in the Cochlear Implantee (인공와우 이식자에서 자극 잡파를 이용한 고장 평가)

  • Heo, Seung-Deok;Kim, Sang-Ryeol;Ahn, Joong-Ki;Jung, Dong-Keun;Kang, Myung-Koo
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.35-42
    • /
    • 2007
  • The aim of this study is to analyze the correlation between current intensity and amplitude of stimulus artifact on the cochlear implantee, and to find out basic information to check the device failure. Subjects were a prelingual child and 3 postlingual adults with more than severe hearing losses. The charge-balanced biphasic pulses were presented at stimulus rates of 11 pulses per second, each pulse width of $25{\mu}s$ with monopolar mode(MP1+2). Current intensities were delivered at 27.5, 33.7, 41.3, 50.5, 61.9, $75.8{\mu}A$. Stimulus artifacts were recorded by evoked potential system. This procedure was performed just before the initial stimulation, and then, the amplitude of stimulus artifacts were compared with each current intensity. The amplitude of stimulus artifacts was increased significantly according to the current intensity (p<0.01). The results suggest that the change of the amplitude of stimulus artifact can be used as a good cue to check the device failure in the cochlear implantee.

  • PDF

Fast Sequential Probability Ratio Test Method to Obtain Consistent Results in Speaker Verification (화자확인에서 일정한 결과를 얻기 위한 빠른 순시 확률비 테스트 방법)

  • Kim, Eun-Young;Seo, Chang-Woo;Jeon, Sung-Chae
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.63-68
    • /
    • 2010
  • A new version of sequential probability ratio test (SPRT) which has been investigated in utterance-length control is proposed to obtain uniform response results in speaker verification (SV). Although SPRTs can obtain fast responses in SV tests, differences in the performance may occur depending on the compositions of consonants and vowels in the sentences used. In this paper, a fast sequential probability ratio test (FSPRT) method that shows consistent performances at all times regardless of the compositions of vocalized sentences for SV will be proposed. In generating frames, the FSPRT will first conduct SV test processes with only generated frames without any overlapping and if the results do not satisfy discrimination criteria, the FSPRT will sequentially use frames applied with overlapping. With the progress of processes as such, the test will not be affected by the compositions of sentences for SV and thus fast response outcomes and even consistent performances can be obtained. Experimental results show that the FSPRT has better performance to the SPRT method while requiring less complexity with equal error rates (EER).

  • PDF

A Study on the Implementation of Signal Transmission System Within Electric Culvert (지하 전력 구내에서 신호 전송 시스템의 실현에 관한 연구)

  • 진달복;오상기;최성주;나채동
    • The Proceedings of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.7 no.3
    • /
    • pp.49-56
    • /
    • 1993
  • This paper describes design and implementation of signal transmission system using LCX as communication media, which has characteristics of high reliability easy for expansion and complex transmission of voice, data and video signal in Electric culvert. In this system, we estimated system performance as result of variable transmission characteristics test. In case of voice signal, transmission loss characteristics improved 5-10(dB] than designed Values in received signal level. In the test of speech quality estimation, we obtained satisfactory result as speech intensity = 3 (QSA value), speech atriculation = 4 (QRK value). As for data and video signal transmission, communication success rates were 981% 1 in monitoring and control functional test. As a result of transmission characteristics test in transmission line and system, transmission range by LCX communication system without repeater can reach in 6Km. This paper presents basic construction method using LCX communication system for total management in Electric culvert.

  • PDF

A Training Algorithm for the Transform Trellis Code with Applications to Stationary Gaussian Sources and Speech (정상 가우시안 소오스와 음성 신호용 변환 격자 코드에 대한 훈련 알고리즘 개발)

  • Kim, Dong-Youn;Park, Yong-Seo;Whang, Keum-Chan;Pearlman, William A.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.1
    • /
    • pp.22-34
    • /
    • 1992
  • There exists a transform trellis code that is optimal for stationary Gaussian sources and the squared-error distortion measure at all rates. In this paper, we train an asymptotically optimal version of such a code to obtain one which is matched better to the statistics of real world data. The training algorithm uses the M algorithm to search the trellis codebook and the LBG algorithm to update the trellis codebook. We investigate the trained transform trellis coding scheme for the first-order AR(autoregressive) Gaussian source whose correlation coefficient is 0.9 and actual speech sentences. For the first-order AR source, the achieved SNR for the test sequence is from 0.6 to 1.4 dB less than the maximum achievable SNR as given by Shannon's rate-distortion function for this source, depending on the rate and surpasses all previous known results for this source. For actual speech data, to achieve improved performance, we use window functions and gain adaptation at rate 1.0 bits/sample.

  • PDF

A Study on Hybrid Structure of Semi-Continuous HMM and RBF for Speaker Independent Speech Recognition (화자 독립 음성 인식을 위한 반연속 HMM과 RBF의 혼합 구조에 관한 연구)

  • 문연주;전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.94-99
    • /
    • 1999
  • It is the hybrid structure of HMM and neural network(NN) that shows high recognition rate in speech recognition algorithms. And it is a method which has majorities of statistical model and neural network model respectively. In this study, we propose a new style of the hybrid structure of semi-continuous HMM(SCHMM) and radial basis function(RBF), which re-estimates weighting coefficients probability affecting observation probability after Baum-Welch estimation. The proposed method takes account of the similarity of basis Auction of RBF's hidden layer and SCHMM's probability density functions so as to discriminate speech signals sensibly through the learned and estimated weighting coefficients of RBF. As simulation results show that the recognition rates of the hybrid structure SCHMM/RBF are higher than those of SCHMM in unlearned speakers' recognition experiment, the proposed method has been proved to be one which has more sensible property in recognition than SCHMM.

  • PDF

Speech Recognition in Noisy environment using Transition Constrained HMM (천이 제한 HMM을 이용한 잡음 환경에서의 음성 인식)

  • Kim, Weon-Goo;Shin, Won-Ho;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.85-89
    • /
    • 1996
  • In this paper, transition constrained Hidden Markov Model(HMM) in which the transition between states occur only within prescribed time slot is proposed and the performance is evaluated in the noisy environment. The transition constrained HMM can explicitly limit the state durations and accurately de scribe the temporal structure of speech signal simply and efficiently. The transition constrained HMM is not only superior to the conventional HMM but also require much less computation time. In order to evaluate the performance of the transition constrained HMM, speaker independent isolated word recognition experiments were conducted using semi-continuous HMM with the noisy speech for 20, 10, 0 dB SNR. Experiment results show that the proposed method is robust to the environmental noise. The 81.08% and 75.36% word recognition rates for conventional HMM was increased by 7.31% and 10.35%, respectively, by using transition constrained HMM when two kinds of noises are added with 10dB SNR.

  • PDF

Normalized gestural overlap measures and spatial properties of lingual movements in Korean non-assimilating contexts

  • Son, Minjung
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.31-38
    • /
    • 2019
  • The current electromagnetic articulography study analyzes several articulatory measures and examines whether, and if so, how they are interconnected, with a focus on cluster types and an additional consideration of speech rates and morphosyntactic contexts. Using articulatory data on non-assimilating contexts from three Seoul-Korean speakers, we examine how speaker-dependent gestural overlap between C1 and C2 in a low vowel context (/a/-to-/a/) and their resulting intergestural coordination are realized. Examining three C1C2 sequences (/k(#)t/, /k(#)p/, and /p(#)t/), we found that three normalized gestural overlap measures (movement onset lag, constriction onset lag, and constriction plateau lag) were correlated with one another for all speakers. Limiting the scope of analysis to C1 velar stop (/k(#)t/ and /k(#)p/), the results are recapitulated as follows. First, for two speakers (K1 and K3), i) longer normalized constriction plateau lags (i.e., less gestural overlap) were observed in the pre-/t/ context, compared to the pre-/p/ (/k(#)t/>/k(#)p/), ii) the tongue dorsum at the constriction offset of C1 in the pre-/t/ contexts was more anterior, and iii) these two variables are correlated. Second, the three speakers consistently showed greater horizontal distance between the vertical tongue dorsum and the vertical tongue tip position in /k(#)t/ sequences when it was measured at the time of constriction onset of C2 (/k(#)t/>/k(#)p/): the tongue tip completed its constriction onset by extending further forward in the pre-/t/ contexts than the uncontrolled tongue tip articulator in the pre-/p/ contexts (/k(#)t/>/k(#)p/). Finally, most speakers demonstrated less variability in the horizontal distance of the lingual-lingual sequences, which were taken as the active articulators (/k(#)t/=/k(#)p/ for K1; /k(#)t/

Machine-learning-based out-of-hospital cardiac arrest (OHCA) detection in emergency calls using speech recognition (119 응급신고에서 수보요원과 신고자의 통화분석을 활용한 머신 러닝 기반의 심정지 탐지 모델)

  • Jong In Kim;Joo Young Lee;Jio Chung;Dae Jin Shin;Dong Hyun Choi;Ki Hong Kim;Ki Jeong Hong;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.109-118
    • /
    • 2023
  • Cardiac arrest is a critical medical emergency where immediate response is essential for patient survival. This is especially true for Out-of-Hospital Cardiac Arrest (OHCA), for which the actions of emergency medical services in the early stages significantly impact outcomes. However, in Korea, a challenge arises due to a shortage of dispatcher who handle a large volume of emergency calls. In such situations, the implementation of a machine learning-based OHCA detection program can assist responders and improve patient survival rates. In this study, we address this challenge by developing a machine learning-based OHCA detection program. This program analyzes transcripts of conversations between responders and callers to identify instances of cardiac arrest. The proposed model includes an automatic transcription module for these conversations, a text-based cardiac arrest detection model, and the necessary server and client components for program deployment. Importantly, The experimental results demonstrate the model's effectiveness, achieving a performance score of 79.49% based on the F1 metric and reducing the time needed for cardiac arrest detection by 15 seconds compared to dispatcher. Despite working with a limited dataset, this research highlights the potential of a cardiac arrest detection program as a valuable tool for responders, ultimately enhancing cardiac arrest survival rates.

A New Wideband Speech/Audio Coder Interoperable with ITU-T G.729/G.729E (ITU-T G.729/G.729E와 호환성을 갖는 광대역 음성/오디오 부호화기)

  • Kim, Kyung-Tae;Lee, Min-Ki;Youn, Dae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.2
    • /
    • pp.81-89
    • /
    • 2008
  • Wideband speech, characterized by a bandwidth of about 7 kHz (50-7000 Hz), provides a substantial quality improvement in terms of naturalness and intelligibility. Although higher data rates are required, it has extended its application to audio and video conferencing, high-quality multimedia communications in mobile links or packet-switched transmissions, and digital AM broadcasting. In this paper, we present a new bandwidth-scalable coder for wideband speech and audio signals. The proposed coder spits 8kHz signal bandwidth into two narrow bands, and different coding schemes are applied to each band. The lower-band signal is coded using the ITU-T G.729/G.729E coder, and the higher-band signal is compressed using a new algorithm based on the gammatone filter bank with an invertible auditory model. Due to the split-band architecture and completely independent coding schemes for each band, the output speech of the decoder can be selected to be a narrowband or wideband according to the channel condition. Subjective tests showed that, for wideband speech and audio signals, the proposed coder at 14.2/18 kbit/s produces superior quality to ITU-T 24 kbit/s G.722.1 with the shorter algorithmic delay.

A Study on Duration Length and Place of Feature Extraction for Phoneme Recognition (음소 인식을 위한 특징 추출의 위치와 지속 시간 길이에 관한 연구)

  • Kim, Bum-Koog;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.4
    • /
    • pp.32-39
    • /
    • 1994
  • As a basic research to realize Korean speech recognition system, phoneme recognition was carried out to find out ; 1) the best place which represents each phoneme's characteristics, and 2) the reasonable length of duration for obtaining the best recognition rates. For the recognition experiments, multi-speaker dependent recognition with Bayesian decision rule using 21 order of cepstral coefficient as a feature parameter was adopted. It turned out that the best place of feature extraction for the highest recognition rates were 10~50ms in vowels, 40~100ms in fricatives and affricates, 10~50ms in nasals and liquids, and 10~50ms in plosives. And about 70ms of duration was good enough for the recognition of all 35 phonemes.

  • PDF