• Title/Summary/Keyword: Connected speech

Search Result 147, Processing Time 0.024 seconds

Utterance Verification and Substitution Error Correction In Korean Connected Digit Recognition (한국어 연결숫자 인식에서의 발화 검증과 대체오류 수정)

  • Jung Du Kyung;Song Hwa Jeon;Jung Ho-Young;Kim Hyung Soon
    • MALSORI
    • /
    • no.45
    • /
    • pp.79-91
    • /
    • 2003
  • Utterance verification aims at rejecting both out-of-vocabulary (OOV) utterances and low-confidence-scored in-vocabulary (IV) utterances. For utterance verification on Korean connected digit recognition task, we investigate several methods to construct filler and anti-digit models. In particular, we propose a substitution error correction method based on 2-best decoding results. In this method, when 1st candidate is rejected, 2nd candidate is selected if it is accepted by a specific hypothesis test, instead of simply rejecting the 1st one. Experimental results show that the proposed method outperforms the conventional log likelihood ratio (LLR) test method.

  • PDF

Aerodynamic Characteristics of Voice Disorders (Polyp, Cyst) before and after Laryngeal Micro Surgery: Focus on Running Speech (성대폴립, 성대낭종 환자들의 Laryngeal Micro Surgery 수술 전, 후 공기역학적 비교: Running Speech 중심으로)

  • Moon, Tae-Hoon;Shim, Mi-Ran;Hwang, Yeon-Shin;Kim, Geun-Jeon;Lee, Dong-Hyeon;Sun, and Dong-Il
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.30 no.2
    • /
    • pp.95-100
    • /
    • 2019
  • Background and Objectives For patients with polyps and cysts, glottal gaps resulting from their lesions have negative respiratory effects when they vocalize. Phonatory Aerodynamic System is clinically used, but is often limited in the measurement of vowels. So the researchers attempted to verify the usefulness of Phonatory Aerodynamic System by comparing differences in respiratory characteristics and patterns which can be measured by the level of connected speech. Materials and Method Among the subjects who were diagnosed through a stroboscopy, there were 33 patients with polyps and 23 patients with cysts. Then, 36 subjects who were found to have no specific findings through a stroboscopy and perceptual test were selected to the normal group. We compared respiratory characteristics and patterns. And compared vocal polyps and cysts before and after laryngeal micro surgery (LMS). Results First, difference in respiratory patterns between the normal group and the patients with polyps and cysts were examined to show that breath groups, breath group syllables, and expiratory·inspiratory volume were significantly higher in the polyp/cyst group than those in the normal group, indicating that precision was lowered during the conversation, due to reduction in speech intelligibility and interruption of communication. Second, there were significant differences in maximum phonation time, mean flow rate, and subglottal pressure among respiratory characteristics, breath groups, breath group syllables, and inspiratory volume before and after LMS, which appeared to be similar to the normal group. Conclusion The understanding of respiratory characteristics and patterns produced by patients in connected speech which is most similar to natural speech was found to be the objective and useful method for examining characteristics of the subjects.

Text-Independent Speaker Identification System Based On Vowel And Incremental Learning Neural Networks

  • Heo, Kwang-Seung;Lee, Dong-Wook;Sim, Kwee-Bo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1042-1045
    • /
    • 2003
  • In this paper, we propose the speaker identification system that uses vowel that has speaker's characteristic. System is divided to speech feature extraction part and speaker identification part. Speech feature extraction part extracts speaker's feature. Voiced speech has the characteristic that divides speakers. For vowel extraction, formants are used in voiced speech through frequency analysis. Vowel-a that different formants is extracted in text. Pitch, formant, intensity, log area ratio, LP coefficients, cepstral coefficients are used by method to draw characteristic. The cpestral coefficients that show the best performance in speaker identification among several methods are used. Speaker identification part distinguishes speaker using Neural Network. 12 order cepstral coefficients are used learning input data. Neural Network's structure is MLP and learning algorithm is BP (Backpropagation). Hidden nodes and output nodes are incremented. The nodes in the incremental learning neural network are interconnected via weighted links and each node in a layer is generally connected to each node in the succeeding layer leaving the output node to provide output for the network. Though the vowel extract and incremental learning, the proposed system uses low learning data and reduces learning time and improves identification rate.

  • PDF

A DSP Implementation of Subband Sound Localization System

  • Park, Kyusik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4E
    • /
    • pp.52-60
    • /
    • 2001
  • This paper describes real time implementation of subband sound localization system on a floating-point DSP TI TMS320C31. The system determines two dimensional location of an active speaker in a closed room environment with real noise presents. The system consists of an two microphone array connected to TI DSP hosted by PC. The implemented sound localization algorithm is Subband CPSP which is an improved version of traditional CPSP (Cross-Power Spectrum Phase) method. The algorithm first split the input speech signal into arbitrary number of subband using subband filter banks and calculate the CPSP in each subband. It then averages out the CPSP results on each subband and compute a source location estimate. The proposed algorithm has an advantage over CPSP such that it minimize the overall estimation error in source location by limiting the specific band dominant noise to that subband. As a result, it makes possible to set up a robust real time sound localization system. For real time simulation, the input speech is captured using two microphone and digitized by the DSP at sampling rate 8192 hz, 16 bit/sample. The source location is then estimated at once per second to satisfy real-time computational constraints. The performance of the proposed system is confirmed by several real time simulation of the speech at a distance of 1m, 2m, 3m with various speech source locations and it shows over 5% accuracy improvement for the source location estimation.

  • PDF

Development and Evaluation of an Address Input System Employing Speech Recognition (음성인식 기능을 가진 주소입력 시스템의 개발과 평가)

  • 김득수;황철준;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.3-10
    • /
    • 1999
  • This paper describes the development and evaluation of a Korean address input system employing automatic speech recognition technique as user interface for input Korean address. Address consists of cities, provinces and counties. The system works on a window 95 environment of personal computer with built-in soundcard. In the speech recognition part, the Continuous density Hidden Markov Model(CHMM) for making phoneme like units(PLUs) and One Pass Dynamic Programming(OPDP) algorithm is used for recognition. For address recognition, Finite State Automata(FSA) suitable for Korean address structure is constructed. To achieve an acceptable performance against the variation of speakers, microphones, and environmental noises, Maximum a posteriori(MAP) estimation is implemented in adaptation. And to improve the recognition speed, fast search method using variable pruning threshold is newly proposed. In the evaluation tests conducted for the 100 connected words uttered by 3 males the system showed above average 96.0% of recognition accuracy for connected words after adaption and recognition speed within 2 seconds, showing the effectiveness of the system.

  • PDF

Coda Sounds Acquisition at Word Medial Position in Three and Four Year Old Children's Spontaneous Speech (자발화에 나타난 3-4세 아동의 어중종성 습득)

  • Woo, Hyekyeong;Kim, Soojin
    • Phonetics and Speech Sciences
    • /
    • v.5 no.3
    • /
    • pp.73-81
    • /
    • 2013
  • Coda in the word-medial position plays an important role in acquisition of our speech. Accuracy of the coda in the word-medial position is important as a diagnostic indicator since it has a close relationship with degrees of disorder. Coda in the word-medial position only appears in condition of connecting two vowels and the sequence causes diverse phonological processes to happen. The coda in the word-medial position differs in production difficulty by the initial sound in the sequence. Accordingly, this study aims to examine the tendency of producing a coda in the word-medial position with consideration of an optional phonological process in spontaneous speech of three and four year old children. Data was collected from 24 children (four groups by age) without speech and language delay. The results of the study are as follows: 1) Sonorant coda in the word-medial position showed a high production frequency in manner of articulation, and alveolar in place of articulation. When the coda in the word-medial position is connected to an initial sound in the same place of articulation, it revealed a high frequency of production. 2) The coda in word-medial position followed by an initial alveolar stop revealed a high error rate. Error patterns showed regressive assimilation predominantly. 3) The order of difficulty that Children had producing codas in the word-medial position was $/k^{\neg}/$, $/p^{\neg}/$, /m/, /n/, /ŋ/ and /l/. Those results suggest that in targeting coda in the word-medial position for evaluation, we should consider optional phonological process as well as the following initial sound. Further studies would be necessary which codas in the word-medial position will be used for therapeutic purpose.

A Speech Translation System for Hotel Reservation (호텔예약을 위한 음성번역시스템)

  • 구명완;김재인;박상규;김우성;장두성;홍영국;장경애;김응인;강용범
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.24-31
    • /
    • 1996
  • In this paper, we present a speech translation system for hotel reservation, KT_STS(Korea Telecom Speech Translation System). KT-STS is a speech-to-speech translation system which translates a spoken utterance in Korean into one in Japanese. The system has been designed around the task of hotel reservation(dialogues between a Korean customer and a hotel reservation de나 in Japan). It consists of a Korean speech recognition system, a Korean-to-Japanese machine translation system and a korean speech synthesis system. The Korean speech recognition system is an HMM(Hidden Markov model)-based speaker-independent, continuous speech recognizer which can recognize about 300 word vocabularies. Bigram language model is used as a forward language model and dependency grammar is used for a backward language model. For machine translation, we use dependency grammar and direct transfer method. And Korean speech synthesizer uses the demiphones as a synthesis unit and the method of periodic waveform analysis and reallocation. KT-STS runs in nearly real time on the SPARC20 workstation with one TMS320C30 DSP board. We have achieved the word recognition rate of 94. 68% and the sentence recognition rate of 82.42% after the speech recognition tests. On Korean-to-Japanese translation tests, we achieved translation success rate of 100%. We had an international joint experiment in which our system was connected with another system developed by KDD in Japan using the leased line.

  • PDF

Speech Activity Decision with Lip Movement Image Signals (입술움직임 영상신호를 고려한 음성존재 검출)

  • Park, Jun;Lee, Young-Jik;Kim, Eung-Kyeu;Lee, Soo-Jong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.1
    • /
    • pp.25-31
    • /
    • 2007
  • This paper describes an attempt to prevent the external acoustic noise from being misrecognized as the speech recognition target. For this, in the speech activity detection process for the speech recognition, it confirmed besides the acoustic energy to the lip movement image signal of a speaker. First of all, the successive images are obtained through the image camera for PC. The lip movement whether or not is discriminated. And the lip movement image signal data is stored in the shared memory and shares with the recognition process. In the meantime, in the speech activity detection Process which is the preprocess phase of the speech recognition. by conforming data stored in the shared memory the acoustic energy whether or not by the speech of a speaker is verified. The speech recognition processor and the image processor were connected and was experimented successfully. Then, it confirmed to be normal progression to the output of the speech recognition result if faced the image camera and spoke. On the other hand. it confirmed not to output of the speech recognition result if did not face the image camera and spoke. That is, if the lip movement image is not identified although the acoustic energy is inputted. it regards as the acoustic noise.

A study on skip-connection with time-frequency self-attention for improving speech enhancement based on complex-valued spectrum (복소 스펙트럼 기반 음성 향상의 성능 향상을 위한 time-frequency self-attention 기반 skip-connection 기법 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.94-101
    • /
    • 2023
  • A deep neural network composed of encoders and decoders, such as U-Net, used for speech enhancement, concatenates the encoder to the decoder through skip-connection. Skip-connection helps reconstruct the enhanced spectrum and complement the lost information. The features of the encoder and the decoder connected by the skip-connection are incompatible with each other. In this paper, for complex-valued spectrum based speech enhancement, Self-Attention (SA) method is applied to skip-connection to transform the feature of encoder to be compatible with the features of decoder. SA is a technique in which when generating an output sequence in a sequence-to-sequence tasks the weighted average of input is used to put attention on subsets of input, showing that noise can be effectively eliminated by being applied in speech enhancement. The three models using encoder and decoder features to apply SA to skip-connection are studied. As experimental results using TIMIT database, the proposed methods show improvements in all evaluation metrics compared to the Deep Complex U-Net (DCUNET) with skip-connection only.

On a Split Model for Analysis Techniques of Wideband Speech Signal (광대역 음성신호의 분할모델 분석기법에 관한 연구)

  • Park, Young-Ho;Ham, Myung-Kyu;You, Kwang-Bock;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.80-84
    • /
    • 1999
  • In this paper, the split model analysis algorithm, which can generate the wideband speech signal from the spectral information of narrowband signal, is developed. The split model analysis algorithm deals with the separation of the 10/sup th/ order LPC model into five cascade-connected 2/sup nd/ order model. The use of the less complex 2/sup nd/ order models allows for the exclusion of the complicated nonlinear relationships between model parameters and all the poles of the LPC model. The relationships between the model parameters and its corresponding analog poles is proved and applied to each 2/sup nd/ order model. The wideband speech signal is obtained by changing only the sampling rate.

  • PDF