• Title/Summary/Keyword: speech rates

Search Result 271, Processing Time 0.025 seconds

Korean continuous digit speech recognition by multilayer perceptron using KL transformation (KL 변환을 이용한 multilayer perceptron에 의한 한국어 연속 숫자음 인식)

  • 박정선;권장우;권정상;이응혁;홍승홍
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.8
    • /
    • pp.105-113
    • /
    • 1996
  • In this paper, a new korean digita speech recognition technique was proposed using muktolayer perceptron (MLP). In spite of its weakness in dynamic signal recognition, MLP was adapted for this model, cecause korean syllable could give static features. It is so simle in its structure and fast in its computing that MLP was used to the suggested system. MLP's input vectors was transformed using karhunen-loeve transformation (KLT), which compress signal successfully without losin gits separateness, but its physical properties is changed. Because the suggested technique could extract static features while it is not affected from the changes of syllable lengths, it is effectively useful for korean numeric recognition system. Without decreasing classification rates, we can save the time and memory size for computation using KLT. The proposed feature extraction technique extracts same size of features form the tow same parts, front and end of a syllable. This technique makes frames, where features are extracted, using unique size of windows. It could be applied for continuous speech recognition that was not easy for the normal neural network recognition system.

  • PDF

Dynamic Excitation Modeling Scheme Applied for Variable Low Bit-Rate Homomorphic Vocoder (가변 저 전송율 호모몰픽 보코더에 응용된 동적 음원 모델링 기법)

  • 정재호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.12
    • /
    • pp.2479-2488
    • /
    • 1994
  • In this paper, a new dynamic excitation modeling scheme is proposed. Based upon the proposed excitation modeling scheme, two variable bit rate homomorphic vocoders are designed, whose average bit rates are 3.8 Kbps and 4.4 Kbps. The performance of the proposed excitation modeling scheme is then evaluated through the subjective listening tests. In the tests, the performances of two speech coders designed in this paper ate compared with the one of 4.8 Kbps homomorphic vocoder designed by Chung and Schafer, in which conventional static excitation modeling scheme applied. The subjective listening tests show that proposed dynamic excitation modeling scheme improves synthesized speech quality while lowering the average bit rate of speech coders.

  • PDF

Complexity-Reduction Algorithm of Speech Coder (QCELP) for CDMA Digital Cellular System (CDMA 디지틀 셀룰라용 음성 부호화기 (QCELP) 의 복잡도 감소 알고리즘)

  • 이인성
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.3
    • /
    • pp.126-132
    • /
    • 1996
  • In this paper, the complexity reduction method for QCELP speech coder (IS-96) without any perfomrance degradation is proposed for the vecoder of CDMA digital cellular system. The energy terms in pitch parameter search and codebook search routines that require large computations are calculated recursively by utilizing the overlapped structure of code vectors in adaptive codebook and excitation codebook. The additional complexity reduction in the codebook search routine can be achieved by using a simple form in calculation of the energy term when the initial codebook value is zero. In the case of lower transmission rates such as 4,2,1 kbps, the complexity reduction by recursive calulations of energy term is increased.

  • PDF

Enhanced Spectral Envelope Coding Scheme Using Inter-frame Correlation for G.729.1 (G.729.1 코더에서 프레임 간의 상호상관 관계를 이용한 개선된 스펙트럼 포락 코딩 방법)

  • Cho, Keun-Seok;Sung, Jong-Mo;Hahn, Min-Soo;Kim, Young-Il;Jeong, Sang-Bae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.97-103
    • /
    • 2009
  • This paper describes a new algorithm for encoding spectral envelope in the time domain alias cancellation (TDAC) part of G.729.1. The spectral envelope and modified discrete cosine transform (MDCT) coefficients of the weighted code-excited linear predictive (CELP) coding error in lower-band and the higher-band input signal are encoded in the TDAC part. In order to reduce allocation bits for spectral envelope coding, a new algorithm using sub-band correlation between adjacent frames is proposed. In addition, to improve the quality of decoded signals, two bit allocation strategies using reduced bits from the proposed algorithm are proposed. The performance of the proposed algorithm is evaluated in terms of objective quality and bit reduction rates. Experimental results show that the proposed algorithm increases the quality of sounds significantly.

  • PDF

Corpus-based evaluation of French text normalization (코퍼스 기반 프랑스어 텍스트 정규화 평가)

  • Kim, Sunhee
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.31-39
    • /
    • 2018
  • This paper aims to present a taxonomy of non-standard words (NSW) for developing a French text normalization system and to propose a method for evaluating this system based on a corpus. The proposed taxonomy of French NSWs consists of 13 categories, including 2 types of letter-based categories and 9 types of number-based categories. In order to evaluate the text normalization system, a representative test set including NSWs from various text domains, such as news, literature, non-fiction, social-networking services (SNSs), and transcriptions, is constructed, and an evaluation equation is proposed reflecting the distribution of the NSW categories of the target domain to which the system is applied. The error rate of the test set is 1.64%, while the error rate of the whole corpus is 2.08%, reflecting the NSW distribution in the corpus. The results show that the literature and SNS domains are assessed as having higher error rates compared to the test set.

Perception Ability of Synthetic Vowels in Cochlear Implanted Children (모음의 포먼트 변형에 따른 인공와우 이식 아동의 청각적 인지변화)

  • Huh, Myung-Jin
    • MALSORI
    • /
    • no.64
    • /
    • pp.1-14
    • /
    • 2007
  • The purpose of this study was to examine the acoustic perception different by formants change for profoundly hearing impaired children with cochlear implants. The subjects were 10 children after 15 months of experience with the implant and mean of their chronological age was 8.4 years and Standard deviation was 2.9 years. The ability of auditory perception was assessed using acoustic-synthetic vowels. The acoustic-synthetic vowel was combined with F1, F2, and F3 into a vowel and produced 42 synthetic sound, using Speech GUI(Graphic User Interface) program. The data was deal with clustering analysis and on-line analytical processing for perception ability of acoustic synthetic vowel. The results showed that auditory perception scores of acoustic-synthetic vowels for cochlear implanted children were increased in F2 synthetic vowels compaire to those of F1. And it was found that they perceived the differences of vowels in terms of distance rates between F1 and F2 in specific vowel.

  • PDF

An Implementation of Real-Time Speaker Verification System on Telephone Voices Using DSP Board (DSP보드를 이용한 전화음성용 실시간 화자인증 시스템의 구현에 관한 연구)

  • Lee Hyeon Seung;Choi Hong Sub
    • MALSORI
    • /
    • no.49
    • /
    • pp.145-158
    • /
    • 2004
  • This paper is aiming at implementation of real-time speaker verification system using DSP board. Dialog/4, which is based on microprocessor and DSP processor, is selected to easily control telephone signals and to process audio/voice signals. Speaker verification system performs signal processing and feature extraction after receiving voice and its ID. Then through computing the likelihood ratio of claimed speaker model to the background model, it makes real-time decision on acceptance or rejection. For the verification experiments, total 15 speaker models and 6 background models are adopted. The experimental results show that verification accuracy rates are 99.5% for using telephone speech-based speaker models.

  • PDF

Binary clustering network for recognition of keywords in continuous speech (연속음성중 키워드(Keyword) 인식을 위한 Binary Clustering Network)

  • 최관선;한민홍
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1993.10a
    • /
    • pp.870-876
    • /
    • 1993
  • This paper presents a binary clustering network (BCN) and a heuristic algorithm to detect pitch for recognition of keywords in continuous speech. In order to classify nonlinear patterns, BCN separates patterns into binary clusters hierarchically and links same patterns at root level by using the supervised learning and the unsupervised learning. BCN has many desirable properties such as flexibility of dynamic structure, high classification accuracy, short learning time, and short recall time. Pitch Detection algorithm is a heuristic model that can solve the difficulties such as scaling invariance, time warping, time-shift invariance, and redundance. This recognition algorithm has shown recognition rates as high as 95% for speaker-dependent as well as multispeaker-dependent tests.

  • PDF

Effects of Multisensory Teatment on Phonological processing of Reading Pronunciation for the Middle School Students with Reading Disorders (음운변동 적용 낱말 읽기치료 효과 검증)

  • Kim, Soo-Jin;Lee, Ji-Young
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.270-273
    • /
    • 2007
  • The purpose of this study was to evaluate the effects of multisensory(AVK: Auditory, Visual and Kinethetic) treatment on reading pronunciation with phonological prcessing - tensification, palatalization, and lateralization for the middle school students with delayed language development caused by mental retarded. Participants were three children with reading pronunciation difficulties in phonological processing. The following conclusions were arrived. First, three children are improved on tensifiication, palatalization, and lateralization by multisensory treatment program. Second, multisensory treatment was effective in facilitating generalization. Three children presented prominent generalization effcects in lateralization. Third, they were found to maintain partially their performance rates of the later phase of the reading with phonological processing intervention three weeks after the termination of the intervention.

  • PDF

Time-Domain Quantization and Interpolation of Pitch Cycle Waveform

  • Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.1E
    • /
    • pp.11-16
    • /
    • 2008
  • In this paper, a pitch cycle waveform (PCW) is extracted, quantized, and interpolated in a time domain to synthesize high-quality speech at low bit rates. The pre-alignment technique is proposed for the accurate and efficient PCW extraction, which predicts the current PCW position from the previous PCW position assuming that pitch periods evolve slowly. Since the pitch periods are different frame by frame, the original PCW is converted into the fixed-dimension PCW using the dimension-conversion method, and subsequently quantized by code-excited linear predictive (CELP) coding. The excitation signal for the linear predictive coding (LPC) synthesis filter is generated using the time-domain interpolation and interlink of the quantized PCW's. The coder operates at 4.2 kbit/s and 3.2 kbit/s depending on the pitch period. Informal listening test demonstrates the effectiveness of the proposed coding scheme.