• Title/Summary/Keyword: speech speed

Search Result 238, Processing Time 0.02 seconds

Real-time Implementation of Multi-channel AMR Speech Coder (멀티채널 AMR 음성부호화기의 실시간 구현)

  • 지덕구;박만호;김형중;윤병식;최송인
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.19-23
    • /
    • 2001
  • DSP-based implementation is pervasive in wireless communication parts for systems and handsets according to developing high-speed and low-power programmable Digital Signal Processor (DSP). In this paper, we present a real-time implementation of multi-channel Adaptive Multi-rate (AMR) speech coder. The real-time implementation of an AMR algorithm is achieved using 32-bit fixed-point TMS320C6202 DSP chip that operates at 250 MHz. We performed cross compile, linear assembly optimization and TMS320C62xx assembly optimization for real-time implementation. Furthermore, speech data input/output function and communication function with external CPU is included in an AMR speech coder. The AMR Speech coder developed using DSP EVM board was evaluated in ETRI IMT-2000 Test-bed system.

  • PDF

SOME PROSODIC FEATURES OBSERVED IN THE PASSAGE READING BY JAPANESE LEARNERS OF ENGLISH

  • Kanzaki, Kazuo
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.37-42
    • /
    • 1996
  • This study aims to see some prosodic features of English spoken by Japanese learners of English. It focuses on speech rates, pauses, and intonation when the learners read an English passage. Three Japanese learners of English, who are all male university students, were asked to read the speech material, an English passage of 110 word length, at their normal reading speed. Then a native speaker of English, a male American English teacher. was asked to read the same passage. The Japanese speakers were also asked to read a Japanese passage of 286 letters (Japanese Kana) to compare the reading of English with that of japanese. Their speech was analyzed on a computerized system (KAY Computerized Speech Lab). Wave forms, spectrograms, and F0 contours were shown on the screen to measure the duration of pauses, phrases and sentences and to observe intonation contours. One finding of the experiment was that the movement of the low speakers' speech rates showed a similar tendency in their reading of the English passage. Reading of the Japanese passage by the three learners also had a similar tendency in the movement of speech rates. Another finding was that the frequency of pauses in the learners speech was greater than that in the speech of the native speaker, but that the ration of the total pause length to the whole utterance length was about tile same in both the learners' and the native speaker's speech. A similar tendency was observed about the learners' reading of the Japanese passage except that they used shorter pauses in the mid-sentence position. As to intonation contours, we found that the learners used a narrower pitch range than the native speaker in their reading of the English passage while they used a wider pitch range as they read the Japanese passage. It was found that the learners tended to use falling intonation before pauses whereas the native speaker used different intonation patterns. These findings are applicable to the teaching of English pronunciation at the passage level in the sense that they can show the learners. Japanese here, what their problems are and how they could be solved.

  • PDF

Fast computation of Observation Probability for Speaker-Independent Real-Time Speech Recognition (실시간 화자독립 음성인식을 위한 고속 확률계산)

  • Park Dong-Chul;Ahn Ju-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.9C
    • /
    • pp.907-912
    • /
    • 2005
  • An efficient method for calculation of observation probability in CDHMM(Continous Density Hidden Markov Model) is proposed in this paper. the proposed algorithm, called FCOP(Fast Computation of Observation Probability), approximate obsewation probabilities in CDHMM by eliminating insignificant PDFs(Probability Density Functions) and reduces the computational load. When applied to a speech recognition system, the proposed FCOP algorithm can reduce the instruction cycles by $20\%-30\%$ and can also increase the recognition speed about $30\%$ while minimizing the loss in its recognition rate. When implemented on a practical cellular phone, the FCOP algorithm can increase its recognition speed about $30\%$ while suffering $0.2\%$ loss in recognition rate.

The Characteristics of Diadochokinesis in 1st and 2nd Grades of Elementary School Students (아동의 조음교대운동 특성: 광주광역시 초등학교 1, 2학년을 대상으로)

  • Choi, A Rim;Yoo, Jae Yeon
    • 재활복지
    • /
    • v.22 no.2
    • /
    • pp.231-246
    • /
    • 2018
  • Diadochokinesis (DDK) aims to identify the evaluating the oral mitor ability and the moter coordination ability. There are few DDK normative data on elementary school students in Korea, The purpose of this study was to investigate the characteristics of the speed and regularity of DDK in first- and second-grade students in elementary school. The subjects were a total of 194 students in first- (45 males, 50 females) and second-grade (47 males, 52 females) in elementary schools in Gwangju Metropolitan City. As evaluation tasks, AMR task 'p?', 't?', and 'k?' and SMR task 'p?t?k?' were performed. The speed and regularity of DDK was measured using Motor speech profile (Model 5141, KayPENTAX) and Praat (v6.0.3.6). The results of this study, First, there was a statistically significant difference by grade in AMR speed for 'p?', 't?', and 'k?' and the AMR speed was faster in second grade group. And, there was no statistically significant. Second, AMR regularity showed a statistically significant difference in 'p?', 't?', and 'k?' according to sex and was found to be more regular in female student group. There was no significant difference in regularity by grade. Third, the SMR speed showed statistically significant difference in 'p?t?k?' by grade and was faster in second grade group. And there was no statistically significant difference by sex. The results of this study showed that the DDK performance ability in first- and second-grade students in elementary school was slightly different according to grade and sex. In future research, it is necessary to investigate the correlation between the articulation accuracy and linguistic intelligibility, and to find out the usefulness of DDK in articulation evaluation.

Real-time Implementation of Variable Transmission Bit Rate Vocoder Improved Speech Quality in SOLA-B Algorithm & G.729A Vocoder Using on the TMS320C5416 (TMS320C5416을 이용한 SOLA-B 알고리즘과 G.729A 보코더의 음질 향상된 가변 전송률 보코더의 실시간 구현)

  • Ham, Myung-Kyu;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.241-250
    • /
    • 2003
  • In this paper, we implemented the vocoder of variable rate by applying the SOLA-B algorithm to the G.729A to the TMS320C5416 in real-time. This method using the SOLA-B algorithm is that it is reduced the duration of the speech in encoding and is played at the speed of normal by extending the duration of the speech in decoding. But the method applied to the existed G.729A and SOLA-B algorithm is caused the loss of speech quality in G.729A which is not reflected about length variation of speech. Therefore the proposed method is encoded according as it is modified the structure of LSP quantization table about the length of speech is reduced by using the SOLA-B algorithm. The vocoder of variable rate by applying the G.729A and SOLA-B algorithm is represented the maximum complexity of 10.2MIPS about encoder and 2.8MIPS about decoder in 8kbps transmission rate. Also it is evaluated 17.3MIPS about encoder, 9.9MIPS about decoder in 6kbps and 18.5MIPS about encoder, 11.1MIPS about decoder in 4kbps according to the transmission rate. The used memory is about program ROM 9.7kwords, table ROM 4.69kwords, RAM 5.2kwords. The waveform of output is showed by the result of C simulator and Bit Exact. Also, the result of MOS test for evaluation of speech quality of the vocoder of variable rate which is implemented in real-time, it is estimated about 3.68 in 4kbps.

  • PDF

Improving transformer-based speech recognition performance using data augmentation by local frame rate changes (로컬 프레임 속도 변경에 의한 데이터 증강을 이용한 트랜스포머 기반 음성 인식 성능 향상)

  • Lim, Seong Su;Kang, Byung Ok;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.122-129
    • /
    • 2022
  • In this paper, we propose a method to improve the performance of Transformer-based speech recognizers using data augmentation that locally adjusts the frame rate. First, the start time and length of the part to be augmented in the original voice data are randomly selected. Then, the frame rate of the selected part is changed to a new frame rate by using linear interpolation. Experimental results using the Wall Street Journal and LibriSpeech speech databases showed that the convergence time took longer than the baseline, but the recognition accuracy was improved in most cases. In order to further improve the performance, various parameters such as the length and the speed of the selected parts were optimized. The proposed method was shown to achieve relative performance improvement of 11.8 % and 14.9 % compared with the baseline in the Wall Street Journal and LibriSpeech speech databases, respectively.

Adaptive Noise Canceler Using Fast Wavelet Transform Adaptive Algorithm (고속 웨이브렛 변환 적응알고리즘을 이용한 적응잡음제거기에 관한 연구)

  • 이채욱;박세기;오신범;강명수
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.179-182
    • /
    • 2002
  • In this paper, we propose a wavelet based adaptive algorithm which improves the convergence speed and reduces computational complexity using the fast running FIR filtering efficiently We compared the performance of the proposed algorithm with time and frequence domain adaptive algorithm using computer simulation of adaptive noise canceler based on synthesis speech. As the result, the proposed algorithm is suitable for adaptive signal processing area using speech or acoustic field.

  • PDF

Phonetic Tied-Mixture Syllable Model for Efficient Decoding in Korean ASR (효율적 한국어 음성 인식을 위한 PTM 음절 모델)

  • Kim Bong-Wan;Lee Yong-Jn
    • MALSORI
    • /
    • no.50
    • /
    • pp.139-150
    • /
    • 2004
  • A Phonetic Tied-Mixture (PTM) model has been proposed as a way of efficient decoding in large vocabulary continuous speech recognition systems (LVCSR). It has been reported that PTM model shows better performance in decoding than triphones by sharing a set of mixture components among states of the same topological location[5]. In this paper we propose a Phonetic Tied-Mixture Syllable (PTMS) model which extends PTM technique up to syllables. The proposed PTMS model shows 13% enhancement in decoding speed than PTM. In spite of difference in context dependent modeling (PTM : cross-word context dependent modeling, PTMS : word-internal left-phone dependent modeling), the proposed model shows just less than 1% degradation in word accuracy than PTM with the same beam width. With a different beam width, it shows better word accuracy than in PTM at the same or higher speed.

  • PDF

Acoustic Echo Canceller using Adaptive IIR Filters with Prewhitening Method and Variable Step-Size LMS Algorithm

  • Cho, Ju Pil;Hwng, Tae Jin;Baik, Heung Ki
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2E
    • /
    • pp.14-20
    • /
    • 1997
  • The future teleconferencing systems will need an appropriate system which controls properly the acoustic echo for the convenient communication. The conventional acoustic echo cancellation algorithms involve large adaptive filters identifying the impulse response of the echo path. The use of adaptive IIR filters appears to be a reasonable way to reduce computational complexity. Effective cancellation of acoustic echo presented in teleconferencing system requires that adaptive filters have a rapid convergence speed. One of the main problems of acoustic echo cancellation techniques is that the convergence properties degrade for an highly correlated signal input such as speech signals. By the way, the introduction of linear prediction filers onto the structure of the acoustic echo cancellation represents one approach to decorrelate the speech signal. And variable step-size LMS algorithm improves the convergence speed through a little increasing of computational complexity. In this paper, we applied these two methods to the acoustic echo canceller(AEC) and showed that these methods have better performances than the conventional AEC.

  • PDF

Human-like Fuzzy Lip Synchronization of 3D Facial Model Based on Speech Speed (발화속도를 고려한 3차원 얼굴 모형의 퍼지 모델 기반 립싱크 구현)

  • Park Jong-Ryul;Choi Cheol-Wan;Park Min-Yong
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.05a
    • /
    • pp.416-419
    • /
    • 2006
  • 본 논문에서는 음성 속도를 고려한 새로운 립싱크 방법에 대해서 제안한다. 실험을 통해 구축한 데이터베이스로부터 음성속도와 입모양 및 크기와의 관계를 퍼지 알고리즘을 이용하여 정립하였다. 기존 립싱크 방법은 음성 속도를 고려하지 않기 때문에 말의 속도와 상관없이 일정한 입술의 모양과 크기를 보여준다. 본 논문에서 제안한 방법은 음성 속도와 입술 모양의 관계를 적용하여 보다 인간에 근접한 립싱크의 구현이 가능하다. 또한 퍼지 이론을 사용함으로써 수치적으로 정확하게 표현할 수 없는 애매한 입 크기와 모양의 변화를 모델링 할 수 있다. 이를 증명하기 위해 제안된 립싱크 알고리즘과 기존의 방법을 비교하고 3차원 그래픽 플랫폼을 제작하여 실제 응용 프로그램에 적용한다.

  • PDF