• Title/Summary/Keyword: speech rates

Search Result 271, Processing Time 0.023 seconds

Voiced-Unvoiced-Silence Detection Algorithm using Perceptron Neural Network (퍼셉트론 신경회로망을 사용한 유성음, 무성음, 묵음 구간의 검출 알고리즘)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.2
    • /
    • pp.237-242
    • /
    • 2011
  • This paper proposes a detection algorithm for each section which detects the voiced section, unvoiced section, and the silence section at each frame using a multi-layer perceptron neural network. First, a power spectrum and FFT (fast Fourier transform) coefficients obtained by FFT are used as the input to the neural network for each frame, then the neural network is trained using these power spectrum and FFT coefficients. In this experiment, the performance of the proposed algorithm for detection of the voiced section, unvoiced section, and silence section was evaluated based on the detection rates using various speeches, which are degraded by white noise and used as the input data of the neural network. In this experiment, the detection rates were 92% or more for such speech and white noise when training data and evaluation data were the different.

Analysis of the Percentage Articulation and Voice Packet Loss over the Internet (인터넷상의 음성 패킷손실과 명료도 분석)

  • 고대식;박준석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.8
    • /
    • pp.2090-2095
    • /
    • 1998
  • In this paper, we measured voice packet loss over the Korean Internet and analyzed percentage articulation by variation of the packet loss. To do this, we reviewed real-time transmission service based on RTP/UDP/IP and test method of the transmission quality. and implemented the real-time speech transmission system using GSM and UDP/IP. Monosyllable list has been chosen for the percentage articulation test, each voice packet has been coded and compressed by GSM and it has sequence number to measured packet loss and to recover out-of-order packets. In transmission results using seven router over the Korean Internet, we have show that loss rates reached 1.6% (unload), 22.5%(load) and loss rates after packet recovery by resequencing and FEC are from 9% to 35%. Finally, we have shown that percentage articulations by variation of the network traffic are Table 4.

  • PDF

Real-Time Implementation of the G.729.1 Using ARM926EJ-S Processor Core (ARM926EJ-S 프로세서 코어를 이용한 G.729.1의 실시간 구현)

  • So, Woon-Seob;Kim, Dae-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.8C
    • /
    • pp.575-582
    • /
    • 2008
  • In this paper we described the process and the results of real-time implementation of G.729.1 wideband speech codec which is standardized in SG15 of ITU-T. To apply the codec on ARM926EJ-S(R) processor core. we transformed some parts of the codec C program including basic operations and arithmetic functions into assembly language to operate the codec in real-time. G.729.1 is the standard wideband speech codec of ITU-T having variable bit rates of $8{\sim}32kbps$ and inputs quantized 16 bits PCM signal per sample at the rate of 8kHz or 16kHz sampling. This codec is interoperable with the G.729 and G.729A and the bandwidth extended wideband($50{\sim}7,000Hz$) version of existing narrowband($300{\sim}3,400Hz$) codec to enhance voice quality. The implemented G.729.1 wideband speech codec has the complexity of 31.2 MCPS for encoder and 22.8 MCPS for decoder and the execution time of the codec takes 11.5ms total on the target with 6.75ms and 4.76ms respectively. Also this codec was tested bit by bit exactly against all set of test vectors provided by ITU-T and passed all the test vectors. Besides the codec operated well on the Internet phone in real-time.

Real-time Implementation of the AMR Speech Coder Using $OakDSPCore^{\circledR}$ ($OakDSPCore^{\circledR}$를 이용한 적응형 다중 비트 (AMR) 음성 부호화기의 실시간 구현)

  • 이남일;손창용;이동원;강상원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.6
    • /
    • pp.34-39
    • /
    • 2001
  • An adaptive multi-rate (AMR) speech coder was adopted as a standard of W-CDMA by 3GPP and ETSI. The AMR coder is based on the CELP algorithm operating at rates ranging from 12.2 kbps down to 4.75 kbps, and it is a source controlled codec according to the channel error conditions and the traffic loading. In this paper, we implement the DSP S/W of the AMR coder using OakDSPCore. The implementation is based on the CSD17C00A chip developed by C&S Technology, and it is tested using test vectors, for the AMR speech codec, provided by ETSI for the bit exact implementation. The DSP B/W requires 20.6 MIPS for the encoder and 2.7 MIPS for the decoder. Memories required by the Am coder were 21.97 kwords, 6.64 kwords and 15.1 kwords for code, data sections and data ROM, respectively. Also, actual sound input/output test using microphone and speaker demonstrates its proper real-time operation without distortions or delays.

  • PDF

On a Performance Comparison of Pitch Search Algorithms by using a Correlation Properties for the CELP Vocoder (CELP 보코더의 피치 검색시간 단축법의 비교)

  • 배명진
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1993.06a
    • /
    • pp.280-287
    • /
    • 1993
  • Code Excited Linear Prediction(CELP) speech coders exhibit good performance at data rates as low as 4800bps. The major drawback to CELP type paper, a comparative performance study of three pitch searching algorithms for the CELP vocoder was conducted. For each of the algorithms, a standard pitch searching algorithm was used by the sequential pitch searching algorithm that was implimented in the QCELP vocoder. The algorithms used in this study were 1) using the skip table(TABLE), 2) using the symmetrical property of the autocorrelation(SYMMT), and 3) using the preprocessing autocorrelation(PREPC). Performance scores are presented for each of the three pitch searching algorithms based on computation speed and on pitch prediction error.

  • PDF

On a Performance Comparison of Pitch Search Algorithms with the Correlation Properties for the CELP Vocoder (상관관계 특성을 이용한 CELP 보코더의 피치검색시간 단축법의 비교)

  • 김대식
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.188-194
    • /
    • 1994
  • Code excited linear prediction speech coders exhibit good performance at data rates as low as 4800bps. But the major drawback to CELP type coders is their large computational requirements. Therefore, in this paper a comparative performance study of three pitch searching algorithms for the CELP vocoder was conducted. For each of the algorithms, a standard pitch searching algorithm was used by the full pitch searching algorithm that was implimented in the QCELP vocoder. The algorithms used in this study is to reduce the pitch searching time 1) using the skip table, 2) using the symmetrical property of the autocorrelation , and 3) using the preprocessing autocorrelation, 4) using the positive autocorrelation, 5) using the preliminary pitch. Performance scores are presented for each of the five pitch searching algorithms based on computation speed and on pitch prediction error.

  • PDF

Discrimination of Emotional States In Voice and Facial Expression

  • Kim, Sung-Ill;Yasunari Yoshitomi;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2E
    • /
    • pp.98-104
    • /
    • 2002
  • The present study describes a combination method to recognize the human affective states such as anger, happiness, sadness, or surprise. For this, we extracted emotional features from voice signals and facial expressions, and then trained them to recognize emotional states using hidden Markov model (HMM) and neural network (NN). For voices, we used prosodic parameters such as pitch signals, energy, and their derivatives, which were then trained by HMM for recognition. For facial expressions, on the other hands, we used feature parameters extracted from thermal and visible images, and these feature parameters were then trained by NN for recognition. The recognition rates for the combined parameters obtained from voice and facial expressions showed better performance than any of two isolated sets of parameters. The simulation results were also compared with human questionnaire results.

Multimodal Parametric Fusion for Emotion Recognition

  • Kim, Jonghwa
    • International journal of advanced smart convergence
    • /
    • v.9 no.1
    • /
    • pp.193-201
    • /
    • 2020
  • The main objective of this study is to investigate the impact of additional modalities on the performance of emotion recognition using speech, facial expression and physiological measurements. In order to compare different approaches, we designed a feature-based recognition system as a benchmark which carries out linear supervised classification followed by the leave-one-out cross-validation. For the classification of four emotions, it turned out that bimodal fusion in our experiment improves recognition accuracy of unimodal approach, while the performance of trimodal fusion varies strongly depending on the individual. Furthermore, we experienced extremely high disparity between single class recognition rates, while we could not observe a best performing single modality in our experiment. Based on these observations, we developed a novel fusion method, called parametric decision fusion (PDF), which lies in building emotion-specific classifiers and exploits advantage of a parametrized decision process. By using the PDF scheme we achieved 16% improvement in accuracy of subject-dependent recognition and 10% for subject-independent recognition compared to the best unimodal results.

Effects of Speaking Rate on Korean Vowels (발화속도에 따른 한국어 모음의 음향적 특성)

  • 이숙향;고현주;한양구;김종진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.14-22
    • /
    • 2003
  • In this study, we examined the acoustic characteristics of Korean vowels through a production test under three conditions of speaking rates (slow, normal, fast). The effects of a change in speaking .ate on vowel duration were found to be very strong. The faster speaking rate was, the shorter the total duration of vowels was. But the duration ratio of two components of diphthong was not changed significantly according to changes in speaking rate. But unlike the temporal aspects, the formant value of vowels at their steady-state and change ratio of formant of semivowels were not affected strongly by the change in speaking rate.

The Effect of Audio and Visual Cues on Korean and Japanese EFL Learners' Perception of English Liquids

  • Chung, Hyun-Song
    • English Language & Literature Teaching
    • /
    • v.11 no.2
    • /
    • pp.135-148
    • /
    • 2005
  • This paper investigated the effect of audio and visual cues on Korean and Japanese EFL learners' perception of the lateral/retroflex contrast in English. In a perception experiment, the two English consonants /l/ and /r/ were embedded in initial and medial position in nonsense words in the context of the vowels /i, a, u/. Singletons and clusters were included in the speech material. Audio and video recordings were made using a total of 108 items. The items were presented to Korean and Japanese learners of English in three conditions: audio-alone (A), visual-alone (V) and audio-visual presentation (AV). The results showed that there was no evidence of AV benefit for the perception of the /l/-/r/ contrast for either Korean or Japanese learners of English. Korean listeners showed much better identification rates of the /l/-/r/ contrast than Japanese listeners when presented in audio or audio-visual conditions.

  • PDF