• Title/Summary/Keyword: speech error

Search Result 581, Processing Time 0.021 seconds

Effect of Listening Biographies on Frequency Following Response Responses of Vocalists, Violinists, and Non-Musicians to Indian Carnatic Music Stimuli

  • J, Prajna Bhat;Krishna, Rajalakshmi
    • Korean Journal of Audiology
    • /
    • v.25 no.3
    • /
    • pp.131-137
    • /
    • 2021
  • Background and Objectives: The current study investigates pitch coding using frequency following response (FFR) among vocalists, violinists, and non-musicians for Indian Carnatic transition music stimuli and assesses whether their listening biographies strengthen their F0 neural encoding for these stimuli. Subjects and Methods: Three participant groups in the age range of 18-45 years were included in the study. The first group of participants consisted of 20 trained Carnatic vocalists, the second group consisted of 13 trained violinists, and the third group consisted of 22 non-musicians. The stimuli consisted of three Indian Carnatic raga notes (/S-R2-G3/), which was sung by a trained vocalist and played by a trained violinist. For the purposes of this study, the two transitions between the notes T1=/S-R2/ and T2=/R2-G3/ were analyzed, and FFRs were recorded binaurally at 80 dB SPL using neuroscan equipment. Results: Overall average responses of the participants were generated. To assess the participants' pitch tracking to the Carnatic music stimuli, stimulus to response correlation (CC), pitch strength (PS), and pitch error (PE) were measured. Results revealed that both the vocalists and violinists had better CC and PS values with lower PE values, as compared to non-musicians, for both vocal and violin T1 and T2 transition stimuli. Between the musician groups, the vocalists were found to perform superiorly to the violinists for both vocal and violin T1 and T2 transition stimuli. Conclusions: Listening biographies strengthened F0 neural coding, with respect to the vocalists for vocal stimulus at the brainstem level. The violinists, on the other hand, did not show such preference.

Effect of Listening Biographies on Frequency Following Response Responses of Vocalists, Violinists, and Non-Musicians to Indian Carnatic Music Stimuli

  • Prajna, Bhat J;Rajalakshmi, Krishna
    • Journal of Audiology & Otology
    • /
    • v.25 no.3
    • /
    • pp.131-137
    • /
    • 2021
  • Background and Objectives: The current study investigates pitch coding using frequency following response (FFR) among vocalists, violinists, and non-musicians for Indian Carnatic transition music stimuli and assesses whether their listening biographies strengthen their F0 neural encoding for these stimuli. Subjects and Methods: Three participant groups in the age range of 18-45 years were included in the study. The first group of participants consisted of 20 trained Carnatic vocalists, the second group consisted of 13 trained violinists, and the third group consisted of 22 non-musicians. The stimuli consisted of three Indian Carnatic raga notes (/S-R2-G3/), which was sung by a trained vocalist and played by a trained violinist. For the purposes of this study, the two transitions between the notes T1=/S-R2/ and T2=/R2-G3/ were analyzed, and FFRs were recorded binaurally at 80 dB SPL using neuroscan equipment. Results: Overall average responses of the participants were generated. To assess the participants' pitch tracking to the Carnatic music stimuli, stimulus to response correlation (CC), pitch strength (PS), and pitch error (PE) were measured. Results revealed that both the vocalists and violinists had better CC and PS values with lower PE values, as compared to non-musicians, for both vocal and violin T1 and T2 transition stimuli. Between the musician groups, the vocalists were found to perform superiorly to the violinists for both vocal and violin T1 and T2 transition stimuli. Conclusions: Listening biographies strengthened F0 neural coding, with respect to the vocalists for vocal stimulus at the brainstem level. The violinists, on the other hand, did not show such preference.

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

  • Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.566-571
    • /
    • 2009
  • This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.

Detecting lies through suspect's nonverbal behaviors in the investigation scene (군 수사현장에서 용의자의 비언어적 행동을 이용한 거짓말 탐지)

  • Si Up Kim;Woo Byoung Jhon;Chung Hyun Jeon
    • Korean Journal of Culture and Social Issue
    • /
    • v.12 no.2
    • /
    • pp.101-114
    • /
    • 2006
  • This study was examined the effective nonverbal behavior cues of detecting suspects' lies in the investigation scene. In order to search the suspects who drank the alcohol liquor without a permission, 18 soldiers were interviewed. 8 solders had drunken alcohol and had lied when was asked(lie group). The other 10 soldiers hadn't drunken alcohol and had told the truth(truth group). The mean frequencies of nonverbal behaviors were compared lie group with truth group. The following behaviors were measured by frequency: vocal characteristics (high pitch of voice, speech hesitations, speech error, frequency of pauses, period of pauses, latency period), facial characteristics (gaze, smile, touching face, blinking, facial micro-expression), body movement (illustrators, hand and finger movement, leg and foot movement, head movement, trunk movement, shifting position). As results, this study found that deception cues were periods and frequencies of pause, micro-expression, head movements. The lie group had less periods and frequencies of pause, and more micro-expression, head movements than truth group. But, this study didn't found Othello's error cues.

Acoustic Feedback and Noise Cancellation of Hearing Aids by Deep Learning Algorithm (심층학습 알고리즘을 이용한 보청기의 음향궤환 및 잡음 제거)

  • Lee, Haeng-Woo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.6
    • /
    • pp.1249-1256
    • /
    • 2019
  • In this paper, we propose a new algorithm to remove acoustic feedback and noise in hearing aids. Instead of using the conventional FIR structure, this algorithm is a deep learning algorithm using neural network adaptive prediction filter to improve the feedback and noise reduction performance. The feedback canceller first removes the feedback signal from the microphone signal and then removes the noise using the Wiener filter technique. Noise elimination is to estimate the speech from the speech signal containing noise using the linear prediction model according to the periodicity of the speech signal. In order to ensure stable convergence of two adaptive systems in a loop, coefficient updates of the feedback canceller and noise canceller are separated and converged using the residual error signal generated after the cancellation. In order to verify the performance of the feedback and noise canceller proposed in this study, a simulation program was written and simulated. Experimental results show that the proposed deep learning algorithm improves the signal to feedback ratio(: SFR) of about 10 dB in the feedback canceller and the signal to noise ratio enhancement(: SNRE) of about 3 dB in the noise canceller than the conventional FIR structure.

English Phoneme Recognition using Segmental-Feature HMM (분절 특징 HMM을 이용한 영어 음소 인식)

  • Yun, Young-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.3
    • /
    • pp.167-179
    • /
    • 2002
  • In this paper, we propose a new acoustic model for characterizing segmental features and an algorithm based upon a general framework of hidden Markov models (HMMs) in order to compensate the weakness of HMM assumptions. The segmental features are represented as a trajectory of observed vector sequences by a polynomial regression function because the single frame feature cannot represent the temporal dynamics of speech signals effectively. To apply the segmental features to pattern classification, we adopted segmental HMM(SHMM) which is known as the effective method to represent the trend of speech signals. SHMM separates observation probability of the given state into extra- and intra-segmental variations that show the long-term and short-term variabilities, respectively. To consider the segmental characteristics in acoustic model, we present segmental-feature HMM(SFHMM) by modifying the SHMM. The SFHMM therefore represents the external- and internal-variation as the observation probability of the trajectory in a given state and trajectory estimation error for the given segment, respectively. We conducted several experiments on the TIMIT database to establish the effectiveness of the proposed method and the characteristics of the segmental features. From the experimental results, we conclude that the proposed method is valuable, if its number of parameters is greater than that of conventional HMM, in the flexible and informative feature representation and the performance improvement.

A Study on the Development of Embedded Serial Multi-modal Biometrics Recognition System (임베디드 직렬 다중 생체 인식 시스템 개발에 관한 연구)

  • Kim, Joeng-Hoon;Kwon, Soon-Ryang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.1
    • /
    • pp.49-54
    • /
    • 2006
  • The recent fingerprint recognition system has unstable factors, such as copy of fingerprint patterns and hacking of fingerprint feature point, which mali cause significant system error. Thus, in this research, we used the fingerprint as the main recognition device and then implemented the multi-biometric recognition system in serial using the speech recognition which has been widely used recently. As a multi-biometric recognition system, once the speech is successfully recognized, the fingerprint recognition process is run. In addition, speaker-dependent DTW(Dynamic Time Warping) algorithm is used among existing speech recognition algorithms (VQ, DTW, HMM, NN) for effective real-time process while KSOM (Kohonen Self-Organizing feature Map) algorithm, which is the artificial intelligence method, is applied for the fingerprint recognition system because of its calculation amount. The experiment of multi-biometric recognition system implemented in this research showed 2 to $7\%$ lower FRR (False Rejection Ratio) than single recognition systems using each fingerprints or voice, but zero FAR (False Acceptance Ratio), which is the most important factor in the recognition system. Moreover, there is almost no difference in the recognition time(average 1.5 seconds) comparing with other existing single biometric recognition systems; therefore, it is proved that the multi-biometric recognition system implemented is more efficient security system than single recognition systems based on various experiments.

Improved Decision Tree-Based State Tying In Continuous Speech Recognition System (연속 음성 인식 시스템을 위한 향상된 결정 트리 기반 상태 공유)

  • ;Xintian Wu;Chaojun Liu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.49-56
    • /
    • 1999
  • In many continuous speech recognition systems based on HMMs, decision tree-based state tying has been used for not only improving the robustness and accuracy of context dependent acoustic modeling but also synthesizing unseen models. To construct the phonetic decision tree, standard method performs one-level pruning using just single Gaussian triphone models. In this paper, two novel approaches, two-level decision tree and multi-mixture decision tree, are proposed to get better performance through more accurate acoustic modeling. Two-level decision tree performs two level pruning for the state tying and the mixture weight tying. Using the second level, the tied states can have different mixture weights based on the similarities in their phonetic contexts. In the second approach, phonetic decision tree continues to be updated with training sequence, mixture splitting and re-estimation. Multi-mixture Gaussian as well as single Gaussian models are used to construct the multi-mixture decision tree. Continuous speech recognition experiment using these approaches on BN-96 and WSJ5k data showed a reduction in word error rate comparing to the standard decision tree based system given similar number of tied states.

  • PDF

A Study on the Spoken Korean Citynames Using Multi-Layered Perceptron of Back-Propagation Algorithm (오차 역전파 알고리즘을 갖는 MLP를 이용한 한국 지명 인식에 대한 연구)

  • Song, Do-Sun;Lee, Jae-Gheon;Kim, Seok-Dong;Lee, Haing-Sei
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.5-14
    • /
    • 1994
  • This paper is about an experiment of speaker-independent automatic Korean spoken words recognition using Multi-Layered Perceptron and Error Back-propagation algorithm. The object words are 50 citynames of D.D.D local numbers. 43 of those are 2 syllables and the rest 7 are 3 syllables. The words were not segmented into syllables or phonemes, and some feature components extracted from the words in equal gap were applied to the neural network. That led independent result on the speech duration, and the PARCOR coefficients calculated from the frames using linear predictive analysis were employed as feature components. This paper tried to find out the optimum conditions through 4 differerent experiments which are comparison between total and pre-classified training, dependency of recognition rate on the number of frames and PAROCR order, recognition change due to the number of neurons in the hidden layer, and the comparison of the output pattern composition method of output neurons. As a result, the recognition rate of $89.6\%$ is obtaimed through the research.

  • PDF

A Study on Keyword Spotting System Using Pseudo N-gram Language Model (의사 N-gram 언어모델을 이용한 핵심어 검출 시스템에 관한 연구)

  • 이여송;김주곤;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.242-247
    • /
    • 2004
  • Conventional keyword spotting systems use the connected word recognition network consisted by keyword models and filler models in keyword spotting. This is why the system can not construct the language models of word appearance effectively for detecting keywords in large vocabulary continuous speech recognition system with large text data. In this paper to solve this problem, we propose a keyword spotting system using pseudo N-gram language model for detecting key-words and investigate the performance of the system upon the changes of the frequencies of appearances of both keywords and filler models. As the results, when the Unigram probability of keywords and filler models were set to 0.2, 0.8, the experimental results showed that CA (Correctly Accept for In-Vocabulary) and CR (Correctly Reject for Out-Of-Vocabulary) were 91.1% and 91.7% respectively, which means that our proposed system can get 14% of improved average CA-CR performance than conventional methods in ERR (Error Reduction Rate).