• Title/Summary/Keyword: Speech Recognition Error

Search Result 282, Processing Time 0.023 seconds

The Speaker Recognition System using the Pitch Alteration (피치변경을 이용한 화자인식 시스템)

  • Jung JongSoon;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.115-118
    • /
    • 2002
  • Parameters used in a speaker recognition system are desirable expressing speaker's characteristics filly and have in a speech. That is to say, if inter-speaker than intra-speaker variance a big characteristic, it is useful to distinguish between speakers. Also, to make minimum error between speakers, it is required the improved recognition technology as well as the distinguishing characteristics. When we see the result of recent simulation performance, we obtain more exact performance by using dynamic characteristics and constant characteristics by a speaking habit. Therefore we suggest it to solve this problem as followings. The prosodic information is used by a characteristic vector of speech. Characteristics vector generally using in speaker recognition system is a modeling spectrum information and is working for a high performance in non-noise circumstance. However, it is found a problem that characteristic vector is distorted in noise circumstance and it makes a reduction of recognition rate. In this paper, we change pitch line divided by segment which can estimate a dynamic characteristic and it is used as a recognition characteristic. we confirmed that the dynamic characteristic is very robust in noise circumstance with a simulation. We make a decision of acceptance or rejection by comparing test pattern and recognition rate using the proposed algorithm has more improvement than using spectrum and prosodic information. Especially stational recognition rate can be obtained in noise circumstance through the simulation.

  • PDF

The Proposal of the Fuzzed Lyapunov Dimension at Speech Signal (음성에 대한 퍼지-리아프노프 차원의 제안)

  • In, Joon-Hawn;Yoo, Byong-Wook;Ryu, Seok-Han;Jung, Myong-Jin;Kim, Chang-Seok
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.4
    • /
    • pp.30-37
    • /
    • 1999
  • This study suggested the Fuzzy Lyapunov dimension. The Fuzzy Lyapunov dimension is to evaluate the quantitative variation of the attractor. In this paper the speaker recognition is evaluated by the Fuzzy Lyapunov dimension. It has been proved that the suggested Fuzzy Lyapunov dimension is superior in the discrimination characteristics between standard reference pattern attractors, and in reference to the test pattern attractor, it has been verified that it is the speaker recognition parameter which absorbs the pattern variation. In order to evaluate the Fuzzy Lyapunov dimension as speaker recognition parameter, the mistaken recognition according to discrimination error in each of speaker and standard reference pattern was estimated, and the validity of the speaker recognition parameter was experimental. As the result of the speaker recognition experiment, 97.0[%] of recognition ratio was obtained, and it was confirmed that the Fuzzy Lyapunov dimension was fit for the speaker recognition parameter.

  • PDF

An adaptive time-delay recurrent neural network for temporal learning and prediction (시계열패턴의 학습과 예측을 위한 적응 시간지연 회귀 신경회로망)

  • 김성식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.2
    • /
    • pp.534-540
    • /
    • 1996
  • This paper presents an Adaptive Time-Delay Recurrent Neural Network (ATRN) for learning and recognition of temporal correlations of temporal patterns. The ATRN employs adaptive time-delays and recurrent connections, which are inspired from neurobiology. In the ATRN, the adaptive time-delays make the ATRN choose the optimal values of time-delays for the temporal location of the important information in the input parrerns, and the recurrent connections enable the network to encode and integrate temporal information of sequences which have arbitrary interval time and arbitrary length of temporal context. The ATRN described in this paper, ATNN proposed by Lin, and TDNN introduced by Waibel were simulated and applied to the chaotic time series preditcion of Mackey-Glass delay-differential equation. The simulation results show that the normalized mean square error (NMSE) of ATRN is 0.0026, while the NMSE values of ATNN and TDNN are 0.014, 0.0117, respectively, and in temporal learning, employing recurrent links in the network is more effective than putting multiple time-delays into the neurons. The best performance is attained bythe ATRN. This ATRN will be sell applicable for temporally continuous domains, such as speech recognition, moving object recognition, motor control, and time-series prediction.

  • PDF

VR-simulated Sailor Training Platform for Emergency (긴급상황에 대한 가상현실 선원 훈련 플랫폼)

  • Park, Chur-Woong;Jung, Jinki;Yang, Hyun-Seung
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2015.10a
    • /
    • pp.175-178
    • /
    • 2015
  • This paper presents a VR-simulated sailor training platform for emergency in order to prevent a human error that causes 60~80% of domestic/ abroad marine accidents. Through virtual reality technology, the proposed platform provides an interaction method for proficiency of procedures in emergency, and a crowd control method for controlling crowd agents in a virtual ship environment. The interaction method uses speech recognition and gesture recognition to enhance the immersiveness and efficiency of the training. The crowd control method provides natural simulations of crowd agents by applying a behavior model that reflects the social behavior model of human. To examine the efficiency of the proposed platform, a prototype whose virtual training scenario describes the outbreak of fire in a ship was implemented as a standalone system.

  • PDF

Pulse-Coded Train and QRS Feature extraction Using Linear Prediction (선형예측법을 이용한 심전도 신호의 부호화와 특징추출)

  • Song, Chul-Gyu;Lee, Byung-Chae;Jeong, Kee-Sam;Lee, Myoung-Ho
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1992 no.05
    • /
    • pp.175-178
    • /
    • 1992
  • This paper proposes a method called linear prediction (a high performant technique in digital speech processing) for analyzing digital ECG signals. There are several significant properties indicating that ECG signals have an important feature in the residual error signal obtained after processing by Durbin's linear prediction algorithm. The ECG signal classification puts an emphasis on the residual error signal. For each ECG's QRS complex. the feature for recognition is obtained from a nonlinear transformation which transforms every residual error signal to set of three states pulse-cord train relative to the original ECG signal. The pulse-cord train has the advantage of easy implementation in digital hardware circuits to achive automated ECG diagnosis. The algorithm performs very well feature extraction in arrythmia detection. Using this method, our studies indicate that the PVC (premature ventricular contration) detection has a at least 90 percent sensityvity for arrythmia data.

  • PDF

An Enhancement of Learning Speed of the Error - Backpropagation Algorithm (오류 역전도 알고리즘의 학습속도 향상기법)

  • Shim, Bum-Sik;Jung, Eui-Yong;Yoon, Chung-Hwa;Kang, Kyung-Sik
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.7
    • /
    • pp.1759-1769
    • /
    • 1997
  • The Error BackPropagation (EBP) algorithm for multi-layered neural networks is widely used in various areas such as associative memory, speech recognition, pattern recognition and robotics, etc. Nevertheless, many researchers have continuously published papers about improvements over the original EBP algorithm. The main reason for this research activity is that EBP is exceeding slow when the number of neurons and the size of training set is large. In this study, we developed new learning speed acceleration methods using variable learning rate, variable momentum rate and variable slope for the sigmoid function. During the learning process, these parameters should be adjusted continuously according to the total error of network, and it has been shown that these methods significantly reduced learning time over the original EBP. In order to show the efficiency of the proposed methods, first we have used binary data which are made by random number generator and showed the vast improvements in terms of epoch. Also, we have applied our methods to the binary-valued Monk's data, 4, 5, 6, 7-bit parity checker and real-valued Iris data which are famous benchmark training sets for machine learning.

  • PDF

Speech Recognition on Korean Monosyllable using Phoneme Discriminant Filters (음소판별필터를 이용한 한국어 단음절 음성인식)

  • Hur, Sung-Phil;Chung, Hyun-Yeol;Kim, Kyung-Tae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1
    • /
    • pp.31-39
    • /
    • 1995
  • In this paper, we have constructed phoneme discriminant filters [PDF] according to the linear discriminant function. These discriminant filters do not follow the heuristic rules by the experts but the mathematical methods in iterative learning. Proposed system. is based on the piecewise linear classifier and error correction learning method. The segmentation of speech and the classification of phoneme are carried out simutaneously by the PDF. Because each of them operates independently, some speech intervals may have multiple outputs. Therefore, we introduce the unified coefficients by the output unification process. But sometimes the output has a region which shows no response, or insensitive. So we propose time windows and median filters to remove such problems. We have trained this system with the 549 monosyllables uttered 3 times by 3 male speakers. After we detect the endpoint of speech signal using threshold value and zero crossing rate, the vowels and consonants are separated by the PDF, and then selected phoneme passes through the following PDF. Finally this system unifies the outputs for competitive region or insensitive area using time window and median filter.

  • PDF

Effective Combination of Temporal Information and Linear Transformation of Feature Vector in Speaker Verification (화자확인에서 특징벡터의 순시 정보와 선형 변환의 효과적인 적용)

  • Seo, Chang-Woo;Zhao, Mei-Hua;Lim, Young-Hwan;Jeon, Sung-Chae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.127-132
    • /
    • 2009
  • The feature vectors which are used in conventional speaker recognition (SR) systems may have many correlations between their neighbors. To improve the performance of the SR, many researchers adopted linear transformation method like principal component analysis (PCA). In general, the linear transformation of the feature vectors is based on concatenated form of the static features and their dynamic features. However, the linear transformation which based on both the static features and their dynamic features is more complex than that based on the static features alone due to the high order of the features. To overcome these problems, we propose an efficient method that applies linear transformation and temporal information of the features to reduce complexity and improve the performance in speaker verification (SV). The proposed method first performs a linear transformation by PCA coefficients. The delta parameters for temporal information are then obtained from the transformed features. The proposed method only requires 1/4 in the size of the covariance matrix compared with adding the static and their dynamic features for PCA coefficients. Also, the delta parameters are extracted from the linearly transformed features after the reduction of dimension in the static features. Compared with the PCA and conventional methods in terms of equal error rate (EER) in SV, the proposed method shows better performance while requiring less storage space and complexity.

  • PDF

Continuous Speech Recognition based on Parmetric Trajectory Segmental HMM (모수적 궤적 기반의 분절 HMM을 이용한 연속 음성 인식)

  • 윤영선;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.35-44
    • /
    • 2000
  • In this paper, we propose a new trajectory model for characterizing segmental features and their interaction based upon a general framework of hidden Markov models. Each segment, a sequence of vectors, is represented by a trajectory of observed sequences. This trajectory is obtained by applying a new design matrix which includes transitional information on contiguous frames, and is characterized as a polynomial regression function. To apply the trajectory to the segmental HMM, the frame features are replaced with the trajectory of a given segment. We also propose the likelihood of a given segment and the estimation of trajectory parameters. The obervation probability of a given segment is represented as the relation between the segment likelihood and the estimation error of the trajectories. The estimation error of a trajectory is considered as the weight of the likelihood of a given segment in a state. This weight represents the probability of how well the corresponding trajectory characterize the segment. The proposed model can be regarded as a generalization of a conventional HMM and a parametric trajectory model. The experimental results are reported on the TIMIT corpus and performance is show to improve significantly over that of the conventional HMM.

  • PDF

Speech Recognition Using Formant Bandwidth Normalization (포만트 밴드폭 정규화를 이용한 음성인식)

  • 홍종진;강석건;박군작;박규태
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.5
    • /
    • pp.458-467
    • /
    • 1991
  • In this paper, the cause of linear prediction error is analysed and the theoretical basis for nomalizing the format bandwidth to 0is given and its validity is verified. The formant and bandwidth in relation to the position of the poles of AR filter are measured for an alaysis of the relation between the pole position and the formant bandwidth. By changing the glottis reflection coefficient to 1. the pole position and the formant bandwidth. By changing the glottis reflection coefficient to 1. the effect of the glottis is eliminated and as the result a new linear preiction coefficients are obtained by normalizing the formant bandwidth of the signal to 0. since these coefficients are symmetrical, the standard deviation is larger than the coefficients with fixed glottis reflection coefficient. The bit rate for speech coding can be reduced by a factor of 2 without any loss of information. Through computer simulation, recognition rate of 96.7% is botained by using the proposed algorithm in recognizing 5 Korean vowels in noisy environment.

  • PDF