Search | Korea Science

Isolated-Word Speech Recognition in Telephone Environment Using Perceptual Auditory Characteristic (인지적 청각 특성을 이용한 고립 단어 전화 음성 인식)

Choi, Hyung-Ki;Park, Ki-Young;Kim, Chong-Kyo
- Journal of the Institute of Electronics Engineers of Korea TE
- /
- v.39 no.2
- /
- pp.60-65
- /
- 2002
In this paper, we propose GFCC(gammatone filter frequency cepstrum coefficient) parameter which was based on the auditory characteristic for accomplishing better speech recognition rate. And it is performed the experiment of speech recognition for isolated word acquired from telephone network. For the purpose of comparing GFCC parameter with other parameter, the experiment of speech recognition are carried out using MFCC and LPCC parameter. Also, for each parameter, we are implemented CMS(cepstral mean subtraction)which was applied or not in order to compensate channel distortion in telephone network. Accordingly, we found that the recognition rate using GFCC parameter is better than other parameter in the experimental result.
PDF KSCI

Sound Model Generation using Most Frequent Model Search for Recognizing Animal Vocalization (최대 빈도모델 탐색을 이용한 동물소리 인식용 소리모델생성)

Ko, Youjung;Kim, Yoonjoong
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.10 no.1
- /
- pp.85-94
- /
- 2017
In this paper, I proposed a sound model generation and a most frequent model search algorithm for recognizing animal vocalization. The sound model generation algorithm generates a optimal set of models through repeating processes such as the training process, the Viterbi Search process, and the most frequent model search process while adjusting HMM(Hidden Markov Model) structure to improve global recognition rate. The most frequent model search algorithm searches the list of models produced by Viterbi Search Algorithm for the most frequent model and makes it be the final decision of recognition process. It is implemented using MFCC(Mel Frequency Cepstral Coefficient) for the sound feature, HMM for the model, and C# programming language. To evaluate the algorithm, a set of animal sounds for 27 species were prepared and the experiment showed that the sound model generation algorithm generates 27 HMM models with 97.29 percent of recognition rate.
https://doi.org/10.17661/jkiiect.2017.10.1.85 인용 PDF KSCI

Feature-Vector Normalization for SVM-based Music Genre Classification (SVM에 기반한 음악 장르 분류를 위한 특징벡터 정규화 방법)

Lim, Shin-Cheol;Jang, Sei-Jin;Lee, Seok-Pil;Kim, Moo-Young
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.48 no.5
- /
- pp.31-36
- /
- 2011
In this paper, Mel-Frequency Cepstral Coefficient (MFCC), Decorrelated Filter Bank (DFB), Octave-based Spectral Contrast (OSC), Zero-Crossing Rate (ZCR), and Spectral Contract/Roll-Off are combined as a set of multiple feature-vectors for the music genre classification system based on the Support Vector Machine (SVM) classifier. In the conventional system, feature vectors for the entire genre classes are normalized for the SVM model training and classification. However, in this paper, selected feature vectors that are compared based on the One-Against-One (OAO) SVM classifier are only used for normalization. Using OSC as a single feature-vector and the multiple feature-vectors, we obtain the genre classification rates of 60.8% and 77.4%, respectively, with the conventional normalization method. Using the proposed normalization method, we obtain the increased classification rates by 8.2% and 3.3% for OSC and the multiple feature-vectors, respectively.
PDF KSCI

Musical Genre Classification System based on Multiple-Octave Bands (다중 옥타브 밴드 기반 음악 장르 분류 시스템)

Byun, Karam;Kim, Moo Young
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.12
- /
- pp.238-244
- /
- 2013
For musical genre classification, various types of feature vectors are utilized. Mel-frequency cepstral coefficient (MFCC), decorrelated filter bank (DFB), and octave-based spectral contrast (OSC) are widely used as short-term features, and their long-term variations are also utilized. In this paper, OSC features are extracted not only in the single-octave band domain, but also in the multiple-octave band one to capture the correlation between octave bands. As a baseline system, we select the genre classification system that won the fourth place in the 2012 music information retrieval evaluation exchange (MIREX) contest. By applying the OSC features based on multiple-octave bands, we obtain the better classification accuracy by 0.40% and 3.15% for the GTZAN and Ballroom databases, respectively.
https://doi.org/10.5573/ieek.2013.50.12.238 인용 PDF KSCI

Same music file recognition method by using similarity measurement among music feature data (음악 특징점간의 유사도 측정을 이용한 동일음원 인식 방법)

Sung, Bo-Kyung;Chung, Myoung-Beom;Ko, Il-Ju
- Journal of the Korea Society of Computer and Information
- /
- v.13 no.3
- /
- pp.99-106
- /
- 2008
Recently, digital music retrieval is using in many fields (Web portal. audio service site etc). In existing fields, Meta data of music are used for digital music retrieval. If Meta data are not right or do not exist, it is hard to get high accurate retrieval result. Contents based information retrieval that use music itself are researched for solving upper problem. In this paper, we propose Same music recognition method using similarity measurement. Feature data of digital music are extracted from waveform of music using Simplified MFCC (Mel Frequency Cepstral Coefficient). Similarity between digital music files are measured using DTW (Dynamic time Warping) that are used in Vision and Speech recognition fields. We success all of 500 times experiment in randomly collected 1000 songs from same genre for preying of proposed same music recognition method. 500 digital music were made by mixing different compressing codec and bit-rate from 60 digital audios. We ploved that similarity measurement using DTW can recognize same music.
PDF

Improvement of Speech Reconstructed from MFCC Using GMM (GMM을 이용한 MFCC로부터 복원된 음성의 개선)

Choi, Won-Young;Choi, Mu-Yeol;Kim, Hyung-Soon
- MALSORI
- /
- no.53
- /
- pp.129-141
- /
- 2005
The goal of this research is to improve the quality of reconstructed speech in the Distributed Speech Recognition (DSR) system. For the extended DSR, we estimate the variable Maximum Voiced Frequency (MVF) from Mel-Frequency Cepstral Coefficient (MFCC) based on Gaussian Mixture Model (GMM), to implement realistic harmonic plus noise model for the excitation signal. For the standard DSR, we also make the voiced/unvoiced decision from MFCC based on GMM because the pitch information is not available in that case. The perceptual test reveals that speech reconstructed by the proposed method is preferred to the one by the conventional methods.
PDF

Channel-attentive MFCC for Improved Recognition of Partially Corrupted Speech (부분 손상된 음성의 인식 향상을 위한 채널집중 MFCC 기법)

조훈영;지상문;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.4
- /
- pp.315-322
- /
- 2003
We propose a channel-attentive Mel frequency cepstral coefficient (CAMFCC) extraction method to improve the recognition performance of speech that is partially corrupted in the frequency domain. This method introduces weighting terms both at the filter bank analysis step and at the output probability calculation of decoding step. The weights are obtained for each frequency channel of filter bank such that the more reliable channel is emphasized by a higher weight value. Experimental results on TIDIGITS database corrupted by various frequency-selective noises indicated that the proposed CAMFCC method utilizes the uncorrupted speech information well, improving the recognition performance by 11.2% on average in comparison to a multi-band speech recognition system.
PDF KSCI

Identification of Underwater Ambient Noise Sources Using MFCC (MFCC를 이용한 수중소음원의 식별)

Hwang, Do-Jin;Kim, Jea-Soo
- Proceedings of the Korea Committee for Ocean Resources and Engineering Conference
- /
- 2006.11a
- /
- pp.307-310
- /
- 2006
Underwater ambient noise originating from the geophysical, biological, and man-made acoustic sources contains much information on the sources and the ocean environment affecting the performance of the sonar equipments. In this paper, a set of feature vectors of the ambient noises using MFCC is proposed and extracted to form a data base for the purpose of identifying the noise sources. The developed algorithm for the pattern recognition is applied to the observed ocean data, and the initial results are presented and discussed.
PDF

Speech Emotion Recognition with SVM, KNN and DSVM

Hadhami Aouani ;Yassine Ben Ayed
- International Journal of Computer Science & Network Security
- /
- v.23 no.8
- /
- pp.40-48
- /
- 2023
Speech Emotions recognition has become the active research theme in speech processing and in applications based on human-machine interaction. In this work, our system is a two-stage approach, namely feature extraction and classification engine. Firstly, two sets of feature are investigated which are: the first one is extracting only 13 Mel-frequency Cepstral Coefficient (MFCC) from emotional speech samples and the second one is applying features fusions between the three features: Zero Crossing Rate (ZCR), Teager Energy Operator (TEO), and Harmonic to Noise Rate (HNR) and MFCC features. Secondly, we use two types of classification techniques which are: the Support Vector Machines (SVM) and the k-Nearest Neighbor (k-NN) to show the performance between them. Besides that, we investigate the importance of the recent advances in machine learning including the deep kernel learning. A large set of experiments are conducted on Surrey Audio-Visual Expressed Emotion (SAVEE) dataset for seven emotions. The results of our experiments showed given good accuracy compared with the previous studies.
https://doi.org/10.22937/IJCSNS.2023.23.8.6 인용 PDF

Parts-Based Feature Extraction of Spectrum of Speech Signal Using Non-Negative Matrix Factorization

Park, Jeong-Won;Kim, Chang-Keun;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
- Journal of information and communication convergence engineering
- /
- v.1 no.4
- /
- pp.209-212
- /
- 2003
In this paper, we proposed new speech feature parameter through parts-based feature extraction of speech spectrum using Non-Negative Matrix Factorization (NMF). NMF can effectively reduce dimension for multi-dimensional data through matrix factorization under the non-negativity constraints, and dimensionally reduced data should be presented parts-based features of input data. For speech feature extraction, we applied Mel-scaled filter bank outputs to inputs of NMF, than used outputs of NMF for inputs of speech recognizer. From recognition experiment result, we could confirm that proposed feature parameter is superior in recognition performance than mel frequency cepstral coefficient (MFCC) that is used generally.
PDF KSCI

Search Result 54, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)