Search | Korea Science

Speaker Recognition using LPC cepstrum Coefficients and Neural Network (LPC 켑스트럼 계수와 신경회로망을 사용한 화자인식)

Choi, Jae-Seung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.15 no.12
- /
- pp.2521-2526
- /
- 2011
This paper proposes a speaker recognition algorithm using a perceptron neural network and LPC (Linear Predictive Coding) cepstrum coefficients. The proposed algorithm first detects the voiced sections at each frame. Then, the LPC cepstrum coefficients which have speaker characteristics are obtained by the linear predictive analysis for the detected voiced sections. To classify the obtained LPC cepstrum coefficients, a neural network is trained using the LPC cepstrum coefficients. In this experiment, the performance of the proposed algorithm was evaluated using the speech recognition rates based on the LPC cepstrum coefficients and the neural network.
https://doi.org/10.6109/jkiice.2011.15.12.2521 인용 PDF KSCI

An SNR Scalable Video Coding using Linearly Combined Motion Vectors

Ryu, Chang-Hoon;Byoungjun Han;Park, Kwang-Pyo;Yoon, Eung-Sik;Lee, Keun-Young
- Proceedings of the IEEK Conference
- /
- 2002.07a
- /
- pp.50-53
- /
- 2002
There are increasing needs to deliver the multimedia streaming over heterogeneous networks. When considering network environments and equipment accessed by user, delivery of video streaming must be scalable. There are many kinds of scalable video coding: spatial, temporal, SNR, and hybrid. The SNR scalable and spatial resolution, but different SNR quality with respect to layers. The 1-layer SNR scalable encoder produces SNR scalable video streams with ease. But, there is drift problem. Modified 1-layer approach does not have this problem but coding inefficiency, and is not MPEG-compliant. The present MPEG-compliant 2-layer encoder comes out to reduce coding rate. But it still use only base layer to encode whole layer. In this paper, we propose adaptive MPEG-compliant 2-layer encoder. Using linear combination algorithm, encoder use 1 motion vector to encode the sequences efficiently. By dong this, we can achieve the coding efficiency of SNR scalable coding.
PDF

Network Coding-Based Fault Diagnosis Protocol for Dynamic Networks

Jarrah, Hazim;Chong, Peter Han Joo;Sarkar, Nurul I.;Gutierrez, Jairo
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.4
- /
- pp.1479-1501
- /
- 2020
Dependable functioning of dynamic networks is essential for delivering ubiquitous services. Faults are the root causes of network outages. The comparison diagnosis model, which automates fault's identification, is one of the leading approaches to attain network dependability. Most of the existing research has focused on stationary networks. Nonetheless, the time-free comparison model imposes no time constraints on the system under considerations, and it suits most of the diagnosis requirements of dynamic networks. This paper presents a novel protocol that diagnoses faulty nodes in diagnosable dynamic networks. The proposed protocol comprises two stages, a testing stage, which uses the time-free comparison model to diagnose faulty neighbour nodes, and a disseminating stage, which leverages a Random Linear Network Coding (RLNC) technique to disseminate the partial view of nodes. We analysed and evaluated the performance of the proposed protocol under various scenarios, considering two metrics: communication overhead and diagnosis time. The simulation results revealed that the proposed protocol diagnoses different types of faults in dynamic networks. Compared with most related protocols, our proposed protocol has very low communication overhead and diagnosis time. These results demonstrated that the proposed protocol is energy-efficient, scalable, and robust.
https://doi.org/10.3837/tiis.2020.04.005 인용 PDF KSCI HTML

Artificial speech bandwidth extension technique based on opus codec using deep belief network (심층 신뢰 신경망을 이용한 오푸스 코덱 기반 인공 음성 대역 확장 기술)

Choi, Yoonsang;Li, Yaxing;Kang, Sangwon
- The Journal of the Acoustical Society of Korea
- /
- v.36 no.1
- /
- pp.70-77
- /
- 2017
Bandwidth extension is a technique to improve speech quality, intelligibility and naturalness, extending from the 300 ~ 3,400 Hz narrowband speech to the 50 ~ 7,000 Hz wideband speech. In this paper, an Artificial Bandwidth Extension (ABE) module embedded in the Opus audio decoder is designed using the information of narrowband speech to reduce the computational complexity of LPC (Linear Prediction Coding) and LSF (Line Spectral Frequencies) analysis and the algorithm delay of the ABE module. We proposed a spectral envelope extension method using DBN (Deep Belief Network), one of deep learning techniques, and the proposed scheme produces better extended spectrum than the traditional codebook mapping method.
https://doi.org/10.7776/ASK.2017.36.1.070 인용 PDF KSCI

Speaker Verification Using Hidden LMS Adaptive Filtering Algorithm and Competitive Learning Neural Network (Hidden LMS 적응 필터링 알고리즘을 이용한 경쟁학습 화자검증)

Cho, Seong-Won;Kim, Jae-Min
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.51 no.2
- /
- pp.69-77
- /
- 2002
Speaker verification can be classified in two categories, text-dependent speaker verification and text-independent speaker verification. In this paper, we discuss text-dependent speaker verification. Text-dependent speaker verification system determines whether the sound characteristics of the speaker are equal to those of the specific person or not. In this paper we obtain the speaker data using a sound card in various noisy conditions, apply a new Hidden LMS (Least Mean Square) adaptive algorithm to it, and extract LPC (Linear Predictive Coding)-cepstrum coefficients as feature vectors. Finally, we use a competitive learning neural network for speaker verification. The proposed hidden LMS adaptive filter using a neural network reduces noise and enhances features in various noisy conditions. We construct a separate neural network for each speaker, which makes it unnecessary to train the whole network for a new added speaker and makes the system expansion easy. We experimentally prove that the proposed method improves the speaker verification performance.
PDF KSCI

Speech Recognition of Multi-Syllable Words Using Soft Computing Techniques (소프트컴퓨팅 기법을 이용한 다음절 단어의 음성인식)

Lee, Jong-Soo;Yoon, Ji-Won
- Transactions of the Society of Information Storage Systems
- /
- v.6 no.1
- /
- pp.18-24
- /
- 2010
The performance of the speech recognition mainly depends on uncertain factors such as speaker's conditions and environmental effects. The present study deals with the speech recognition of a number of multi-syllable isolated Korean words using soft computing techniques such as back-propagation neural network, fuzzy inference system, and fuzzy neural network. Feature patterns for the speech recognition are analyzed with 12th order thirty frames that are normalized by the linear predictive coding and Cepstrums. Using four models of speech recognizer, actual experiments for both single-speakers and multiple-speakers are conducted. Through this study, the recognizers of combined fuzzy logic and back-propagation neural network and fuzzy neural network show the better performance in identifying the speech recognition.
PDF KSCI

Electroencephalogram-based Driver Drowsiness Detection System Using AR Coefficients and SVM (AR계수와 SVM을 이용한 뇌파 기반 운전자의 졸음 감지 시스템)

Han, Hyungseob;Chong, Uipil
- Journal of the Korean Institute of Intelligent Systems
- /
- v.22 no.6
- /
- pp.768-773
- /
- 2012
One of the main reasons for serious road accidents is driving while drowsy. For this reason, drowsiness detection and warning system for drivers has recently become a very important issue. Monitoring physiological signals provides the possibility of detecting features of drowsiness and fatigue of drivers. One of the effective signals is to measure electroencephalogram (EEG) signals and electrooculogram (EOG) signals. The aim of this study is to extract drowsiness-related features from a set of EEG signals and to classify the features into three states: alertness, drowsiness, sleepiness. This paper proposes a drowsiness detection system using Linear Predictive Coding (LPC) coefficients and Support Vector Machine (SVM). Samples of EEG data from each predefined state were used to train the SVM program by using the proposed feature extraction algorithms. The trained SVM program was tested on unclassified EEG data and subsequently reviewed according to manual classification. The classification rate of the proposed system is over 96.5% for only very small number of samples (250ms, 64 samples). Therefore, it can be applied to real driving incident situation that can occur for a split second.
https://doi.org/10.5391/JKIIS.2012.22.6.768 인용 PDF KSCI

Action Recognition with deep network features and dimension reduction

Li, Lijun;Dai, Shuling
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.2
- /
- pp.832-854
- /
- 2019
Action recognition has been studied in computer vision field for years. We present an effective approach to recognize actions using a dimension reduction method, which is applied as a crucial step to reduce the dimensionality of feature descriptors after extracting features. We propose to use sparse matrix and randomized kd-tree to modify it and then propose modified Local Fisher Discriminant Analysis (mLFDA) method which greatly reduces the required memory and accelerate the standard Local Fisher Discriminant Analysis. For feature encoding, we propose a useful encoding method called mix encoding which combines Fisher vector encoding and locality-constrained linear coding to get the final video representations. In order to add more meaningful features to the process of action recognition, the convolutional neural network is utilized and combined with mix encoding to produce the deep network feature. Experimental results show that our algorithm is a competitive method on KTH dataset, HMDB51 dataset and UCF101 dataset when combining all these methods.
https://doi.org/10.3837/tiis.2019.02.019 인용 PDF KSCI HTML

Performance Evaluation of Wavelet-based ECG Compression Algorithms over CDMA Networks (CDMA 네트워크에서의 ECG 압축 알고리즘의 성능 평가)

김병수;유선국
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.53 no.9
- /
- pp.663-669
- /
- 2004
The mobile tole-cardiology system is the new research area that support an ubiquitous health care based on mobile telecommunication networks. Although there are many researches presenting the modeling concepts of a GSM-based mobile telemedical system, practical application needs to be considered both compression performance and error corruption in the mobile environment. This paper evaluates three wavelet ECG compression algorithms over CDMA networks. The three selected methods are Rajoub using EPE thresholding, Embedded Zerotree Wavelet(EZW) and Wavelet transform Higher Order Statistics Coding(WHOSC) with linear prediction. All methodologies protected more significant information using Forward Error Correction coding and measured not only compression performance in noise-free but also error robustness and delay profile in CDMA environment. In addition, from the field test we analyzed the PRD for movement speed and the features of CDMA 1X. The test results show that Rajoub has low robustness over high error attack and EZW contributes to more efficient exploitation in variable bandwidth and high error. WHOSC has high robustness in overall BER but loses performance about particular abnormal ECG.
PDF KSCI

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
- MALSORI
- /
- no.54
- /
- pp.27-43
- /
- 2005
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
PDF

Search Result 55, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)