Search | Korea Science

Speaker Recognition using LPC cepstrum Coefficients and Neural Network (LPC 켑스트럼 계수와 신경회로망을 사용한 화자인식)

Choi, Jae-Seung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.15 no.12
- /
- pp.2521-2526
- /
- 2011
This paper proposes a speaker recognition algorithm using a perceptron neural network and LPC (Linear Predictive Coding) cepstrum coefficients. The proposed algorithm first detects the voiced sections at each frame. Then, the LPC cepstrum coefficients which have speaker characteristics are obtained by the linear predictive analysis for the detected voiced sections. To classify the obtained LPC cepstrum coefficients, a neural network is trained using the LPC cepstrum coefficients. In this experiment, the performance of the proposed algorithm was evaluated using the speech recognition rates based on the LPC cepstrum coefficients and the neural network.
https://doi.org/10.6109/jkiice.2011.15.12.2521 인용 PDF KSCI

A LSF Quantizer for the Wideband Speech Using the Predictive VQ-Pyramid VQ (예측 VQ-Pyramid VQ를 이용한 광대역 음성용 LSF 양자학기 설계)

이강은;이인성;강상원
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.4
- /
- pp.333-339
- /
- 2004
This Paper proposes the vector quantizer-pyramid vector quantizer(VQ-PVQ) structure. Also both predictive structure and safety-net concept are combined into the VQ-PVQ to quantize the IPC parameter of wideband speech codec. The Performance is compared to the LPC vector quantizer used in the AMR-WB(ITU-T G.722.2). demonstrating reduction in both spectral distortion and encoding memory.
PDF KSCI

Design of a Lossless Audio Coding Using Cholesky Decomposition and Golomb-Rice Coding (콜레스키 분해와 골롬-라이스 부호화를 이용한 무손실 오디오 부호화기 설계)

Cheong, Cheon-Dae;Shin, Jae-Ho
- Journal of Korea Multimedia Society
- /
- v.11 no.11
- /
- pp.1480-1490
- /
- 2008
Design of a linear predictor and matching of an entropy coder is the art of lossless audio coding. In this paper, we use the covariance method and the Choleskey decomposition for calculating linear prediction coefficients instead of the autocorreation method and the Levinson-Durbin recursion. These results are compared to the polynomial predictor. Both of them, the predictor which has small prediction error is selected. For the entropy coding, we use the Golomb-Rice coder using the block-based parameter estimation method and the sequential adaptation method with LOCO-land RLGR. The proposed predictor and the block-based parameter estimation have $2.2879%{\sim}0.3413%$ improved compression ratios compared to FLAC lossless audio coder which use the autocorrelation method and the Levinson-Durbin recursion. The proposed predictor and the LOCO-I adaptation method could improved by $2.2879%{\sim}0.3413%$. But the proposed predictor and the RLGR adaptation method got better results with specific signals.
PDF

Non-linear Predictive Method using Simplified Morphological Polynomial Transform and Morphological Interpolation (간략화된 형상학적 다항식 변환과 형상학적 보간을 이용한 배설형 예측 방법)

김수현;한헌수;홍민철;차형태
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2002.11a
- /
- pp.81-84
- /
- 2002
본 논문에서는 간략화 된 형상학적 다항식 변환(Morphological Polynomial Transform)과 형상학적 보간법(Morphological Interpolation)을 이용하는 비선형 예측 방법을 제안한다. 형상학적 다항식 변환은 형상학적 연산을 통해 데이터를 구조함수들의 계수들로 표현하는 변환이며, 형상학적 보간법은 형상학적 다항식 변환에 의한 계수들을 이용하여 보간하는 방법이다. 형상학적 다항식 변환을 간략화 하여 정수 연산만으로 적용할 수 있도록 개선하였으며, 보다 영상에 적합한 형상학적 보간법에 기반 한 예측 방법을 사용한다. 제안하는 예측 방법과 허프만 부호화를 사용하여 적은 비트로 영상을 손실 없이 저장할 수 있음을 실험으로 검증한다.
PDF

Speech Recognition Using Formant Bandwidth Normalization (포만트 밴드폭 정규화를 이용한 음성인식)

홍종진;강석건;박군작;박규태
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.16 no.5
- /
- pp.458-467
- /
- 1991
In this paper, the cause of linear prediction error is analysed and the theoretical basis for nomalizing the format bandwidth to 0is given and its validity is verified. The formant and bandwidth in relation to the position of the poles of AR filter are measured for an alaysis of the relation between the pole position and the formant bandwidth. By changing the glottis reflection coefficient to 1. the pole position and the formant bandwidth. By changing the glottis reflection coefficient to 1. the effect of the glottis is eliminated and as the result a new linear preiction coefficients are obtained by normalizing the formant bandwidth of the signal to 0. since these coefficients are symmetrical, the standard deviation is larger than the coefficients with fixed glottis reflection coefficient. The bit rate for speech coding can be reduced by a factor of 2 without any loss of information. Through computer simulation, recognition rate of 96.7% is botained by using the proposed algorithm in recognizing 5 Korean vowels in noisy environment.
PDF

Least Squares Based Adaptive Motion Vector Prediction Algorithm for Video Coding (동영상 압축 방식을 위한 최소 자승 기반 적응 움직임 벡터 예측 알고리즘)

Kim, Ji-hee;Jeong, Jong-woo;Hong, Min-Cheol
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.9C
- /
- pp.1330-1336
- /
- 2004
This paper addresses an adaptive motion vector prediction algorithm to improve the performance of video encoder. The block-based motion vector is characterized by non-stationary local statistics so that the coefficients of LS (Least Squares) based linear motion can be optimized. However, it requires very expensive computational cost. The proposed algorithm using LS approach with spatially varying motion-directed property adaptively controls the coefficients of the motion predictor and reduces the computational cost as well as the motion prediction error. Experimental results show the capability of the proposed algorithm.
PDF KSCI

Estimation of Weight Coefficients of Residual DPCM based on L1 Regularization in HEVC Format Range Extension (HEVC 확장 표준 내 Residual DPCM 을 위한 L₁ 정규화 기반의 가중 계수 추정 기법)

Ryu, Su-Kyung;Kang, Je-Won
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2016.06a
- /
- pp.373-374
- /
- 2016
Residual Differnetial pulse-code Modulation (RDPCM) 기법은 비디오의 압축을 위한 시간 및 공간 예측 후 남은 잔여신호를 인접 화소를 이용하여 추가적인 중복정보를 제거하는 기법을 의미한다. 본 논문에서는 우선 잔차 신호의 예측을 위하여 인접 화소 사이 선형 가중 합으로 예측 모델을 세우고, 각 가중치를 $L_1$ 정규화를 포함하는 비용함수를 통해 추정함으로써 보다 효율적인 부호화 성능을 제공하는 알고리즘을 제안한다.
PDF

Quality Improvement of Low Bitrate HE-AAC using Linear Prediction Pre-processor (저 전송률 환경에서 선형예측 전처리기를 사용한 HE-AAC의 성능 향상)

Lee, Jae-Seong;Lee, Gun-Woo;Park, Young-Chul;Youn, Dae-Hee
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.34 no.8C
- /
- pp.822-829
- /
- 2009
This paper proposes a new method of improving the quality of High Efficiency Advanced Audio Coding (HE-AAC). HE-AAC encodes input source by allocating bits for each scalefactor bands appropriately according to human ear's psychoacoustic property. As a result, insufficient bits are assigned to the bands which have relatively low energy. This imbalance between different energy bands can cause decreasing of sound quality like musical noise. In the proposed system, a Linear Prediction (LP) module is combined with HE-AAC as a pre-processor to improve sound quality by even bits distribution. To apply accurate human being's psychoacoustic property, the psychoacoustic model uses Fast Fourier Transform (FFT) spectrum of original input signal to make masking threshold. In its implementation, masking threshold of psychoacoustic model is normalized using the LP spectral envelope in prior to quantization of the LP residual. Experimental result shows that, the proposed algorithm allocates bits appropriately for insufficient bits condition and improves the performance of HE-AAC.
PDF KSCI

PSNR Comparison of DCT-domain Image Resizing Methods (DCT 영역 영상 크기 조절 방법들에 대한 PSNR 비교)

Kim Do nyeon;Choi Yoon sik
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.10C
- /
- pp.1484-1489
- /
- 2004
Given a video frame in terms of its 8${\times}$8 block-DCT coefncients, we wish to obtain a downsized or upsized version of this Dame also in terms of 8${\times}$8 block DCT coefficients. The DCT being a linear unitary transform is distributive over matrix multiplication. This fact has been used for downsampling video frames in the DCT domains in Dugad's, Mukherjee's, and Park's methods. The downsampling and upsampling schemes combined together preserve all the low-frequency DCT coefficients of the original image. This implies tremendous savings for coding the difference between the original frame (unsampled image) and its prediction (the upsampled image).This is desirable for many applications based on scalable encoding of video. In this paper, we extend the earlier works to various DCT sizes, when we downsample and then upsample of an image by a factor of two. Through experiment, we could improve the PSM values whenever we increase the DCT block size. However, because the complexity will be also increase, we can say there is a tradeoff. The experiment result would provide important data for developing fast algorithms of compressed-domain image/video resizing.
PDF KSCI

2.4kbps Speech Coding Algorithm Using the Sinusoidal Model (정현파 모델을 이용한 2.4kbps 음성부호화 알고리즘)

백성기;배건성
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.27 no.3A
- /
- pp.196-204
- /
- 2002
The Sinusoidal Transform Coding(STC) is a vocoding scheme based on a sinusoidal model of a speech signal. The low bit-rate speech coding based on sinusoidal model is a method that models and synthesizes speech with fundamental frequency and its harmonic elements, spectral envelope and phase in the frequency region. In this paper, we propose the 2.4kbps low-rate speech coding algorithm using the sinusoidal model of a speech signal. In the proposed coder, the pitch frequency is estimated by choosing the frequency that makes least mean squared error between synthetic speech with all spectrum peaks and speech synthesized with chosen frequency and its harmonics. The spectral envelope is estimated using SEEVOC(Spectral Envelope Estimation VOCoder) algorithm and the discrete all-pole model. The phase information is obtained using the time of pitch pulse occurrence, i.e., the onset time, as well as the phase of the vocal tract system. Experimental results show that the synthetic speech preserves both the formant and phase information of the original speech very well. The performance of the coder has been evaluated in terms of the MOS test based on informal listening tests, and it achieved over the MOS score of 3.1.
PDF KSCI

Search Result 10, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)