• Title/Summary/Keyword: 정규화 코드북

Search Result 10, Processing Time 0.039 seconds

A Method of Adaptive ISF Split Vector Quantization Using Normalized Codebook (정규화 코드북을 이용한 분할 벡터 구조의 ISF 적응적 양자화 기법)

  • Piao, Zhigang;Lim, Jong-Ha;Hong, Gi-Bong;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.5
    • /
    • pp.265-272
    • /
    • 2011
  • In most of the ISF (or LSF) based real time speech codec, SVQ (split vector quantization) method is used to decrease the quantizer complexity and memory size of codebook. However, it produces drawback that the level of correlation between code vectors can not be used during vector splits. This paper presents a new method of adaptive ISF vector quantization, which compensates the drawbacks of SVQ structured quantizer for wideband speech codec. In each different frame, the proposed method makes use of the correlation between splitted vectors by adaptively changing codebook distribution according to ordering property of ISF. The algorithm is evaluated in AMR-WB, and shows about 1.5 bit per frame improvement.

Speech Recognition Imptovement Using Extraction Selective Observation in DHMM (선별적인 관측열 추출을 통한 DHMM 음성인식의 성능 개선)

  • 김우창;조선호;고수정;이정현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.374-376
    • /
    • 2000
  • 음성인식 시스템에 사용하는 알고리즘 중에 하나인 DHMM은 코드북을 이용하여 음성의 프레임들에 대한 특징을 관측열로 추출하여 음성의 패턴에 대한 훈련과 인식을 수행하게 된다. 그러나 음성은 유성음과 무성음의 특징 차이가 많이 나게 되므로 하나의 코드북을 이용하게 되면 코드북 오차에 의하여 성질이 전혀 다른 코드북 인덱스를 DHMM의 관측열로 사용하게 된다. 본 논문에서는 음성의 유성음과 무성음에 대한 선별적인 작업을 통해 서로 다른 코드북을 만들어 관측열을 추출하고 선행 관측과 현 관측과의 거리 비교 연산을 통하여 관측의 시간축을 정규화한 관측열을 음성인식에 사용하였다. 본 논문에서 제시하는 인식 방법을 사용하여 실험한 결과, 기존의 인식 방법보다 5.33% 향상된 결과를 얻었다.

  • PDF

Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition (연속음성 인식기를 위한 벡터양자화기 기반의 화자정규화)

  • Shin Ok-keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.583-589
    • /
    • 2004
  • Proposed is a speaker normalization method based on vector quantizer for continuous speech recognition (CSR) system in which no acoustic information is made use of. The proposed method, which is an improvement of the previously reported speaker normalization scheme for a simple digit recognizer, builds up a canonical codebook by iteratively training the codebook while the size of codebook is increased after each iteration from a relatively small initial size. Once the codebook established, the warp factors of speakers are estimated by comparing exhaustively the warped versions of each speaker's utterance with the codebook. Two sets of phones are used to estimate the warp factors: one, a set of vowels only. and the other, a set composed of all the Phonemes. A Piecewise linear warping function which corresponds to the estimated warp factor is adopted to warp the power spectrum of the utterance. Then the warped feature vectors are extracted to be used to train and to test the speech recognizer. The effectiveness of the proposed method is investigated by a set of recognition experiments using the TIMIT corpus and HTK speech recognition tool kit. The experimental results showed comparable recognition rate improvement with the formant based warping method.

Quantization Based Speaker Normalization for DHMM Speech Recognition System (DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.299-307
    • /
    • 2003
  • There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.

A Method For Improvement Of Split Vector Quantization Of The ISF Parameters Using Adaptive Extended Codebook (적응적인 확장된 코드북을 이용한 분할 벡터 양자화기 구조의 ISF 양자화기 개선)

  • Lim, Jong-Ha;Jeong, Gyu-Hyeok;Hong, Gi-Bong;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.1
    • /
    • pp.1-8
    • /
    • 2011
  • This paper presents a method for improving the performance of ISF coefficients quantizer through compensating the defect of the split structure vector quantization using the ordering property of ISF coefficients. And design the ISF coefficients quantizer for wideband speech codec using proposed method. The wideband speech codec uses split structure vector quantizer which could not use the correlation between ISF coefficients fully to reduce complexity and the size of codebook. The proposed algorithm uses the ordering property of ISF coefficients to overcome the defect. Using the ordering property, the codebook redundancy could be figured out. The codebook redundancy is replaced by the adaptive-extended codebook to improve the performance of the quantizer through using the ordering property, ISF coefficient prediction and interpolation of existing codebook. As a result, the proposed algorithm shows that the adaptive-extended codebook algorithm could get about 2 bit gains in comparison with the existing split structure ISF quantizer of AMR-WB (G.722.2) in the points of spectral distortion.

Wavelet Lifting based ECG Signal Compression Using Multi-Stage Vector Quantization (다단계 벡터 양자화를 이용한 웨이브렛 리프팅 기반 ECG 압축)

  • Park, Seo-Young;Jeong, Gyu-Hyeok;Kim, Young-Ju;Lee, In-Sung;Joo, Gi-Ho
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.43 no.6 s.312
    • /
    • pp.76-82
    • /
    • 2006
  • In this paper, the biomedical signal compression method, which is combined with the multi-stage vector quantization and wavelet lifting scheme, is proposed. It utilizes the property of wavelet coefficients that give emphasis on approximation coefficients. The transmitted codebook index consists of the code vectors obtained by wavelet lifting coefficients of ECG and error signals from the 1024 block length, respectively. Each codebook is adaptively updated by the method comparing to the distance of input codevectors with candidate codevectors by using an pre-defined threshold value. The proposed compression method showed blow 3% in term of PRD and 276.62 bits/sec in term of CDR.

Speech Recognition Using MSVQ/TDRNN (MSVQ/TDRNN을 이용한 음성인식)

  • Kim, Sung-Suk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.4
    • /
    • pp.268-272
    • /
    • 2014
  • This paper presents a method for speech recognition using multi-section vector-quantization (MSVQ) and time-delay recurrent neural network (TDTNN). The MSVQ generates the codebook with normalized uniform sections of voice signal, and the TDRNN performs the speech recognition using the MSVQ codebook. The TDRNN is a time-delay recurrent neural network classifier with two different representations of dynamic context: the time-delayed input nodes represent local dynamic context, while the recursive nodes are able to represent long-term dynamic context of voice signal. The cepstral PLP coefficients were used as speech features. In the speech recognition experiments, the MSVQ/TDRNN speech recognizer shows 97.9 % word recognition rate for speaker independent recognition.

Isolated-Word Speech Recognition using Variable-Frame Length Normalization (가변프레임 길이정규화를 이용한 단어음성인식)

  • Sin, Chan-Hu;Lee, Hui-Jeong;Park, Byeong-Cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.6 no.4
    • /
    • pp.21-30
    • /
    • 1987
  • Length normalization by variable frame size is proposed as a novel approach to length normalization to solve the problem that the length variation of spoken word results in a lowing of recognition accuracy. This method has the advantage of curtailment of recognition time in the recognition stage because it can reduce the number of frames constructing a word compared with length normalization by a fixed frame size. In this paper, variable frame length normalization is applied to multisection vector quantization and the efficiency of this method is estimated in the view of recognition time and accuracy through practical recognition experiments.

  • PDF

RGB-D Image Feature Point Extraction and Description Method for 3D Object Recognition (3차원 객체 인식을 위한 RGB-D 영상 특징점 추출 및 특징 기술자 생성 방법)

  • Park, Noh-Young;Jang, Young-Kyoon;Woo, Woon-Tack
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.448-450
    • /
    • 2012
  • 본 논문에서는 Kinect 방식의 RGB-D 영상센서를 사용하여, 깊이(Depth) 영상으로부터 3차원 객체의 기하정보를 표현하는 표면 정규 벡터(Surface Normal Vector)를 추출하고, 그 결과를 영상화하는 방법을 제안하며, 제안된 방법으로 생성된 영상으로부터 깊이 영상의 특징점 및 특징 기술자를 추출하여 3차원 객체 인식 성능을 향상시키는 방법을 제안한다. 또한 생성된 RGB-D 특징 기술자들을 객체 단위로 구분 가능한 코드북(CodeBook) 학습을 통한 인식방법을 제안하여 객체의 인식 성능을 높이는 방법을 제안한다. 제안하는 RGB-D 기반의 특징 추출 및 학습 방법은 텍스쳐 유무, 카메라 회전 및 이동 변화 등의 환경변화에 강건함을 실험적으로 증명하였으며, 이 방법은 Kinect 방식의 RGB-D 영상을 사용하는 3차원 객체/공간 인식 및 추적, 혹은 이를 응용하는 증강현실 시스템에 적용하여 사용될 수 있다.

Performance of Space Time Block Coded-Spatial Multiplexing Systems in Limited Feedback Channel (제한된 귀환채널에서 시공간블록부호화를 적용한 다중화 시스템의 성능)

  • Hwang, Hyeon-Chyeol;Shin, Seung-Hoon;Lim, Jong-Kyoung;Kim, Seok-Ho;Kwak, Kyung-Sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.9A
    • /
    • pp.772-780
    • /
    • 2005
  • In this paper, an efficient pre-processing in space tine block coded-spatial multiplexing systems is presented. The pre-processing scheme is designed empirically with extending the diagonally weighted orthogonal space time-block coded diversity system to spatial multiplexing system. Simulation results show the proposed scheme outperforms both the precoder using the predefued codebooks and typical antenna selection scheme over moderate doppler frequency in limited feedback channel.