Search | Korea Science

Rapid Speaker Adaptation Based on Eigenvoice Using Weight Distribution Characteristics (가중치 분포 특성을 이용한 Eigenvoice 기반 고속화자적응)

박종세;김형순;송화전
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.5
- /
- pp.403-407
- /
- 2003
Recently, eigenvoice approach has been widely used for rapid speaker adaptation. However, even in the eigenvoice approach, Performance improvement using very small amount of adaptation data is relatively small in comparison with that using somewhat large adaptation data because the reliable estimation of weights of eigenvoice is difficult. In this paper, we propose a rapid speaker adaptation method based on eigenvoice using the weight distribution characteristics to improve the performance on a small adaptation data. In the Experimental results on vocabulary-independent word recognition task (using PBW 452 database), the weight threshold method alleviates the problem of relatively low performance for a tiny small adaptation data. When single adaptation word is used, word error rate is reduced about 9-18% by the weight threshold method.
PDF KSCI

Efficient Rapid Speaker Adaptation Using Merging Eigenvoices (Eigenvoice 병합을 이용한 효율적인 고속 화자 적응)

Choi Dong-jin;Oh Yung-Hwan
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.115-118
- /
- 2004
음성 인식 분야에서는 화자 적응을 통해 화자 독립 시스템의 성능을 화자 종속 시스템에 근접시키려는 여러 가지 노력이 시도되고 있다. 특히 30 초미만의 매우 적은 양의 적응 자료를 이용하는 고속 화자 적응에 대한 관심이 증가하고 있다. 고속 화자 적응에 적합한 eigenvoice 를 이용한 적응 방법은 eigenvoice 를 구성하기 위해 너무 많은 계산량과 메모리를 요구한다. 본 논문에서는 각각 따로 계산된 eigenvoice 들을 한 번에 구성한 eigenvoice 들과 거의 같은 정확도를 갖도록 병합하여 고속 화자 적응에 이용하는 방법을 제안한다. 이 방법을 이용하면 훈련 자료의 추가시 처음부터 새롭게 eigenvoice 를 구하는 대신 추가된 자료에 대한 eigenvoice 를 구하고 병합함으로써 계산량과 메모리양을 현저히 줄일 수 있다. 실험 결과, 메모리와 계산량은 추가되는 화자 종속 모델의 수에 따라 감소하며 성능 저하는 거의 없었다.
PDF

Performance Improvement of Rapid Speaker Adaptation Using Bias Compensation and Mean of Dimensional Eigenvoice Models (바이어스 보상과 차원별 Eigenvoice 모델 평균을 이용한 고속화자적응의 성능향상)

박종세;김형순;송화전
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.5
- /
- pp.383-389
- /
- 2004
In this paper. we propose the bias compensation methods and the eigenvoice method using the mean of dimensional eigenvoice to improve the performance of rapid speaker adaptation based on eigenvoice under mismatch between training and test environment. Experimental results for vocabulary-independent word recognition task (using PBW 452 DB) show that the proposed methods yield improvements for small adaptation data. We obtained about 22∼30% relative improvement by the bias compensation methods as amount of adaptation data varied from 1 to 50, and obtained 41% relative improvement in error rate by the eigenvoice method using the mean of dimensional eigenvoice with only single adaptation word.
PDF KSCI

Performance Improvement of Fast Speaker Adaptation Based on Dimensional Eigenvoice and Adaptation Mode Selection (차원별 Eigenvoice와 화자적응 모드 선택에 기반한 고속화자적응 성능 향상)

송화전;이윤근;김형순
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.1
- /
- pp.48-53
- /
- 2003
Eigenvoice method is known to be adequate for fast speaker adaptation, but it hardly shows additional improvement with increased amount of adaptation data. In this paper, to deal with this problem, we propose a modified method estimating the weights of eigenvoices in each feature vector dimension. We also propose an adaptation mode selection scheme that one method with higher performance among several adaptation methods is selected according to the amount of adaptation data. We used POW DB to construct the speaker independent model and eigenvoices, and utterances(ranging from 1 to 50) from PBW 452 DB and the remaining 400 utterances were used for adaptation and evaluation, respectively. With the increased amount of adaptation data, proposed dimensional eigenvoice method showed higher performance than both conventional eigenvoice method and MLLR. Up to 26% of word error rate was reduced by the adaptation mode selection between eigenvoice and dimensional eigenvoice methods in comparison with conventional eigenvoice method.
PDF KSCI

Rapid Speaker Adaptation Based on MAPLR with Adaptive Hybrid Priors Estimated from Reference Speakers (참조화자로부터 추정된 적응적 혼성 사전분포를 이용한 MAPLR 고속 화자적응)

Song, Young-Rok;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.6
- /
- pp.315-323
- /
- 2011
This paper proposes two methods of estimating prior distribution to improve the performance of rapid speaker adaptation based on maximum a posteriori linear regression (MAPLR). In general, prior distribution of the transformation matrix used in MAPLR adaptation is estimated from all of the training speakers who are employed to construct the speaker-independent model, and it is applied identically to all new speakers. In this paper, we propose a method in which prior distribution is estimated from a group of reference speakers, selected using adaptation data, so that the acoustic characteristics of the selected reference speakers may be similar to that of the new speaker. Additionally, in MAPLR adaptation with block-diagonal transformation matrix, we propose a method in which the mean matrix and covariance matrix of prior distribution are estimated from two groups of transformation matrices obtained from the same training speakers, respectively. To evaluate the performance of the proposed methods, we examine word accuracy according to the number of adaptation words in the isolated word recognition task. Experimental results show that, for very limited adaptation data, statistically significant performance improvement is obtained in comparison with the conventional MAPLR adaptation.
https://doi.org/10.7776/ASK.2011.30.6.315 인용 PDF KSCI

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.6
- /
- pp.566-571
- /
- 2009
This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.
https://doi.org/10.7776/ASK.2009.28.6.566 인용 PDF KSCI

Stereophonic Acoustic Echo Canceler using Fast Affine Projection Algorithm (고속 Affine Projection 알고리듬을 이용한 스테레오 음향 반향 제거기)

조영민;이원철
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.1
- /
- pp.86-97
- /
- 1998
본 논문은 스테레오 음향 반향 제거기에 적용되는 고속 Affine Projection 알고리듬 을 제안한다. 최근 스테레오 원격 회의 시스템은 보다 현실감 있는 원격 회의를 가능케 하 는 장점으로 인해 많은 관심을 끌고 있다. 그러나, 회의실의 원단화자와 마이크로폰사이의 상호교차(cross-coupling)로 인해 음향 반향이 발생하게 된다. 만약 이 반향 신호가 제거되 지 않은채 수신 룸으로 전달되면 결국 음성 통화 품질이 저하된다. 이를 방지하기 위하여 추정 반향 신호를 만들어 내고 통신 품질의 손실 없이 이 반향을 제거하는 음향 반향 제거 기가 필수적이다. 단 채널 음향 반향 제거기와 다르게 스테레오 환경하에서의 음향 반향 제 거기는 전송실의 환경변화로 인한 성능 저하와 각 반향 경로를 추정하기 위해 사용하는 각 적응 필터의 임펄스응답이 반향 경로와 일치하지 않는 등의 각종 문제점들이 발생하게 된 다. 본 논문에서는 서로 상관관계 없는 입력신호를 만들어내고 전송실의 환경변화로 인한 성능저하를 보완하기 위해 전처리단(pre-processing block)을 제안하여 일반적인 방법에 대 해 3-10dB정도의 향상된 성능을 보이며 적은 계산량으로 빠른 수렴성능을 갖는 새로운 형 태의 스테레오 음향 반향 제거기를 제안한다.
PDF

Development and Evaluation of an Address Input System Employing Speech Recognition (음성인식 기능을 가진 주소입력 시스템의 개발과 평가)

김득수;황철준;정현열
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.2
- /
- pp.3-10
- /
- 1999
This paper describes the development and evaluation of a Korean address input system employing automatic speech recognition technique as user interface for input Korean address. Address consists of cities, provinces and counties. The system works on a window 95 environment of personal computer with built-in soundcard. In the speech recognition part, the Continuous density Hidden Markov Model(CHMM) for making phoneme like units(PLUs) and One Pass Dynamic Programming(OPDP) algorithm is used for recognition. For address recognition, Finite State Automata(FSA) suitable for Korean address structure is constructed. To achieve an acceptable performance against the variation of speakers, microphones, and environmental noises, Maximum a posteriori(MAP) estimation is implemented in adaptation. And to improve the recognition speed, fast search method using variable pruning threshold is newly proposed. In the evaluation tests conducted for the 100 connected words uttered by 3 males the system showed above average 96.0% of recognition accuracy for connected words after adaption and recognition speed within 2 seconds, showing the effectiveness of the system.
PDF

Fast Speaker Adaptation Using Sub-Stream Based Eigenvoice (Sub-Stream 기반의 Eigenvoice를 이용한 고속 화자적응)

Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
- MALSORI
- /
- v.55
- /
- pp.93-102
- /
- 2005
In this paper, sub-stream based eigenvoice method is proposed to overcome the weak points of conventional eigenvoice and dimensional eigenvoice. In the proposed method, sub-streams are automatically constructed by the statistical clustering analysis that uses the correlation information between dimensions. To obtain the reliable distance matrix from covariance matrix for dividing into optimal sub-streams, MAP adaptation technique is employed to the covariance matrix of training data and the sample covariance of adaptation data. According to our experiments, the proposed method shows $41\%$ error rate reduction when the number of adaptation data is 50.
PDF

Fast Speaker Adaptation Based on Eigenspace-based MLLR Using Artificially Distorted Speech in Car Noise Environment (차량 잡음 환경에서 인위적 왜곡 음성을 이용한 Eigenspace-based MLLR에 기반한 고속 화자 적응)

Song, Hwa-Jeon;Jeon, Hyung-Bae;Kim, Hyung-Soon
- Phonetics and Speech Sciences
- /
- v.1 no.4
- /
- pp.119-125
- /
- 2009
This paper proposes fast speaker adaptation method using artificially distorted speech in telematics terminal under the car noise environment based on eigenspace-based maximum likelihood linear regression (ES-MLLR). The artificially distorted speech is built from adding the various car noise signals collected from a driving car to the speech signal collected from an idling car. Then, in every environment, the transformation matrix is estimated by ES-MLLR using the artificially distorted speech corresponding to the specific noise environment. In test mode, an online model is built by weighted sum of the environment transformation matrices depending on the driving condition. In 3k-word recognition task in the telematics terminal, we achieve a performance superior to ES-MLLR even using the adaptation data collected from the driving condition.
PDF

Search Result 15, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)