Search | Korea Science

Acoustic Model Transformation Method for Speech Recognition Employing Gaussian Mixture Model Adaptation Using Untranscribed Speech Database (미전사 음성 데이터베이스를 이용한 가우시안 혼합 모델 적응 기반의 음성 인식용 음향 모델 변환 기법)

Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.19 no.5
- /
- pp.1047-1054
- /
- 2015
This paper presents an acoustic model transform method using untranscribed speech database for improved speech recognition. In the presented model transform method, an adapted GMM is obtained by employing the conventional adaptation method, and the most similar Gaussian component is selected from the adapted GMM. The bias vector between the mean vectors of the clean GMM and the adapted GMM is used for updating the mean vector of HMM. The presented GAMT combined with MAP or MLLR brings improved speech recognition performance in car noise and speech babble conditions, compared to singly-used MAP or MLLR respectively. The experimental results show that the presented model transform method effectively utilizes untranscribed speech database for acoustic model adaptation in order to increase speech recognition accuracy.
https://doi.org/10.6109/jkiice.2015.19.5.1047 인용 PDF KSCI KPUBS HTML

Performance Comparison and Duration Model Improvement of Speaker Adaptation Methods in HMM-based Korean Speech Synthesis (HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선)

Lee, Hea-Min;Kim, Hyung-Soon
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.111-117
- /
- 2012
In this paper, we compare the performance of several speaker adaptation methods for a HMM-based Korean speech synthesis system with small amounts of adaptation data. According to objective and subjective evaluations, a hybrid method of constrained structural maximum a posteriori linear regression (CSMAPLR) and maximum a posteriori (MAP) adaptation shows better performance than other methods, when only five minutes of adaptation data are available for the target speaker. During the objective evaluation, we find that the duration models are insufficiently adapted to the target speaker as the spectral envelope and pitch models. To alleviate the problem, we propose the duration rectification method and the duration interpolation method. Both the objective and subjective evaluations reveal that the incorporation of the proposed two methods into the conventional speaker adaptation method is effective in improving the performance of the duration model adaptation.
https://doi.org/10.13064/KSSS.2012.4.3.111 인용 PDF

Emotion recognition in speech using hidden Markov model (은닉 마르코프 모델을 이용한 음성에서의 감정인식)

김성일;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.3 no.3
- /
- pp.21-26
- /
- 2002
This paper presents the new approach of identifying human emotional states such as anger, happiness, normal, sadness, or surprise. This is accomplished by using discrete duration continuous hidden Markov models(DDCHMM). For this, the emotional feature parameters are first defined from input speech signals. In this study, we used prosodic parameters such as pitch signals, energy, and their each derivative, which were then trained by HMM for recognition. Speaker adapted emotional models based on maximum a posteriori(MAP) estimation were also considered for speaker adaptation. As results, the simulation performance showed that the recognition rates of vocal emotion gradually increased with an increase of adaptation sample number.
PDF

Improvement of the Environmental Conservation Value Assessment Map (ECVAM) by Complement of the Vegetation Community Stability Item (식생 군집구조 안정성 평가항목 보완을 통한 국토환경성평가지도 개선방안 연구)

Jeon, Seong-Woo;Song, Won-Kyong;Lee, Moung-Jin;Kang, Byung-Jin
- Journal of the Korean Society of Environmental Restoration Technology
- /
- v.13 no.2
- /
- pp.114-123
- /
- 2010
The Environmental Conservation Value Assessment Map (ECVAM) is a five grade assessment map created with nationally integrated environmental information and environmental values. The map is made through the evaluation of 67 items, including greenbelt area and bio-diversity. The ECVAM assesses the stability of the community using forest maps. However, the existing assessment method is problematic because the assessment grades are evaluated using higher than practical values; in part because it uses even-valued overlay and minimal indicator methods. This study was performed in order to suggest an integrated assessment method that could complement the stability evaluation based on existing methods. Accordingly, this study added forest type information, including whether the forest was natural or artificial, to the overlay method using forest diameter maps and forest density maps. As a result, the proposed ECVAM indicated a drastic grade change. After applying the method in South Korea, Grade I areas decreased 12.1%, from 52.6% to 40.6%, Grade II areas increased 11.9%, from 17.4% to 29.2%, and Grade III areas increased 0.2%, from 17.1% to 17.4%, respectively. From the results of the field survey, we found differences between natural forest and planted forest with regard to the number of mortality, species of shrubs, and vine cover. This means that natural forests are more stable than planted forests. This study suggests an improved assessment methodology to complement the existing EVCAM method. The results are expected to be used in environmental evaluations and forest conservation value assessments in ecology and environmental fields.
PDF KSCI

Flexible Speaker Adaptation Reflecting the Quality of Adaptation Data (Adaptation Data의 Quality를 고려한 강인한 화자 적응)

Pyo Hyun-A;Kim Se-Hyun;Oh Yung-Hwan
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.37-40
- /
- 2002
최근 음성 인식 시스템의 성능 향상을 위해 화자 적응(speaker adaptation)에 대한 연구가 활발히 진행되고 있다. HMM 기반 인식 시스템의 모델 파라미터를 수정하는 화자 적응의 경우, MAP 방법과 MLLR 방법에 대한 연구가 주류를 이루고 있다. 두 방법은 adaptation data의 양에 따라서 서로 다른 성능을 보인다. 본 논문에서는 adaptation data의 quality를 정의하고, 이를 기존 두 방법의 가중치로 이용하여 화자 적응을 수행하는 방법을 제안한다. 제안한 방법을 KAIST 통신연구실에서 구축한 한국어 도시이름 500단어 인식 시스템에 적용하여 성능을 개선하였다.
PDF

Fast Speaker Adaptation Using Sub-Stream Based Eigenvoice (Sub-Stream 기반의 Eigenvoice를 이용한 고속 화자적응)

Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
- MALSORI
- /
- v.55
- /
- pp.93-102
- /
- 2005
In this paper, sub-stream based eigenvoice method is proposed to overcome the weak points of conventional eigenvoice and dimensional eigenvoice. In the proposed method, sub-streams are automatically constructed by the statistical clustering analysis that uses the correlation information between dimensions. To obtain the reliable distance matrix from covariance matrix for dividing into optimal sub-streams, MAP adaptation technique is employed to the covariance matrix of training data and the sample covariance of adaptation data. According to our experiments, the proposed method shows $41\%$ error rate reduction when the number of adaptation data is 50.
PDF

Adaptation of Wavelet Algorithm for Obtaining a Human Brain's Function Map (뇌의 기능적 영역 추출을 위한 Wavelet 변환 알고리즘의 적용)

이상민;장두봉;김동희;김광열;이건기;신태민
- Proceedings of the IEEK Conference
- /
- 2001.06e
- /
- pp.203-206
- /
- 2001
The fMRI which can express the function of brain as MR image is now being studied. The study on the functional image has usually been performed with the MRI in 4 tesla class in goneral, but if gradient echo imaging method could be used, it might make the most of what it has with the MRI in 1.5 tesla class. However, the lack of adequate image post-processing software prevents it from being used as widely as it could be. For the image post-processing algorithm of the functional image, subtraction method and several statistical methods are used with continuous introduction of new method recently. In this paper, we suggest adaptation of wavelet algorithm for obtaining a more reliable brain function map.
PDF

A Study on Speaker Adaptation of Large Continuous Spoken Language Using back-off bigram (Back-off bigram을 이랑한 대용량 연속어의 화자적응에 관한 연구)

최학윤
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.28 no.9C
- /
- pp.884-890
- /
- 2003
In this paper, we studied the speaker adaptation methods that improve the speaker independent recognition system. For the independent speakers, we compared the results between bigram and back-off bigram, MAP and MLLR. Cause back-off bigram applys unigram and back-off weighted value as bigram probability value, it has the effect adding little weighted value to bigram probability value. We did an experiment using total 39-feature vectors as featuring voice parameter with 12-MFCC, log energy and their delta and delta-delta parameter. For this recognition experiment, We constructed a system made by CHMM and tri-phones recognition unit and bigram and back-off bigrams language model.
PDF KSCI

A Study on the Speaker Adaptation in CDHMM (CDHMM의 화자적응에 관한 연구)

Kim, Gwang-Tae
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.39 no.2
- /
- pp.116-127
- /
- 2002
A new approach to improve the speaker adaptation algorithm by means of the variable number of observation density functions for CDHMM speech recognizer has been proposed. The proposed method uses the observation density function with more than one mixture in each state to represent speech characteristics in detail. The number of mixtures in each state is determined by the number of frames and the determinant of the variance, respectively. The each MAP Parameter is extracted in every mixture determined by these two methods. In addition, the state segmentation method requiring speaker adaptation can segment the adapting speech more Precisely by using speaker-independent model trained from sufficient database as a priori knowledge. And the state duration distribution is used lot adapting the speech duration information owing to speaker's utterance habit and speed. The recognition rate of the proposed methods are significantly higher than that of the conventional method using one mixture in each state.
PDF KSCI

Self-Organizing Feature Map with Constant Learning Rate and Binary Reinforcement (일정 학습계수와 이진 강화함수를 가진 자기 조직화 형상지도 신경회로망)

조성원;석진욱
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.1
- /
- pp.180-188
- /
- 1995
A modified Kohonen's self-organizing feature map (SOFM) algorithm which has binary reinforcement function and a constant learning rate is proposed. In contrast to the time-varing adaptaion gain of the original Kohonen's SOFM algorithm, the proposed algorithm uses a constant adaptation gain, and adds a binary reinforcement function in order to compensate for the lowered learning ability of SOFM due to the constant learning rate. Since the proposed algorithm does not have the complicated multiplication, it's digital hardware implementation is much easier than that of the original SOFM.
PDF

Search Result 103, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)