Search | Korea Science

Vocal Tract Length Normalization for Speech Recognition (음성인식을 위한 성도 길이 정규화)

지상문
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.7 no.7
- /
- pp.1380-1386
- /
- 2003
Speech recognition performance is degraded by the variation in vocal tract length among speakers. In this paper, we have used a vocal tract length normalization method wherein the frequency axis of the short-time spectrum associated with a speaker's speech is scaled to minimize the effects of speaker's vocal tract length on the speech recognition performance In order to normalize vocal tract length, we tried several frequency warping functions such as linear and piece-wise linear function. Variable interval piece-wise linear warping function is proposed to effectively model the variation of frequency axis scale due to the large variation of vocal tract length. Experimental results on TIDIGITS connected digits showed the dramatic reduction of word error rates from 2.15% to 0.53% by the proposed vocal tract normalization.
PDF KSCI

Comparison of Adult and Child's Speech Recognition of Korean (한국어에서의 성인과 유아의 음성 인식 비교)

Yoo, Jae-Kwon;Lee, Kyoung-Mi
- The Journal of the Korea Contents Association
- /
- v.11 no.5
- /
- pp.138-147
- /
- 2011
While most Korean speech databases are developed for adults' speech, not for children's speech, there are various children's speech databases based on other languages. Because there are wide differences between children's and adults' speech in acoustic and linguistic characteristics, the children's speech database needs to be developed. In this paper, to find the differences between them in Korean, we built speech recognizers using HMM and tested them according to gender, age, and the presence of VTLN(Vocal Tract Length Normalization). This paper shows the speech recognizer made by children's speech has a much higher recognition rate than that made by adults' speech and using VTLN helps to improve the recognition rate in Korean.
https://doi.org/10.5392/JKCA.2011.11.5.138 인용 PDF KSCI

A Study on the Document Length Normalization of Extended Vector Model Using the Information of Location (위치 정보를 이용한 확장 벡터 모델의 문서 길의 정규화에 관한 연구)

Kim, Kwang-Young;Seo, Jerry;Lee, Min-Ho;Joo, Won-Kyun;Jeong, Chang-Hoo;You, Beom-Jong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2003.05c
- /
- pp.1623-1626
- /
- 2003
인터넷의 발달과 인터넷 이용자수의 급격한 증가로 정보 검색 시스템의 필요성이 커지고 있다. 또한 대용량의 문서에서 사용자가 원하는 정보를 정확하게 찾기가 점점 어려워지고 있다. 현재 대부분의 검색 시스템들은 문서 길이에 대한 정규화를 처리하고 있다. 현재 문서 길이 정보도 검색 시스템의 검색성능에 기여를 하고 있다. 일반적으로 TREC이나 HANTEC2.0을 이용한 검색 성능 평가를 했을 때문서 길의 정규화를 하지 않는 것보다 한 것이 우수한 성능을 보여 주고 있다. 본 논문에서는 KISTAL2000을 이용하여 위치 정보를 사용하여 문서 길의 정규화 방법에 제시하고 이에 대한 실험하였다.
PDF

Normalized Mean Field Annealing Algorithm for Module Orientation Problem (모듈 방향 결정 문제 해결을 위한 정규화된 평균장 어닐링 알고리즘)

Chong, Kyun-Rak
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.12
- /
- pp.988-995
- /
- 2000
각 모듈들의 위치가 배치 알고리즘에 의해 결정된 후에도 모듈들을 종축 또는 횡축을 중심으로 뒤집거나 회전시킴으로써 회로의 효율성과 연결성을 향상시킬 수 있다. 고집적 회로설계의 한 단계인 모듈방향 결정 문제는 모듈간에 연결된 선의 길이의 합이 최소가 되도록 각 모듈의 방향을 결정하는 문제이다. 최근에 평균장 어닐링 방법이 조합적 최적화 문제에 사용되어 좋은 결과를 보여 주고 있다. 평균장 어닐링은 신경회로망의 따른 수렴 특성과 시뮬레이티드 어닐링의 우수한 해를 생성하는 특성이 결합된 방법이다. 본 논문에서는 정규화된 평균장 어닐링을 사용해서 모듈 방향 결정 문제를 해결하였고 실험을 통해 기존의 Hopfield 네트워크 방법과 시뮬레이티드 어닐링과 그 결과를 비교하였다. 시뮬레이티드 어닐링, 정규화된 평균장 어닐링과 Hopfield 네트워크의 총 길이 감소율은 각각 19.86%, 19.85%, 19.03%였으며, 정규화된 평균장 어닐링의 실행 시간은 Hopfield 네트워크보다는 1.1배, 시뮬레이티드 어닐링보다는 11.4배 정도 빨랐다.
PDF

An Index Interpolation-based Subsequence Matching Algorithm supporting Normalization Transform in Time-Series Databases (시계열 데이터베이스에서 인덱스 보간법을 기반으로 정규화 변환을 지원하는 서브시퀀스 매칭 알고리즘)

No, Ung-Gi;Kim, Sang-Uk;Hwang, Gyu-Yeong
- Journal of KIISE:Databases
- /
- v.28 no.2
- /
- pp.217-232
- /
- 2001
본 논문에서는 시계열 데이터베이스에서 정규화 변환을 지원하는 서브시퀀스 매칭 알고리즘을 제안한다. 정규화 변환을 시계열 데이터 간의 절대적인 유클리드 거리에 관계 없이, 구성하는 값들의 상대적인 변화 추이가 유사한 패턴을 갖는 시계열 데이터를 검색하는 데에 유용하다. 기존의 서브시퀀스 매칭 알고리즘을 확장 없이 정규화 변환 서브시퀀스 매칭에 단순히 응용할 경우, 질의 결과로 반환되어야 할 서부시퀀스를 모두 찾아내지 못하는 착오 기각이 발생한다. 또한, 정규화 변환을 지원하는 기존의 전체 매칭 알고리즘의 경우, 모든 가능한 질의 시퀀스 길이 각각에 대하여 하나씩의 인덱스를 생성하여야 하므로, 저장 공간 및 데이터 시퀀스 삽입/삭제의 부담이 매우 심각하다. 본 논문에서는 인덱스 보간법을 이용하여 문제를 해결한다. 인덱스 보간법은 인덱스가 요구되는 모든 경우 중에서 적당한 간격의 일부에 대해서만 생성된 인덱스를 이용하며, 인덱스가 필요한 모든 경우에 대한 탐색을 수행하는 기법이다. 제안된 알고리즘은 몇 개의 질의 시퀀스 길이에 대해서만 각각 인덱스를 생성한 후, 이를 이용하여 모든 가능한 길이의 질의 시퀀스에 대해서 탐색을 수행한다. 이때, 착오 기각이 발생하지 않음을 증명한다. 제안된 알고리즘은 질의 시에 주어진 질의 시퀀스의 길이에 따라 생성되어 있는 인덱스 중에서 가장 적절한 것을 선택하여 탐색을 수행한다. 이때, 생성되어 있는 인덱스의 개수가 많을수록 탐색 성능이 향상된다. 필요에 따라 인덱스의 개수를 변화함으로써 탐색 성능과 저장 공간 간의 비율을 유연하게 조정할 수 있다. 질의 시퀀스의 길이 256 ~ 512중 다섯 개의 길이에 대해 인덱스를 생성하여 실험한 결과, 탐색 결과 선택률이 $10^{-2}$일 때 제안된 알고리즘의 탐색 성능이 순차 검색에 비하여 평균 2.40배, 선택률이 $10^{-5}$일 때 평균 14.6배 개선되었다. 제안된 알고리즘의 탐색 성능은 탐색 결과 선택률이 작아질수록 더욱 향상되므로, 실제 데이터베이스 응용에서의 효용성이 높다고 판단된다.
PDF

Isolated-Word Speech Recognition using Variable-Frame Length Normalization (가변프레임 길이정규화를 이용한 단어음성인식)

Sin, Chan-Hu;Lee, Hui-Jeong;Park, Byeong-Cheol
- The Journal of the Acoustical Society of Korea
- /
- v.6 no.4
- /
- pp.21-30
- /
- 1987
Length normalization by variable frame size is proposed as a novel approach to length normalization to solve the problem that the length variation of spoken word results in a lowing of recognition accuracy. This method has the advantage of curtailment of recognition time in the recognition stage because it can reduce the number of frames constructing a word compared with length normalization by a fixed frame size. In this paper, variable frame length normalization is applied to multisection vector quantization and the efficiency of this method is estimated in the view of recognition time and accuracy through practical recognition experiments.
PDF

Emotion Robust Speech Recognition using Speech Transformation (음성 변환을 사용한 감정 변화에 강인한 음성 인식)

Kim, Weon-Goo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.20 no.5
- /
- pp.683-687
- /
- 2010
This paper studied some methods which use frequency warping method that is the one of the speech transformation method to develope the robust speech recognition system for the emotional variation. For this purpose, the effect of emotional variations on the speech signal were studied using speech database containing various emotions and it is observed that speech spectrum is affected by the emotional variation and this effect is one of the reasons that makes the performance of the speech recognition system worse. In this paper, new training method that uses frequency warping in training process is presented to reduce the effect of emotional variation and the speech recognition system based on vocal tract length normalization method is developed to be compared with proposed system. Experimental results from the isolated word recognition using HMM showed that new training method reduced the error rate of the conventional recognition system using speech signal containing various emotions.
https://doi.org/10.5391/JKIIS.2010.20.5.683 인용 PDF KSCI

Robust Speech Recognition using Vocal Tract Normalization for Emotional Variation (성도 정규화를 이용한 감정 변화에 강인한 음성 인식)

Kim, Weon-Goo;Bang, Hyun-Jin
- Journal of the Korean Institute of Intelligent Systems
- /
- v.19 no.6
- /
- pp.773-778
- /
- 2009
This paper studied the training methods less affected by the emotional variation for the development of the robust speech recognition system. For this purpose, the effect of emotional variations on the speech signal were studied using speech database containing various emotions. The performance of the speech recognition system trained by using the speech signal containing no emotion is deteriorated if the test speech signal contains the emotions because of the emotional difference between the test and training data. In this study, it is observed that vocal tract length of the speaker is affected by the emotional variation and this effect is one of the reasons that makes the performance of the speech recognition system worse. In this paper, vocal tract normalization method is used to develop the robust speech recognition system for emotional variations. Experimental results from the isolated word recognition using HMM showed that the vocal tract normalization method reduced the error rate of the conventional recognition system by 41.9% when emotional test data was used.
https://doi.org/10.5391/JKIIS.2009.19.6.773 인용 PDF KSCI

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

Lee, Guehyun;Kim, Weon-Goo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.22 no.6
- /
- pp.681-686
- /
- 2012
This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.
https://doi.org/10.5391/JKIIS.2012.22.6.681 인용 PDF KSCI

3D building modeling from airborne Lidar data by building model regularization (건물모델 정규화를 적용한 항공라이다의 3차원 건물 모델링)

Lee, Jeong Ho;Ga, Chill Ol;Kim, Yong Il;Lee, Byung Gil
- Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
- /
- v.30 no.4
- /
- pp.353-362
- /
- 2012
3D building modeling from airborne Lidar without model regularization may cause positional errors or topological inconsistency in building models. Regularization of 3D building models, on the other hand, restricts the types of models which can be reconstructed. To resolve these issues, this paper modelled 3D buildings from airborne Lidar by building model regularization which considers more various types of buildings. Building points are first segmented into roof planes by clustering in feature space and segmentation in object space. Then, 3D building models are reconstructed by consecutive adjustment of planes, lines, and points to satisfy parallelism, symmetry, and consistency between model components. The experimental results demonstrated that the method could make more various types of 3d building models with regularity. The effects of regularization on the positional accuracies of models were also analyzed quantitatively.
https://doi.org/10.7848/ksgpc.2012.30.4.353 인용 PDF KSCI

Search Result 39, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)