통합 검색 | Korea Science

분절 특징 HMM을 이용한 영어 음소 인식 (English Phoneme Recognition using Segmental-Feature HMM)

윤영선
- 한국정보과학회논문지:소프트웨어및응용
- /
- 제29권3호
- /
- pp.167-179
- /
- 2002
본 논문에서는 여러 프레임 특징으로 표현되는 분절 특징(segmental feature) 표현 방법을 제안하고, HMM 개념 위에서 음향학적 모델과 그 알고리즘을 개발하여 HMM의 약점으로 지적되는 독립관측 가정을 완화시키고자 한다. 제안된 특징 표현은 단일 프레임 특징이 음성 신호의 시간적 동적 특성 (temporal dynamics)을 제대로 표현하지 못하기 때문에, 여러 프레임을 이용하여 음성 특징을 표현하도록 한다. 분절 특징은 다항식의 회귀 함수(polynomial regression function)에 의하여 관측 벡터의 궤적으로 표현되고, 이 특징을 패턴 분류에 사용하기 위하여 음성 신호의 궤적을 효과적으로 표현하는 분절 HMM(segmental HMM)을 이용한다. SHMM은 상태에서의 관측 확률을 외적 분절 변이와 내적 분절 변이로 세분하며, 외적 분절 변이는 장기적인 변화를, 내적 분절 변이는 단기적인 변화를 나타낸다. 음향학적 모델에서 분절 특성을 고려하기 위하여 외적 분절 변이는 분절의 확률 분포로 표현하고, 내적 분절 변이는 궤적의 추정 오차로 표현하도록 SHMM을 수정한 분절 특징 HMM(SFHMM; segmental-feature HMM)을 제안한다. SFHMM에서는 분절의 관측 확률을 분절 우도와 궤적의 추정 오차의 관계로써 표현하며, 추정오차는 특정 상태에서의 분절의 우도에 대한 가중치로 고려될 수 있다. 제안된 방법의 유효성과 분절 특징의 특성을 살펴보기 위하여 TIMIT 자료를 이용하여 몇 가지 실험을 하였다. 이들 실험 결과에서, 제안된 방법이 기존의 HMM보다 매개 변수가 많더라도, 성능의 향상과 제안된 특징이 유연하고 정보를 많이 가진다는 점에서 의미가 있다고 하겠다.
PDF KSCI

강인한 음성 인식을 위한 탠덤 구조와 분절 특징의 결합 (Combination Tandem Architecture with Segmental Features for Robust Speech Recognition)

윤영선;이윤근
- 대한음성학회지:말소리
- /
- 제62호
- /
- pp.113-131
- /
- 2007
It is reported that the segmental feature based recognition system shows better results than conventional feature based system in the previous studies. On the other hand, the various studies of combining neural network and hidden Markov models within a single system are done with expectations that it may potentially combine the advantages of both systems. With the influence of these studies, tandem approach was presented to use neural network as the classifier and hidden Markov models as the decoder. In this paper, we applied the trend information of segmental features to tandem architecture and used posterior probabilities, which are the output of neural network, as inputs of recognition system. The experiments are performed on Auroral database to examine the potentiality of the trend feature based tandem architecture. From the results, the proposed system outperforms on very low SNR environments. Consequently, we argue that the trend information on tandem architecture can be additionally used for traditional MFCC features.
PDF

Acoustic Analysis for Natural Pronunciation Programs

Lim Un
- 대한음성학회지:말소리
- /
- 제44호
- /
- pp.1-14
- /
- 2002
Because the accuracy and the fluency are the essence in English speaking, both of them are very important in English trencher training and in-service English training programs. To get the accuracy and the fluency, the causes and the phenomena of the unnatural pronunciation have to be diagnosed. Consequently, the problematic and unnatural pronunciation of Korean elementary and secondary English teachers should be analyzed with using Acoustic Analyzing tools like CSL, Multi-speech and Praat. In addition, an attempt to Pinpoint what the causes of unnatural pronunciation was executed. Next a procedure and steps were proposed for in-service training programs that would cultivate the fluency and the accuracy. In case of elementary teachers, the unnatural pronunciation of segmental features and suprasegmental features were found much. therefore segmental features should be emphasized in the begging of pronunciation training courses and then suprasegmental features have to be emphasized. In case of secondary teachers, the unnatural pronunciation of suprasegmental features were found much. Therefore segmental and suprasegmental features have to be focused at the same time. In other words, features in word level should be focused first for elementary English teacher, and features in word level and beyond word level should be trained at the same time for secondary English teachers.
PDF

Component dynamics in miscible polymer blends: A review of recent findings

Watanabe, Hiroshi;Urakawa, Osamu
- Korea-Australia Rheology Journal
- /
- 제21권4호
- /
- pp.235-244
- /
- 2009
Miscible polymer blends still have heterogeneity in their component chain concentration in the segmental length scale because of the chain connectivity (that results in the self-concentration of the segments of respective chains) as well as the dynamic fluctuation over various length scales. As a result, the blend components feel different dynamic environments to exhibit different temperature dependence in their segmental relaxation rates. This type of dynamic heterogeneity often results in a broad glass transition (sometimes seen as two separate transitions), a broad distribution of the local (segmental) relaxation modes, and the thermo-rheological complexity of this distribution. Furthermore, the dynamic heterogeneity also affects the global dynamics in the miscible blends if the component chains therein have a large dynamic asymmetry. Thus, the superficially simple miscible blends exhibit interesting dynamic behavior. This article gives a brief summary of the features of the segmental and global dynamics in those blends.
PDF KSCI

Unilateral segmental odontomaxillary hypoplasia: an unusual case report

Pandey, Sushma;Pai, Keerthilatha M.;Nayak, Ajay G.;Vineetha, Ravindranath
- Imaging Science in Dentistry
- /
- 제41권1호
- /
- pp.39-42
- /
- 2011
Facial asymmetry is not an uncommon occurrence in day to day dental practice. It can be caused by various etiologic factors ranging from facial trauma to serious hereditary conditions. Here, we report a rare case of non-syndromic facial asymmetry in a young female, who was born with this condition but was not aware of the progression of asymmetry. No relevant family history was recognized. She was also deficient in both deciduous and permanent teeth in the corresponding region of maxilla. Hence, the cause of this asymmetry was believed to be a segmental odontomaxillary hypoplasia of left maxilla accompanied by agenesis of left maxillary premolars and molars and disuse atrophy of corresponding facial musculature. This report briefly discussed the comparative features of segmental odontomaxillary hypoplasia, hemimaxillofacial dysplasia, and segmental odontomaxillary dysplasia and justified the differences between segmental odontomaxillary hypoplasia and the other two conditions.
https://doi.org/10.5624/isd.2011.41.1.39 인용 PDF KSCI

Synthesis and Evaluation of Prosodically Exaggerated Utterances

윤규철
- 말소리와 음성과학
- /
- 제1권3호
- /
- pp.73-85
- /
- 2009
This paper introduces the technique of synthesizing and evaluating human utterances with exaggerated or atypical prosody. Prosody exaggeration can be implemented by manipulating either the fundamental frequency (F0) contour, the segmental durations, or the intensity contour of an utterance. Of these three prosodic elements, two or more can be exaggerated at the same time. The algorithms of synthesis and evaluation were suggested. Learner utterances exaggerated in each of the three prosodic features were evaluated with respect to their original native versions in terms of the differences in their F0 contours, the segmental durations, and the intensity contours. The measure of differences was the Euclidean distance metric between the matching points in their F0 and intensity contours. The measure was calculated after the exaggerated learner utterances were aligned by the segments and rendered identical to their native version in terms of their segmental durations. For the evaluation of the segmental durations, no prior modifications were made in durations and the same measure was used. The results from the pilot experiment suggest the viability of this measure in the evaluation of learner utterances with atypical prosody with respect to their native versions.
PDF

SWAPPING NATIVE AND NON-NATIVE SPEAKERS' PROSODY USING THE PSOLA ALGORITHM

Yoon Kyu-Chul
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2006년도 춘계 학술대회 발표논문집
- /
- pp.77-81
- /
- 2006
This paper presents a technique of imposing the prosodic features of a native speaker's utterance onto the same sentence uttered by a non-native speaker. Three acoustic aspects of the prosodic features were considered: the fundamental frequency (F0) contour, segmental durations, and the intensity contour. The fundamental frequency contour and the segmental durations of the native speaker's utterance were imposed on the non-native speaker's utterance by using the PSOLA (pitch-synchronous overlap and add) algorithm [1] implemented in Praat[2]. The intensity contour transfer was also done in Praat. The technique of transferring one or more of these prosodic features was elaborated and its implications in the area of language education were discussed.
PDF

The Role of Prosody in Dialect Synthesis and Authentication

Yoon, Kyu-Chul
- 말소리와 음성과학
- /
- 제1권1호
- /
- pp.25-31
- /
- 2009
The purpose of this paper is to examine the viability of synthesizing Masan dialect with Seoul dialect and to examine the role of prosody in the authentication of the synthesized Masan dialect. The synthesis was performed by transferring one or more of the prosodic features of the Masan utterance onto the Seoul utterance. The hypothesis is that, given an utterance composed of the phonemes shared by both dialects, as more prosodic features of the Masan utterance are transferred onto the Seoul utterance, the Seoul utterance will be identified as more authentic Masan utterance. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The synthesized Masan utterances were evaluated by thirteen native speakers of Masan dialect. The result showed that the fundamental frequency contour and the segmental durations had main effects on the perceptual shift from Seoul to Masan dialect.
PDF

Focal Segmental Glomerulosclerosis in a Child with Prader-Willi Syndrome : A Case of Obesity-associated Focal Segmental Glomerulosclerosis

Cho Hee-Yeon;Chung Dae-Lim;Kang Ju-Hyung;Ha Il-Soo;Cheong Hae-Il;Choi Yong
- Childhood Kidney Diseases
- /
- 제8권2호
- /
- pp.244-249
- /
- 2004
비만성 사구체경화증(obesity-associated focal segmental glomerulosclerosis)은 비만과 부종이 없는 신증후군 범위의 단백뇨, 사구체 비대 및 경화 등의 임상상을 보이는 질환으로, 다수의 환자에서 신부전으로 진행되는 것으로 알려져 있다. 연구자들은 부종 없이 심한 단백뇨와 저알부민혈증을 보인 14세의 Prader-Willi 증후군 여아에서 신생검을 통하여 사구체비후와 메산지움 증식이 동반된 국소성 분절성 사구체경화 소견을 관찰하였다 이로써 소아의 Prader-Willi 증후군에서도 비만성 사구체경화증에 의한 신부전으로의 진행 위험이 있음을 알리는 바이다.
PDF

Aurora DB를 이용한 잡음 음성 인식실험을 위한 Segmental K-means 훈련 방식의 기반인식기의 구현 (An Implementation of the Baseline Recognizer Using the Segmental K-means Algorithm for the Noisy Speech Recognition Using the Aurora DB)

김희근;정용주
- 대한음성학회지:말소리
- /
- 제57호
- /
- pp.113-122
- /
- 2006
Recently, many studies have been done for speech recognition in noisy environments. Particularly, the Aurora DB has been built as the common database for comparing the various feature extraction schemes. However, in general, the recognition models as well as the features have to be modified for effective noisy speech recognition. As the structure of the HTK is very complex, it is not easy to modify, the recognition engine. In this paper, we implemented a baseline recognizer based on the segmental K-means algorithm whose performance is comparable to the HTK in spite of the simplicity in its implementation.
PDF

검색결과 71건 처리시간 0.027초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)