• 제목/요약/키워드: phonetic data

검색결과 200건 처리시간 0.021초

음성학적 지식 기반 변이음 모델을 이용한 가변 어휘 단어 인식기 (Variable Vocabulary Word Recognizer using Phonetic Knowledge-based Allophone Model)

  • 김회린;이항섭
    • 한국음향학회지
    • /
    • 제16권2호
    • /
    • pp.31-35
    • /
    • 1997
  • 본 논문에서는 훈련용 음성 데이터와 무관한 임의의 새로운 어휘를 인식해 낼 수 있는 가변 어휘 단어 인식기 개발에 대하여 기술한다. 가변 어휘 단어 인식기를 구현하기 위해서는, 인식 대상이 될 새로운 어휘를 즉시 발음 사전으로 변환시키는 on-line 발음 사전 생성기가 필요하고, 발음 사전 출력을 가지고 각 단어를 모델링할 수 있는 신뢰성 있는 음소 및 변이음 모델이 필요하다. 이와 같은 신뢰성 있는 음소 및 변이음 모델은 생성시키기 위하여 본 연구에서는, 각 음소의 전후 음소들의 음성학적 자질을 고려하여 3 음소열을 집단화(clustering)하여 변이음을 정의하고 이를 당 연구실이 보유하고 있는 POW(Phonetically Optimized Words) 3,848개 단어에 적용하여 1,548개의 변이음 모델을 생성시켰다. 이를 토대로 가변 어휘 단어 인식기를 구현하고 이를 POW 3,848 DB, PBW 445 DB 및 호텔 예약용 244 단어 DB 등에 적용하여 그 성능을 평가하였다. 평가 결과, POW DB에 대해서는 79.6%, PBW DB에 대해서는 445 단어 사전의 경우 79.4%, 100 단어 사전의 경우 88.9%의 성능을 보여 주었고, 호텔 예약 DB에 대해서는 71.4%의 성능을 보여 주었다.

  • PDF

모방 발화의 음향음성학적 연구(3) -전문 성대 모사자의 자료를 중심으로- (An Acoustic Study on the Voice Imitation(3) - Based on a professional voice imitator′s speech -)

  • 안병섭;박미영
    • 대한음성학회지:말소리
    • /
    • 제52호
    • /
    • pp.1-14
    • /
    • 2004
  • In this study, we investigated acoustic characteristics of imitated utterances by a professional voice imitator, focusing on prosodic properties such as vowel formants and f0 distribution. To see the patterns of a voice imitation by a professional voice imitator, we compared the imitator's voice data with target speakers' voice data. The professional imitator, Mr. Bae produced utterances imitating the former President Kim's, the comedian Choi's, and the singer Bae's voices. Auditorily, the imitator was judged to imitate all the target speakers' voices successfully. However, acoustic examination showed that the imitator was better at imitating the singer Bae's voice in that the imitator's and the singer Bae's voices are more alike with respect to vowel formants and f0 distribution. We infer this is because the imitator's normal voice is very similar to the singer Bae's voice. On the other hand, the imitator's voice data showed that the patterns of vowel formants and f0 distribution found in the imitator's imitation voices of the other two target speakers were different from those of target speakers' voices.

  • PDF

상태레벨 공유를 이용한 MLLR 적응화의 회귀클래스 생성에 관한 연구 (A Study on Regression Class Generation of MLLR Adaptation Using State Level Sharing)

  • 오세진;성우창;김광동;노덕규;송민규;정현열
    • 한국음향학회지
    • /
    • 제22권8호
    • /
    • pp.727-739
    • /
    • 2003
  • 본 논문에서는 HM-Net (Hidden Markov Network)을 다양한 태스크에의 적용과 화자의 특성을 효과적으로 나타내기 위해 HM-Net 음성인식 시스템에 MLLR (Maximum Likelihood Linear Regression) 적응방법을 도입하였으며, HM-Net 학습 알고리즘을 개량하여 회귀클래스 생성방법을 제안한다. 제안방법은 PDT-SSS (Phonetic Decision Tree-based Successive State Splitting)알고리즘의 문맥방향 상태분할에 의한 상태레벨 공유를 이용한 방법이다. 즉, 문맥방향의 각 상태에 적응화자 음성데이터에 포함된 문맥정보를 분할하여 적응화될 음소환경을 결정하는 것이다. 따라서 제안방법은 새로운 화자로부터 문맥정보와 적응화 데이터의 발성 양에 의존하여 결정된 많은 적응 파라미터들을 (평균, 분산) 자유롭게 제어할 수 있게 된다. 제안방법의 유효성을 확인하기 위해 국어공학센터 (KLE) 452 데이터와 항공편 예약관련 (YNU200) 연속음성을 대상으로 인식실험을 수행한 결과, 음소인식, 단어인식, 연속음성인식에 대해서, 평균 34∼37%, 평균 9%, 평균 20%의 성능 향상을 각각 보였다. 또한 적응화 데이터의 양에 따른 인식성능 비교에서 제안방법을 적용한 인식 시스템이 적응 데이터의 양이 적은 경우에도 향상된 인식률을 보여 MLLR 적응방법의 특성을 만족하였다. 따라서 MLLR 적응방법을 도입한 HM-Net 음성인식 시스템에 제안한 회귀클래스 생성방법이 유효함을 확인할 수 있었다.

VoiceXML 기반 음성인식시스템을 이용한 서비스 개발 (The Interactive Voice Services based on VoiceXML)

  • 김학균;김은향;김재인;구명완
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.113-125
    • /
    • 2002
  • As there are needs to search the Web information via wire or wireless telephones, VoiceXML forum was established to develop and promote the Voice eXtensible Markup Language (VoiceXML). VoiceXML simplifies the creation of personalized interactive voice response services on the Web, and allows voice and phone access to information on Web sites, call center databases. Also, it can utilize the Web-based technologies, such as CGI(Common Gateway Interface) scripts. In this paper, we have developed the voice portal service platform based on VoiceXML called TeleGateway. It enables integration of voice services with data services using the Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) engines. Also, we have showed the various services on voice portal services.

  • PDF

음성인식을 이용한 자동 호 분류 철도 예약 시스템 (A Train Ticket Reservation Aid System Using Automated Call Routing Technology Based on Speech Recognition)

  • 심유진;김재인;구명완
    • 대한음성학회지:말소리
    • /
    • 제52호
    • /
    • pp.161-169
    • /
    • 2004
  • This paper describes the automated call routing for train ticket reservation aid system based on speech recognition. We focus on the task of automatically routing telephone calls based on user's fluently spoken response instead of touch tone menus in an interactive voice response system. Vector-based call routing algorithm is investigated and mapping table for key term is suggested. Korail database collected by KT is used for call routing experiment. We evaluate call-classification experiments for transcribed text from Korail database. In case of small training data, an average call routing error reduction rate of 14% is observed when mapping table is used.

  • PDF

다 모델 방식과 모델보상을 통한 잡음환경 음성인식 (A Multi-Model Based Noisy Speech Recognition Using the Model Compensation Method)

  • 정용주;곽성우
    • 대한음성학회지:말소리
    • /
    • 제62호
    • /
    • pp.97-112
    • /
    • 2007
  • The speech recognizer in general operates in noisy acoustical environments. Many research works have been done to cope with the acoustical variations. Among them, the multiple-HMM model approach seems to be quite effective compared with the conventional methods. In this paper, we consider a multiple-model approach combined with the model compensation method and investigate the necessary number of the HMM model sets through noisy speech recognition experiments. By using the data-driven Jacobian adaptation for the model compensation, the multiple-model approach with only a few model sets for each noise type could achieve comparable results with the re-training method.

  • PDF

한국어 원거리 음성의 모음의 음향적 특성 (Acoustic Characteristics of Vowels in Korean Distant-Talking Speech)

  • 이숙향;김선희
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.61-76
    • /
    • 2005
  • This paper aims to analyze the acoustic effects of vowels produced in a distant-talking environment. The analysis was performed using a statistical method. The influence of gender and speakers on the variation was also examined. The speech data used in this study consist of 500 distant-talking words and 500 normal words of 10 speakers (5 males and 5 females). Acoustic features selected for the analysis were the duration, the formants (Fl and F2), the fundamental frequency and the total energy. The results showed that the duration, F0, F1 and the total energy increased in the distant-talking speech compared to normal speech; female speakers showed higher increase in all features except for the total energy and the fundamental frequency. In addition, speaker differences were observed.

  • PDF

Annotation of a Non-native English Speech Database by Korean Speakers

  • Kim, Jong-Mi
    • 음성과학
    • /
    • 제9권1호
    • /
    • pp.111-135
    • /
    • 2002
  • An annotation model of a non-native speech database has been devised, wherein English is the target language and Korean is the native language. The proposed annotation model features overt transcription of predictable linguistic information in native speech by the dictionary entry and several predefined types of error specification found in native language transfer. The proposed model is, in that sense, different from other previously explored annotation models in the literature, most of which are based on native speech. The validity of the newly proposed model is revealed in its consistent annotation of 1) salient linguistic features of English, 2) contrastive linguistic features of English and Korean, 3) actual errors reported in the literature, and 4) the newly collected data in this study. The annotation method in this model adopts the widely accepted conventions, Speech Assessment Methods Phonetic Alphabet (SAMPA) and the TOnes and Break Indices (ToBI). In the proposed annotation model, SAMPA is exclusively employed for segmental transcription and ToBI for prosodic transcription. The annotation of non-native speech is used to assess speaking ability for English as Foreign Language (EFL) learners.

  • PDF

성별 및 연령에 따른 비음치 비교 (Age and Sex Differences in Nasalance Scores)

  • 김민정;임성은;최홍식
    • 대한후두음성언어의학회지
    • /
    • 제11권2호
    • /
    • pp.141-145
    • /
    • 2000
  • Background and Objectives : The nasalance score measured by Nasometer is a supplementary data for the perceptually rated nasality by a trained speech pathologist. The nasalance score varies with subjects. The objective of the present study was to examine whether there are differences in nasalance scores as a function of age and sex. Materials and Method : This study used 20 normal chidren aged from 3 to 8 and 40 normal adults aged from 21 to 37(male : female= 1 : 1) as subjects. The nasalance scores were analyzed in 3 different phonetic contests(nasal, /i/ vowel,/a/ vowel) and 4 different sentence lengths(1, 2, 4, 8 syllable). Results : The children had significantly higher nasalance scores in short sentences an the adults. The female subjects had significantly higher nasalance scores in nasal sentences and in short sentences than the male subjects. Conclusion : These results may indicate that sex and age differences should be considered in the interpretation of the nasalance score in nasal sentences or in short sentences.

  • PDF

N-gram 기반의 유사도를 이용한 대화체 연속 음성 언어 모델링 (Spontaneous Speech Language Modeling using N-gram based Similarity)

  • 박영희;정민화
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.117-126
    • /
    • 2003
  • This paper presents our language model adaptation for Korean spontaneous speech recognition. Korean spontaneous speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpus. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf/sup */idf similarity. In addition to relevance weighting, we use disfluencies as Predictor to the neighboring words. The best result reduces 9.7% word error rate relatively and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor also.

  • PDF