• 제목/요약/키워드: phonetic system

검색결과 313건 처리시간 0.019초

Adaptive Comb Filtering을 이용한 이동 통신 환경에서의 효과적인 잡음 제거 (Effective Noise Reduction in Mobile Communication Environment using Adaptive Comb Filtering)

  • 박정식;정규준;오영환
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.203-206
    • /
    • 2003
  • In this paper, we employ the adaptive comb filtering for effective noise reduction in mobile communication environment. Adaptive comb filtering is a well- known method for noise reduction, but requires the correct pitch period and must be applied just in voiced speech frames. To satisfy these requirements we use two kinds of information extracted from speech packets, one of which is the pitch period information measured precisely by a speech coder and the other is the frame rate information related to a decision on speech or silence frame. Experiments on speech recognition system confirm the efficiency of this method. Feature parameters employing this method give superior performance in noise environment to those extracted directly from output speech.

  • PDF

자동 구두점 삽입을 이용한 Rich Transcription 생성 (Rich Transcription Generation Using Automatic Insertion of Punctuation Marks)

  • 김지환
    • 대한음성학회지:말소리
    • /
    • 제61호
    • /
    • pp.87-100
    • /
    • 2007
  • A punctuation generation system which combines prosodic information with acoustic and language model information is presented. Experiments have been conducted first for the reference text transcriptions. In these experiments, prosodic information was shown to be more useful than language model information. When these information sources are combined, an F-measure of up to 0.7830 was obtained for adding punctuation to a reference transcription. This method of punctuation generation can also be applied to the 1-best output of a speech recogniser. The 1-best output is first time aligned. Based on the time alignment information, prosodic features are generated. As in the approach applied in the punctuation generation for reference transcriptions, the best sequence of punctuation marks for this 1-best output is found using the prosodic feature model and an language model trained on texts which contain punctuation marks.

  • PDF

SNR을 이용한 프레임별 유사도 가중방법을 적용한 문맥종속 화자인식에 관한 연구 (A Study on the Context-dependent Speaker Recognition Adopting the Method of Weighting the Frame-based Likelihood Using SNR)

  • 최홍섭
    • 대한음성학회지:말소리
    • /
    • 제61호
    • /
    • pp.113-123
    • /
    • 2007
  • The environmental differences between training and testing mode are generally considered to be the critical factor for the performance degradation in speaker recognition systems. Especially, general speaker recognition systems try to get as clean speech as possible to train the speaker model, but it's not true in real testing phase due to environmental and channel noise. So in this paper, the new method of weighting the frame-based likelihood according to frame SNR is proposed in order to cope with that problem. That is to make use of the deep correlation between speech SNR and speaker discrimination rate. To verify the usefulness of this proposed method, it is applied to the context dependent speaker identification system. And the experimental results with the cellular phone speech DB which is designed by ETRI for Koran speaker recognition show that the proposed method is effective and increase the identification accuracy by 11% at maximum.

  • PDF

대구 방언 단모음의 세대 간 차이에 대한 음향 음성학적 연구 (An Acoustic Study on the Generational Difference of the Monophthongs in the Daegu Dialect)

  • 장혜진;신지영
    • 대한음성학회지:말소리
    • /
    • 제57호
    • /
    • pp.15-30
    • /
    • 2006
  • This paper investigates differences between generations in the vowel system of the Daegu dialect in terms of F1 and F2 of the monophthongs. Three different groups of subjects participated in the present study: 20 female native speakers of the Daegu dialect(10 in their 20's and 10 in their 40's), and 10 female native speakers of the Seoul dialect as a control group. It has been assumed that the Daegu dialect has six vowels. However, younger generation appears to have 7 vowels different from older generation. The result of the present study showed that the Daegu dialect has different vowel systems between generations: for 40's have six vowels and 20's have seven vowels. These differences seems to be attributed to the influence of the Seoul dialect.

  • PDF

Robust Histogram Equalization Using Compensated Probability Distribution

  • Kim, Sung-Tak;Kim, Hoi-Rin
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.131-142
    • /
    • 2005
  • A mismatch between the training and the test conditions often causes a drastic decrease in the performance of the speech recognition systems. In this paper, non-linear transformation techniques based on histogram equalization in the acoustic feature space are studied for reducing the mismatched condition. The purpose of histogram equalization(HEQ) is to convert the probability distribution of test speech into the probability distribution of training speech. While conventional histogram equalization methods consider only the probability distribution of a test speech, for noise-corrupted test speech, its probability distribution is also distorted. The transformation function obtained by this distorted probability distribution maybe bring about miss-transformation of feature vectors, and this causes the performance of histogram equalization to decrease. Therefore, this paper proposes a new method of calculating noise-removed probability distribution by using assumption that the CDF of noisy speech feature vectors consists of component of speech feature vectors and component of noise feature vectors, and this compensated probability distribution is used in HEQ process. In the AURORA-2 framework, the proposed method reduced the error rate by over $44\%$ in clean training condition compared to the baseline system. For multi training condition, the proposed methods are also better than the baseline system.

  • PDF

음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성 (Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech)

  • 김승원;정옥;이근배;김병창
    • 대한음성학회지:말소리
    • /
    • 제53호
    • /
    • pp.75-92
    • /
    • 2005
  • Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.

  • PDF

미들웨어 기반의 텔레매틱스용 멀티모달 인터페이스 (A Multimodal Interface for Telematics based on Multimodal middleware)

  • 박성찬;안세열;박성수;구명완
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.41-44
    • /
    • 2007
  • In this paper, we introduce a system in which car navigation scenario is plugged multimodal interface based on multimodal middleware. In map-based system, the combination of speech and pen input/output modalities can offer users better expressive power. To be able to achieve multimodal task in car environments, we have chosen SCXML(State Chart XML), a multimodal authoring language of W3C standard, to control modality components as XHTML, VoiceXML and GPS. In Network Manager, GPS signals from navigation software are converted to EMMA meta language, sent to MultiModal Interaction Runtime Framework(MMI). Not only does MMI handles GPS signals and a user's multimodal I/Os but also it combines them with information of device, user preference and reasoned RDF to give the user intelligent or personalized services. The self-simulation test has shown that middleware accomplish a navigational multimodal task over multiple users in car environments.

  • PDF

Acquisition of English Voiced Stop in Word Initial Position : Correlation with Vowel Height

  • Yoon, Su-yeon;Seo, Min-kyong;Song, Yoon-Kyoung
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2000년도 7월 학술대회지
    • /
    • pp.199-199
    • /
    • 2000
  • Korean stops are 3 system: aspirated, fortis, lenis, whereas English stops are 2 system: voiced, voiceless. Because in Korean, lenis stop is realized by slight aspirated voiceless stop, it is likely to produce English word initial voiced stop as voiceless stop. We divide subjects into three group-native, experienced, unexperienced- and investigate differences between group. VOT of experienced group IS same as native group, but VOT of unexperienced group is longer than native group. VOt of unexperienced group is 1.8 times than native group. We survey whether the height of following vowel influences VOT of initial stop. As a result, for all group, VOT followed by low vowel is shorter than VOT followed by high vowel. But this tendency is more salient in unexperienced group. For high vowel, VOT of unexperienced group is 2.05 times than native group, whereas for low vowel, it is just 1.55 times. The unexperienced pronounce well English word initial voiced stop followed by low vowel than high vowel. Samples are divided into two group according to type of coda consonant- nasal and voiceless stop. But average of VOT is similar and there is no significant difference between two groups. There is no influence by type of coda consonant. The average of phrases is compared to the average of isolated words. In the case of natives and experienced, there is no significant differences between phrases and words, but in the case of unexperienced, VOT of phrases becomes shorter than words. But VOT of unexperienced is still longer than native group.

  • PDF

자동 음성분할 및 레이블링 시스템의 구현 (Implementation of the Automatic Segmentation and Labeling System)

  • 성종모;김형순
    • 한국음향학회지
    • /
    • 제16권5호
    • /
    • pp.50-59
    • /
    • 1997
  • 본 논문에서는 한국어 음성 데이터베이스 구축을 위하여 자동으로 음소경계를 추출하는 자동 음성분할 및 레이블링 시스템을 구현하였다. 기존의 음성분할 및 레이블링 기술을 근간으로 본 시스템을 구현하였으며, 또한 사용자가 자동분할된 음소경계를 확인하여 그 경계를 쉽게 수정할 수 있도록 한글 모티프 환경에서 그래픽 사용자 인터페이스를 개발하였다. 개발된 시스템은 16kHz로 샘플링된 음성을 대상으로 하고 있으며, 레이블링 단위는 45개의 유사음소와 하나의 묵음으로 구성하였다. 그리고 언어학적 정보의 입력방식으로는 음소표기와 철자표기를 사용하였으며, 패턴매칭 방법으로는 hidden Markov model(HMM)을 이용하였다. 개발된 시스템의 각 음소 모델은 수작업에 의해서 음소단위로 분할한 음성학적으로 균형잡힌 445 단어 데이터베이스를 이용해서 훈련되었다. 그리고 본 시스템의 성능평가를 위해 훈련에 사용되지 않는 문장 데이터베이스에 대해서 자동 음성분할 실험을 수행하였다. 실험결과, 수작업에 의해서 분할된 음소경계위치와의 오차가 20ms 이내인 것이 74.7%였으며, 40ms이내에는 92.8%가 포함되었다.

  • PDF

조선한자음(朝鮮漢字音)의 성립(成立)과 변천(變遷) (The Formation and Alternation of Sino-Korean Pronunciation)

  • 정광
    • 인문언어
    • /
    • 제7집
    • /
    • pp.31-56
    • /
    • 2005
  • In most Asian areas Chinese writing and characters had been used as a unique recording device. The way to account for the circumstance related with the writing system could be twofold. Firstly the races inhabited around Sino-territory actually neither used the type of languages as Chinese - not isolating type but agglutinative one - nor established any independent writing letters. Secondly those people who belonged to the races accepted the writing system of China due to the frequent cultural and economical interchange between them and Chinese people. In Korean peninsula the same situation of linguistic phenomenon had been pervasive. The aborigine of the territory who acquired to use Chinese writing applied their knowledge of the second language to record the facts related with the management of the country. But the grammatical structure of Chines writing and native language showed the remarkable contrast; so, the people of the peninsula managed the specific letter system - in other words, the discrepancy between language and writing. This difference carried on the huge influence on the way of using Chinese writing and characters in Korea. Some scholars of historical linguistics of Korean language considered the alternation of Chinese writing system and characters as "the procedure of nativization" - in which the inflow of characters into Korean and the same one continuously used in China illustrated the large gap of the phonological aspects. The method of reading Chinese characters came to be named as Sino-Korean Pronunciation. In the categorization of Chinese characters' pronunciation Sino-Korean Pronunciation was also categorized as the Eastern Pronunciation(東音). It indicates the sound of Chinese characters which has been historically adapted to the phonological system of Korean language. In this paper the main point is to survey the procedure of reception of Chinese writing and characters and that of establishment and alternation of Korean phonetic feature of Chinese writing and characters.

  • PDF