통합 검색 | Korea Science

MPEG-4TTS 현황 및 전망

한민수
- 전자공학회지
- /
- 제24권9호
- /
- pp.91-98
- /
- 1997
Text-to-Speech(WS) technology has been attracting a lot of interest among speech engineers because of its own benefits. Namely, the possible application areas of talking computers, emergency alarming systems in speech, speech output devices for speech-impaired, and so on. Hence, many researchers have made significant progresses in the speech synthesis techniques in the sense of their own languages and as a result, the quality of current speech synthesizers are believed to be acceptable to normal users. These are partly why the MPEG group had decided to include the WS technology as one of its MPEG-4 functionalities. ETRI has made major contributions to the current MPEG-4 775 appearing in various MPEG-4 documents with relatively minor contributions from AT&T and NW. Main MPEG-4 functionalities presently available are; 1) use of original prosody for synthesized speech output, 2) trick mode functions for general users without breaking synthesized speech prosody, 3) interoperability with Facial Animation(FA) tools, and 4) dubbing a moving/anlmated picture with lip-shape pattern informations.
PDF

코퍼스 기반 무제한 단어 중국어 TTS (Corpus Based Unrestricted vocabulary Mandarin TTS)

;하주홍;김병창;이근배
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2003년도 10월 학술대회지
- /
- pp.175-179
- /
- 2003
In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.
PDF

'Because of Doing' and 'Because of Happening': A Corpus-based Analysis of Korean Causal Conjunctives, -nula(ko) and -nun palamey

Oh, Sang-Suk
- 한국언어정보학회지:언어와정보
- /
- 제8권2호
- /
- pp.131-147
- /
- 2004
the two Korean causal conjunctive suffixes, -nula(ko) and -nun palamey, based on corpus linguistic analysis. Many of the linguistic accounts available, both in pedagogical reference and in the literature on linguistics, provide incomplete analyses of these suffixes, based on fabricated linguistic data. Using naturally occurring, real linguistic data, this paper examines the syntactic and semantic structures of the two causal suffixes through a consideration of three areas of corpus linguistic analysis: token frequencies, collocations, and semantic prosody. An analysis based on concordance data reveals that the two causal connectives, -nula(ko) and -nun palamey, have more differences than similarities in terms of syntactic and semantic constraints. The idiosyncratic structures of the two suffixes are discussed in terms of same subject condition, verb selection, same agent condition, synchronicity condition, and negative semantic prosody.
PDF

대용량 운율 음성데이타를 이용한 자동합성방식 (Automatic Synthesis Method Using Prosody-Rich Database)

김상훈
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
- /
- pp.87-92
- /
- 1998
In general, the synthesis unit database was constructed by recording isolated word. In that case, each boundary of word has typical prosodic pattern like a falling intonation or preboundary lengthening. To get natural synthetic speech using these kinds of database, we must artificially distort original speech. However, that artificial process rather resulted in unnatural, unintelligible synthetic speech due to the excessive prosodic modification on speech signal. To overcome these problems, we gathered thousands of sentences for synthesis database. To make a phone level synthesis unit, we trained speech recognizer with the recorded speech, and then segmented phone boundaries automatically. In addition, we used laryngo graph for the epoch detection. From the automatically generated synthesis database, we chose the best phone and directly concatenated it without any prosody processing. To select the best phone among multiple phone candidates, we used prosodic information such as break strength of word boundaries, phonetic contexts, cepstrum, pitch, energy, and phone duration. From the pilot test, we obtained some positive results.
PDF

MPEG-4 TTS (Text-to-Speech)

한민수
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 1999년도 하계종합학술대회 논문집
- /
- pp.699-707
- /
- 1999
It cannot be argued that speech is the most natural interfacing tool between men and machines. In order to realize acceptable speech interfaces, highly advanced speech recognizers and synthesizers are inevitable. Text-to-Speech(TTS) technology has been attracting a lot of interest among speech engineers because of its own benefits. Namely, the possible application areas of talking computers, emergency alarming systems in speech, speech output devices fur speech-impaired, and so on. Hence, many researchers have made significant progresses in the speech synthesis techniques in the sense of their own languages and as a result, the quality of currently available speech synthesizers are believed to be acceptable to normal users. These are partly why the MPEG group had decided to include the TTS technology as one of its MPEG-4 functionalities. ETRI has made major contributions to the current MPEG-4 TTS among various MPEG-4 functionalities. They are; 1) use of original prosody for synthesized speech output, 2) trick mode functions fer general users without breaking synthesized speech prosody, 3) interoperability with Facial Animation(FA) tools, and 4) dubbing a moving/animated picture with lib-shape pattern information.
PDF

한국인 영어 학습자의 영어 관계절 모호성 해소의 운율적 전략 (Korean English Learners' Prosodic Disambiguation in English Relative Clause Attachment)

전윤실;신지영;김기호
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2006년도 춘계 학술대회 발표논문집
- /
- pp.67-70
- /
- 2006
Prosody can be used to resolve syntactic ambiguity of a sentence. English relative clause construction with complex NP(the N1, N2, and RC sequence) has syntactic ambiguity and the clause can be interpreted as modyfying N1(high attachment) or N2(low attachment), Speakers and listeners can disambiguate those sentences based on the prosody. In this paper, we investigate the Korean English learners production on the prosodic structure of English relative clause construction. The production experiment shows that the beginner learners use the phrasing frequently and the advanced learners depend on both the phrasing and the accent. One of the characteristic of the Korean English learners' intonation is that the Korean accentual phrase tone pattern LHa is transferred to their production.
PDF

시간 영역에서의 무제한 고립어 합성을 위한 운율 요소 제어용 알고리즘 개발 (Development of an algorithm for the control of prosodic factors to synthesize unlimited isolated words in the time domain)

강찬희
- 전자공학회논문지C
- /
- 제35C권7호
- /
- pp.59-68
- /
- 1998
This paper is to develop an algorithm for the unlimited korean speech synthesis. We present the results controlled of prosodic factors with isolated words as aynthesis basis unit int he time domain. With a new pitch-synchronous and parametric speech synthesis mehtod in the time domain here we mainly present the results of controlled prosody factors such a spitch periods, energy envelops and durations and the evaluaton of synthetic speech qualities. In the case of synthesis, it is possible ot synthesize connected words by controlling of a continuous unified prosody that makes to improve the naturalities. In the results of experiment, it also has been to be improved uncontinuities of pitch and zeroing of energy in the junction parts of speech waveforms. Specially it has been to be possible to synthesize speeches with unlimitted durations and tones. So on it makes the noisiness and the clearness better by improving the degradation effects from the phase distortion due to the discontinuities in the waveform connection parts.
PDF

Considering Dynamic Non-Segmental Phonetics

Fujino, Yoshinari
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2000년도 7월 학술대회지
- /
- pp.312-320
- /
- 2000
This presentation aims to explore some possibility of non-segmental phonetics usually ignored in phonetics education. In pedagogical phonetics, especially ESL/EFL oriented phonetics speech sounds tend to be classified in two criteria 1) 'pronunciation' which deals with segments and 2) 'prosody' or 'suprasegmentals', a criterion that deals with non-segmental elements such as stress and intonation. However, speech involves more dynamic processing. It is non-linear and multi-dimensional in spite of the linear sequence of symbols in phonetic/phonological transcriptions. No word is without pitch or voice quality apart from segmental characteristics whether it is spoken in isolation or cut out from continuous speech. This simply tells the dichotomy of pronunciation and prosody is merely a useful convention. There exists some room to consider dynamic non-segmental phonetics. Examples of non-segmental phonetic investigation, some of the analyses conducted within the frame of Firthian Prosodic Analysis, especially of the relation between vowel variants and foot types, are examined and we see what kind of auditory phonetic training is required to understand impressionistic transcriptions which lie behind the non-segmental phonetics.
PDF

The Application of the Bodysonic System to L2 Learning

Suzuki, Kaoru
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2000년도 7월 학술대회지
- /
- pp.96-104
- /
- 2000
The Bodysonic system was invented on the basis of 'Bone Conduction Theory,' which states that people feel sounds with their whole body. The Bodysonic system is used for L2 (English) learning at Aichi Women's Junior College. In recent years we have developed some unique methodology related to use of the Bodysonic system. In Japan it is difficult for adult L2 learners to acquire the prosody of a foreign language. A language laboratory using the Bodysonic system has been suggested as one way to eradicate such adult L2 problems. The Bodysonic system changes sounds into vibrations. It makes it easy for learners to acquire the prosody of a foreign language because humans can convey information, through their tactile organs. In addition, this system was originally designed to make people relax, so it can also help minimize learner anxiety. The effect of Bodysonic vibrations on language learning has already been proven by some experiments. The Bodysonic system appears to be an ideal teaching method for adult to learn a foreign language.
PDF

한국어 대화체 TTS 개발을 위한 발음 및 운율 추정 (Grapheme-to-Phoneme Conversion and Prosody Modeling for Korean Conversational Style TTS)

이진식;김승원;김병창;이근배
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2006년도 추계학술대회 발표논문집
- /
- pp.135-138
- /
- 2006
In this paper, we introduce a method for extracting grapheme-to-phoneme conversion rules from the transcription of speech synthesis database and a prosody modeling method using the light version of ToBI for a Korean conversational style TTS. We focused on representing the characteristics of the conversational speech style and the experimental results show that our proposed methods are suitable for developing a Korean conversional style TTS.
PDF

검색결과 208건 처리시간 0.026초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)