• 제목/요약/키워드: prosody pattern

검색결과 30건 처리시간 0.019초

운율 패턴, 강도, 신호대소음비에 따른 문장 지각 변화 (Perception of sentences varying with prosody pattern, sound intensity, and signal-to-noise ratio)

  • 장선아;장은주;장재진
    • 말소리와 음성과학
    • /
    • 제9권2호
    • /
    • pp.119-124
    • /
    • 2017
  • This study investigates how perception of easy sentences varies with prosody pattern, sound intensity, and signal-to-noise ratio(SNR) in young adults with normal hearing who were in their 20's. The results showed that the presence of proper prosody pattern in the sentences increased correct perception rate of the target sentences, and that the lower the intensity and SNR, the lower the sentence perception scores. The results also showed that SNR had a greater effect on the sentence perception scores than sound intensity. There was a significant decrease of perception scores starting at the level of 15 dB and +3 SNR for the sentences with prosody pattern, while starting at the level of 18 dB and +6 SNR for the sentences without prosody pattern, ending up with a very poor perception score as sound intensity and SNR gets lower. There was a significant difference in the perception score of the sentences with prosody pattern between 20 year-old group and 21 year or older group in several listening conditions of sound intensity and SNR.

Prosody in Spoken Language Processing

  • Schafer Amy J.;Jun Sun-Ah
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2000년도 하계학술발표대회 논문집 제19권 1호
    • /
    • pp.7-10
    • /
    • 2000
  • Studies of prosody and sentence processing have demonstrated that prosodic phrasing can exhibit strong effects on processing decisions in English. In this paper, we tested Korean sentence fragments containing syntactically ambiguous Adj-N1-N2 strings in a cross-modal naming task. Four accentual phrasing patterns were tested: (a) the default phrasing pattern, in which each word forms an accentual phrase; (b) a phrasing biased toward N1 modification; (c) a phrasing biased toward complex-NP modification; and (d) a phrasing used with adjective focus. Patterns (b) and (c) are disambiguating phrasings; the other two are commonly found with both interpretations and are thus ambiguous. The results showed that the naming time of items produced in the prosody contradicting the semantic grouping is significantly longer than that produced in either default or supporting prosody, We claim that, as in English, prosodic information in Korean is parsed into a well-formed prosodic representation during the early stages of processing. The partially constructed prosodic representation produces incremental effects on syntactic and semantic processing decisions and is retained in memory to influence reanalysis decisions.

  • PDF

음향 측정과 지각 판단에 의한 한국인 영어의 운율 연구 (A Study Using Acoustic Measurement and Perceptual Judgment to identify Prosodic Characteristics of English as Spoken by Koreans)

  • 구희산
    • 음성과학
    • /
    • 제2권
    • /
    • pp.95-108
    • /
    • 1997
  • The purpose of this experimental study was to investigate prosodic characteristics of English as spoken by Koreans. Test materials were four English words, a sentence, and a paragraph. Six female Korean speakers and five native English speakers participated in acoustic and perceptual experiments. Pitch and duration of word syllables were measured from signals and spectrograms made by the Signalize 3.04 software program for Power Mac 7200. In the perceptual experiment, accent position, intonation patterns, rhythm patterns and phrasing were evaluated by the five native English speakers. Preliminary results from this limited study show that prosodic characteristics of Koreans include (1) pitch on the first part of a word and sentence is lower than that of English speakers, but the pitch on the last part is the opposite; (2) word prosody is quite similar to that of an English speaker, but sentence prosody is quite different; (3) the weakest point of sentence prosody spoken by Koreans is in the rhythmic pattern.

  • PDF

인공 신경망의 한국어 운율 발생에 관한 연구 (The Study on Korean Prosody Generation using Artificial Neural Networks)

  • 민경중;임운천
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2004년도 춘계학술발표대회 논문집 제23권 1호
    • /
    • pp.337-340
    • /
    • 2004
  • 한국어 문-음성 합성 시스템(TTS: Text-To-Speech)은 합성음의 자연스러움을 증가시키기 위해 운율 발생 알고리듬을 만들어 시스템에 적용하고 있다. 운율 법칙은 각국의 언어에 대한 언어학적 정보나 자연음에서 구한 운율에 대한 지식을 기반으로 음성 합성 시스템에 적용하고 있다. 그러나 이렇게 구한 운율 법칙이 자연음에 존재하는 모든 운율 법칙을 포함할 수도 없고, 또 추출한 운율 법칙이 틀린 법칙이라면, 합성음의 자연감이나 이해도는 떨어질 것이므로, TTS의 실용화에 장애가 될 수 있다. 이러한 점을 감안하여 본 논문에서는 자연음에 내재하는 운율을 학습할 수 있는 인공 신경망을 이용한 운율발생 신경망을 제안하였다. 훈련단계에서 인공 신경망의 입력 단에 한국어 문장의 음소 열을 차례로 이동시켜 인가하면 입력 단의 중앙에 해당하는 음소의 운율 정보가 출력되도록 훈련시킬 때, 목표 패턴을 이용한 감독학습을 통해, 자연음에 내재하는 운율을 학습하도록 하였다. 평가 단계에서 문장의 음소 열을 입력하고, 추정율을 측정하여 인공 신경망이 한국어 문장에 내재하는 운율을 학습하여 발생시킬 수 있음을 살펴보았다.

  • PDF

자연어 처리 기반 한국어 TTS 시스템 구현 (Implementation of Korean TTS System based on Natural Language Processing)

  • 김병창;이근배
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.51-64
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant vowel) LTS (letter to sound) rules, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method. The probabilistic method atone usually suffers from performance degradation due to inherent data sparseness problems. So we adopted tree-based error correction to overcome these training data limitations.

  • PDF

프랑스어의 대화 담화에 나타난 운율 연구 (Prosodic characteristics of French language in conversational discourse)

  • 고영림;윤애선
    • 음성과학
    • /
    • 제8권2호
    • /
    • pp.165-180
    • /
    • 2001
  • In this paper prosodic characteristics of French language are analysed with a corpus of radio interview. Intonation patterns are interpreted in terms of raising pattern, focal raising pattern and falling pattern. Accentual prominence is classified in two types, rhythmic accent and focal accent. Focal accent permit to explain the cohesion in a utterance or between two utterances. As a prosodic variable of discourse pauses are described by their form of realization (filled pause, silent pause, hesitation etc), their distribution and their function in utterance.

  • PDF

Acoustic correlates of L2 English stress - Comparison of Japanese English and Korean English

  • Konishi, Takayuki;Yun, Jihyeon;Kondo, Mariko
    • 말소리와 음성과학
    • /
    • 제10권1호
    • /
    • pp.9-14
    • /
    • 2018
  • This study compared the relative contributions of intensity, F0, duration and vowel spectra of L2 English lexical stress by Japanese and Korean learners of English. Recordings of Japanese, Korean and native English speakers reading eighteen 2 to 4 syllable words in a carrier sentence were analyzed using multiple regression to investigate the influence of each acoustic correlate in determining whether a vowel was stressed. The relative contribution of each correlate was calculated by converting the coefficients to percentages. The Japanese learner group showed phonological transfer of L1 phonology to L2 lexical prosody and relied mostly on F0 and duration in manifesting L2 English stress. This is consistent with the results of the previous studies. However, advanced Japanese speakers in the group showed less reliance on F0, and more use of intensity, which is another parameter used in native English stress accents. On the other hand, there was little influence of F0 on L2 English stress by the Korean learners, probably due to the transfer of the Korean intonation pattern to L2 English prosody. Hence, this study shows that L1 transfer happens at the prosodic level for Japanese learners of English and at the intonational level for Korean learners.

Prosodic Contour Generation for Korean Text-To-Speech System Using Artificial Neural Networks

  • Lim, Un-Cheon
    • The Journal of the Acoustical Society of Korea
    • /
    • 제28권2E호
    • /
    • pp.43-50
    • /
    • 2009
  • To get more natural synthetic speech generated by a Korean TTS (Text-To-Speech) system, we have to know all the possible prosodic rules in Korean spoken language. We should find out these rules from linguistic, phonetic information or from real speech. In general, all of these rules should be integrated into a prosody-generation algorithm in a TTS system. But this algorithm cannot cover up all the possible prosodic rules in a language and it is not perfect, so the naturalness of synthesized speech cannot be as good as we expect. ANNs (Artificial Neural Networks) can be trained to learn the prosodic rules in Korean spoken language. To train and test ANNs, we need to prepare the prosodic patterns of all the phonemic segments in a prosodic corpus. A prosodic corpus will include meaningful sentences to represent all the possible prosodic rules. Sentences in the corpus were made by picking up a series of words from the list of PB (phonetically Balanced) isolated words. These sentences in the corpus were read by speakers, recorded, and collected as a speech database. By analyzing recorded real speech, we can extract prosodic pattern about each phoneme, and assign them as target and test patterns for ANNs. ANNs can learn the prosody from natural speech and generate prosodic patterns of the central phonemic segment in phoneme strings as output response of ANNs when phoneme strings of a sentence are given to ANNs as input stimuli.

MPEG-4TTS 현황 및 전망

  • 한민수
    • 전자공학회지
    • /
    • 제24권9호
    • /
    • pp.91-98
    • /
    • 1997
  • Text-to-Speech(WS) technology has been attracting a lot of interest among speech engineers because of its own benefits. Namely, the possible application areas of talking computers, emergency alarming systems in speech, speech output devices for speech-impaired, and so on. Hence, many researchers have made significant progresses in the speech synthesis techniques in the sense of their own languages and as a result, the quality of current speech synthesizers are believed to be acceptable to normal users. These are partly why the MPEG group had decided to include the WS technology as one of its MPEG-4 functionalities. ETRI has made major contributions to the current MPEG-4 775 appearing in various MPEG-4 documents with relatively minor contributions from AT&T and NW. Main MPEG-4 functionalities presently available are; 1) use of original prosody for synthesized speech output, 2) trick mode functions for general users without breaking synthesized speech prosody, 3) interoperability with Facial Animation(FA) tools, and 4) dubbing a moving/anlmated picture with lip-shape pattern informations.

  • PDF

대용량 운율 음성데이타를 이용한 자동합성방식 (Automatic Synthesis Method Using Prosody-Rich Database)

  • 김상훈
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
    • /
    • pp.87-92
    • /
    • 1998
  • In general, the synthesis unit database was constructed by recording isolated word. In that case, each boundary of word has typical prosodic pattern like a falling intonation or preboundary lengthening. To get natural synthetic speech using these kinds of database, we must artificially distort original speech. However, that artificial process rather resulted in unnatural, unintelligible synthetic speech due to the excessive prosodic modification on speech signal. To overcome these problems, we gathered thousands of sentences for synthesis database. To make a phone level synthesis unit, we trained speech recognizer with the recorded speech, and then segmented phone boundaries automatically. In addition, we used laryngo graph for the epoch detection. From the automatically generated synthesis database, we chose the best phone and directly concatenated it without any prosody processing. To select the best phone among multiple phone candidates, we used prosodic information such as break strength of word boundaries, phonetic contexts, cepstrum, pitch, energy, and phone duration. From the pilot test, we obtained some positive results.

  • PDF