• Title/Summary/Keyword: Prosody

Search Result 208, Processing Time 0.027 seconds

A study on the Suprasegmental Parameters Exerting an Effect on the Judgment of Goodness or Badness on Korean-spoken English (한국인 영어 발음의 좋음과 나쁨 인지 평가에 영향을 미치는 초분절 매개변수 연구)

  • Kang, Seok-Han;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.3-10
    • /
    • 2011
  • This study investigates the role of suprasegmental features with respect to the intelligibility of Korean-spoken English judged by Korean and English raters as being good or bad. It has been hypothesized that Korean raters would have different evaluations from English native raters and that the effect may vary depending on the types of suprasegmental factors. Four Korean and four English native raters, respectively, took part in the evaluation of 14 Korean subjects' English speaking. The subjects read a given paragraph. The results show that the evaluation for 'intelligibility' is different for the two groups and that the difference comes from their perception of L2 English suprasegmentals.

  • PDF

On A Pitch Alteration using the Waveform Symmetry with Time - Frequency Conversion (시간 - 주파수 변환에 의한 파형 대칭 피치변경법)

  • 박형빈
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.147-150
    • /
    • 1998
  • In the case of speech synthesis, the waveform coding method with high quality is mainly used to the synthesis by analysis. Because the parameters of this coding method are not classified as both excitation and vocal tract parameters, it is difficult to apply the waveform coding method to the synthesis by rule. Thus, in order to apply the waveform coding method to the synthesis by rule, a pitch alteration is required for the prosody control. In the speech synthesis method by the conventional PSOLA technique, applying symmetric window function to asymmetric speech waveform, it occurs the unbalance phenomenon of energy according to the overlapped degree of pitch interval adjustment. In this paper to overcome the unbalance phenomenon of energy, we proposed a new method that can convert asymmetric waveform to symmetric one by time-frequency conversion. As a result, we can obtain an average spectrum distortion ratio with 6.38% according to the pitch alteration ratio.

  • PDF

Prosody Boundary Index Prediction Model for Continuous Speech Recognition and Speech Synthesis (연속음성 인식 및 합성을 위한 운율 경계강도 예측 모델)

  • 강평수
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.99-102
    • /
    • 1998
  • 본 연구에서는 연속음 인식과 합성을 위한 경계강도 예측 모델을 제안한다. 운율 경계 강도는 음성 합성에서는 운율구 사이의 휴지기의 길이 조절로 합성음의 자연도에 기여를 하고 연속음 인식에서는 인식과정에서 나타나는 후보문장의 선별 과정에 특징변수가 되어 인식률 향상에 큰 역할을 한다. 음성학적으로 발화된 문장은 큰 경계 단위로 볼 때 운율구 형태로 이루어졌다고 볼 수 있으며 구의 경계는 문장의 문법적인 특징과 관련을 지을 수 있게 된다. 본 논문에서는 운율 경계 강도 수준을 4로 하고 문법적인 특징으로는 트리구조 방법으로 결정된 오른쪽 가지의 수식의 깊이(rd)와 link grammar방법으로 결정된 음절수(syl), 연결거리(torig)를 bigram 모형과 결합하여 운율적 경계 강도를 예측한다. 예측 모형으로는 다중 회귀 모형과 Marcov 모형을 제안한다. 이들 모형으로 낭독체 200 문장에 대해 실험한 결과 76%로 경계 강도를 예측할 수 있었다.

  • PDF

A Study on the Architecture and Learning of the Artificial Neural Networks for Prosody Generation of Korean Sentence (한국어 운율 발생용 인공신경망의 구조 및 학습에 관한 연구)

  • Min Kyung-Joong;Lim Un-Cheon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.135-138
    • /
    • 2004
  • 음성처리기술은 정보화 시대를 위한 주요 기술의 하나이다. 이 중에서도 음성합성의 연구는 디지털 신호처리 기술과 컴퓨터의 발달로 활발히 진행되고 있다. 그러나 음성 합성기에 의해 발생된 합성음의 음질은 이해도 면에서는 상당한 진전이 있었지만, 자연감 면에서는 만족한 수준에 도달할 수 없었는데, 이러한 합성시스템의 문제점을 해결하는 방법은 다양하게 적용되는 언어정보와 합성음의 자연감을 결정하는 정확한 운율정보가 필요하다. 그러나 구한 운율 정보가 자연음에 존재하는 모든 운율 법칙을 포함할 수 없고, 또한 추출한 운율 법칙이 틀린 것이면 자연감이나 이해도가 떨어지는 합성음이 만들어지고 이것은 음성 합성 시스템의 실용화에 장애로 작용할 것이다. 본 논문은 한국어 음성 합성 시 문제가 되는 자연감을 높이기 위한 한 방법으로 자연음에 내재하는 운율 변화를 효율적으로 학습할 수 있는 인공 신경망을 제안하였다.

  • PDF

Improvements on Phrase Breaks Prediction Using CRF (Conditional Random Fields) (CRF를 이용한 운율경계추성 성능개선)

  • Kim Seung-Won;Lee Geun-Bae;Kim Byeong-Chang
    • MALSORI
    • /
    • no.57
    • /
    • pp.139-152
    • /
    • 2006
  • In this paper, we present a phrase break prediction method using CRF(Conditional Random Fields), which has good performance at classification problems. The phrase break prediction problem was mapped into a classification problem in our research. We trained the CRF using the various linguistic features which was extracted from POS(Part Of Speech) tag, lexicon, length of word, and location of word in the sentences. Combined linguistic features were used in the experiments, and we could collect some linguistic features which generate good performance in the phrase break prediction. From the results of experiments, we can see that the proposed method shows improved performance on previous methods. Additionally, because the linguistic features are independent of each other in our research, the proposed method has higher flexibility than other methods.

  • PDF

A Study on the Prosody Generation of Korean Sentences (한국어 문장 단위운율 발생에 관한 연구)

  • 민경중
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.419-423
    • /
    • 1998
  • 법칙합성 시스템은 합성단위 합성기, 합성방식 등에 따라 여러 가지 다양한 음성합성시스템이 있으나 순수한 법칙합성 시스템이 아니고 기본 합성단위를 연결하여 합성음을 발생시키는 연결합성 시스템은 연결단위사이 그리고 문장단위에서의 매끄러운 합성계수의 변화를 구현하지 못해 자연감이 떨어지는 실정이다. 자연감을 높이기 위해 보다 자연음에 가까운 운율을 발생시키기 위해 먼저 운율에 영향을 주는 요소들을 고려하여 신경망 입력 패턴을 구성한다. 분절요인에 의한 영향을 고려해주기 위해 전후 3음소를 동시에 입력시키고 문장내에서의 구문론적인 영향을 고려해주기 위해 해당 음소의 문장내에서의 위치, 운율구에 관한 정보등을 신경망의 입력 패턴으로 구성하였다. 신경망을 훈련시키기 위한 언어자료로는 고립단어군과 음소균형 문장군 그리고 삽입음절연결어 등으로 구성한다. 특정화자로 하여금 신경망을 훈련시켜 자연음의 운율과 유사한 합성운을 발생시켰다.

  • PDF

Discrimination of Emotional States In Voice and Facial Expression

  • Kim, Sung-Ill;Yasunari Yoshitomi;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2E
    • /
    • pp.98-104
    • /
    • 2002
  • The present study describes a combination method to recognize the human affective states such as anger, happiness, sadness, or surprise. For this, we extracted emotional features from voice signals and facial expressions, and then trained them to recognize emotional states using hidden Markov model (HMM) and neural network (NN). For voices, we used prosodic parameters such as pitch signals, energy, and their derivatives, which were then trained by HMM for recognition. For facial expressions, on the other hands, we used feature parameters extracted from thermal and visible images, and these feature parameters were then trained by NN for recognition. The recognition rates for the combined parameters obtained from voice and facial expressions showed better performance than any of two isolated sets of parameters. The simulation results were also compared with human questionnaire results.

Prosodic Strengthening in Speech Production and Perception: The Current Issues

  • Cho, Tae-Hong
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.7-24
    • /
    • 2007
  • This paper discusses some current issues regarding how prosodic structure is manifested in fine-grained phonetic details, how prosodically-conditioned articulatory variation is explained in terms of speech dynamics, and how such phonetic manifestation of prosodic structure may be exploited in spoken word recognition. Prosodic structure is phonetically manifested in prosodically important landmark locations such as prosodic domain-final position, domain-initial position and stressed/accented syllables. It will be discussed how each of the prosodic landmarks engenders particular phonetic patterns, ow articulatory variation in such locations are dynamically accounted for, and how prosodically-driven fine-grained phonetic detail is exploited by listeners in speech comprehension.

  • PDF

'Hanmal' Korean Language Diphone Database for Speech Synthesis

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.55-63
    • /
    • 2005
  • This paper introduces a 'Hanmal' Korean language diphone database for speech synthesis, which has been publicly available since 1999 in the MBROLA web site and never been properly published in a journal. The diphone database is compatible with the MBROLA programme of high-quality multilingual speech synthesis systems. The usefulness of the diphone database is introduced in the paper. The paper also describes the phonetic and phonological structure of the database, showing the process of creating a text corpus. A machine-readable Korean SAMPA convention for the control data input to the MBROLA application is also suggested. Diphone concatenation and prosody manipulation are performed using the MBR-PSOLA algorithm. A set of segment duration models can be applied to the diphone synthesis of Korean.

  • PDF

A Learning Method of French Prosodic Rhythm for Korean Speakers using CSL (CSL를 이용한 한국인의 프랑스어 운율학습 방안)

  • Lee, E.Y.;Lee, M.K.;Lee, J.H.
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.83-101
    • /
    • 1999
  • The aim of this study is to provide a learning method of prosodic rhythm for Taegu North Kyungsang Korean speakers to learn French rhythm more effectively. The rhythmic properties of spoken French and Taegu North Kyungsang Korean dialect are different from each other. Therefore, we try to provide a basic rhythmic model of the two languages by dividing into three parts: syllable, rhythmic unit and accent, and intonation. To do so, we recorded French of Taegu Kyungsang Korean speakers, and then analysed and compared the rhythmic properties of Korean and French by spectrograph. We tried to find rhythmic mistakes in their French pronunciation, and then established a learning model to modify them. After training with the CSL Macro learning model, we observed the output result. However, although learners understand the method we have proposed, an effective method which is possible by repeating practice must be arranged to be actually used in direct verbal communications in a well-developed learning programme. Hence, this study may play an important role at the level of preparation in the setting of an effective rhythmic learning programme.

  • PDF