• Title/Summary/Keyword: Part of speech

Search Result 439, Processing Time 0.04 seconds

Synthesis-by-rule of Korean: Part II - Speech Synthesis Using the Units of Demisyllables (우리말 규칙합성에 관한 연구 (II) - 반음절 단위의 음성합성)

  • Cheon, Kang-Sik;Lee, Sung-Jun;Lee, Jae-Hong
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.29-32
    • /
    • 1988
  • A new set of the units of demi-syllables is presented for Korean speech synthesis. The performance of the set of demi-syllable units is compared with that of the set of syllable units in the aspects of the quality of synthesized speech using each set of the units and the size of the computer memory which each set of units occupies. The set of demi-syllable units achieves comparable speech quality and occupies smaller memory size than the set of syllable units.

  • PDF

Phoneme Frequency of 3 to 8-year-old Korean Children (3세${\sim}$8세 아동의 자유 발화 분석을 바탕으로 한 한국어 말소리의 빈도 관련 정보)

  • Sin, Ji-Yeong
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.15-19
    • /
    • 2005
  • The aim of this study is to provide some information on frequencies of occurrence for units of Korean phonemes and syllables analysing spontaneous speech spoken by 3 to 8-year-old Korean children. 49 Korean Children(7${\sim}$10 children for each age) were employed as subjects for this study. Speech data were recorded and phonemically transcribed. 120 utterances for each child were selected for analysis except one child whose data were only 91 utterances. The data size of the present study were 5,971 utterances, 5,1554 syllables, and 105491 phonemes. Among 19 consonants, /n/ showed highest frequency rate of these four conson ants were over 50% for all age groups. Among 18 vowels, /a/ was the most frequent one and /i/ and / ${\wedge}$ were the second and third respectively. The frequency rate of these four consonants were over 50% for all age groups. Frequently occurring syllable types were a part of grammatical word in most cases. Only 5${\sim}$6% of syllable types covered 50% of speech.

  • PDF

Speech Feature Extraction Using Auditory Model (청각모델을 이용한 음성신호의 특징 추출 방법에 관한 연구)

  • Park, Kyu-Hong;Kim, Young-Ho;Jung, Sang-Kuk;Rho, Seung-Yong
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2259-2261
    • /
    • 1998
  • Auditory Models that are capable of achieving human performance would provide a basis for realizing effective speech processing systems. Perceptual invariance to adverse signal conditions (noise, microphone and channel distortions, room reverberations) may provide a basis for robust speech recognition and speech coder with high efficiency. Auditory model that simulates the part of auditory periphery up through the auditory nerve level and new distance measure that is defined as angle between vectors are described.

  • PDF

An HMM-based Korean TTS synthesis system using phrase information (운율 경계 정보를 이용한 HMM 기반의 한국어 음성합성 시스템)

  • Joo, Young-Seon;Jung, Chi-Sang;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.89-91
    • /
    • 2011
  • In this paper, phrase boundaries in sentence are predicted and a phrase break information is applied to an HMM-based Korean Text-to-Speech synthesis system. Synthesis with phrase break information increases a naturalness of the synthetic speech and an understanding of sentences. To predict these phrase boundaries, context-dependent information like forward/backward POS(Part-of-Speech) of eojeol, a position of eojeol in a sentence, length of eojeol, and presence or absence of punctuation marks are used. The experimental results show that the naturalness of synthetic speech with phrase break information increases.

  • PDF

The Acoustic Characteristics in Women Diver's Soombijil Sound (해녀의 숨비질소리에 대한 음향특징)

  • Han, Ji-Yeon;Park, Hyun-Ja;Jeong, Ok-Ran
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.176-179
    • /
    • 2007
  • This study examined the acoustic characteristics in women diver's Soombijil sound. A total of 18 women divers was attended this study. Acoustic analysis was performed via Praat. Soombijil sound were classified into three types as pitch variations in beginning, middle, and ending part. Type I showed increasing-decreasing-flat. Type II was identified by the shape of flat-flat-increasing. The shape of type III showed increasing-decreasing-increasing. Duration of Soombijil sound was mean 1.48 sec. The range of frequency was 1591.54 ${\sim}$ 4477.13 Hz. FFT analysis showed that frequencies were concentrated 500${\sim}$2000 Hz. Type I and II showed two peaks at 500 Hz and 1500${\sim}$2000 Hz. Type III has one peak below 500 Hz.

  • PDF

The Smoothing Method of the Concatenation Parts in Speech Waveform by using the Forward/Backward LPC Technique (전, 후방향 LPC법에 의한 음성 파형분절의 연결부분 스므딩법)

  • 이미숙
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.15-20
    • /
    • 1991
  • In a text-to-speech system, sound units (e. q., phonemes, words, or phrases) can be concatenated together to produce required utterance. The quality of the resulting speech is dependent on factors including the phonological/prosodic contour, the quality of basic concatenation units, and how well the units join together. Thus although the quality of each basic sound unit is high, if occur the discontinuity in the concatenation part then the quality of synthesis speech is decrease. To solve this problem, a smoothing operation should be carried out in concatenation parts. But a major problem is that, as yet, no method of parameter smoothing is availalbe for joining the segment together.

  • PDF

Voice Activity Detection with Run-Ratio Parameter Derived from Runs Test Statistic

  • Oh, Kwang-Cheol
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.95-105
    • /
    • 2003
  • This paper describes a new parameter for voice activity detection which serves as a front-end part for automatic speech recognition systems. The new parameter called run-ratio is derived from the runs test statistic which is used in the statistical test for randomness of a given sequence. The run-ratio parameter has the property that the values of the parameter for the random sequence are about 1. To apply the run-ratio parameter into the voice activity detection method, it is assumed that the samples of an inputted audio signal should be converted to binary sequences of positive and negative values. Then, the silence region in the audio signal can be regarded as random sequences so that their values of the run-ratio would be about 1. The run-ratio for the voiced region has far lower values than 1 and for fricative sounds higher values than 1. Therefore, the parameter can discriminate speech signals from the background sounds by using the newly derived run-ratio parameter. The proposed voice activity detector outperformed the conventional energy-based detector in the sense of error mean and variance, small deviation from true speech boundaries, and low chance of missing real utterances

  • PDF

A Study on TSIUVC Approximate-Synthesis Method using Least Mean Square (최소 자승법을 이용한 TSIUVC 근사합성법에 관한 연구)

  • Lee, See-Woo
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.223-230
    • /
    • 2002
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involves a distortion of speech waveform in case coexist with a voiced and an unvoiced consonants in a frame. This paper present a new method of TSIUVC (Transition Segment Including Unvoiced Consonant) approximate-synthesis by using Least Mean Square. The TSIUVC extraction is based on a zero crossing rate and IPP (Individual Pitch Pulses) extraction algorithm using residual signal of FIR-STREAK Digital Filter. As a result, This method obtain a high Quality approximation-synthesis waveform by using Least Mean Square. The important thing is that the frequency signals in a maximum error signal can be made with low distortion approximation-synthesis waveform. This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis and speech synthesis.

Implementation of Korean TTS System based on Natural Language Processing (자연어 처리 기반 한국어 TTS 시스템 구현)

  • Kim Byeongchang;Lee Gary Geunbae
    • MALSORI
    • /
    • no.46
    • /
    • pp.51-64
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant vowel) LTS (letter to sound) rules, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method. The probabilistic method atone usually suffers from performance degradation due to inherent data sparseness problems. So we adopted tree-based error correction to overcome these training data limitations.

  • PDF

A Phonetic Analysis of Yodel Singing by the Electroglottographic(EGG) Measurement (요들송에 대한 전기성문파형검사(EGG)를 이용한 발성학적 접근)

  • Suh, D.;Choi, H.S.
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.113-126
    • /
    • 2000
  • A comparative phonetic analysis of Yodel singing and Belcanto singing by the electroglottographic(EGG) measurement was done in three singers. One professional tenor singer(SDI) who is also well trained in Yodel singing, another yodler(KWS) who is not so trained in Belcanto singing, and the other training tenor singer(CSK) who is not well trained both yodel and Belcanto singing. Closed quotient(CQ), speed quotient(SQ) and fundamental frequency (F0) at the initial modal part(I) , middle falsetto part(M), and final modal part(F) of the same phrase were measured by EGG machine and program(Kay model 4338). In the middle part, not only CQ but also SQ of the Yodel singing were much smaller than that of Belcanto singing in all three singers. However, accuracy of parameters in Belcanto singing of the yodler(KWS) and both Yodel singing and Belcanto singing of the training singer(CSK) were inferior to that of trained tenor singer(SDI). Possible advantages of utilizing Yodel singing training under the guidance of feedback control by the EGG for hyperfunctional voice disorders such as vocal nodules were discussed.

  • PDF