• Title/Summary/Keyword: Diphone

Search Result 32, Processing Time 0.027 seconds

An Automatic Diphone Segmentation for Korean Speech Synthesis-by-Rule (한국어 규칙 합성을 위한 다이폰의 자동 추출)

  • 정인종;경연정;김한우;이양희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.12 no.2E
    • /
    • pp.63-72
    • /
    • 1993
  • 본 논문에서는 무제한 음성 생성을 위한 단위음성으로서의 다이폰을 2음절 자연음성으로부터 자동 추출하는 알고리즘을 제안한다. 입력음성을 개량 켑스트럼 파라미터로 분석하여 이로부터 다이폰 추출 파라미터들을 도출한다. 제안된 파라미터로는 에너지 레벨을 나타내는 0차 켑스트럼의 동적변화량, 스펙트럼의 시간 변화량 영교차율, 캡스트럼의 유클리디안 거리이다. 스펙트럼 포락의 변화가 완만한 모음 연쇄등의 음소 경계를 보다 효율적으로 검출하기 위해 스펙트럼의 시간 변화를 미세부분과 개형부분으로 나누어 각각을 파라미터로 사용한다. VV(모음연쇄), VCV(C: 반모음, 자음), VCCV형들로 이루어진 2음절 단어들에 대해 실험한 결과, 모음연쇄 등이 포함되어 있음에도 약 85% 정확도의 음소경계검출을 얻었다. 본 논문에 의한 다이폰을 이용한 합성음의 청취실험 결과 명료도가 높음을 확인하였다.

  • PDF

A Study on the Rejection Algorithm Using Generic Word Model Based on Diphone Subword Unit (다이폰 기반의 Generic Word Model을 이용한 거절 알고리즘)

  • Chung, Ik-Joo;Chung, Hoon
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.15-25
    • /
    • 2003
  • In this paper, we propose an algorithm on OOV(Out-of-Vocabulary) rejection based on two-stage method. In the first stage, the algorithm rejects OOVs using generic word model, and then in the second stage, for further reduction of false acceptance, it rejects words which have low similarity to the candidate by measuring the distance between HMM models. For the experiment, we choose 20 in-vocabulary words out of PBW445 DB distributed by ETRI. In case that the first stage is processed only, the false acceptance is 3% with 100% correct acceptance, and in case both stages are processed, the false acceptance is reduced to 1% with 100% correct acceptance.

  • PDF

A Study on Phoneme-Based PSOLA Speech Synthesis Using LSP (LSP를 이용한 음소단위 PSOLA 음성합성에 관한 연구)

  • 권혁제;조순계;김종교
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.2
    • /
    • pp.3-10
    • /
    • 1998
  • 본 논문에서는 음소단위 PSOLA 한국어 합성을 LSP line의 조절과 자모음 분석을 통해서 실시하였다. 음성합성에서 많이 사용하는 triphone, diphone, demisyllable등과 같은 합성단위들은 자연스러운 합성음을 위해 다양한 음운환경에서 수집된다. 그러나, 이런 방법 은 많은 시간과 메모리가 요구된다. 본 논문에서는 합성단위로서 자음17개, 모음 16개로 총 33개의 음소를 이용하였다. 자음은 후위모음/이/인 CV에서 segment되고, 모음은 단음절의 단모음과 이중모음을 1인의 화자로부터 합성데이터를 수집하였다. 또한, 10명의 화자가 발성 한 CV에서 각 모음에 따라 변하는 자음의 주파수를 분석하였고, CV+VC 또는 CV+CV에서 각 자음에 따라 변하는 모음의 포먼트변화를 분석하였다. 분석결과를 토대로 모음은 LSP line을 조절해서 PSOLA합성을 하고, 자음은 합성하려는 모음과 결합하였다. 그 결과 6개의 합성단어에 대한 청취율은 65%를 보였다.

  • PDF

Text-to-Speech System Using Variable Synthesis Units (가변합성단위를 사용한 문서 음성 변환 시스템)

  • 조관선;이철희
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1998.06a
    • /
    • pp.99-102
    • /
    • 1998
  • 본 논문에서는 자연스러운 음성을 합성하기 위해 가변합성단위를 사용한 합성시스템을 제안한다. 음소나 diphone과 같은 작은 단위를 사용하는 기존의 시스템은 음성세그먼트 연결시 접속점이 많아지는 단점이 있다. 반면에 단어나 복합음소와 같이 큰 단위를 사용할 경우 접속점의 수가 감소하여 음질이 향상되지만 단위수 증가로 무제한 합성이 어려워진다. 이러한 문제를 해결하기 위하여 본 논문에서는 접속점의 수를 줄이고 적정한 크기의 메모리로 향상된 음질을 얻기 위한 방법으로 어절 및 CVC와 같은 큰 단위와 반음절과 같은 작은 단위를 선택적으로 사용하여 음성을 합성한다. 실험은 특정문장을 대상으로 각각 반음절, CVC로 합성한 음성과 이들을 어절과 혼합하여 합성한 음성을 비교하였으며 그 결과 가변단위를 사용하여 합성한 음성이 비교적 자연스러움을 알 수 있었다.

  • PDF

Diphone-based Intonation and VoiceXML document Generation using Multi-dimensional Linguistic Information (다양한 언어 정보를 이용한 음소 단위 억양 및 VoiceXML 문서 생성)

  • Lee, Hwa-Jin;Park, Jong-C.
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.69-76
    • /
    • 2002
  • 최근 음성 합성 과정에서 화자의 의도를 가장 많이 반영하는 언어 정보인 문맥 정보를 사용하려는 시도가 이루어지고 있으나 문맥 정보를 적은 비중으로 사용하기 때문에 자연성 향상에 큰 도움을 주지 못하고 있다. 본 연구에서는 구문 정보, 의미 정보를 억양 생성 과정에 이용함과 동시에 문맥 정보와 음성 정보와의 관계를 음성 데이터를 바탕으로 분석하여 다양한 문맥 정보를 음성 합성 과정에 반영하는 방법을 제안한다. 또한 한국어에서 나타나는 다양한 억양 곡선 유형을 형태소를 이용하여 의다 효율적으로 처리할 수 있는 방법을 제안하여 자연스러운 억양 생성 시스템을 구현하고 시스템의 결과를 음소 단위 억양 생성기와 VoiceXML을 이용하여 적용시켜보고 결과를 논의한다.

  • PDF

Vertebrate Fauna, Speciation and Geological History in the Cheju Island (제주도의 척추동물상과 종분화 및 지사학적 역사)

  • 심재한;박병상
    • Korean Journal of Environment and Ecology
    • /
    • v.12 no.1
    • /
    • pp.42-57
    • /
    • 1998
  • Cheju island had that a fresh water Pisces composed of 9 Orders, 12Families, 24 Species, Amphibians composed of 2 Orders, 6 Families, 9 Species, Reptiles composed of 2 Suborders, 5 Families, 10 Species, Aves composed of 18 Orders, 49 Families, 236 Species and Mammalian composed of 6 Oredrs, 9 Families, 16 Species. So, total vertebrate's fauna were 35 Oredrs, 2 Suborders, 80 Families, 4 Subfamilies and 295 Species. Endemic species of the Cheju island were Mustela sibirica quelpartis, Apodemus agrarius vhejuensis, Micromys minutus hertigi and Crocidura russula quelpartis, Ageithalos caudatus trivirgatus, Sitta europaea bedfordi, Eophona personata personata and Dendrocopos oeucotos quelpartis, Troglodytes troglodytes fumigatus, Parus major minor, Cettia diphone cantans and Hynobius leechii quelpartis. Especially, Sibynoghis collaris and Anguilla mauritiana were only habitated in the Cheju island. And the Cheju island was formed in extending from Plieocene to Pleistocene. Differentiation of species was continued by geological isolation 0.3 million years that repeating glacial epoch and interglacial epoch.

  • PDF

Adaptive Korean Continuous Speech Recognizer to Speech Rate (발화속도 적응적인 한국어 연속음 인식기)

  • Kim, Jae-Beom;Park, Chan-Kyu;Han, Mi-Sung;Lee, Jung-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1531-1540
    • /
    • 1997
  • In this paper, we presents automatic Korean continuous speech recognizer which is improved by the speech rate estimation and the compensation methods. Automatic continuous speech recognition is significantly more difficult than isolated word recognition because of coarticulatory effects and variations in speech rate. In order to recognize continuous speech, modeling methods of coarticulatory effects and variations in speech rate are needed. In this paper, the speech rate is measured by change of format, and the compensation is peformed by extracting relatively many feature vectors in fast speech. Coarticulatory effects are modeled by defining 514 Korean diphone set, and ETRI's 445 word DB is used for training speech material. With combining above methods, we implement automatic Korean continuous speech recognizer, which shows improved recognition rate, based on DHMM(Discrete Hidden Markov Model).

  • PDF

Analysis of Changes on the Forest Environment and the Bird Community in Terms of ‘Guild’ (길드에 의한 산림환경과 조류군집 변화 분석)

  • Lee, Woo-Shin;Park, Chandra
    • The Korean Journal of Ecology
    • /
    • v.18 no.3
    • /
    • pp.397-408
    • /
    • 1995
  • This study was conducted to analyze the breeding bird community by using guild concept in Mt. Baekwoon Research Forest of Seoul National University. Bird Community was studied by line transect method during the breeding seasons of birds in 1982 and its results were compared and analyzed with the Park et al. Work in 1993. Guilds were characterized by nesting and foraging sites as follows: nesting guild - (H) hole, (C) canopy, (B) bush, (E) edge; and foraging guild - (o) outside, (c) canopy, (b) bush. Bush-nesting guilds such as Tricolor Flycatcher (Ficedula zanthopygia), Blue-and-White Flycatcher (Cyanoptila cyanomelana), Stonechat (Saxicola torquata), Bush Warbler (cettia diphone) and Skylark (Alauda arvensis) had disappeared after ten years. Outside-foraging guilds such as Common Buzzard (Buteo buteo), sparrow Hawk (Accipiter nisus) and Carrion Crow (Corvus corone) also were not observed. There was a sharp decrease of species richness of bush-nesting guild, canopy-foraging guild and bush-foraging guild compared to ten years ago. These decreases indicate that forest environment of this area has been changed for the ten years, and guild concept in this study can be used usefully to elucidate the change of bird community according to the change in forest environment.

  • PDF

Playback Expedments on Bush Warbiers (Cettia diphone): Their Song Recognition of Intra- and Inter-Population (휘파람새의 Intra-and Inter-Population Songs 인식에 관한 Playback실험)

  • 박시룡;박대식;김수일;윤무부
    • The Korean Journal of Zoology
    • /
    • v.38 no.4
    • /
    • pp.443-448
    • /
    • 1995
  • Playback experiments were performed to clarify the degree of song recognition using inter - and intra-populatlon songs of Bush Warbler at Cheongwon, Chungbuk area. Six territorial males were strongly responded to inter- as well as intrapopulation songs. Their responses to the inter- and Intra-population songs were not differed significantly in all measures of latency time, staying time, and closest distance. This result imply that Bush Warbiers in the region did not discriminate the difference between intra- and inter-population songs. It may be the reason that the regional males have little Interactions In song exchange with neighbors by keeping a long Individual distance. In order to investigate the signal value as species recognition releaser, playback of partial songs, prepared from tow distinct regional populations of the spedes were presented to males of the study area. The partial songs presented were made of two portions for each presentation, a whlsde portion only, and a complex syllable portion only. Territorial males responded stronger to the complex syllable portion than the whistle portion of the song. This result indicate that the complex syllable portion conveys more information on the species recognition. As 'releaser' hypothesis suggested formerly, a function of the complex syllable portion In Bush Warbler song is understood In which conveys most spedesIdentifying information. Thus, the result of this playback experiments supports the releaser hypothesis.

  • PDF

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique (새로운 스펙트럼 완만화에 의한 합성 음질 개선)

  • 장효종;최형일
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1037-1043
    • /
    • 2003
  • This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.