• Title/Summary/Keyword: Diphone

Search Result 32, Processing Time 0.02 seconds

Morphological analysis of spoken Korean using Viterbi search (Viterbi 검색 기법을 이용한 한국어 음성 언어의 형태소 분석)

  • 김병창
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.200-203
    • /
    • 1995
  • This paper proposes a spoken Korean processing model which is extensible to large vocabulary continuous spoken Korean system. The integration of phoneme level speech recognition with natural language processing can support a sophisticated phonological/morphological analysis. The model consists of a diphone speech recognizer, a viterbi dictionaly searcher and a morpheme connectivity information checker. Two-level hierarchical TDNNs recognize newly defined Korean diphones. The diphone sequences are segmented and converted to the most probable morpheme sequences by the Viterbi dictionary searcher. The morpheme sequency are then examined by the morpheme connectivity information checker and the correct morpheme sequence which has the greatest probability is collected. The experiments show that the morphological analysis for spoken Korean can be achieved for 328 Eojeols with 80.6% success rate.

  • PDF

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing (다이폰 군집화와 개선된 스펙트럼 완만화에 의한 음성합성)

  • Jang, Hyo-Jong;Kim, Kwan-Jung;Kim, Gye-Young;Choi, Hyung-Il
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.665-672
    • /
    • 2003
  • This paper describes a speech synthesis technique by concatenating unit phoneme. At that time, a major problem is that discontinuity is happened from connection part between unit phonemes, especially from connection part between unit phonemes recorded by different persons. To solve the problem, this paper uses clustered diphone, and proposes a spectral smoothing technique, not only using formant trajectory and distribution characteristic of spectrum but also reflecting human's acoustic characteristic. That is, the proposed technique performs unit phoneme clustering using distribution characteristic of spectrum at connection part between unit phonemes and decides a quantity and a scope for the smoothing by considering human's acoustic characteristic at the connection part of unit phonemes, and then performs the spectral smoothing using weights calculated along a time axes at the border of two diphones. The proposed technique removes the discontinuity and minimizes the distortion which can be occurred by spectrum smoothing. For the purpose of the performance evaluation, we test on five hundred diphones which are extracted from twenty sentences recorded by five persons, and show the experimental results.

Design of A Speech Recognition System using Hidden Markov Models (은닉 마코프 모델을 이용한 음성 인식 시스템 설계)

  • Lee, Chul-Won;Lim, In-Chil
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.1
    • /
    • pp.108-115
    • /
    • 1996
  • This paper proposes an algorithm and a model topology for the connected speech recognition using Discrete Hidden Markov Models. A proposed model uses diphone and triphone model which consider the recognition rate and recognisable vocabulary. Considering more exact inter- phoneme segmentation and execution speed of algorithm, 4 states have to exist in diphone model where the first state and the last state are keeping a steady state, the other states hold a transient state. 7 states have to exist in triphone model where 7 states are specified and improved to 3 steady states and 4 transition states. Also, the proposed speech recognition algorithm is designed to detect the inter-phoneme segmentation during the recognition processing.

  • PDF

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

Does the Bush Warbler (Cettia diphone) Defend Its Territory through a Particular Song Mode or a Mode Sequence? (텃새권 방어와 관련된 휘파람새의 Song Mode와 Mode Sequence의 이용)

  • 박대식;박시룡
    • The Korean Journal of Zoology
    • /
    • v.39 no.3
    • /
    • pp.282-291
    • /
    • 1996
  • The song of the bush warbler, Cettia diphone, consists of an introdudory whisde portion and a complex ending syllable portion. In bush warbiers, a song with two or fewer notes in the whistle portion is classified as an a song mode, while a song with three or more notes in the whistle portion as a $\beta$ song mode. Although some variations occur in mode seledion by individuals and populations, the proportion of a mode songs to total songs is 55% (range 51.6-58.7%) on average. The a mode has a higher dominant frequency in the whistle portion than does the $\beta$ mode, but the number of syllables m the complex ending syllable portion is fewer. Bush warbler mode sequences are defined as $\alpha$$\alpha$, $\alpha$$\beta$, $\beta$$\alpha$ and $\beta$$\beta$ mode sequences. In order to test the hypothesis that song modes and mode sequences play a role in the defence of territory in Jeju and Wando populations in the south-coastal geographic song variation group, playback experiments were executed. Mode sequences differed between naturally produced songs and songs produced in response to playback for two populations. In particular, for birds in the Wando populations our results indicate that the use of song modes may be affeded by habitat, singing site and type of territory, and further propose that particular mode sequences may play a more important role than song mode in vocal interadions.

  • PDF

한국어 문자음성 변환시스템 : 가라사대

  • 권철홍;정원국;구준모;김형순
    • Information and Communications Magazine
    • /
    • v.11 no.9
    • /
    • pp.17-25
    • /
    • 1994
  • 본 논문에서는 국내 최초의 상용 한국어 무제한 음성합성 시스템인 가라사대에 관하여 기술한다. 우선, 음성합성 과정의 각 단계에 이용된 알고리즘을 설명한다. 즉, 문장의 분석을 위해서는 문장 전처리, parsing 발음표기 변환 등의 규칙에 의하여 순차적으로 수행된다. 문장 분석후에는 강세, 억양과 지속시간 등의 운율을 제어하는 요소가 계산되고 음성신호는 확장된 diphone 단위의 음성신호를 연결하여 생성된다. 다음으로 가라사대 하드웨어 및 소프트웨어의 구성에 관하여 서술한다. 범용의 디지탈 신호처리 IC를 이용하여 구현한 하드웨어와 가라사대의 소프트웨어뿐만 아니라 PC내의 소프트웨어의 구성과 역할에 관하여 살펴본다.

  • PDF

Concatenative Speech Sythesis based on Diphone Clustering using improved spectral smoothing (개선된 스펙트럼 스무딩을 이용한 다이폰 클러스터링 기반의 연결 음성합성)

  • 장효종;김계영;최형일
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.499-501
    • /
    • 2002
  • 최근의 합성음성단위 연결을 통한 음성합성 방법의 잘 알려진 문제점은 연결 부분에서 불연속이 발생한다는 것이다. 본 논문에서는 음성을 합성할 때 나타나는 스펙트럼의 불연속을 제거하기 위하여 개선된 스펙트럼 스무딩 방법을 제안한다. 그리고 보다 좋은 스무딩의 결과를 얻기 위하여 음성합성의 단위로는 문맥에 민감한 클러스터링된 다이폰을 사용한다. 스무딩 방법에서는 연결 구간에서의 다이폰 바운더리에서의 양쪽 스펙트럼의 분포를 고려하여 시간에 따라 가중치를 다르게 주어 스무딩을 수행한다. 또한 가중치를 결정할 때 비선형 함수인 B-Spline함수를 사용하여 스무딩을 수행하여 보다 자연스러운 스펙트럼을 생성 할 수 있었다.

  • PDF

Development of a Diphone-Based Audiote System (다이폰단위의 합성방법을 이용한 오디오텍스 시스템의 구현에 관한 연구)

  • 이승훈
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.99-102
    • /
    • 1994
  • 당 연구실에서 개발했던 초기의 오디오텍스 시스템은 LSP 파라미터를 이용한 무제한 한국어 음성합성 장치로서 합성데이타베이스는 640개의 반음절로 구성되어 있었다. 그러나 이 시스템은 일반 사용자들에게 음성합성 서비스를 제공하기에는 damwlf이 너무 미흡하였으므로 음원모델의 수정, 에너지 contour의 조절등을 사용하여 어느 정도 음질개선을 꾀하였으나 만족할 만한 수준에는 도달하지 못했다. 그래서 합성단위를 다이폰단위로 수정한 새로운 오디오텍스 시스템을 ngus하였다. 다이폰단위의 오디오텍스시스템은 한국어의여러가지 음운환경을 고려하여 1228개의 합성단위로 구성되어 있으며 LSP 파라미터를 이용한 합성방식을 채택하고 있다. 또한 음원생성시 수정된 LF 모델에 자음의 명료도 및 자연성을 높이기 위해 TMS320C30 DSP chip, MC68020 CPU, 고속 메모리소자, 및 VRTOS를 사용하여 시스템을 구현하였으며, 청취실험결과 기존의 합성방법보다 자연성 및 명료도에서 개선된 음질을 얻을 수 있었다.

  • PDF