• Title/Summary/Keyword: Speech Synthesis

Search Result 381, Processing Time 0.023 seconds

An Implementation of Speech Recognition and Synthesis System using Japanese-Korean Phonetic Transcription (일한 음차 변환을 이용한 음성인식 및 합성기의 구현)

  • 이용주;이현구;윤재선;양원렬;홍광석
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.401-403
    • /
    • 2000
  • 본 논문에서는 일한 음차 변환을 이용한 음성인식 및 합성기를 구현하였다. 음성인식의 경우 CV, VCCV, VCV, VV, VC 단위를 사용하였다. 이와 같이 단위별로 미리 구축된 모델을 결합함으로써 음성인식 시스템을 구축하였다. 따라서 일한 음차 변환을 적용하게 되면 인식 대상이 일어단어일 경우에도 이를 한글 발음으로 변환한 후 그에 해당하는 모델을 생성함으로써 인식이 가능하다. 음성 합성기의 경우 합성에 필요한 한국어 음성 데이터 베이스를 구축하고, 입력되는 텍스트에 따라 이를 연결하여 합성음을 생성한다. 일어가 입력될 경우 일한 음차 변환 규칙을 이용하여 입력된 일어 발음을 한글로 바꾸어 준 후 입력하게 되므로 별도의 일어 합성기 없이도 합성음을 생성할 수 있다.

  • PDF

An Implementation of Unlimited Speech Recognition and Synthesis System using Transcription of Roman to Hangul (영한 음차 변환을 이용한 무제한 음성인식 및 합성기의 구현)

  • 양원렬;윤재선;홍광석
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.08a
    • /
    • pp.181-184
    • /
    • 2000
  • 본 논문에서는 영한 음차 변환을 이용한 음성인식 및 합성기를 구현하였다. 음성인식의 경우 CV(Consonant Vowel), VCCV, VCV, VV, VC 단위를 사용하였다. 위의 단위별로 미리 구축된 모델을 결합함으로써 무제한 음성인식 시스템을 구축하였다. 따라서 영한 음차 변환을 이용하게 되면 인식 대상이 영어단어일 경우에도 이를 한글 발음으로 변환한 후 그에 해당하는 모델을 생성함으로써 인식이 가능하다. 음성 합성기의 경우 합성에 필요한 한국어 음성 데이터 베이스를 구축하고, 입력되는 텍스트에 따라 이를 연결하여 합성음을 생성한다. 영어가 입력될 경우 영한 음차 변환을 이용하여 입력된 영어발음을 한글로 바꾸어 준 후 입력하게 되므로 별도의 영어 합성기 없이도 합성음을 생성할 수 있다.

  • PDF

Real-Time Implementation of a SBC Codec Using a NEC 7720 DSP (NEC 7720 DSP를 이용한 SBC codec의 실시간 구현)

  • Oh, Soo Hwan;Lee, Sang Uk
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.23 no.4
    • /
    • pp.429-438
    • /
    • 1986
  • In this paper we have designed and implemented a real-time, full-duplex SBC (sub-band coding) codec at 16kbps using a high speed digital signal processor, NEC 7720. The SBC codec employs a QMF(quadrature mirror filter) filter bank based on the tree structures of two-band analysis-synthesis pairs to partition speech signal into 4 octabe bands. Computer simulation has been done to investigate the effect of fixed-point computation of the NEC 7720. Three different performance measures, the conventional signal-to-noise ratio, the informal listening test, and an LPC(linear predictive coding)distance measure, have been used in this simulation. The necessary parameters have been optimized through the simulation. The developed hardware and software have been tested in real-time operation using a hardware emulator.

  • PDF

Korean Pause Prediction Model based on Dialogue Context (대화 맥락에 기반한 한국어 휴지 예측 모델)

  • Joung Lee;Jeongho Na;Jeongbeom Jeong;Maengsik Choi;Chunghee Lee;Seung-Hoon Na
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.404-408
    • /
    • 2023
  • 음성 사용자 인터페이스(Voice User Interface)에 대한 수요가 증가함에 따라 음성 합성(Speech Synthesis) 시스템에서 자연스러운 음성 발화를 모방하기 위해 적절한 위치에 휴지를 삽입하는 것이 주된 과업으로 자리잡았다. 대화의 연속성을 고려했을 때, 자연스러운 음성 기반 인터페이스를 구성하기 위해서는 대화의 맥락을 이해하고 적절한 위치에 휴지를 삽입하는 것이 필수적이다. 이에 따라 본 연구는 대화 맥락에 기반하여 적절한 위치에 휴지를 삽입하는 Long-Input Transformer 기반 휴지 예측 모델을 제안하고 한국어 대화 데이터셋에서 검증한 결과를 보인다.

  • PDF

A quantitative study on the minimal pair of Korean phonemes: Focused on syllable-initial consonants (한국어 음소 최소대립쌍의 계량언어학적 연구: 초성 자음을 중심으로)

  • Jung, Jieun
    • Phonetics and Speech Sciences
    • /
    • v.11 no.1
    • /
    • pp.29-40
    • /
    • 2019
  • The paper investigates the minimal pair of Korean phonemes quantitatively. To achieve this goal, I calculated the number of consonant minimal pairs in the syllable-initial position as both raw counts and relative counts, and analyzed the part of speech relations of the two words in the minimal pair. "Urimalsaem" was chosen as the object of this study because it was judged that the minimal pair analysis should be done through a dictionary and it is the largest among Korean dictionaries. The results of the study are summarized as follows. First, there were 153 types of minimal pairs out of 337,135 examples. The ranking of phoneme pairs from highest to lowest was 'ㅅ-ㅈ, ㄱ-ㅅ, ㄱ-ㅈ, ㄱ-ㅂ, ㄱ-ㅎ, ${\ldots}$, ㅆ-ㅋ, ㄸ-ㅋ, ㅉ-ㅋ, ㄹ-ㅃ, ㅃ-ㅋ'. The phonemes that played a major role in the formation of the minimal pair were /ㄱ, ㅅ, ㅈ, ㅂ, ㅊ/, in that order, which showed a high proportion of palatals. The correlation between the raw count of minimal pairs and the relative count of minimal pairs was found to be quite high r=0.937. Second, 87.91% of the minimal pairs shared the part of speech (same syntactic category). The most frequently observed type has been 'noun-noun' pair (70.25%), and 'vowel-vowel' pair (14.77%) was the next ranking. It can be indicated that the minimal pair could be grouped into similar categories in terms of semantics. The results of this study can be useful for various research in Korean linguistics, speech-language pathology, language education, language acquisition, speech synthesis, and artificial intelligence-machine learning as basic data related to Korean phonemes.

Efficient Harmonic-CELP Based Low Bit Rate Speech Coder (효율적인 하모닉-CELP 구조를 갖는 저 전송률 음성 부호화기)

  • 최용수;김경민;윤대희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.5
    • /
    • pp.35-47
    • /
    • 2001
  • This paper describes an efficient harmonic-CELP speech coder by taking advantages of harmonic and CELP coders into account. According to frame voicing decision, the proposed harmonic-CELP coder adopts the RP-VSELP coder as a fast CELP in case of an unvoiced frame, or an improved harmonic coder in case of a voiced frame. The proposed coder has main features as follows: simple pitch detection, fast harmonic estimation, variable dimension harmonic vector quantization, perceptual weighting reflecting frequency resolution, fast harmonic synthesis, naturalness control using band voicing, and multi-mode. These features make the proposed coder require very low complexity, compared with HVXC coder To demonstrate the performance of the proposed coder, a 2.4 kbps coder has been implemented and compared with reference coders. From results of informal listening tests, the proposed coder showed good quality while requiring low delay and complexity.

  • PDF

A Very Low-Bit-Rate Analysis-by-Synthesis Speech Coder Using Zinc Function Excitation (Zinc 함수 여기신호를 이용한 분석-합성 구조의 초 저속 음성 부호화기)

  • Seo Sang-Won;Kim Jong-Hak;Lee Chang-Hwan;Jeong Gyu-Hyeok;Lee In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.6
    • /
    • pp.282-290
    • /
    • 2006
  • This paper proposes a new Digital Reverberator that models Analog Helical Coil Spring Reverberator for guitar amplifiers. While the conventional digital reverberators are proposed to provide better sound field mainly based on room acoustics, no algorithm or analysis of digital reverberators those model Helical Coil Spring Reverberator was proposed. Considering the fact that approximately $70{\sim}80$ percent of guitar amplifiers are still with Helical Coil Spring Reverberator, research was performed based not on Room Acoustics but on Helical Coil Spring Reverberator itself as an effector. After performing simulations with proposed algorithm, it was confirmed that the Digital Reverberator by proposed algorithm provides perceptually equivalent response to the conventional Analog Helical Coil Spring Reverberators.

A Study on the Pitch Extraction Improvement Using LSP for the Synthesis of High Speech Quality (고음질 음성합성을 위한 LSP를 이용한 피치검출 성능향상에 관한 연구)

  • Seo, Ji-Ho;Kim, Jong-Kuk;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.69-75
    • /
    • 2010
  • In this paper, the pitch is detected after the elimination of formant ingredients by flattening the spectrum in frequency domain. In order to remove impact of formant and transition frequency in the signal spectrum, formant envelop is made by linear interpolation with any points each sub-band and the spectrum of speech signal is compensated by the reverse of the envelop interpolated linearly after we divide frequency band into several segment based on LSP and detect the points. The experimental result showed the proposed method appeared an outstanding performance in compared with LPC, Cepstrum, Lifter methods. The method reduced the gross error rate 1.30% than the LPC method which appeared a good performance except the proposed method. Also, the proposed method showed low error rate in noise environment.

Wavelet-based Pitch Detector for 2.4 kbps Harmonic-CELP Coder (2.4 kbps 하모닉-CELP 코더를 위한 웨이블렛 피치 검출기)

  • 방상운;이인성;권오주
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.717-726
    • /
    • 2003
  • This paper presents the methods that design the Wavelet-based pitch detector for 2,4 kbps Harmonic-CELP Coder, and that achieve the effective waveform interpolation by decision window shape of the transition region, Waveform interpolation coder operates by encoding one pitch-period-sized segment, a prototype segment, of speech for each frame, generate the smooth waveform interpolation between the prototype segments for voiced frame, But, harmonic synthesis of the prototype waveforms between previous frame and current frame occur not only waveform errors but also discontinuity at frame boundary on that case of pitch halving or doubling, In addtion, in transition region since waveform interpolation coder synthesizes the excitation waveform by using overlap-add with triangularity window, therefore, Harmonic-CELP fail to model the instantaneous increasing speech and synthesis waveform linearly increases, First of all, in order to detect the precise pitch period, we use the hybrid 1st pitch detector, and increse the precision by using 2nd ACF-pitch detector, Next, in order to modify excitation window, we detect the onset, offset of frame by GCI, As the result, pitch doubling is removed and pitch error rate is decreased 5.4% in comparison with ACF, and is decreased 2,66% in comparison with wavelet detector, MOS test improve 0.13 at transition region.

Performance Comparison of State-of-the-Art Vocoder Technology Based on Deep Learning in a Korean TTS System (한국어 TTS 시스템에서 딥러닝 기반 최첨단 보코더 기술 성능 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.2
    • /
    • pp.509-514
    • /
    • 2020
  • The conventional TTS system consists of several modules, including text preprocessing, parsing analysis, grapheme-to-phoneme conversion, boundary analysis, prosody control, acoustic feature generation by acoustic model, and synthesized speech generation. But TTS system with deep learning is composed of Text2Mel process that generates spectrogram from text, and vocoder that synthesizes speech signals from spectrogram. In this paper, for the optimal Korean TTS system construction we apply Tacotron2 to Tex2Mel process, and as a vocoder we introduce the methods such as WaveNet, WaveRNN, and WaveGlow, and implement them to verify and compare their performance. Experimental results show that WaveNet has the highest MOS and the trained model is hundreds of megabytes in size, but the synthesis time is about 50 times the real time. WaveRNN shows MOS performance similar to that of WaveNet and the model size is several tens of megabytes, but this method also cannot be processed in real time. WaveGlow can handle real-time processing, but the model is several GB in size and MOS is the worst of the three vocoders. From the results of this study, the reference criteria for selecting the appropriate method according to the hardware environment in the field of applying the TTS system are presented in this paper.