Search | Korea Science

The Error Pattern Analysis of the HMM-Based Automatic Phoneme Segmentation (HMM기반 자동음소분할기의 음소분할 오류 유형 분석)

Kim Min-Je;Lee Jung-Chul;Kim Jong-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.25 no.5
- /
- pp.213-221
- /
- 2006
Phone segmentation of speech waveform is especially important for concatenative text to speech synthesis which uses segmented corpora for the construction of synthetic units. because the quality of synthesized speech depends critically on the accuracy of the segmentation. In the beginning. the phone segmentation was manually performed. but it brings the huge effort and the large time delay. HMM-based approaches adopted from automatic speech recognition are most widely used for automatic segmentation in speech synthesis, providing a consistent and accurate phone labeling scheme. Even the HMM-based approach has been successful, it may locate a phone boundary at a different position than expected. In this paper. we categorized adjacent phoneme pairs and analyzed the mismatches between hand-labeled transcriptions and HMM-based labels. Then we described the dominant error patterns that must be improved for the speech synthesis. For the experiment. hand labeled standard Korean speech DB from ETRI was used as a reference DB. Time difference larger than 20ms between hand-labeled phoneme boundary and auto-aligned boundary is treated as an automatic segmentation error. Our experimental results from female speaker revealed that plosive-vowel, affricate-vowel and vowel-liquid pairs showed high accuracies, 99%, 99.5% and 99% respectively. But stop-nasal, stop-liquid and nasal-liquid pairs showed very low accuracies, 45%, 50% and 55%. And these from male speaker revealed similar tendency.
https://doi.org/10.7776/ASK.2006.25.5.213 인용 PDF KSCI

일본의 합성음 품질 평가 동향

이용주
- Proceedings of the KSPS conference
- /
- 2000.03a
- /
- pp.32-50
- /
- 2000
PDF

LG TTS의 특징 및 합성음 평가를 위한 제안

이윤근
- Proceedings of the KSPS conference
- /
- 2000.03a
- /
- pp.90-94
- /
- 2000
PDF

A Short-term and Long-term Usability Testing of the Speech Synthesizer for the People with Visual Impairments (시각장애인용 음성합성기에 대한 장/단기 사용성 평가)

Lee, H.Y.;Hong, K.H.
- Journal of rehabilitation welfare engineering & assistive technology
- /
- v.9 no.1
- /
- pp.53-60
- /
- 2015
We conducted a long-term and short-term usability testing on the built-in speech synthesizer of a screen-reader for the people with visual impairments. A total of 20 persons with visual impairments participated in the short-term usability testing, and 10 of them participated in the long-term usability testing. Naturalness and clarity of the synthetic speech were evaluated by MOS scores, preference for various synthetic speeches was examined through a preference test, and the users' satisfaction level and other requirements for the synthetic speech were evaluated by open feedback. We also examined naturalness, clarity, preference, and user requirements for the synthetic speech through a long-term usability testing. Then, we compare and contrast the long-term and short-term usability testing results.
PDF

Improving LD-CELP using frame classification and modified synthesis filter (프레임 분류와 합성필터의 변형을 이용한 적은 지연을 갖는 음성 부호화기의 성능)

임은희;이주호;김형명
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.6
- /
- pp.1430-1437
- /
- 1996
A low delay code excited linear predictive speech coder(LD-CELP) at bit rates under 8kbps is considered. We try to improve the perfomance of speech coder with frame type dependent modification of synthesis filter. We first classify frames into 3 groups: voiced, unvoiced and onset. For voicedand unvoiced frame, the spectral envelope of the synthesis filter is adapted to the phonetic characteristics. For transition frame from unvoiced to voiced, the synthesis filter which has been interpolated with the bias filter is used. The proposed vocoder produced more clear sound with similar delay level than other pre-existing LD-CELP vocoders.
PDF

Mutiple-Speech Synthesis System according to Various Utterance (다양한 발성에 따른 다중음성 합성 시스템)

Park, Hyun-Young;Kim, Myoung;Bae, Myoung-Jin
- Proceedings of the Korean Society for Emotion and Sensibility Conference
- /
- 2003.11a
- /
- pp.151-154
- /
- 2003
음성 합성이란 기계적인 장치나 전지회로 또는 컴퓨터 모의를 이용하여 자동으로 음성파형을 생성해 내는 것으로 정의한다. 음성 합성에 대한 연구는 다른 음성에 관련된 기술들보다 가장 먼저 연구된 기술이다. 음성 합성기는 PC의 보급이 확대되고 통신 시장이 컴짐에 따라 그 응용 분야가 점차 확대되어 가고 다양한 방식의 음성 합성 기법에 관한 연구가 이루어지고 있다. 일반적으로 자연스러운 대화를 할 때나 글을 읽을 때의 음성에는 퍼지, 지속시간, 에너지 등의 운율 정보가 포함되어 있다. 따라서, 문장을 합성하는 경우 운율정보를 합성음에 반영하면 보다 명확한 의미 전달과 다양한 발성변환이 가능해 진다. 본 논문에서는 시간영역에서 PSOLA 합성방식에 의한 피치 변경과 지속시간 변경을 이용하여 다양한 발성변환에 따른 다중음성 합성기를 구현하였다.
PDF

난분해성 ABS 내성균의 분리, 동정 및 그 활성

하현필;홍순덕
- Proceedings of the Korean Society for Applied Microbiology Conference
- /
- 1978.10a
- /
- pp.209.3-209
- /
- 1978
국내시판합성세제에 합유된 난분해성 ABS(=alkyl benzene sulfonate)의 분해도가 우수한 ABS내성균을 아파트단지 하수구에서 분리하여, 분난균을 동정하고 본균에 대한 합성세제 농도와 pH 영향, 음 ion 계면활성제 구조와 농도별 분해소을 조사하고, 금속 ion이 공존할 때에 최고생육한도, 진탕과 정치배양시 ABS의 분해소을 비교하고, 합성세제를 농도별로 함유한 배지에 분리면을 배양시켜 형태 변화 등을 전자현미경으로 관찰하였다.
PDF

A Basic Study on Development of Orchestra Blasting Method - About the Application of Rhythm - (연주식 발파공법 개발에 대한 기초적 연구 - 리듬감 부여에 관하여 -)

Yoon, Ji-Sun;Choi, Sung-Hyun;Bae, Sang-Hun
- Explosives and Blasting
- /
- v.26 no.1
- /
- pp.39-48
- /
- 2008
Using Electronic Detonators which is well known for controlling vibration, we have been studying Orchestra Blasting Method, OBM, for many years to transform the unpleasant blasting sound to favorable sound in some job-sites such as tunneling and bench blasting which have to been taken place near some structures needed great care. In this study, we focus on rhythmical sense. First, we acquired individual wave from a shot. With the program named the Program Blasting Wave, PBW, it was analyzed and found that its best delay time was 34ms and 50ms was acceptable. Also, delay time was fitted into the music which was accepted after analyzing the rhythm. As a result, the blasting sound along with the music felt comfortable as if the music was played with base drum.
PDF KSCI

Sound Synthesis of Piri by Asymmetric Frequency Modulation (비대칭 FM합성방식을 이용한 피리 소리의 합성)

Pyoun, Joong-Bae;Cho, Sang-Jin;Chong, Ui-Pil
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2006.06a
- /
- pp.37-40
- /
- 2006
FM(Frequency Modulation) 음 합성방식은 오랫동안 연구되어 왔고 여러 가지 효과를 주기에 유용하고 비교적 낮은 사양으로 높은 수준의 배음을 합성할 수 있지만 그 응용분야가 대부분이 서양악기 위주의 연구였다. 이에 본 논문에서는 국악 관악기인 피리를 주파수 영역에서 저주파 영역과 고주파 영역으로 나누어 분석하고 이 결과를 이용하여 FM 파라미터를 추출하고 비대칭 FM합성방식으로 1차적인 소리를 합성하였다. 분석된 저주파 영역과 고주파 영역의 특성을 각각의 역 필터링을 위한 2개의 필터를 설계하여 이를 다시 합성된 소리를 통과시켜 두 소리를 합산함으로써 더욱 원음과 유사한 소리를 합성할 수 있었다.
PDF

A Study on Multi-Pulse Speech Coding Method by using Selected Information in a Frequency Domain (주파수 영역의 선택정보를 이용한 멀티펄스 음성부호화 방식에 관한 연구)

Lee See-Woo
- Journal of Internet Computing and Services
- /
- v.7 no.4
- /
- pp.57-66
- /
- 2006
In this paper, I propose a new method of Multi-Pulse Speech Coding(FBD-MPC: Frequency Band Division MPC) by using TSIUVC(Transition Segment Including UnVoiced Consonant) searching, extraction and approximation-synthesis method in a frequency domain. As, a result. the extraction rates of TSIUVC are 84.8%(plosive), 94.9%(fricative) and 92.3%(affricative) in female voice, 88%(plosive), 94.9%(fricative) and 92.3%(affricative) in male voice respectively. Also, I obtain a high quality approximation-synthesis waveforms within TSIUVC by using frequency information of 0.547kHz below and 2.813kHz above. I evaluate MPC by using switching information of voiced/unvoiced and FBD-MPC by using switching information of voiced/Silence/TSIUVC. As, a result, I knew that synthesis speech of FBD-MPC was better in speech quality than synthesis speech of the MPC.
PDF

Search Result 333, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)