• Title/Summary/Keyword: Phonetic segmentation

Search Result 24, Processing Time 0.023 seconds

Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information (음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할)

  • 박창목;왕지남
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.24-30
    • /
    • 2001
  • This article is concerned with automatic segmentation for Korean speech signals. All kinds of transition cases of phonetic units are classified into 3 types and different strategies for each type are applied. The type 1 is the discrimination of silence, voiced-speech and unvoiced-speech. The histogram analysis of each indicators which consists of wavelet coefficients and SVF (Spectral Variation Function) in wavelet coefficients are used for type 1 segmentation. The type 2 is the discrimination of adjacent vowels. The vowel transition cases can be characterized by spectrogram. Given phonetic transcription and transition pattern spectrogram, the speech signal, having consecutive vowels, are automatically segmented by the template matching. The type 3 is the discrimination of vowel and voiced-consonants. The smoothed short-time RMS energy of Wavelet low pass component and SVF in cepstral coefficients are adopted for type 3 segmentation. The experiment is performed for 342 words utterance set. The speech data are gathered from 6 speakers. The result shows the validity of the method.

  • PDF

Variable Rate CELP Coding with Phonetic Segmentation using LPC Vector Quantization (LPC 벡터 양자화를 이용한 가변률 CELP 음성코딩에 관한 연구)

  • 정영호
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.205-209
    • /
    • 1994
  • This paper presents a variable rate speech coding method with phonetic segmentation, called for PSVXC. Multiple access techniques that require efficient encoding of speech to achieve capacity improvements are currently emerging in the cellular telephone system. The variable rate speech coder have the reduced average data rate required to transmit conversational speech. Each frame of active speech is classified into one of four phonetic classes. A distinct coding configuration and bit-rate is applied to each category. And also a split vector quantization is used to accurately quantize the LPC information using LSP parameters.

  • PDF

A Study on Consonant/Vowel/Unvoiced Consonant Phonetic Value Segmentation and Recognition of Korean Isolated Word Speech (한국어 고립 단어 음성의 자음/모음/유성자음 음가 분할 및 인식에 관한 연구)

  • Lee, Jun-Hwan;Lee, Sang-Beom
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.6
    • /
    • pp.1964-1972
    • /
    • 2000
  • For the Korean language, on acoustics, it creates a different form of phonetic value not a phoneme by its own peculiar property. Therefore, the construction of extended recognition system for understanding Korean language should be created with a study of the Korean rule-based system, before it can be used as post-processing of the Korean recognition system. In this paper, text-based Korean rule-based system featuring Korean peculiar vocal sound changing rule is constructed. and based on the text-based phonetic value result of the system constructed, a preliminary phonetic value segmentation border points with non-uniform blocks are extracted in Korean isolated word speech. Through the way of merge and recognition of the non-uniform blocks between the extracted border points, recognition possibility of Korean voice as the form of the phonetic vale has been investigated.

  • PDF

Segmentation of the Korean speech signals into phonetic units using the super resolution pitch determination (고해상 피치검출을 이용한 한국어 음성신호의 음소분리)

  • 이응구;이두수
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.2
    • /
    • pp.270-278
    • /
    • 1993
  • This paper is presented the phonetic segmentation alg9rithm of the Korean speech signals which is finded the exact pitch using the super resoluton pitch determination and is compared corss-correlation to threshold each pitch period. The features of the proposed algorithm are infinite resolution and high reliability, and also can separate transient or silent segment. The algorithm is instrumental to speech processing applications which require vector quantization and speech recognition. The presented algorithm is implemented by 386-MATLAB on PC 386/DX and is verified the exact pitch period and the phonetic segmentation of speech signals.

  • PDF

Segmentation and Labeling in Creation of Speech Corpus (음성 코퍼스 구축에서 분절과 레이블링의 문제)

  • Um Yongnam;Lee Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.27-32
    • /
    • 2002
  • In this paper it is discussed what should be taken into consideration with respect to segmentation and labeling in creation of speech corpus. What levels of annotation and what kind of contents should be included, what kind of acoustic information is checked for in segmentation, etc are discussed.

  • PDF

Improvement of an Automatic Segmentation for TTS Using Voiced/Unvoiced/Silence Information (유/무성/묵음 정보를 이용한 TTS용 자동음소분할기 성능향상)

  • Kim Min-Je;Lee Jung-Chul;Kim Jong-Jin
    • MALSORI
    • /
    • no.58
    • /
    • pp.67-81
    • /
    • 2006
  • For a large corpus of time-aligned data, HMM based approaches are most widely used for automatic segmentation, providing a consistent and accurate phone labeling scheme. There are two methods for training in HMM. Flat starting method has a property that human interference is minimized but it has low accuracy. Bootstrap method has a high accuracy, but it has a defect that manual segmentation is required In this paper, a new algorithm is proposed to minimize manual work and to improve the performance of automatic segmentation. At first phase, voiced, unvoiced and silence classification is performed for each speech data frame. At second phase, the phoneme sequence is aligned dynamically to the voiced/unvoiced/silence sequence according to the acoustic phonetic rules. Finally, using these segmented speech data as a bootstrap, phoneme model parameters based on HMM are trained. For the performance test, hand labeled ETRI speech DB was used. The experiment results showed that our algorithm achieved 10% improvement of segmentation accuracy within 20 ms tolerable error range. Especially for the unvoiced consonants, it showed 30% improvement.

  • PDF

Sentence design for speech recognition database

  • Zu Yiqing
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.472-472
    • /
    • 1996
  • The material of database for speech recognition should include phonetic phenomena as much as possible. At the same time, such material should be phonetically compact with low redundancy[1, 2]. The phonetic phenomena in continuous speech is the key problem in speech recognition. This paper describes the processing of a set of sentences collected from the database of 1993 and 1994 "People's Daily"(Chinese newspaper) which consist of news, politics, economics, arts, sports etc.. In those sentences, both phonetic phenometla and sentence patterns are included. In continuous speech, phonemes always appear in the form of allophones which result in the co-articulary effects. The task of designing a speech database should be concerned with both intra-syllabic and inter-syllabic allophone structures. In our experiments, there are 404 syllables, 415 inter-syllabic diphones, 3050 merged inter-syllabic triphones and 2161 merged final-initial structures in read speech. Statistics on the database from "People's Daily" gives and evaluation to all of the possible phonetic structures. In this sentence set, we first consider the phonetic balances among syllables, inter-syllabic diphones, inter-syllabic triphones and semi-syllables with their junctures. The syllabic balances ensure the intra-syllabic phenomena such as phonemes, initial/final and consonant/vowel. the rest describes the inter-syllabic jucture. The 1560 sentences consist of 96% syllables without tones(the absent syllables are only used in spoken language), 100% inter-syllabic diphones, 67% inter-syllabic triphones(87% of which appears in Peoples' Daily). There are rougWy 17 kinds of sentence patterns which appear in our sentence set. By taking the transitions between syllables into account, the Chinese speech recognition systems have gotten significantly high recognition rates[3, 4]. The following figure shows the process of collecting sentences. [people's Daily Database] -> [segmentation of sentences] -> [segmentation of word group] -> [translate the text in to Pin Yin] -> [statistic phonetic phenomena & select useful paragraph] -> [modify the selected sentences by hand] -> [phonetic compact sentence set]

  • PDF

Performance Improvement of Automatic Speech Segmentation and Labeling System (자동 음성분할 및 레이블링 시스템의 성능향상)

  • Hong Seong Tae;Kim Je-U;Kim Hyeong-Sun
    • MALSORI
    • /
    • no.35_36
    • /
    • pp.175-188
    • /
    • 1998
  • Database segmented and labeled up to phoneme level plays an important role in phonetic research and speech engineering. However, it usually requires manual segmentation and labeling, which is time-consuming and may also lead to inconsistent consequences. Automatic segmentation and labeling can be introduced to solve these problems. In this paper, we investigate a method to improve the performance of automatic segmentation and labeling system, where Spectral Variation Function(SVF), modification of silence model, and use of energy variations in postprocessing stage are considered. In this paper, SVF is applied in three ways: (1) addition to feature parameters, (2) postprocessing of phoneme boundaries, (3) restricting the Viterbi path so that the resulting phoneme boundaries may be located in frames around SVF peaks. In the postprocessing stage, positions with greatest energy variation during transitional period between silence and other phonemes were used to modify boundaries. In order to evaluate the performance of the system, we used 452 phonetically balanced word(PBW) database for training phoneme models and phonetically balanced sentence(PBS) database for testing. According to our experiments, 83.1% (6.2% improved) and 95.8% (0.9% improved) of phoneme boundaries were within 20ms and 40ms of the manually segmented boundaries, respectively.

  • PDF

Phonetic Acoustic Knowledge and Divide And Conquer Based Segmentation Algorithm (음성학적 지식과 DAC 기반 분할 알고리즘)

  • Koo, Chan-Mo;Wang, Gi-Nam
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.215-222
    • /
    • 2002
  • This paper presents a reliable fully automatic labeling system which fits well with languages having well-developed syllables such as in Korean. The ASL System utilize DAC (Divide and Conquer), a control mechanism, based segmentation algorithm to use phonetic and acoustic information with greater efficiency. The segmentation algorithm is to devide speech signals into speechlets which is localized speech signal pieces and to segment each speechlet for speech boundaries. While HMM method has uniform and definite efficiencies, the suggested method gives framework to steadily develope and improve specified acoustic knowledges as a component. Without using statistical method such as HMM, this new method use only phonetic-acoustic information. Therefore, this method has high speed performance, is consistent extending the specific acoustic knowledge component, and can be applied in efficient way. we show experiment result to verify suggested method at the end.

The role of prosodic phrasing in Korean word segmentation (음운 구조가 한국어 단어 분절에 미치는 영향)

  • Kim, Sa-Hyang
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.114-118
    • /
    • 2007
  • The current study investigates the degree to which various prosodic cues at the boundaries of a prosodic phrase in Korean (Accentual Phrase) contributed to word segmentation. Since most phonological words in Korean are produced as one AP, it was hypothesized that the detection of acoustic cues at AP boundaries would facilitate word segmentation. The prosodic characteristics of Korean APs include initial strengthening at the beginning of the phrase and pitch rise and final lengthening at the end. A perception experiment revealed that the cues that conform to the above-mentioned prosodic characteristics of Korean facilitated listeners' word segmentation. Results also showed that duration and amplitude cues were more helpful in segmentation than pitch. Further, the results showed that a pitch cue that did not conform to the Korean AP interfered with segmentation.

  • PDF