• 제목/요약/키워드: Phonetic segmentation

검색결과 24건 처리시간 0.021초

음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할 (Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information)

  • 박창목;왕지남
    • 한국음향학회지
    • /
    • 제20권8호
    • /
    • pp.24-30
    • /
    • 2001
  • 본 논문에서는 발음표기가 주어진 상황에서 음성 신호의 자동 음소 분할에 관한 것이며 음소의 경계를 음소 음향학적인 변화특성에 따라 3가지 형태로 분류하여 각각에 적합한 분할 알고리즘을 개발하였다. 형태 1은 묵음·유성음·무성음간의 분할이며 히스토그램분석으로 구한 문턱 값으로 초기 분할 후, 웨이블릿 계수의 SVF (Spectral Variation Function)를 이용하여 분할하였다. 형태 2는 연속적인 모음의 분할이며 각 모음변화특성을 템플릿으로 구성하여 분할에 활용하였다. 형태 3은 모음과 유성자음 혹은 유성화 자음의 분할이며 특성주파수대역의 진폭변화를 이용하여 후보구간을 정한 후, 캡스트럼 계수의 SVF를 이용하여 최종적인 분할을 수행하였다. 본 실험에서는 분할 성능을 테스트하기 위하여 한국어 PBWSpeech DB에서 342개의 단어를 자동으로 분할한 후, 수작업으로 분할한 결과와 비교하였다. 전체적인 자동 분할 성능은 20 msec내에서 81.5%의 분할성능을 보였다.

  • PDF

LPC 벡터 양자화를 이용한 가변률 CELP 음성코딩에 관한 연구 (Variable Rate CELP Coding with Phonetic Segmentation using LPC Vector Quantization)

  • 정영호
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 제11회 음성통신 및 신호처리 워크샵 논문집 (SCAS 11권 1호)
    • /
    • pp.205-209
    • /
    • 1994
  • This paper presents a variable rate speech coding method with phonetic segmentation, called for PSVXC. Multiple access techniques that require efficient encoding of speech to achieve capacity improvements are currently emerging in the cellular telephone system. The variable rate speech coder have the reduced average data rate required to transmit conversational speech. Each frame of active speech is classified into one of four phonetic classes. A distinct coding configuration and bit-rate is applied to each category. And also a split vector quantization is used to accurately quantize the LPC information using LSP parameters.

  • PDF

한국어 고립 단어 음성의 자음/모음/유성자음 음가 분할 및 인식에 관한 연구 (A Study on Consonant/Vowel/Unvoiced Consonant Phonetic Value Segmentation and Recognition of Korean Isolated Word Speech)

  • 이준환;이상범
    • 한국정보처리학회논문지
    • /
    • 제7권6호
    • /
    • pp.1964-1972
    • /
    • 2000
  • For the Korean language, on acoustics, it creates a different form of phonetic value not a phoneme by its own peculiar property. Therefore, the construction of extended recognition system for understanding Korean language should be created with a study of the Korean rule-based system, before it can be used as post-processing of the Korean recognition system. In this paper, text-based Korean rule-based system featuring Korean peculiar vocal sound changing rule is constructed. and based on the text-based phonetic value result of the system constructed, a preliminary phonetic value segmentation border points with non-uniform blocks are extracted in Korean isolated word speech. Through the way of merge and recognition of the non-uniform blocks between the extracted border points, recognition possibility of Korean voice as the form of the phonetic vale has been investigated.

  • PDF

고해상 피치검출을 이용한 한국어 음성신호의 음소분리 (Segmentation of the Korean speech signals into phonetic units using the super resolution pitch determination)

  • 이응구;이두수
    • 한국통신학회논문지
    • /
    • 제18권2호
    • /
    • pp.270-278
    • /
    • 1993
  • 본 논문에서는 고해상 피치검출을 이용해서 정확한 피치를 찾고 각 피치 주기에서의 상관함수와 문턱값을 비교하여 한국어 음성신호를 음소단위로 분리하는 알로리듬을 제안한다. 제안된 알고리듬의 특성은 정확하고 고신뢰도를 갖으며, 변이구간이나 무음구간도 구분할 수 있다. 이 알고리듬은 음소단위로 분리하여 코드북을 설계하는 백터양자화와 음성인식 분야에 적용된다. 본 논문에서 제안한 알고리듬은 PC386/DX 상에서 386/MATLAB으로 실행한 결과 피치주기를 정확히 찾고 음소별로 분리가 가능함을 알 수 있다.

  • PDF

음성 코퍼스 구축에서 분절과 레이블링의 문제 (Segmentation and Labeling in Creation of Speech Corpus)

  • 엄용남;이용주
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.27-32
    • /
    • 2002
  • In this paper it is discussed what should be taken into consideration with respect to segmentation and labeling in creation of speech corpus. What levels of annotation and what kind of contents should be included, what kind of acoustic information is checked for in segmentation, etc are discussed.

  • PDF

유/무성/묵음 정보를 이용한 TTS용 자동음소분할기 성능향상 (Improvement of an Automatic Segmentation for TTS Using Voiced/Unvoiced/Silence Information)

  • 김민제;이정철;김종진
    • 대한음성학회지:말소리
    • /
    • 제58호
    • /
    • pp.67-81
    • /
    • 2006
  • For a large corpus of time-aligned data, HMM based approaches are most widely used for automatic segmentation, providing a consistent and accurate phone labeling scheme. There are two methods for training in HMM. Flat starting method has a property that human interference is minimized but it has low accuracy. Bootstrap method has a high accuracy, but it has a defect that manual segmentation is required In this paper, a new algorithm is proposed to minimize manual work and to improve the performance of automatic segmentation. At first phase, voiced, unvoiced and silence classification is performed for each speech data frame. At second phase, the phoneme sequence is aligned dynamically to the voiced/unvoiced/silence sequence according to the acoustic phonetic rules. Finally, using these segmented speech data as a bootstrap, phoneme model parameters based on HMM are trained. For the performance test, hand labeled ETRI speech DB was used. The experiment results showed that our algorithm achieved 10% improvement of segmentation accuracy within 20 ms tolerable error range. Especially for the unvoiced consonants, it showed 30% improvement.

  • PDF

Sentence design for speech recognition database

  • Zu Yiqing
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 10월 학술대회지
    • /
    • pp.472-472
    • /
    • 1996
  • The material of database for speech recognition should include phonetic phenomena as much as possible. At the same time, such material should be phonetically compact with low redundancy[1, 2]. The phonetic phenomena in continuous speech is the key problem in speech recognition. This paper describes the processing of a set of sentences collected from the database of 1993 and 1994 "People's Daily"(Chinese newspaper) which consist of news, politics, economics, arts, sports etc.. In those sentences, both phonetic phenometla and sentence patterns are included. In continuous speech, phonemes always appear in the form of allophones which result in the co-articulary effects. The task of designing a speech database should be concerned with both intra-syllabic and inter-syllabic allophone structures. In our experiments, there are 404 syllables, 415 inter-syllabic diphones, 3050 merged inter-syllabic triphones and 2161 merged final-initial structures in read speech. Statistics on the database from "People's Daily" gives and evaluation to all of the possible phonetic structures. In this sentence set, we first consider the phonetic balances among syllables, inter-syllabic diphones, inter-syllabic triphones and semi-syllables with their junctures. The syllabic balances ensure the intra-syllabic phenomena such as phonemes, initial/final and consonant/vowel. the rest describes the inter-syllabic jucture. The 1560 sentences consist of 96% syllables without tones(the absent syllables are only used in spoken language), 100% inter-syllabic diphones, 67% inter-syllabic triphones(87% of which appears in Peoples' Daily). There are rougWy 17 kinds of sentence patterns which appear in our sentence set. By taking the transitions between syllables into account, the Chinese speech recognition systems have gotten significantly high recognition rates[3, 4]. The following figure shows the process of collecting sentences. [people's Daily Database] -> [segmentation of sentences] -> [segmentation of word group] -> [translate the text in to Pin Yin] -> [statistic phonetic phenomena & select useful paragraph] -> [modify the selected sentences by hand] -> [phonetic compact sentence set]

  • PDF

자동 음성분할 및 레이블링 시스템의 성능향상 (Performance Improvement of Automatic Speech Segmentation and Labeling System)

  • 홍성태;김제우;김형순
    • 대한음성학회지:말소리
    • /
    • 제35_36호
    • /
    • pp.175-188
    • /
    • 1998
  • Database segmented and labeled up to phoneme level plays an important role in phonetic research and speech engineering. However, it usually requires manual segmentation and labeling, which is time-consuming and may also lead to inconsistent consequences. Automatic segmentation and labeling can be introduced to solve these problems. In this paper, we investigate a method to improve the performance of automatic segmentation and labeling system, where Spectral Variation Function(SVF), modification of silence model, and use of energy variations in postprocessing stage are considered. In this paper, SVF is applied in three ways: (1) addition to feature parameters, (2) postprocessing of phoneme boundaries, (3) restricting the Viterbi path so that the resulting phoneme boundaries may be located in frames around SVF peaks. In the postprocessing stage, positions with greatest energy variation during transitional period between silence and other phonemes were used to modify boundaries. In order to evaluate the performance of the system, we used 452 phonetically balanced word(PBW) database for training phoneme models and phonetically balanced sentence(PBS) database for testing. According to our experiments, 83.1% (6.2% improved) and 95.8% (0.9% improved) of phoneme boundaries were within 20ms and 40ms of the manually segmented boundaries, respectively.

  • PDF

음성학적 지식과 DAC 기반 분할 알고리즘 (Phonetic Acoustic Knowledge and Divide And Conquer Based Segmentation Algorithm)

  • 구찬모;왕지남
    • 정보처리학회논문지B
    • /
    • 제9B권2호
    • /
    • pp.215-222
    • /
    • 2002
  • 본 논문에서는 음절이 잘 발달되어 있는 한국어에 대해서 신뢰할 수 있는 완전 자동화된 레이블링 시스템을 제안한다. 음운 및 음향학적인 정보를 최대한 이용하고 분할에러를 줄이기 위해서 조절 메카니즘의 하나로 DAC개념을 사용하여 음성을 speechlet으로 나누고 분할 된 음성 구간에 대해서 레이블링을 시도하는 DAC기반 분할알고리즘이다. HMM방법이 획일적이고 확정적인 성능을 갖는 반면 본 제안 방법은 음성학적인 특화지식을 컴포넌트로 개발 추가 계속 향상시킬 수 있는 프레임워크를 제시하고 있다는 점에서 주요 의의가 있다고 하겠다. MM과 같은 통계학적인 방법을 이용하지 않고 음운학적, 음향학적 지식만을 이용하는 새로운 방법은 수행속도와 음성학적인 특화 지식컴포넌트를 확장함에 따라 일관성이 있으며 효과적 방법으로 적용가능 할 것이다. 제안 방법을 검증하기 위하여 실험결과를 제시하였다.

음운 구조가 한국어 단어 분절에 미치는 영향 (The role of prosodic phrasing in Korean word segmentation)

  • 김사향
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.114-118
    • /
    • 2007
  • The current study investigates the degree to which various prosodic cues at the boundaries of a prosodic phrase in Korean (Accentual Phrase) contributed to word segmentation. Since most phonological words in Korean are produced as one AP, it was hypothesized that the detection of acoustic cues at AP boundaries would facilitate word segmentation. The prosodic characteristics of Korean APs include initial strengthening at the beginning of the phrase and pitch rise and final lengthening at the end. A perception experiment revealed that the cues that conform to the above-mentioned prosodic characteristics of Korean facilitated listeners' word segmentation. Results also showed that duration and amplitude cues were more helpful in segmentation than pitch. Further, the results showed that a pitch cue that did not conform to the Korean AP interfered with segmentation.

  • PDF