• Title/Summary/Keyword: Phonetic Approach

Search Result 78, Processing Time 0.027 seconds

An Introduction to English Intonational Phonology (영어 억양음운론의 소개)

  • Kim, Kee-Ho
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.119-143
    • /
    • 1999
  • In this paper, the development of English Intonational Phonology is introduced. The existing representation systems of intonation are largely divided into the American structuralist school and the British school, which describe intonation by means of 'levels' and 'configurations' respectively. Both representation systems have some theory-internal problems, however. As for the American school, there is no way to represent pitches much lower than the reference line, while the system of intonation in the British school is limited in that intonation is described in a phonetic impressionistic way rather than from a phonological perspective. Intonational Phonology, a real phonological approach, which has grown out of the basic assumptions of autosegmental-metrical(AM) theory has been suggested by Pierrehumbert(1980). In her approach, an intonational tune is made up of one or more pitch accents, followed by an obligatory phrase accent and an obligatory boundary tone, and interestingly 22 combinations are possible. Intonational Phonology has been revised from Beckman & Pierrehumbert(1986) in developing ToBI(Tones & Break Indices), a proposed standard for labelling prosodic features of digital speech databases in English.

  • PDF

Semantic-oriented Error Correction for Spoken Query Processing (음성 질의 처리를 위한 의미 기반 오류 수정)

  • Jeong Minwoo;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.153-156
    • /
    • 2003
  • Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach to correct both semantic level and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.

  • PDF

Efficient Triphone Clustering Using Monophone Distance (모노폰 거리를 이용한 트라이폰 클러스터링 방법 연구)

  • Bang Kyu-Seop;Yook Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.41-44
    • /
    • 2006
  • The purpose of state tying is to reduce the number of models and to use relatively reliable output probability distributions. There are two approaches: one is top down clustering and the other is bottom up clustering. For seen data, the performance of bottom up approach is better than that of top down approach. In this paper, we propose a new clustering technique that can enhance the undertrained triphone clustering performance. The basic idea is to tie unreliable triphones before clustering. An unreliable triphone is the one that appears in the training data too infrequently to train the model accurately. We propose to use monophone distance to preprocess these unreliable triphones. It has been shown in a pilot experiment that the proposed method reduces the error rate significantly.

  • PDF

A Study on the Continuous Speech Recognition for the Automatic Creation of International Phonetics (국제 음소의 자동 생성을 활용한 연속음성인식에 관한 연구)

  • Kim, Suk-Dong;Hong, Seong-Soo;Shin, Chwa-Cheul;Woo, In-Sung;Kang, Heung-Soon
    • Journal of Korea Game Society
    • /
    • v.7 no.2
    • /
    • pp.83-90
    • /
    • 2007
  • One result of the trend towards globalization is an increased number of projects that focus on natural language processing. Automatic speech recognition (ASR) technologies, for example, hold great promise in facilitating global communications and collaborations. Unfortunately, to date, most research projects focus on single widely spoken languages. Therefore, the cost to adapt a particular ASR tool for use with other languages is often prohibitive. This work takes a more general approach. We propose an International Phoneticizing Engine (IPE) that interprets input files supplied in our Phonetic Language Identity (PLI) format to build a dictionary. IPE is language independent and rule based. It operates by decomposing the dictionary creation process into a set of well-defined steps. These steps reduce rule conflicts, allow for rule creation by people without linguistics training, and optimize run-time efficiency. Dictionaries created by the IPE can be used with the speech recognition system. IPE defines an easy-to-use systematic approach that can obtained 92.55% for the recognition rate of Korean speech and 89.93% for English.

  • PDF

Noise Robust Speaker Verification Using Subband-Based Reliable Feature Selection (신뢰성 높은 서브밴드 특징벡터 선택을 이용한 잡음에 강인한 화자검증)

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • MALSORI
    • /
    • no.63
    • /
    • pp.125-137
    • /
    • 2007
  • Recently, many techniques have been proposed to improve the noise robustness for speaker verification. In this paper, we consider the feature recombination technique in multi-band approach. In the conventional feature recombination for speaker verification, to compute the likelihoods of speaker models or universal background model, whole feature components are used. This computation method is not effective in a view point of multi-band approach. To deal with non-effectiveness of the conventional feature recombination technique, we introduce a subband likelihood computation, and propose a modified feature recombination using subband likelihoods. In decision step of speaker verification system in noise environments, a few very low likelihood scores of a speaker model or universal background model cause speaker verification system to make wrong decision. To overcome this problem, a reliable feature selection method is proposed. The low likelihood scores of unreliable feature are substituted by likelihood scores of the adaptive noise model. In here, this adaptive noise model is estimated by maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. The proposed method using subband-based reliable feature selection obtains better performance than conventional feature recombination system. The error reduction rate is more than 31 % compared with the feature recombination-based speaker verification system.

  • PDF

Speaker-Dependent Emotion Recognition For Audio Document Indexing

  • Hung LE Xuan;QUENOT Georges;CASTELLI Eric
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.92-96
    • /
    • 2004
  • The researches of the emotions are currently great interest in speech processing as well as in human-machine interaction domain. In the recent years, more and more of researches relating to emotion synthesis or emotion recognition are developed for the different purposes. Each approach uses its methods and its various parameters measured on the speech signal. In this paper, we proposed using a short-time parameter: MFCC coefficients (Mel­Frequency Cepstrum Coefficients) and a simple but efficient classifying method: Vector Quantification (VQ) for speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. The other models: GMM and HMM (Discrete and Continuous Hidden Markov Model) are studied as well in the hope that the usage of continuous distribution and the temporal behaviour of this set of features will improve the quality of emotion recognition. The maximum accuracy recognizing five different emotions exceeds $88\%$ by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluation by listening and judging without returning permission nor comparison between sentences [8]; And this result is positively comparable with the other approaches.

  • PDF

A Study on the Language Independent Dictionary Creation Using International Phoneticizing Engine Technology (국제 음소 기술에 의한 언어에 독립적인 발음사전 생성에 관한 연구)

  • Shin, Chwa-Cheul;Woo, In-Sung;Kang, Heung-Soon;Hwang, In-Soo;Kim, Suk-Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.1E
    • /
    • pp.1-7
    • /
    • 2007
  • One result of the trend towards globalization is an increased number of projects that focus on natural language processing. Automatic speech recognition (ASR) technologies, for example, hold great promise in facilitating global communications and collaborations. Unfortunately, to date, most research projects focus on single widely spoken languages. Therefore, the cost to adapt a particular ASR tool for use with other languages is often prohibitive. This work takes a more general approach. We propose an International Phoneticizing Engine (IPE) that interprets input files supplied in our Phonetic Language Identity (PLI) format to build a dictionary. IPE is language independent and rule based. It operates by decomposing the dictionary creation process into a set of well-defined steps. These steps reduce rule conflicts, allow for rule creation by people without linguistics training, and optimize run-time efficiency. Dictionaries created by the IPE can be used with the Sphinx speech recognition system. IPE defines an easy-to-use systematic approach that can lead to internationalization of automatic speech recognition systems.

Classification of Consonants by SOM and LVQ (SOM과 LVQ에 의한 자음의 분류)

  • Lee, Chai-Bong;Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.1
    • /
    • pp.34-42
    • /
    • 2011
  • In an effort to the practical realization of phonetic typewriter, we concentrate on the classification of consonants in this paper. Since many of consonants do not show periodic behavior in time domain and thus the validity for Fourier analysis of them are not convincing, vector quantization (VQ) via LBG clustering is first performed to check if the feature vectors of MFCC and LPCC are ever meaningful for consonants. Experimental results of VQ showed that it's not easy to draw a clear-cut conclusion as to the validity of Fourier analysis for consonants. For classification purpose, two kinds of neural networks are employed in our study: self organizing map (SOM) and learning vector quantization (LVQ). Results from SOM revealed that some pairs of phonemes are not resolved. Though LVQ is free from this difficulty inherently, the classification accuracy was found to be low. This suggests that, as long as consonant classification by LVQ is concerned, other types of feature vectors than MFCC should be deployed in parallel. However, the combination of MFCC/LVQ was not found to be inferior to the classification of phonemes by language-moded based approach. In all of our work, LPCC worked worse than MFCC.

A Study on the Characteristics of Segmental-Feature HMM (분절특징 HMM의 특성에 관한 연구)

  • Yun Young-Sun;Jung Ho-Young
    • MALSORI
    • /
    • no.43
    • /
    • pp.163-178
    • /
    • 2002
  • In this paper, we discuss the characteristics of Segmental-Feature HMM and summarize previous studies of SFHMM. There are several approaches to reduce the number of parameters in the previous studies. However, if the number of parameters decreased, the performance of systems also fell. Therefore, we consider the fast computation approach with preserving the same number of parameters. In this paper, we present the new segment comparison method to speed up the computation of SFHMM without loss of performance. The proposed method uses the three-frame calculation rather than the full(five) frames in the given segment. The experimental results show that the performance of the proposed system is better than that of the previous studies.

  • PDF

Channel Compensation for Cepstrum-Based Detection of Laryngeal Diseases (켑스트럼 기반의 후두암 감별을 위한 채널보상)

  • Kim Young Kuk;Kim Su Mi;Kim Hyung Soon;Wang Soo-Geun;Jo Cheol-Woo;Yang Byung-Gon
    • MALSORI
    • /
    • no.50
    • /
    • pp.111-122
    • /
    • 2004
  • Automatic detection of laryngeal diseases by voice is attractive because of its non-intrusive nature. Cepstrum based approach to detect laryngeal cancer shows reliable performance even when the periodicity of voice signals is severely lost, but it has a drawback that it is not robust to channel mismatch due to different microphone characteristics. In this paper, to deal with mismatched training and test microphone conditions, we investigate channel compensation techniques such as Cepstral Mean Subtraction (CMS) and Pole Filtered CMS (PFCMS). According to our experiments, PFCMS yields better performance than CMS. By using PFCMS, we obtained 12% and 40% error reduction over baseline and CMS, respectively.

  • PDF