• Title/Summary/Keyword: speech process

검색결과 526건 처리시간 0.029초

Building English-to-Korean Transliteration Dictionary Based on Pronouncing Dictionary (발음 사전에 기반한 영.한 음차 표기 사전의 구축)

  • Lee, Do-Gil
    • Phonetics and Speech Sciences
    • /
    • 제1권3호
    • /
    • pp.103-108
    • /
    • 2009
  • This paper proposes a method for building a transliteration dictionary, which is based on pronouncing information extracted from two kinds of existing dictionaries. Also, it proposes a method for transforming the pronouncing information into Korean translitered words. To express the pronouncing information, we define Phoman code system. In order to avoid phonetic estimation process of English words which is the most important problem, the proposed method uses the pronouncing information extracted from the existing dictionaries. Therefore, unlike previous approaches, the proposed method does not need any incomplete phonetic estimation process so that it can produce accurate transliteration results. The proposed method has been fully implemented.

  • PDF

Speaker Identification Using Augmented PCA in Unknown Environments (부가 주성분분석을 이용한 미지의 환경에서의 화자식별)

  • Yu, Ha-Jin
    • MALSORI
    • /
    • 제54호
    • /
    • pp.73-83
    • /
    • 2005
  • The goal of our research is to build a text-independent speaker identification system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severely degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(principal component analysis) can improve the performance in the situation. We also propose an augmented PCA process, which augments class discriminative information to the original feature vectors before PCA transformation and selects the best direction for each pair of highly confusable speakers. The proposed method reduced the relative recognition error by 21%.

  • PDF

DYNAMIC TIME WARPING METHOD AND ITS APPLICATION

  • Youn Sang-Youn;Kim Woo Youl
    • Journal of the military operations research society of Korea
    • /
    • 제17권1호
    • /
    • pp.105-129
    • /
    • 1991
  • Dynamic Time Warping(in short DTW) is a kind of sequence comparison method. It is widely used in human speech recognition. The timing difference between two speech patterns to be compared is removed by warping the time axes of the speech pattern by minimising the time-normalised distance between them. In the process of finding the minimum time-normalised distance. the efficient method is dynamic programming problem. This paper describes the concept of dynamic time warping method, mathematical formulation and an application.

  • PDF

Spectral subtraction based on speech state and masking effect

  • 김우일;강선미;고한석
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 1998년도 하계종합학술대회논문집
    • /
    • pp.599-602
    • /
    • 1998
  • In this paper, a speech enhancement method based on phonemic properties and masking effect is propsoed. It is a modified type of spectral subtraction wherein the spectral sharpening process is exploited in unvoiced state considering the phonemic properties. The masking threshold is used to remove the residual noise. The proposed spectral subtraction shows similar performance as that of the classical spectral subtraction method in view of the SNR. But by the prposed scheme, the unvoiced sound region is shown to exhibit relatively less signal distortion in the enhanced speech.

  • PDF

Speech Feature Extraction for Isolated Word in Frequency Domain (주파수 영역에서의 고립단어에 대한 음성 특징 추출)

  • 조영훈;박은명;강홍석;박원배
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 2000년도 하계종합학술대회 논문집(4)
    • /
    • pp.81-84
    • /
    • 2000
  • In this paper, a new technology for extracting the feature of the speech signal of an isolated word by the analysis on the frequency domain is proposed. This technology can be applied efficiently for the limited speech domain. In order to extract the feature of speech signal, the number of peaks is calculated and the value of the frequency for a peak is used. Then the difference between the maximum peak and the second peak is also considered to identify the meanings among the words in the limited domain. By implementing this process hierarchically, the feature of speech signal can be extracted more quickly.

  • PDF

On a Study of Measurement Method of Utterance Velocity for the Reduction of Transmission Rate in CELP Vocoder. (LSP 파라미터를 이용한 발성측정법)

  • 장경아;배명진
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 2000년도 추계종합학술대회 논문집(4)
    • /
    • pp.199-202
    • /
    • 2000
  • Speaking Rate has variety depends on the situation and habit of speakers. It has been many studied about speaking rate In speaker recognition. The study of speaking rate in speech recognition is one of considerable matter when It is recognized the speakers and it is measured by many speech data base and complicate estimation for accuracy. In this paper, conventional vocoder process the speech signal when encoding and transmitting without regard to speaking rate so in order to apply the speaking rate for vocoder It should be considered the simpler algorithm and less computation amount than the conventional method of speaking rate used In speech recognition. We proposed the speaking rate algorithm which is used the simple parameter with Line Spectrum Pair (LSP). The proposed peaking rate method is measured by the information of LSP in speech. We measured the variety rate of phenomenon about utterances which have different velocity, respectively. As a result, It has distinct variation rate of phenomenon between utterances uttered fast and slow and the rate is 42.8% higher in case of uttered fast than in case of uttered slow.

  • PDF

On the Research of a Speech Coder Using a Multi-Level Amplitude Codebook (다중레벨 진폭 코드북을 이용한 음성 부호화기에 관한 연구)

  • 홍성훈;김정진박영호배명진
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 1998년도 추계종합학술대회 논문집
    • /
    • pp.1219-1222
    • /
    • 1998
  • This paper analyzes the dynamic spars algebraic codebook used to model a residual signal and proposes a new algebraic codebook structure as well as a searching process with improved performance. The proposed algorithm improves the disadvantage of algebraic codebook without increased computation. First, this paper makes it possibel to select various pulse amplitudes differently from the conventional method which looks up the sign bit simply. In addition, two pulses are made to be selected on the same track. For speech quality on the telephone line 5.6kbps speech coder using the proposed algorithm was equivalent to the 6.3kbps MP-MLQ in the viewpoint of subjective speech quality. However, speech degradation was caused a little compared to the MP-MLQ where MNRU 1=15dB.

  • PDF

A Study on Design and Implementation of Embedded System for speech Recognition Process

  • Kim, Jung-Hoon;Kang, Sung-In;Ryu, Hong-Suk;Lee, Sang-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제14권2호
    • /
    • pp.201-206
    • /
    • 2004
  • This study attempted to develop a speech recognition module applied to a wheelchair for the physically handicapped. In the proposed speech recognition module, TMS320C32 was used as a main processor and Mel-Cepstrum 12 Order was applied to the pro-processor step to increase the recognition rate in a noisy environment. DTW (Dynamic Time Warping) was used and proven to be excellent output for the speaker-dependent recognition part. In order to utilize this algorithm more effectively, the reference data was compressed to 1/12 using vector quantization so as to decrease memory. In this paper, the necessary diverse technology (End-point detection, DMA processing, etc.) was managed so as to utilize the speech recognition system in real time

Listener's Age Estimation by Prosody Manipulation (운율 변조 양상에 따른 청자의 연령 지각)

  • Kim, Jiyoun;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • 제6권2호
    • /
    • pp.81-88
    • /
    • 2014
  • The normal aging process on speech production and these changes are perceived by listeners. This study examined whether age perception changed under various conditions of prosodic manipulations in normal listeners, comparing the prosodic changes according to age and sex in adulthood. The older and younger voices were resynthesized by manipulation of the speaking rate and pitch to shift the perceived age of the groups toward each other. Two-way repeated ANOVA were conducted to determine if the prosodic type of resynthesized cue resulted in a significant shift in perceived age of young and old voices. The manipulation of the speaking rate resulted in a significant shift in perceived age for the older and younger groups. A significant shift in age estimates was not observed for the younger male group when pitch was manipulated. There were significant gender-by-age group interactions for prosodic manipulation type. Age-related changes in the prosodic properties of speech may ultimately influence speech perception.

Acoustic, Intraoral Air Pressure and EMG Studies of Vowel Devoicing in Korean

  • Kim, Hyun-Gi;Niimi, Sei-Ji
    • Speech Sciences
    • /
    • 제10권1호
    • /
    • pp.3-13
    • /
    • 2003
  • The devoicing vowel is a phonological process whose contrast in sonority is lost or reduces in a particular phonetic environment. Phonetically, the vocal fold vibration originates from the abduction/adduction of the glottis in relation to supraglottal articulatory movements. The purpose of this study is to investigate Korean vowel devoicing by means of experimental instruments. The interrelated laryngeal adjustments and aerodynamic effects for this voicing can clarify the redundant articulatory gestures relevant to the distinctive feature of sonority. Five test words were selected, being composed of the high vowel /i/, between the fricative and strong aspirated or lenis affricated consonants. The subjects uttered the test words successively at a normal or at a faster speed. The EMG, the sensing tube Gaeltec S7b and the High-Speech Analysis system and MSL II were used in these studies. Acoustically, three different types of speech waveforms and spectrograms were classified, based on the voicing variation. The intraoral air pressure curves showed differences, depending on the voicing variations. The activity patterns of the PCA and the CT for devoicing vowels appeared differently from those showing the partially devoicing vowels and the voicing vowels.

  • PDF