• Title/Summary/Keyword: 무성음

Search Result 122, Processing Time 0.023 seconds

An Experiment of a Spoken Digits-Recognition System (숫자음성 자동 인식에 관한 일실험)

  • ;安居院猛
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.15 no.6
    • /
    • pp.23-28
    • /
    • 1978
  • This paper describes a speech recognition system for ten isolated spoken digits. In this system, acoustic parameters such as zero crossing rate, log energy and three formant frequencies estimated by linear prediction method were extracted for classification and/or recognition purpose(s). The former two parameters were used for the classification of unvoiced consonants and the latter one for the recognition of vowels and voiced consonants. Promising recognition results were obtained in this experiment for ten digit utterances spoken by a male speaker.

  • PDF

Spectral Subtraction Usnig Whitening Filter for Reducing Residual Noise (잔류잡음 감소를 위한 백색화 스펙트럼 차감법)

  • 오태호
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.411-414
    • /
    • 1998
  • 음성의 음질 향상(Speech Enhancement)을 위한 여러 가지 방법 중에서 주파수 차감법(Spectral Subtraction)은 계산량이 적기 때문에 현재 실시간으로 Speech Enhancement를 할 수 있는 가장 적절한 방법이다. 그러나, 이 방법은 원래의 입력음성에 없던 새로운 잡음을 만들어내는 큰 단점이 있는데, 이를 제거하기 위해 많은 연구가 되어오고 있다. 이러한 연구의 방향은 대부분 주변프레임 또는 주변의 주파수 성분과의 평균을 통해 피크값을 무디게 해 줌으로써 새로 생긴 튀는 잡음을 감소시키는 것이다. 이런 방법은 음성자체의 정보 또한 평균이 되어버리게 하는 새로운 단점을 낳는데, 이런 현상은 무성음구간에서 특히 심각해진다. 본 논문에서는 입력음성의 LPC 분석으로 백색필터(Whitening Filter)를 구성하여 이를 통과시킨 잔류신호(Residual)를 주파수 차감하여 얻은 새로운 잔류신호를 역 필터링하여(Synthesis Filter) 개선된 음성을 얻는 방법을 제안하였다. 제안된 알고리듬은, 주파수 차감시 포만트(Formant)의 정보가 더 유지 될 수 있기 때문에 잔류잡음을 줄일 수 있다. 청취 테스트 결과 제안한 방법이 기존의 방법보다 잔류잡음을 더 줄이는 사실을 확인할 수 있었다.

  • PDF

A study on the perception of POA and voicing in relation to the release and nonrelease in the English word-final stops (영어 어말 폐쇄음 파열 유무에 따른 위치성 및 유.무성성 인식에 관한 연구)

  • Rhee Seok-Chae;Kang Sooha;Park Jihyun;Hwang Sunmin
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.43-49
    • /
    • 2003
  • This study reveals the perceptual role of stop release burst to Koreans' recognition of POA(place of articulation) and voicing in the English word-final stops. 10 Korean subjects participated in a perception experiment wherein the stimuli are prepared on the basis of the amount of acoustic information, which includes the release burst. The result shows that i) release burst plays an important role in the recognition of POA in the order of velar, alveolar, and bilabial stops, and ii) the release burst more enhances the correct recognition of voiceless stops than that of voiced stops. This result leads us to conclude that the role of stop release burst differs with respect to the POA and voicing of the stops, and it is possibly related to the different intensity of release in voicing and in each POA.

  • PDF

An Experimental Studies on Vowel Duration Differences before Consonant Clusters and unreleased stops of coda-position (영어 어말 자음군 구성에 따른 선행모음 길이 변화 및 어말 자음 비파열 현상에 대한 실험음성학적 연구 -무성 폐쇄음을 중심으로-)

  • Shin Dong-Jin
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.55-58
    • /
    • 2006
  • The aim of this paper is to investigate the effects of postvocalic consonant cluster (Contrasting nasal-stops consonant with stops) on vowel duration. In particular we focused on the rate of vowel duration in their words. (Experimental I ) and the tendency of unreleased voiceless stops at the end of the words.(Experimental II). The result of experimental I showed that the rate of vowel duration which is preceding single voiceless stops are significantly longer than those preceding nasal-stops counterparts and the percentage of English native speakers was longer than those of Korean leaners of English Experiment II indicated that the tendency of unreleased stop consonants occurred more frequently on single voiceless stops than nasal-stop clusters and Korean learners of English were more frequently produced the unreleased stops than English natives.

  • PDF

A study on the release burst spectra of the voiceless plosives from the English and Korean spontaneous speech corpus (영어와 한국어 자연발화 코퍼스에서의 무성 폐쇄음 개방 파열 스펙트럼 연구)

  • Hwang, Sunmi;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.27-34
    • /
    • 2017
  • The purpose of this work is to examine the English and Korean voiceless plosives from the Buckeye[15] and Seoul[16] corpus in terms of their static spectral characteristics. The plosives were automatically extracted by a Praat script. In order to estimate the percent correctness in the classification of the plosives, discriminant analyses were performed whose trainings were based on four spectral moments, i.e. the center of gravity, variance, skewness and kurtosis as suggested in [6]. Another set of discriminant analyses were performed based on the spectral tilts. In the last set of analyeses, the spectral moments and tilts were both used in the training. Results showed that the correct classification rate did not exceed around 65% in the best case, which suggested that phonetic cues other than the release burst would be necessary including the dynamic spectral aspects and vowel-onset cues.

An Experimental Studies on Vowel Duration Differences before Voiced and Voiceless Consonants pronounced by Korean Learners of English - From Fricatives and Affricates sounds - (한국인 영어학습자의 영어 어말자음 유/무성에 따른 모음길이 변화현상에 대한 실험음성학적 연구 - 마찰음, 폐찰음 중심으로 한 발성실험을 통하여 -)

  • Shin, Dong-Jin;Sa, Jae-Jin
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.91-95
    • /
    • 2005
  • The aim of this paper is to investigate the effects of postvocalic voicing(Contrasting voiceless fricative and affricate with voiced fricative and affricate) on vowel duration. In particular we focused on the durational differences between vowels followed by voiceless and voiced consonants across three groups of speakers: English speakers, English bilinguals and Korean learners of English. the result of experimental I showed that durations of vowels preceding voiced fricative and affricates as well as voiced stops are significantly longer than those preceding voiceless counterparts. Experiment Ⅱ indicated that as the subjects exposed themselves longer to English speaking society, their pronunciation was increasingly similar to those of English native speakers.

  • PDF

An Active Region Detection Method for The Speech Playback-speed Control (음성재생 속도 제어를 위한 활성화 영역 검출방법)

  • Yoo, Deok-Hyeon;Kim, Dong-Hyeok;Jeon, Joon-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.98-105
    • /
    • 2012
  • This paper describes a new method for a speech playback speed control with high quality. The proposed method provides an adaptive threshold filtering solution for detecting active regions of a speech signal that are followed by playback speed. For a given playback speed, threshold value is adaptively determined with the statistics(:mean and standard deviation) of each frame in speech, and is used to select only active blocks within the current frame. To minimize quality degradation(i.e., pitch degradation) caused due to high-speed playback, the threshold filtering priorly eliminates relatively low-activity blocks including voice and unvoice. Simulation results show that the proposed scheme provides a playback speed control solution with higher quality than SOLA(Synchonized OverLap Add) method using the pitch extraction of speech.

Real-time implementation of the 2.4kbps EHSX Speech Coder Using a $TMS320C6701^TM$ DSPCore ($TMS320C6701^TM$을 이용한 2.4kbps EHSX 음성 부호화기의 실시간 구현)

  • 양용호;이인성;권오주
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.7C
    • /
    • pp.962-970
    • /
    • 2004
  • This paper presents an efficient implementation of the 2.4 kbps EHSX(Enhanced Harmonic Stochastic Excitation) speech coder on a TMS320C6701$^{TM}$ floating-point digital signal processor. The EHSX speech codec is based on a harmonic and CELP(Code Excited Linear Prediction) modeling of the excitation signal respectively according to the frame characteristic such as a voiced speech and an unvoiced speech. In this paper, we represent the optimization methods to reduce the complexity for real-time implementation. The complexity in the filtering of a CELP algorithm that is the main part for the EHSX algorithm complexity can be reduced by converting program using floating-point variable to program using fixed-point variable. We also present the efficient optimization methods including the code allocation considering a DSP architecture and the low complexity algorithm of harmonic/pitch search in encoder part. Finally, we obtained the subjective quality of MOS 3.28 from speech quality test using the PESQ(perceptual evaluation of speech quality), ITU-T Recommendation P.862 and could get a goal of realtime operation of the EHSX codec.c.

Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information (음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할)

  • 박창목;왕지남
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.24-30
    • /
    • 2001
  • This article is concerned with automatic segmentation for Korean speech signals. All kinds of transition cases of phonetic units are classified into 3 types and different strategies for each type are applied. The type 1 is the discrimination of silence, voiced-speech and unvoiced-speech. The histogram analysis of each indicators which consists of wavelet coefficients and SVF (Spectral Variation Function) in wavelet coefficients are used for type 1 segmentation. The type 2 is the discrimination of adjacent vowels. The vowel transition cases can be characterized by spectrogram. Given phonetic transcription and transition pattern spectrogram, the speech signal, having consecutive vowels, are automatically segmented by the template matching. The type 3 is the discrimination of vowel and voiced-consonants. The smoothed short-time RMS energy of Wavelet low pass component and SVF in cepstral coefficients are adopted for type 3 segmentation. The experiment is performed for 342 words utterance set. The speech data are gathered from 6 speakers. The result shows the validity of the method.

  • PDF

Speaker Identification Using Higher-Order Statistics In Noisy Environment (고차 통계를 이용한 잡음 환경에서의 화자식별)

  • Shin, Tae-Young;Kim, Gi-Sung;Kwon, Young-Uk;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.25-35
    • /
    • 1997
  • Most of speech analysis methods developed up to date are based on second order statistics, and one of the biggest drawback of these methods is that they show dramatical performance degradation in noisy environments. On the contrary, the methods using higher order statistics(HOS), which has the property of suppressing Gaussian noise, enable robust feature extraction in noisy environments. In this paper we propose a text-independent speaker identification system using higher order statistics and compare its performance with that using the conventional second-order-statistics-based method in both white and colored noise environments. The proposed speaker identification system is based on the vector quantization approach, and employs HOS-based voiced/unvoiced detector in order to extract feature parameters for voiced speech only, which has non-Gaussian distribution and is known to contain most of speaker-specific characteristics. Experimental results using 50 speaker's database show that higher-order-statistics-based method gives a better identificaiton performance than the conventional second-order-statistics-based method in noisy environments.

  • PDF