• Title/Summary/Keyword: acoustic silence

Search Result 16, Processing Time 0.019 seconds

Reinterpretation of the Perception of Place Cues in the Reduced Closure Duration of Stop Consonant Clusters (폐쇄자음군의 폐쇄구간 축소에 따른 위치성 지각에 대한 재해석)

  • 이석재
    • MALSORI
    • /
    • no.45
    • /
    • pp.1-14
    • /
    • 2003
  • This paper criticizes S. Kim (1992), claiming that the perception of place cues in the reduced stop consonant clusters ('reducing' means 'cutting off' the acoustic silence in stop clusters) largely depends on the acoustic characteristics such as formant transition and noise frequency distribution of stop burst, rather than the closure duration time as advocated by S. Kim (1992). The claim is based on the perception test conducted upon 111 stimuli over 10 subjects. The finding is that, when the closure duration is cut off up to the point where only one stop is perceived, place of the second stop, not the first one, in the cluster is in most cases perceived regardless of the places of the first and second stops. It is likely that the place cues of the stop in the prevocalic position mask those in the postvocalic position.

  • PDF

A Study on the Equalization for Low Power Underwater Acoustic Communication (저전력 수중음향통신을 위한 등화기에 관한 연구)

  • Lee, Tae-Jin;Kim, Ki-Man
    • Journal of Navigation and Port Research
    • /
    • v.36 no.3
    • /
    • pp.169-173
    • /
    • 2012
  • In this paper, we propose an equalizer to minimize the inter-symbol interference when PSSK(Phase Silence Shift Keying) technique is applied to the low power underwater acoustic communication. PSSK is a QPSK(Quadrature Phase Shift Keying) modulation combined with PPM(Pulse Position Modulation), and it was proposed for low power communication. However, it has poor performance due to delay spread of underwater channel. In this paper, we propose a decision feedback equalizer to minimize the error in PSSK receiver. The sea trial was performed to evaluate the performance of the proposed method. In the result, the BER of PSSK was $4.36{\times}10^{-2}$ before the equalizer was applied, but the BER of PSSK was $3.95{\times}10^{-4}$ after the proposed equalizer was applied.

Improvement of an Automatic Segmentation for TTS Using Voiced/Unvoiced/Silence Information (유/무성/묵음 정보를 이용한 TTS용 자동음소분할기 성능향상)

  • Kim Min-Je;Lee Jung-Chul;Kim Jong-Jin
    • MALSORI
    • /
    • no.58
    • /
    • pp.67-81
    • /
    • 2006
  • For a large corpus of time-aligned data, HMM based approaches are most widely used for automatic segmentation, providing a consistent and accurate phone labeling scheme. There are two methods for training in HMM. Flat starting method has a property that human interference is minimized but it has low accuracy. Bootstrap method has a high accuracy, but it has a defect that manual segmentation is required In this paper, a new algorithm is proposed to minimize manual work and to improve the performance of automatic segmentation. At first phase, voiced, unvoiced and silence classification is performed for each speech data frame. At second phase, the phoneme sequence is aligned dynamically to the voiced/unvoiced/silence sequence according to the acoustic phonetic rules. Finally, using these segmented speech data as a bootstrap, phoneme model parameters based on HMM are trained. For the performance test, hand labeled ETRI speech DB was used. The experiment results showed that our algorithm achieved 10% improvement of segmentation accuracy within 20 ms tolerable error range. Especially for the unvoiced consonants, it showed 30% improvement.

  • PDF

Performance of music section detection in broadcast drama contents using independent component analysis and deep neural networks (ICA와 DNN을 이용한 방송 드라마 콘텐츠에서 음악구간 검출 성능)

  • Heo, Woon-Haeng;Jang, Byeong-Yong;Jo, Hyeon-Ho;Kim, Jung-Hyun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.19-29
    • /
    • 2018
  • We propose to use independent component analysis (ICA) and deep neural network (DNN) to detect music sections in broadcast drama contents. Drama contents mainly comprise silence, noise, speech, music, and mixed (speech+music) sections. The silence section is detected by signal activity detection. To detect the music section, we train noise, speech, music, and mixed models with DNN. In computer experiments, we used the MUSAN corpus for training the acoustic model, and conducted an experiment using 3 hours' worth of Korean drama contents. As the mixed section includes music signals, it was regarded as a music section. The segmentation error rate (SER) of music section detection was observed to be 19.0%. In addition, when stereo mixed signals were separated into music signals using ICA, the SER was reduced to 11.8%.

Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation (자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리)

  • Park Hyeyoung;Kim Hyungsoon
    • MALSORI
    • /
    • no.43
    • /
    • pp.137-150
    • /
    • 2002
  • Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.

  • PDF

A Study On The Automatic Discrimination Of The Korean Alveolar Stops (한국어 파열음의 자동 인식에 대한 연구 : 한국어 치경 파열음의 자동 분류에 관한 연구)

  • Choi, Yun-Seok;Kim, Ki-Seok;Hwang, Hee-Yeung
    • Proceedings of the KIEE Conference
    • /
    • 1987.11a
    • /
    • pp.330-333
    • /
    • 1987
  • This paper is the study on the automatic discrimination of the Korean alveolar stops. In Korean, it is necessary to discriminate the asperate/tense plosive for the automatic speech recognition system because we, Korean, distinguish asperate/tense plosive allphones from tense and lax plosive. In order to detect acoustic cues for automatic recognition of the [ㄲ, ㄸ, ㅃ], we have experimented the discrimination of [ㄷ,ㄸ,ㅌ]. We used temporal cues like VOT and Silence Duration, etc., and energy cues like ratio of high frequency energy and low frequency energy as the acoustic parameters. The VCV speech data where V is the 8 Simple Vowels and C is the 3 alevolar stops, are used for experiments. The 192 speech data are experimented on and the recognition rate is resulted in about 82%-95%.

  • PDF

Application of Seo Dongil's Voice Technique in Patient with Adductor Spasmodic Dysphonia: A Case Study (내전형 경련성 발성장애인에서 서동일 음성치료 기법의 적용 1례)

  • Seo, Dong-Il;Yoo, Jae-Yeon;Jeong, Ok-Ran;Choi, Hong-Shik
    • Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.39-47
    • /
    • 2002
  • The purpose of this study was to investigate the effects of Seo Dongil's voice technique on voice quality in patient with adductor spasmodic dysphonia. One patient participated in the study. The subject was assessed acoustically (Ave Fo, Ave Int, percent speech time, percent silence time, percent voice time, percent voiceless time) and perceptually (GRBAS scales) in the first and last session. Dr. Speech (version 4.0, Tiger-DRS) was used to compare acoustic parameters of pre-and post-treatment. Seo Dongil's voice technique consisted of relaxation, breathing exercise and phonation exercise. The results were as follows: First, Seo Dongil's voice technique tented to be effective on decreasing voice break and voice stoppage in patient with adductor spasmodic dysphonia. Second, GRBAS scales showed that Seo Dongil's voice technique was effective on improving voice quality of patient with adductor spasmodic dysphonia.

  • PDF

Noise Characteristic Analysis of 3-phase SRM for Traction Applications (견인용 3상 SRM의 소음특성 해석)

  • 안진우;이동희;안영주
    • The Transactions of the Korean Institute of Electrical Engineers B
    • /
    • v.53 no.4
    • /
    • pp.224-228
    • /
    • 2004
  • The switched reluctance motor(SRM) drive system provides a good adjustable speed and torque characteristics. SRM has the possibility of maintaining full power over a wide speed range. So, many attempts are being done from home appliances to industrial machinery and tools. Especially, the traction drive of SRM is one of a good application due to it's torque characteristics. However, because of the switching mechanism, it has some disadvantage of noise and vibration. It is difficult to adopt to an appliance demanding silence. A noise simulations and test of 3-phase 6/4, 6/8 and 12/8 SRM were done in other to compare each other. The test results show that 12/8 SRM has good noise characteristics.

Realization a Text Independent Speaker Identification System with Frame Level Likelihood Normalization (프레임레벨유사도정규화를 적용한 문맥독립화자식별시스템의 구현)

  • 김민정;석수영;김광수;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.1
    • /
    • pp.8-14
    • /
    • 2002
  • In this paper, we realized a real-time text-independent speaker recognition system using gaussian mixture model, and applied frame level likelihood normalization method which shows its effects in verification system. The system has three parts as front-end, training, recognition. In front-end part, cepstral mean normalization and silence removal method were applied to consider speaker's speaking variations. In training, gaussian mixture model was used for speaker's acoustic feature modeling, and maximum likelihood estimation was used for GMM parameter optimization. In recognition, likelihood score was calculated with speaker models and test data at frame level. As test sentences, we used text-independent sentences. ETRI 445 and KLE 452 database were used for training and test, and cepstrum coefficient and regressive coefficient were used as feature parameters. The experiment results show that the frame-level likelihood method's recognition result is higher than conventional method's, independently the number of registered speakers.

  • PDF

Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information (음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할)

  • 박창목;왕지남
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.24-30
    • /
    • 2001
  • This article is concerned with automatic segmentation for Korean speech signals. All kinds of transition cases of phonetic units are classified into 3 types and different strategies for each type are applied. The type 1 is the discrimination of silence, voiced-speech and unvoiced-speech. The histogram analysis of each indicators which consists of wavelet coefficients and SVF (Spectral Variation Function) in wavelet coefficients are used for type 1 segmentation. The type 2 is the discrimination of adjacent vowels. The vowel transition cases can be characterized by spectrogram. Given phonetic transcription and transition pattern spectrogram, the speech signal, having consecutive vowels, are automatically segmented by the template matching. The type 3 is the discrimination of vowel and voiced-consonants. The smoothed short-time RMS energy of Wavelet low pass component and SVF in cepstral coefficients are adopted for type 3 segmentation. The experiment is performed for 342 words utterance set. The speech data are gathered from 6 speakers. The result shows the validity of the method.

  • PDF