• Title/Summary/Keyword: speech waveform

Search Result 135, Processing Time 0.032 seconds

On Altering the Pitch of Speech Signals in Waveform Coding -Alteration Method by the LPC and the Pitch Halving- (음성 파형코딩 음원피치 변경에 관한 연구 -LPC와 주기반분법에 의한 피치변경법-)

  • 배명진;윤희상;안수길
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.5
    • /
    • pp.11-19
    • /
    • 1991
  • 음성 신호의 합성기법들 중에서 파형코딩법은 음질이 우수하기 때문에 분석에 의한 합성법으로 많이 사용하고 있다. 그렇지만 음원과 성도의특성을 분리하지 않고 파형의 잉여분만을 제거한 후에 파 형자체를 저장하기 때문에 규칙에 의한 합성기법으로 사용하기에는 어려움이 많다. 본 논문은 파형코딩 법 중 선형 PCM 코딩법으로 저장된 음성파형에 대해 피치를 양분할 수 있는 주기반분법을 제안하여 파형자체의 음원을 분리하지 않고 피치 주기를 변경시킬 수 있는 새로운 피치 변경법을 제안하였다. 따 라서 음질이 우수한 파형코딩 합성법으로 규칙에 의한 합성을 수행할 수 있다.

  • PDF

On the Use of Pre=-and Post-Filters in Speech Waveform Coding (PRE-FILTER와 POST-FILTER를 사용하여 음성파형 부호화 방법에 관하여)

  • 조동호;은종관;김제우
    • The Journal of the Acoustical Society of Korea
    • /
    • v.4 no.3
    • /
    • pp.33-41
    • /
    • 1985
  • 이 논문에서는 frequency-weighted MSE를 최소화하는 적응 pre-filter와 post-filter를 음성파형 부호화기에 적용했을 때의 성능을 분석한다. 먼저 여러 다양한 pre-filter와 post-filter에 의한 noise shaping 효과를 이론적으로 보여준다. 그리고 frequency-weighted SNR 척도를 사용하여 적응 pre-filter 와 post filter에 의한 성능면에서의 이득을 이론적으로 유도한다. 적응 pre-filter와 post-filter를 ADM과 ADPCM 부호화기에 적용해본 결과에 의하면 음성파형 부호화기의 성능을 FWSNR\sub SEG\ 척도로 약 3dB 정도 개선할 수 있음을 알 수있다. 또한 pre-filter와 post-filter를 사용하면 청각적으로 중요한 영향을 미치는 1kHz에서 3kHz 사이의 양자화 잡음을 효과적으로 줄일 수 있다.

  • PDF

Noise Spectral Shaping in Speech Waveform Coding (음성파형 부호화에서의 잡음 SPECTRUM 변형에 관한 연구)

  • 이황수;은종관
    • The Journal of the Acoustical Society of Korea
    • /
    • v.3 no.2
    • /
    • pp.69-90
    • /
    • 1984
  • 본 논문에서는 잡음 spectrum 변형 기능을 가진 APCM, ADPCM 및 ADM 음성 부호기의 성능 에 관해서 연구하였다. 잡은 SPECTRUM 변형방식은 두가지를 고려할 수 있는데, APCM과 ADPCM에 서는 C-massage weighting 된 양자화 잡음을 최소화하는 noise feedback filter를 이용하는 방법을 채택 하고, ADM에서는 in-band의 잡음의 일부를 신호대역의 밖으로 옮기는 방법을 사용하였다. APCM 과 ADPCM 부호기의 성능을 측정하는데는 주파수가 weighting이 된 신호대 잡음비와 segment된 FWSQNR를 사용하였다. 실제음성을 사용한 simulation 결과에 의하면 잡음 spectrum 변형기능을 가진 부호기가 없는 것보다 0.5 내지 3dB 가량 좋은 것으로 나타났다. 이러한 개선은 양적으로 비교적 적은 것이 사실이지만 실제로 음성을 들어보면 음질이 현저히 좋아짐을 알 수 있었다.

  • PDF

On a Duration Control Method of Speech Waveform by an Automatic Pitch Point Detection (자동 피치시점 검출에 의한 음성신호의 지속시간 조절 법에 관한 연구)

  • Park Won;Park HyungBin;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.217-220
    • /
    • 2000
  • 일반적으로 고음질 음성합성을 하기 위해서는 합성음의 지속 시간을 변경하여 줌으로써 운율을 조절하는 기법이 필요하다 이에 먼저 고음질용 음성부호화법을 선정하여야 하고 정확한 피치와 피치시점검출을 통해서 음원분류가 되어야한다. 본 논문에서는 제안한 자동 피치시점 검출을 적용해서 운율조절에 필요한 지속시간 조절 법을 제안하고자 한다. 제안한 방법은 시간영역에서 직접 처리하기 때문에 피치동기분석이 용이하고 다른 영역으로의 변환과정이 불필요하다. 결과적으로 파형부호화법을 적용하고 제안한 자동 피치서점 검출에 의한 지속시간 조절법을 적용하였을 때 비교적 우수한 결과를 얻을 수 있었다.

  • PDF

The Computation Reduction Algorithm Independent of the Language for CELP Vocoders (각국 언어 특성에 독립적인 CELP 계열 보코더에서의 계산량 단축 알고리즘)

  • Ju, Sang-Gyu
    • Proceedings of the KAIS Fall Conference
    • /
    • 2010.05a
    • /
    • pp.257-260
    • /
    • 2010
  • In this paper, we propose the computation reduction methods of LSP(Line spectrum pairs) transformation that is mainly used in CELP vocoders. In order to decrease the computational time in real root method the characteristic of four proposed algorithms is as the following. First, scheme to reduce the LSP transformation time uses mel scale. Developed the second scheme is the control of searching order by the distribution characteristic of LSP parameters. Third, scheme to reduce the LSP transformation time uses voice characteristics. Developed the fourth scheme is the control of searching interval and order by the distribution characteristic of LSP parameters. As a result of searching time, computational amount, transformed LSP parameters, SNR, MOS test, waveform of synthesized speech, spectrogram analysis, searching time is reduced about 37.5%, 46.21%, 46.3%, 51.29% in average, computational amount is reduced about 44.76%, 49.44%, 47.03%, 57.40%. But the transformed LSP parameters of the proposed methods were the same as those of real root method.

  • PDF

Noise Reduction for Korean Connected Digit Recognition through Telephone Channel (전화망 환경에서 한국어 숫자음 인식을 위한 잡음처리)

  • Kim Kyuhong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.211-214
    • /
    • 2003
  • 일반적으로 음성 인식에서의 성능은 잡음의 영향으로 인하여 저하된다. 전화망을 통한 한국어 연속 숫자음 인식은 음성인식 분야에 있어서 어려운 영역에 속하는데, 이는 조음 현상으로 인한 인식률 저하되는 점과 전화망 채널의 영향으로 인하여 스펙트럼 포락이 왜곡되며 음성신호의 대역폭이 제한되기 때문이다. 본 논문에서는 잡음의 영향을 줄이기 위하여, 2WF(2-stage Wiener Filter) 와 SWP (SNR-dependent Waveform Processing) 그리고 CMN(Cepstrum Mean Normalization)을 사용하였다. 2WF는 음성 신호의 포만트 구조를 적게 왜곡시키면서 전체적인 가산잡음 뿐만 아니라 동적 가산잡음도 줄여준다. SWP는 음성파형에서 SNR값이 상대적으로 큰 부분을 강조하여 전체적인 SNR을 향상시킬 수 있다. 또한, CMN은 특징벡터로부터 채널잡음의 영향을 정규화하여 음성 인식 성능을 향상시킨다. 이러한 방법들을 전화망 한국어 연속 숫자음 DB를 이용하여 실험한 결과, 음성신호의 왜곡을 최소화하면서 잡음의 영향을 줄여 전화망에서의 숫자음 인식 성능을 향상시킬 수 있었다.

  • PDF

Embedded Waveform Coding of Speech (음성 파형의 Embedded 부호화에 관한 연구)

  • 이형호;은종관
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.21 no.3
    • /
    • pp.73-83
    • /
    • 1984
  • The performances of embedded adaptive differential pulse code modulation (ADPCM), embedded adaptive delta modulation (ADM), and the same systems with a delayedfecision scheme have been studied with real speech over a wide dynamic range. The embedded ADPCM and ADM coders have been obtained by modifying the conventional ADPCM and ADM coders. The basic scheme of the embedded ADPCM coder is based on the ADPCM originally proposed by Cummiskey et at. For embedded ADM systems, we have modified continuously variable slope DM (CVSD) and hybrid commanding DM (HCDM) systems. Among these embedded coders, the performance of the embedded HCDM is superior to the other coders over a wide range of transmission rate from 16 to 64 kbits/s, When the delayedtecision scheme is applied to the embedded ADPCM the performance is improved significantly at all transmission rates. But, in the embedded ADM systems with 16 kHz sampling rate, the performance improvement resulting from delayed decision is not drastic as is in the embedded ADPCM with the same number of delayed samples.

  • PDF

Effects of Motor Learning Guided Laryngeal Motor Control Therapy for Muscle Misuse Dysphonia (운동학습이론에 기초한 발성운동조절법이 근오용성 발성장애의 음성에 미치는 효과)

  • Seo, In-Hyo;Lee, Ok-Bun;Lee, Sang-Joon;Chung, Phil-Sang
    • Phonetics and Speech Sciences
    • /
    • v.3 no.3
    • /
    • pp.133-140
    • /
    • 2011
  • Muscle misuse dysphonia (MMD) is defined as a behavioral voice disorder resulting from inappropriate contractions of intrinsic and/or extrinsic laryngeal muscles. The purpose of this study was to investigate the effect of motor learning guided laryngeal motor control therapy (MLG-LMCT) which is designed to improve an existing LMT and further the effective voice treatment on people with muscle misuse dysphonia. Forty-six people with MMD (M:F=16:30) participated in this study. The voice samples of the participants were recorded to investigate the effect of MLG-LMCT before and after the voice therapy. Voice samples were analyzed via electro-glotto-graph (EGG). Contact quotient (CQ), speed quotient (SQ), and waveform were reported. In addition, perceptual and acoustical evaluation were conducted to determine the change of voice improvement after treatment. The experimenter massaged the tensioned muscles around the neck. In order to find more proper phonation the experimenter showed the subjects their EGG wave forms as to whether or not they are moving the vocal folds to the appropriate position. Therefore, the EGG wave forms were used as a type of visual feedback. With the wave form, the experimenter helped subjects move the vocal folds and laryngeal muscles to find more proper voice production. The sensory stimuli from the experimenter gradually faded out. A paired dependent t- test revealed that there was significant differences in CQ between pre- and post-therapy. Perceptually, overall, rough, breathy, strain, and transition were significantly reduced. Acoustically, there were significant differences in Fo, jitter, shimmer, and NHR. After using MLG-LMCT, most of the subjects showed improvements in voice quality. The results from this study led us to the following conclusions: Motor learning guided laryngeal motor control therapy (MLG-LMCT) has reduces muscle misuse dysphonia. These results may occur because a visual feedback from EGG wave form can maintain the effect of the muscle tension reduction from laryngeal manual therapy. In case of people with MMD who reduced muscle tension from the therapy (LMT) but, not appropriately manipulating the location of larynx or adducting the vocal folds, MLG-LMCT might be an alternative therapy approach.

  • PDF

Same music file recognition method by using similarity measurement among music feature data (음악 특징점간의 유사도 측정을 이용한 동일음원 인식 방법)

  • Sung, Bo-Kyung;Chung, Myoung-Beom;Ko, Il-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.99-106
    • /
    • 2008
  • Recently, digital music retrieval is using in many fields (Web portal. audio service site etc). In existing fields, Meta data of music are used for digital music retrieval. If Meta data are not right or do not exist, it is hard to get high accurate retrieval result. Contents based information retrieval that use music itself are researched for solving upper problem. In this paper, we propose Same music recognition method using similarity measurement. Feature data of digital music are extracted from waveform of music using Simplified MFCC (Mel Frequency Cepstral Coefficient). Similarity between digital music files are measured using DTW (Dynamic time Warping) that are used in Vision and Speech recognition fields. We success all of 500 times experiment in randomly collected 1000 songs from same genre for preying of proposed same music recognition method. 500 digital music were made by mixing different compressing codec and bit-rate from 60 digital audios. We ploved that similarity measurement using DTW can recognize same music.

  • PDF

Comparative Study on Acoustic Characteristics of Vocal Fold Paralysis and Benign Mucosal Disorders of Vocal Fold (성대마비와 양성 성대점막질환의 음향학적 특성비교)

  • Kong, Il-Seung;Cho, Young-Ju;Lee, Myung-Hee;Kim, Jong-Seung;Yang, Yun-Su;Hong, Ki-Hwan
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.18 no.2
    • /
    • pp.122-128
    • /
    • 2007
  • This study aims to analyze the voices of the patients with voice disorders including vocal fold paralysis, vocal fold cyst and vocal nodule/polyp in the aspect of acoustic phonetics. This study intends to collect subsidiary acoustic data in order to make a speech treatment and an standardization of vocal disorders. Subjects and Methods: The subjects of this study were 64 adult patients who underwent indirect laryngoscopy and laryngostroboscopy, and were diagnosed as vocal fold paralysis, vocal fold cyst or vocal nodule/polyp. Experimental group consisted of 20 patients who were diagnosed as vocal fold paralysis, 21 patients who were diagnosed as vocal fold cyst and had the average age of 42.0 $({\pm}10.03)$ ; and 23 patients who were diagnosed as vocal nodule/polyp and had the average age of 40.9 $({\pm}13.75)$. For the methodology of this study, the patients listed above were asked to sit in a comfortable position at intervals of 10cm apart from the patient's mouth and a microphone, and subsequently to phonate a vowel sound /e/ for the maximum phonation time with natural tone and vocal volume then the sound was directly inputted on a computer. During recording, sampling rate was set to 44,100Hz and the 1-second area corresponding to stable zone except the first and the last stage of waveform of the vowel sound /e/ vocalized by the individual patients was analyzed. Results: First, there was no statistically significant difference in jitter and shimmer between vocal fold paralysis and vocal fold cyst, while there was highly statistically significant difference in them between vocal fold paralysis and vocal nodule/polyp. Second, looking into the mean values obtained from NNE, HNR and SNR results associated with noise ratio, the disease showing the most abnormal characteristics was vocal fold paralysis, followed by cyst and nodule/polyp in order. For NNE, there was statistically significant difference between vocal nodule/polyp, and cyst or paralysis. In other words, it was found that the NNE of vocal nodule/polyp was weaker than that of cyst or paralysis. Similarly, HNR and SNR also showed the same characteristics; there was statistically significant difference between vocal fold paralysis and vocal fold cyst or nodule/polyp, and HNR and SNR values of vocal fold paralysis were lower than those of vocal fold cyst or nodule/polyp. Conclusion: For vocal fold paralysis, the abnormal values of acoustic parameters associated with frequency, amplitude and noise ratio were statistically significantly higher than those of vocal fold cyst and nodule/polyp. This finding suggests that the voices of the patients with vocal fold paralysis are the most severely injured due to less stability of vocal fold movement, asymmetry and incomplete glottic closure. In addition, there was no statistically significant difference in the acoustic parameters of tremor among vocal fold paralysis, vocal fold cyst and vocal nodule/polyp. Further studies need to ascertain reasonable acoustic parameters with various vocal disorders as well as to clarify the correlation between acoustics-based objective tools and subjective evaluations.

  • PDF