• 제목/요약/키워드: speech waveform

검색결과 135건 처리시간 0.022초

고음질 운율조절용 시간-주파수 혼성영역 피치변경법 (On a Pitch Alteration Technique in Time-Frequency Hybrid Domain for High Quality Prosody Control of Speech Signal)

  • 이상효;배명진
    • 한국음향학회지
    • /
    • 제16권4호
    • /
    • pp.106-109
    • /
    • 1997
  • 음성합성분야에서 파형부호화 합성방식은 합성음의 자연성과 명료성을 유지할 수 있다. 그렇지만 법칙에 의한 합성방식에 적용하려고 하면 운율조절을 위해 음성의 피치를 변경해야만 한다. 우리는 본 논문에서 시간영역에서 시간축조절 피치변경법에 의해 켑스트럼 피치변경법의 위상왜곡을 보상하는 시간-주파수 혼성형 피치변경법을 새로이 제안하였다. 이 방법은 연속 프레임에서 파형들간의 연결점에서 유발될 수 있는 위상스펙트럼 왜곡을 제거할 수 있고, 또한 200%의 피치변경에 대해서도 진폭스펙트럼의 왜곡이 1.18% 이하인 성능을 얻었다.

  • PDF

An Applicability of Teager Energy Operator and Energy Separation Algorithm for Waveform Distortion Analysis : Harmonics, Inter-harmonics and Frequency Variation

  • Cho, Soo-Hwan;Hur, Jin;Chung, Il-Yop
    • Journal of Electrical Engineering and Technology
    • /
    • 제9권4호
    • /
    • pp.1210-1216
    • /
    • 2014
  • This paper deals with an application of Teager Energy Operator (TEO) and Energy Separation Algorithm(ESA) to detect and determine various voltage waveform distortions like harmonics, inter-harmonics and frequency variation. Because the TEO and DESA algorithm was initially proposed for speech or communication analysis, its applications are limited to some types of waveform in the power quality analysis area. For example, an undistorted voltage signal is similar with a pure sinusoid. A voltage fluctuation is very similar with an amplitude-modulated signal, from the viewpoint of signal theory. And a continuous frequency variation is similar with a frequency-modulated signal, which is also known as a chirp signal. This paper is written to show that the TEO and DESA algorithm can be used for detecting occurrences of the representative waveform distortions and determining their instantaneous information of amplitude and frequency.

피치 변경 발성에 따른 모음의 음향적 특성 (Acoustic characteristics of Korean vowels on pitch alteration utterance)

  • 조창수;홍광석
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.2439-2442
    • /
    • 2003
  • In this paper, we examine the acoustic characteristics of Korean vowels on pitch alteration utterance. The prosody is known as an indicator of acoustic characteristics of emotions. Also, speech is acoustically differenced according to the emotional variation and environmental variation, although speaker utters the same speech. We analyzed the spectral envelopes and formants from the voiced regions as data points on the speech waveform.

  • PDF

Performance Evaluation of Novel AMDF-Based Pitch Detection Scheme

  • Kumar, Sandeep
    • ETRI Journal
    • /
    • 제38권3호
    • /
    • pp.425-434
    • /
    • 2016
  • A novel average magnitude difference function (AMDF)-based pitch detection scheme (PDS) is proposed to achieve better performance in speech quality. A performance evaluation of the proposed PDS is carried out through both a simulation and a real-time implementation of a speech analysis-synthesis system. The parameters used to compare the performance of the proposed PDS with that of PDSs that are based on either a cepstrum, an autocorrelation function (ACF), an AMDF, or circular AMDF (CAMDF) methods are as follows: percentage gross pitch error (%GPE); a subjective listening test; an objective speech quality assessment; a speech intelligibility test; a synthesized speech waveform; computation time; and memory consumption. The proposed PDS results in lower %GPE and better synthesized speech quality and intelligibility for different speech signals as compared to the cepstrum-, ACF-, AMDF-, and CAMDF-based PDSs. The computational time of the proposed PDS is also less than that for the cepstrum-, ACF-, and CAMDF-based PDSs. Moreover, the total memory consumed by the proposed PDS is less than that for the ACF- and cepstrum-based PDSs.

한국인 표준 음성 DB 구축(II) (Developing a Korean standard speech DB (II))

  • 신지영;김경화
    • 말소리와 음성과학
    • /
    • 제9권2호
    • /
    • pp.9-22
    • /
    • 2017
  • The purpose of this paper is to report the whole process of developing Korean Standard Speech Database (KSS DB). This project is supported by SPO (Supreme Prosecutors' Office) research grant for three years from 2014 to 2016. KSS DB is designed to provide speech data for acoustic-phonetic and phonological studies and speaker recognition system. For the samples to represent the spoken Korean, sociolinguistic factors, such as region (9 regional dialects), age (5 age groups over 20) and gender (male and female) were considered. The goal of the project is to collect over 3,000 male and female speakers of nine regional dialects and five age groups employing direct and indirect methods. Speech samples of 3,191 speakers (2,829 speakers and 362 speakers using direct and indirect methods, respectively) are collected and databased. KSS DB designs to collect read and spontaneous speech samples from each speaker carrying out 5 speech tasks: three (pseudo-)spontaneous speech tasks (producing prolonged simple vowels, 28 blanked sentences and spontaneous talk) and two read speech tasks (reading 55 phonetically and phonologically rich sentences and reading three short passages). KSS DB includes a 16-bit, 44.1kHz speech waveform file and a orthographic file for each speech task.

피치 검출을 위한 스펙트럼 평탄화 기법 (Flattening Techniques for Pitch Detection)

  • 김종국;조왕래;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 하계종합학술대회 논문집(4)
    • /
    • pp.381-384
    • /
    • 2002
  • In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구 (A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting)

  • 김종국;조왕래;배명진
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.85-95
    • /
    • 2003
  • In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

Low Bit Rate을 고려한 LMS-MPC 방식에 관한 연구 (A Study on LMS-MPC Method Considering Low Bit Rate)

  • 이시우
    • 디지털융복합연구
    • /
    • 제10권5호
    • /
    • pp.233-238
    • /
    • 2012
  • 유성음원과 무성음원을 시용하는 음성부호화 방식에 있어서, 같은 프레임 안에 모음과 무성자음이 있는 경우에 음성 파형에 일그러짐이 나타난다. 이것을 해결하기 위하여 본 논문에서는 개별피치와 LMS(Least Mean Square)를 적용한 LMS-MPC를 제시하였으며, 기존의 MPC와 LMS-MPC의 SNRseg를 평가한 결과, LMS-MPC의 남자음성에서 1.5dB, 여자음성에서 1.3dB 개선된 것을 확인할 수 있었다. 결국, MPC에 비해 LMS-MPC의 SNRseg가 개선되어 음성파형의 일그러짐을 제어할 수 있었으며, 본 방법은 셀룰러폰이나 스마트폰과 같이 Low Bit Rate의 음원을 사용하여 음성신호를 부호화 하는 방식에 활용할 수 있을 것으로 기대된다.

정상 성인에서 청성유발 피부전위 (Auditory Evoked Skin Potential in Normal Subjects)

  • 허승덕;정동근;서덕준;김광년;김기련;강명구;김리석
    • 음성과학
    • /
    • 제12권2호
    • /
    • pp.81-88
    • /
    • 2005
  • Electrodermal activity(EDA) is a bio-electric signal which occurs at the skin surface during the sweating. EDA reflects the activity of the sympathetic axis of the autonomic nervous system. EDA is associated with the eccrine sweat gland at the palmar and plamar surface. This study was aimed to characterize the relationship between EDA and auditory stimulus intensities. Acoustic stimulus used in this study were 500 Hz, 1 kHz, 2 kHz of narrow band noise, which were representative of speech frequencies in audible range. Stimulus intensity between 90 and 30 dB in 10 dB within dynamic range. After deriving the minimum stimulus intensity(threshold of skin potential) which elicited skin potential, and then the latency and amplitude were derived from waveform of skin potential, each latency and amplitude were compared to stimulus intensity. The waveform of skin potential were recorded stably, and the threshold of skin potential appeared nearly the hearing threshold level of the participant. The latency was decreased and the amplitude was increased according to the increase of the stimulus intensity. These results suggest that auditory evoked skin potential can be applicable to auditory assessment and audiological diagnosis tool.

  • PDF

CSpeech(Version 3.1)

  • Sik, Choe-Hong
    • 대한음성언어의학회:학술대회논문집
    • /
    • 대한음성언어의학회 1995년도 제4회 학술대회 심포지움 및 워크샵
    • /
    • pp.141-153
    • /
    • 1995
  • CSpeech is a software package that implements an audio waveform/speech analysis workstation on an IBM Personal Computer or hardware compatible computer. Features include digitizing audio waveforms on single or multiple channels, displaying the digitized waveforms, playing back audio waveforms from selected intervals of sing1e channels, saving and retrieving waveforms from binary format disk files, and analysing audio waveforms for their temporal and spectral properties. The distinguishing characteristics of CSpeech are its support for multiple channels, minimal restrictions on sample rate and waveform duration support fur a variety of hardware configurations, fast graphics display, and its user- extensible menu- based command structure.

  • PDF