• 제목/요약/키워드: Speech signal processing

검색결과 331건 처리시간 0.026초

A User-friendly Remote Speech Input Method in Spontaneous Speech Recognition System

  • Suh, Young-Joo;Park, Jun;Lee, Young-Jik
    • The Journal of the Acoustical Society of Korea
    • /
    • 제17권2E호
    • /
    • pp.38-46
    • /
    • 1998
  • In this paper, we propose a remote speech input device, a new method of user-friendly speech input in spontaneous speech recognition system. We focus the user friendliness on hands-free and microphone independence in speech recognition applications. Our method adopts two algorithms, the automatic speech detection and the microphone array delay-and-sum beamforming (DSBF)-based speech enhancement. The automatic speech detection algorithm is composed of two stages; the detection of speech and nonspeech using the pitch information for the detected speech portion candidate. The DSBF algorithm adopts the time domain cross-correlation method as its time delay estimation. In the performance evaluation, the speech detection algorithm shows within-200 ms start point accuracy of 93%, 99% under 15dB, 20dB, and 25dB signal-to-noise ratio (SNR) environments, respectively and those for the end point are 72%, 89%, and 93% for the corresponding environments, respectively. The classification of speech and nonspeech for the start point detected region of input signal is performed by the pitch information-base method. The percentages of correct classification for speech and nonspeech input are 99% and 90%, respectively. The eight microphone array-based speech enhancement using the DSBF algorithm shows the maximum SNR gaing of 6dB over a single microphone and the error reductin of more than 15% in the spontaneous speech recognition domain.

  • PDF

웨이브렛 변환을 이용한 음성신호의 성문폐쇄시점 검출 (Detection of Glottal Closure Instant for Voiced Speech Using Wavelet Transform)

  • 배건성
    • 음성과학
    • /
    • 제7권3호
    • /
    • pp.153-165
    • /
    • 2000
  • During the phonation of voiced sounds, instants exist where the glottis is opened or closed, due to the periodic vibration of the vocal cord. When closed, this is called the glottal closure instant(GCI) or epoch.. The correct detection of the GCI is one of the important problems in speech processing for pitch detection, pitch synchronous analysis, and so on. Recently, it has been shown that the local maxima points of the wavelet transformed speech signal correspond to the GCIs of speech signal. In this paper, we investigate the accuracy of Gels estimated from this wavelet transformed speech signal. For this purpose we compare them with the negative peak points of the differentiated EGG signal that represents the actual GCIs of speech signal.

  • PDF

음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구 (A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting)

  • 김종국;조왕래;배명진
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.85-95
    • /
    • 2003
  • In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

라플라시안 피라미드 프로세싱과 백터 양자화 방법을 이용한 영상 데이타 압축 (Image Data Compression Using Laplacian Pyramid Processing and Vector Quantization)

  • 박광훈;차일환;윤대희
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1987년도 전기.전자공학 학술대회 논문집(II)
    • /
    • pp.1347-1351
    • /
    • 1987
  • This thesis aims at studying laplacian pyramid vector quantization which keeps a simple compression algorithm and stability against various kinds of image data. To this end, images are devied into two groups according to their statistical characteristics. At 0.860 bits/pixel and 0.360 bits/pixel respectively, laplacian pyramid vector quantization is compared to the existing spatial domain vector quantization and transform coding under the same condition in both objective and subjective value. The laplacian pyramid vector quantization is much more stable against the statistical characteristics of images than the existing vector quantization and transform coding.

  • PDF

연속음 처리를 위한 프랙탈 차원 방법 고찰 (Fractal Dimension Method for Connected-digit Recognition)

  • 김태식
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.45-55
    • /
    • 2003
  • Strange attractor can be used as a presentation method for signal processing. Fractal dimension is well known method that extract features from attractor. Even though the method provides powerful capabilities for speech processing, there is drawback which should be solved in advance. Normally, the size of the raw signal should be long enough for processing if we use the fractal dimension method. However, in the area of connected-digits problem, normally, syllable or semi-syllable based processing is applied. In this case, there is no evidence that we have sufficient data or not to extract characteristics of attractor. This paper discusses the relationship between the size of the signal data and the calculation result of fractal dimension, and also discusses the efficient way to be applied to connected-digit recognition.

  • PDF

피치 검출을 위한 스펙트럼 평탄화 기법 (Flattening Techniques for Pitch Detection)

  • 김종국;조왕래;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 하계종합학술대회 논문집(4)
    • /
    • pp.381-384
    • /
    • 2002
  • In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

청각 장애자를 위한 시각 음성 처리 시스템에 관한 연구 (A study on the Visible Speech Processing System for the Hearing Impaired)

  • 김원기;김남현
    • 대한의용생체공학회:의공학회지
    • /
    • 제11권1호
    • /
    • pp.75-82
    • /
    • 1990
  • The purpose of this study is to help the hearing Impaired's speech training with a visible speech processing system. In brief, this system converts the features of speech signals into graphics on monitor, and adjusts the features of hearing impaired to normal ones. There are formant and pitch in the features used for this system. They are extracted using the digital signal processing such as linear predictive method or AMDF(Average Magnitude Difference Function). In order to effectively train for the hearing impaired's abnormal speech, easilly visible feature has been being studied.

  • PDF

청각 계통에서의 음성신호처리 (Speech signal processing in the auditory system)

  • 이재혁;심재성;백승화;박상희
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1987년도 한국자동제어학술회의논문집; 한국과학기술대학, 충남; 16-17 Oct. 1987
    • /
    • pp.680-683
    • /
    • 1987
  • The speech signal processing in the auditory system can be analysized based on two representations : Average discharge rate and Temporal discharge pattern. But the average discharge rate representation is restricted by the narrow dynamic range because of the rate saturation and the two tone suppression phenomena, and the temporal discharge pattern representation needs a sophisticate frequency analysis and synchrony measure. In this paper, a simple representation is proposed : using a model considering the interaction of Cochlear fluid-BM movement and a haircell model, the feature of speech signals (formant frequency and pitch of vowels) is easily estimated in the Average Synchronized Rate.

  • PDF

카오스 패턴 발견을 위한 음성 데이터의 처리 기법 (Speech Signal Processing for Analysis of Chaos Pattern)

  • 김태식
    • 음성과학
    • /
    • 제8권3호
    • /
    • pp.149-157
    • /
    • 2001
  • Based on the chaos theory, a new method of presentation of speech signal has been presented in this paper. This new method can be used for pattern matching such as speaker recognition. The expressions of attractors are represented very well by the logistic maps that show the chaos phenomena. In the speaker recognition field, a speaker's vocal habit could be a very important matching parameter. The attractor configuration using change value of speech signal can be utilized to analyze the influence of voice undulations at a point on the vocal loudness scale to the next point. The attractors arranged by the method could be used in research fields of speech recognition because the attractors also contain unique information for each speaker.

  • PDF

서브밴드 백색화 필터를 이용한 부공간 잡음 제거 (Subspace Speech Enhancement Using Subband Whitening Filter)

  • 김종욱;유창동
    • 한국음향학회지
    • /
    • 제22권3호
    • /
    • pp.169-174
    • /
    • 2003
  • 본 논문에서는 서브밴드 백색화 필터를 이용한 새로운 부공간 잡음제거 방법을 제안하였다. 기존의 부공간 접근방법에서는 백색 잡음을 가정하거나, 유색 잡음에 대한 전처리로서 백색화 필터를 사용하였다. 백색화 필터를 서브밴드로 나누어 처리함으로써, 제안된 방법은 잔여잡음을 줄이면서 신호 왜곡의 상한값을 최소화하도록 설계하였다. 또한 서브밴드 백색화 필터를 도입함으로써 부공간 잡음제거 방법에서 약점으로 지적되는 것 중의 하나인 Karhunen-Loeve(KL) 영역에서의 주파수 해상도를 높일 수 있었다. 실험결과에 의하면 제안된 방법은 Ephraim에 의해 제안된 방법 부공간 잡음 제거 방법이나, Boll에 의해 제안된 주파수 차감법에 비해 구분 신호대 잡음 비 (SNRseg: segmental signal-to-noise ratio), 음성의 인지적 성능 평가 (PESQ: perceptual evaluation of speech quality)를 고려하였을 때 향상된 성능을 보였다.