• 제목/요약/키워드: Speech detection

검색결과 469건 처리시간 0.025초

Otsu 방법을 이용한 음성 종결점 탐색 알고리즘 (Otsu's method for speech endpoint detection)

  • 고유;장한;정길도
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2009년도 정보 및 제어 심포지움 논문집
    • /
    • pp.40-42
    • /
    • 2009
  • This paper presents an algorithm, which is based on Otsu's method, for accurate and robust endpoint detection for speech recognition under noisy environments. The features are extracted in time domain, and then an optimal threshold is selected by minimizing the discriminant criterion, so as to maximize the separability of the speech part and environment part. The simulation results show that the method play a good performance in detection accuracy.

  • PDF

A Study on Pitch Period Detection Algorithm Based on Rotation Transform of AMDF and Threshold

  • 서현수;김남호
    • 융합신호처리학회논문지
    • /
    • 제7권4호
    • /
    • pp.178-183
    • /
    • 2006
  • As a lot of researches on the speech signal processing are performed due to the recent rapid development of the information-communication technology. the pitch period is used as an important element to various speech signal application fields such as the speech recognition. speaker identification. speech analysis. or speech synthesis. A variety of algorithms for the time and the frequency domains related with such pitch period detection have been suggested. One of the pitch detection algorithms for the time domain. AMDF (average magnitude difference function) uses distance between two valley points as the calculated pitch period. However, it has a problem that the algorithm becomes complex in selecting the valley points for the pitch period detection. Therefore, in this paper we proposed the modified AMDF(M-AMDF) algorithm which recognizes the entire minimum valley points as the pitch period of the speech signal by using the rotation transform of AMDF. In addition, a threshold is set to the beginning portion of speech so that it can be used as the selection criteria for the pitch period. Moreover the proposed algorithm is compared with the conventional ones by means of the simulation, and presents better properties than others.

  • PDF

2-채널 (음성 및 EGG) 신호 분석에 의한 피치검출 (Pitch Detection by the Analysis of Speech and EGG Signals)

  • 신무용;김정철;배건성
    • 한국음향학회지
    • /
    • 제15권5호
    • /
    • pp.5-12
    • /
    • 1996
  • 본 연구에서는 음성 및 EGG 신호를 2-채널 피치검출 알고리즘을 제안하였다. 성대의 떨림에 관한 정확한 정보를 얻을 수 있는 EGG 신호를 이용함으로써 음성신호로부터 피치를 검출하고자 할 때 수반되는 문제점들을 최소화 할 수 있으며, EGG 신호의 왜곡 및 불규칙한 변화는 음성신호의 분석을 통해 보완해 줌으로써 시간영역에서 음성신호에 동기된 정확한 피치 marker를 검출하였다. 2-채널 피치검출 알고리즘은 음성신호만을 이용한 일반적인 피치검출 알고리즘에 비해 보다 정확하고 개선된 피치궤적을 얻을 수 있음을 실험을 통해 보였으며, 따라서 새로이 개발되는 피치검출 알고리즘의 객관적인 비교 및 평가에 이용될 수 있다.

  • PDF

통계적 기법을 이용한 화자변화 검출 실험 (A Speaker Change Detection Experiment that Uses a Statistical Method)

  • 이경록;김진영
    • 음성과학
    • /
    • 제8권4호
    • /
    • pp.59-72
    • /
    • 2001
  • In this paper, we experimented with speaker change detection that uses a statistical method for NOD (News On Demand) service. A specified speaker's change can find out content of each data in speech if analysed because it means change of data contents in news data. Speaker change detection acts as preprocessor that divide input speech by speaker. This is an important preprocessor phase for speaker tracking. We detected speaker change using GLR(generalized likelihood ratio) distance base division and BIC (Bayesian information criterion) base division among matrix method. An experiment verified speaker change point using BIC base division after divide by speaker unit using GLR distance base method first. In the experimental result, FAR (False Alarm Rate) was 63.29 in high noise environment and FAR was 54.28 in low noise environment in MDR (Missed Detection Rate) 15% neighborhood.

  • PDF

음성인식기를 이용한 발음오류 자동분류 결과 분석 (Performance Analysis of Automatic Mispronunciation Detection Using Speech Recognizer)

  • 강효원;이상필;배민영;이재강;권철홍
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.29-32
    • /
    • 2003
  • This paper proposes an automatic pronunciation correction system which provides users with correction guidelines for each pronunciation error. For this purpose, we develop an HMM speech recognizer which automatically classifies pronunciation errors when Korean speaks foreign language. And, we collect speech database of native and nonnative speakers using phonetically balanced word lists. We perform analysis of mispronunciation types from the experiment of automatic mispronunciation detection using speech recognizer.

  • PDF

Voice Activity Detection with Run-Ratio Parameter Derived from Runs Test Statistic

  • Oh, Kwang-Cheol
    • 음성과학
    • /
    • 제10권1호
    • /
    • pp.95-105
    • /
    • 2003
  • This paper describes a new parameter for voice activity detection which serves as a front-end part for automatic speech recognition systems. The new parameter called run-ratio is derived from the runs test statistic which is used in the statistical test for randomness of a given sequence. The run-ratio parameter has the property that the values of the parameter for the random sequence are about 1. To apply the run-ratio parameter into the voice activity detection method, it is assumed that the samples of an inputted audio signal should be converted to binary sequences of positive and negative values. Then, the silence region in the audio signal can be regarded as random sequences so that their values of the run-ratio would be about 1. The run-ratio for the voiced region has far lower values than 1 and for fricative sounds higher values than 1. Therefore, the parameter can discriminate speech signals from the background sounds by using the newly derived run-ratio parameter. The proposed voice activity detector outperformed the conventional energy-based detector in the sense of error mean and variance, small deviation from true speech boundaries, and low chance of missing real utterances

  • PDF

음성 신호에서의 시간-주파수 축 충격 잡음 검출 시스템 (Time-Frequency Domain Impulsive Noise Detection System in Speech Signal)

  • 최민석;신호선;황영수;강홍구
    • 한국음향학회지
    • /
    • 제30권2호
    • /
    • pp.73-79
    • /
    • 2011
  • 본 논문에서는 음성 신호를 녹음하는 과정에서 발생하는 충격 잡음의 위치를 검출하는 새로운 알고리즘을 제안하였다. 제안한 방법은 충격 잡음의 주파수 축 특성을 반영하여 기존의 방법에 비해 높은 검출 정확도를 가지면서 음성의 피치를 충격 잡음과 구분하지 못하는 문제를 해결하였다. 또한, 시간 축, 주파수 축 파라미터의 단점을 상호 보완하여 false-alarm 문제를 최소화하는 시간-주파수 축 충격 잡음 검출 시스템을 제안하였다. 실제 녹음된 충격 잡음을 이용한 실험 결과, 제안한 시간-주파수 축 충격 잡음 검출기는 99.33 %의 가장 높은 검출 정확도와 1.49 %의 가장 낮은 false-alarm 비율을 나타내었다.

음성 주파수 분포 분석을 통한 편집 의심 지점 검출 방법 (A Speech Waveform Forgery Detection Algorithm Based on Frequency Distribution Analysis)

  • 허희수;소병민;양일호;유하진
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.35-40
    • /
    • 2015
  • We propose a speech waveform forgery detection algorithm based on the flatness of frequency distribution. We devise a new measure of flatness which emphasizes the local change of the frequency distribution. Our measure calculates the sum of the differences between the energies of neighboring frequency bands. We compare the proposed measure with conventional flatness measures using a set of a large amount of test sounds. We also compare- the proposed method with conventional detection algorithms based on spectral distances. The results show that the proposed method gives lower equal error rate for the test set compared to the conventional methods.

A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제16권4호
    • /
    • pp.276-280
    • /
    • 2016
  • A fixed rate speech coder based on the filter bank and the non-uniform sampling technique is proposed. The non-uniform sampling is achieved by the detection of inflection points (IPs). A speech block is band passed by the filter bank, and the subband signals are processed by the IP detector, and the detected IP patterns are compared with entries of the IP database. For each subband signal, the address of the closest member of the database and the energy of the IP pattern are transmitted through channel. In the receiver, the decoder recovers the subband signals using the received addresses and the energy information, and reconstructs the speech via the filter bank summation. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is confirmed. The signal-to-noise ratio (SNR) performance of the proposed method is comparable to that of the uniform sampled pulse code modulation (PCM) below 20 kbps data rate.

ICA와 DNN을 이용한 방송 드라마 콘텐츠에서 음악구간 검출 성능 (Performance of music section detection in broadcast drama contents using independent component analysis and deep neural networks)

  • 허운행;장병용;조현호;김정현;권오욱
    • 말소리와 음성과학
    • /
    • 제10권3호
    • /
    • pp.19-29
    • /
    • 2018
  • We propose to use independent component analysis (ICA) and deep neural network (DNN) to detect music sections in broadcast drama contents. Drama contents mainly comprise silence, noise, speech, music, and mixed (speech+music) sections. The silence section is detected by signal activity detection. To detect the music section, we train noise, speech, music, and mixed models with DNN. In computer experiments, we used the MUSAN corpus for training the acoustic model, and conducted an experiment using 3 hours' worth of Korean drama contents. As the mixed section includes music signals, it was regarded as a music section. The segmentation error rate (SER) of music section detection was observed to be 19.0%. In addition, when stereo mixed signals were separated into music signals using ICA, the SER was reduced to 11.8%.