• Title/Summary/Keyword: 음성구간검출

Search Result 158, Processing Time 0.037 seconds

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition (이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출)

  • Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.141-148
    • /
    • 2010
  • In this paper, voice activity detection (VAD) for dual-channel noisy speech recognition is proposed in which spatial cues are employed. In the proposed method, a probability model for speech presence/absence is constructed using spatial cues obtained from dual-channel input signal, and a speech activity interval is detected through this probability model. In particular, spatial cues are composed of interaural time differences and interaural level differences of dual-channel speech signals, and the probability model for speech presence/absence is based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed for speech segments that only include speech intervals detected by the proposed VAD method. The performance of the proposed method is compared with those of several methods such as an SNR-based method, a direction of arrival (DOA) based method, and a phase vector based method. It is shown from the speech recognition experiments that the proposed method outperforms conventional methods by providing relative word error rates reductions of 11.68%, 41.92%, and 10.15% compared with SNR-based, DOA-based, and phase vector based method, respectively.

  • PDF

On an Improving Performance of Low Bit-Rate Speech Coder (저전송율 보코더의 성능개선에 관한 연구)

  • 박영호;홍성훈;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.7
    • /
    • pp.101-107
    • /
    • 1998
  • 본 논문에서는 잔차신호를 모델링하기 위해 사용되는 동적희박대수코드북에 대해 분석하고 성능이 향상된 새로운 대수코드북 구조 및 검색과정을 제안하였다. 제안된 알고리 즘은 대수 코드북의 단점을 계산량의 증가 없이 개선시켰다. 먼저 기존에 단순히 부호비트 만을 검색하는 것에 대해 다양한 펄스 진폭의 선택을 가능하게 하였다. 그리고 동일 트랙상 에서 두 펄스를 선택하게 하였으며 추가 계산량이 필요없는 무성음에서 유성음으로의 천이 구간 검출기를 이용하여 LSF 보간 시 발생하는 천이구간에서의 LP지연을 최소화하였다. 제 안된 알고리즘을 이용한 5.6kbps음성부호화기는 전화선상의 음질을 시료로 하여 주관적 음 질면에서 6.3kbps MP-MLQ와 동등하였으며 MNRU Q=15dB에서는 MP-MLQ에 비해 약간 의 음질열하가 발생하였다.

  • PDF

Evaluation of a Rapid Diagnostic Antigen Test Kit Ribotest Mycoplasma® for the Detection of Mycoplasma pneumoniae (Mycoplasma pneumoniae 감염의 신속 항원 검사 키트 "Ribotest Mycoplasma®"의 진단적 평가)

  • Yang, Song I;Han, Mi Seon;Kim, Sun Jung;Lee, Seong Yeon;Choi, Eun Hwa
    • Pediatric Infection and Vaccine
    • /
    • v.26 no.2
    • /
    • pp.81-88
    • /
    • 2019
  • Purpose: Early detection of Mycoplasma pneumoniae is important for appropriate antimicrobial therapy in children with pneumonia. This study aimed to evaluate the diagnostic value of a rapid antigen test kit in detecting M. pneumoniae from respiratory specimens in children with lower respiratory tract infection (LRTI). Methods: A total of 215 nasopharyngeal aspirates (NPAs) were selected from a pool of NPAs that had been obtained from children admitted for LRTI from August 2010 to August 2018. The specimens had been tested for M. pneumoniae by culture and stored at $-70^{\circ}C$ until use. Tests with Ribotest $Mycoplasma^{(R)}$ were performed and interpreted independently by two investigators who were blinded to the culture results. Results: Among the 215 NPAs, 119 were culture positive for M. pneumoniae and 96 were culture negative. Of the culture-positive specimens, 74 (62.2%) were positive for M. pneumoniae by Ribotest $Mycoplasma^{(R)}$, and 92 of the 96 (95.8%) culture-negative specimens were negative for M. pneumoniae by Ribotest $Mycoplasma^{(R)}$. When culture was used as the standard test, the sensitivity and specificity of Ribotest $Mycoplasma^{(R)}$ were 62.2% and 95.8%, respectively. Additionally, the positive predictive value, negative predictive value, and overall agreement rates with Ribotest $Mycoplasma^{(R)}$ were 94.9%, 67.2%, and 77.2%, respectively. Conclusions: A positive test result of Ribotest $Mycoplasma^{(R)}$ suggests a high likelihood of culture-positive M. pneumoniae infection. However, a negative test result should be interpreted with caution because nearly one-third of negative test results reveal culture-positive M. pneumoniae infections.

Target signal detection using MUSIC spectrum in noise environments (MUSIC 스펙트럼을 이용한 잡음환경에서의 목표 신호 구간 검출)

  • Park, Sang-Jun;Jeong, Sang-Bae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.103-110
    • /
    • 2012
  • In this paper, a target signal detection method using multiple signal classification (MUSIC) algorithm is proposed. The MUSIC algorithm is a subspace-based direction of arrival (DOA) estimation method. Using the inverse of the eigenvalue-weighted eigen spectra, the algorithm detects the DOAs of multiple sources. To apply the algorithm in target signal detection for GSC-based beamforming, we utilize its spectral response for the DOA of the target source in noisy conditions. The performance of the proposed target signal detection method is compared with those of the normalized cross-correlation (NCC), the fixed beamforming, and the power ratio method. Experimental results show that the proposed algorithm significantly outperforms the conventional ones in receiver operating characteristics (ROC) curves.

Performance of music section detection in broadcast drama contents using independent component analysis and deep neural networks (ICA와 DNN을 이용한 방송 드라마 콘텐츠에서 음악구간 검출 성능)

  • Heo, Woon-Haeng;Jang, Byeong-Yong;Jo, Hyeon-Ho;Kim, Jung-Hyun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.19-29
    • /
    • 2018
  • We propose to use independent component analysis (ICA) and deep neural network (DNN) to detect music sections in broadcast drama contents. Drama contents mainly comprise silence, noise, speech, music, and mixed (speech+music) sections. The silence section is detected by signal activity detection. To detect the music section, we train noise, speech, music, and mixed models with DNN. In computer experiments, we used the MUSAN corpus for training the acoustic model, and conducted an experiment using 3 hours' worth of Korean drama contents. As the mixed section includes music signals, it was regarded as a music section. The segmentation error rate (SER) of music section detection was observed to be 19.0%. In addition, when stereo mixed signals were separated into music signals using ICA, the SER was reduced to 11.8%.

An Integrated Acoustic Echo and Noise Cancellation System for Hands-Free Telephony (핸즈프리 전화통신을 위하여 통합된 음향 반향 및 잡음 제거 시스템)

  • 박선준;조점군;이충용;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.6B
    • /
    • pp.760-766
    • /
    • 2001
  • 본 논문에서는 차량내 핸즈프리 전화통신을 위한 음향 반향 및 배경 잡음 제거기를 제안한다. 제안한 시스템은 새로운 잔여 반향 제거 기법과 실시간 구현에 적합한 동시통화 검출기를 포함한다. 잔여 반향 제거에서는 근단화자가 없는 구간에 대하여 선형 예측기를 이용하여 잔여 반향 신호의 인접 샘플간의 상관도를 제거하여 잡음 제거기의 입력으로 사용한다. 잔여 반향 신호의 음성특성을 제거함으로써 잡음 제거기를 이용하여 배경 잡음과 더불어 잔여 반향의 전력을 효과적으로 줄일 수 있다. 제안된 시스템에서는 상용 저전송률 음성부호화기와의 결합을 고려하여 IS-127(EVRC)에 포함되어 있는 잡음 제거기를 사용하였다. 90 km/h로 정속 주행하는 차내의 핸즈프리 환경에서 제안된 시스템은 30 dB이상의 간섭신호 제거 성능을 보였다. 제안된 시스템은 16비트 고정 소수점 연산을 하는 저가의 DSP를 이용하여 실시간 구현되었다.

  • PDF

An Active Region Detection Method for The Speech Playback-speed Control (음성재생 속도 제어를 위한 활성화 영역 검출방법)

  • Yoo, Deok-Hyeon;Kim, Dong-Hyeok;Jeon, Joon-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.98-105
    • /
    • 2012
  • This paper describes a new method for a speech playback speed control with high quality. The proposed method provides an adaptive threshold filtering solution for detecting active regions of a speech signal that are followed by playback speed. For a given playback speed, threshold value is adaptively determined with the statistics(:mean and standard deviation) of each frame in speech, and is used to select only active blocks within the current frame. To minimize quality degradation(i.e., pitch degradation) caused due to high-speed playback, the threshold filtering priorly eliminates relatively low-activity blocks including voice and unvoice. Simulation results show that the proposed scheme provides a playback speed control solution with higher quality than SOLA(Synchonized OverLap Add) method using the pitch extraction of speech.

Subband Based Spectrum Subtraction Algorithm (서브밴드에 기반한 스펙트럼 차감 알고리즘)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.4
    • /
    • pp.555-560
    • /
    • 2013
  • This paper first proposes a classification algorithm which detects a voiced, unvoiced, and silence signal using distance measure, logarithm power and root mean square methods at each frame, then a spectrum subtraction algorithm based on a subband filter. The proposed algorithm subtracts spectrums of white noise and street noise from noisy signal based on the subband filter at each frame. In this experiment, experimental results of the proposed spectrum subtraction algorithm demonstrate using the speech and noise data of Aurora-2 database. Based on measuring the speech-to-noise ratio (SNR), experiments confirm that the proposed algorithm is effective for the speech by contaminated the noise. From the experiments, the improvement in the output SNR values was approximately 2.1 dB and 1.91 dB better for white noise and street noise, respectively.

Design of A Speech Recognition System using Hidden Markov Models (은닉 마코프 모델을 이용한 음성 인식 시스템 설계)

  • Lee, Chul-Won;Lim, In-Chil
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.1
    • /
    • pp.108-115
    • /
    • 1996
  • This paper proposes an algorithm and a model topology for the connected speech recognition using Discrete Hidden Markov Models. A proposed model uses diphone and triphone model which consider the recognition rate and recognisable vocabulary. Considering more exact inter- phoneme segmentation and execution speed of algorithm, 4 states have to exist in diphone model where the first state and the last state are keeping a steady state, the other states hold a transient state. 7 states have to exist in triphone model where 7 states are specified and improved to 3 steady states and 4 transition states. Also, the proposed speech recognition algorithm is designed to detect the inter-phoneme segmentation during the recognition processing.

  • PDF

An Automatic Method of Detecting Audio Signal Tampering in Forensic Phonetics (법음성학에서의 오디오 신호의 위변조 구간 자동 검출 방법 연구)

  • Yang, Il-Ho;Kim, Kyung-Wha;Kim, Myung-Jae;Baek, Rock-Seon;Heo, Hee-Soo;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.21-28
    • /
    • 2014
  • We propose a novel scheme for digital audio authentication of given audio files which are edited by inserting small audio segments from different environmental sources. The purpose of this research is to detect inserted sections from given audio files. We expect that the proposed method will assist human investigators by notifying suspected audio section which considered to be recorded or transmitted on different environments. GMM-UBM and GSV-SVM are applied for modeling the dominant environment of a given audio file. Four kinds of likelihood ratio based scores and SVM score are used to measure the likelihood for a dominant environment model. We also use an ensemble score which is a combination of the aforementioned five kinds of scores. In the experimental results, the proposed method shows the lowest average equal error rate when we use the ensemble score. Even when dominant environments were unknown, the proposed method gives a similar accuracy.