• Title/Summary/Keyword: Speech detection

Search Result 471, Processing Time 0.033 seconds

Statistical Voice Activity Detection Using Probabilistic Non-Negative Matrix Factorization (확률적 비음수 행렬 인수분해를 사용한 통계적 음성검출기법)

  • Kim, Dong Kook;Shin, Jong Won;Kwon, Kisoo;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.8
    • /
    • pp.851-858
    • /
    • 2016
  • This paper presents a new statistical voice activity detection (VAD) based on the probabilistic interpretation of nonnegative matrix factorization (NMF). The objective function of the NMF using Kullback-Leibler divergence coincides with the negative log likelihood function of the data if the distribution of the data given the basis and encoding matrices is modeled as Poisson distributions. Based on this probabilistic NMF, the VAD is constructed using the likelihood ratio test assuming that speech and noise follow Poisson distributions. Experimental results show that the proposed approach outperformed the conventional Gaussian model-based and NMF-based methods at 0-15 dB signal-to-noise ratio simulation conditions.

Double Talk Detection before the Convergence of Echo Canceller (반향제거기의 수렴전 동시통화검출)

  • Yoo, Jae-Ha;Kim, Soo-Chan;Kim, Dong-Yon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.5
    • /
    • pp.203-208
    • /
    • 2013
  • In this paper, we proposed a performance improvement method of the double talk detector which can operate before the echo canceller converges. Microphone input signal is filtered by the linear prediction filter and this filtered signal is used for detection. The coefficients of the linear prediction filter are given by the far-end talker signal. During single talk, filtered signal has low power since the characteristics of the echo signal is similar with those of the far-end talker signal. But, during double talk, the filtered signal does not have low power because the signal of different characteristics is included in the microphone signal. Double talk is detected by this difference. Simulations using real speech signals verified that the proposed method outperformed the conventional methods.

Blockchain Technology for Combating Deepfake and Protect Video/Image Integrity

  • Rashid, Md Mamunur;Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.8
    • /
    • pp.1044-1058
    • /
    • 2021
  • Tempered electronic contents have multiplied in last few years, thanks to the emergence of sophisticated artificial intelligence(AI) algorithms. Deepfakes (fake footage, photos, speech, and videos) can be a frightening and destructive phenomenon that has the capacity to distort the facts and hamper reputation by presenting a fake reality. Evidence of ownership or authentication of digital material is crucial for combating the fabricated content influx we are facing today. Current solutions lack the capacity to track digital media's history and provenance. Due to the rise of misrepresentation created by technologies like deepfake, detection algorithms are required to verify the integrity of digital content. Many real-world scenarios have been claimed to benefit from blockchain's authentication capabilities. Despite the scattered efforts surrounding such remedies, relatively little research has been undertaken to discover where blockchain technology can be used to tackle the deepfake problem. Latest blockchain based innovations such as Smart Contract, Hyperledger fabric can play a vital role against the manipulation of digital content. The goal of this paper is to summarize and discuss the ongoing researches related to blockchain's capabilities to protect digital content authentication. We have also suggested a blockchain (smart contract) dependent framework that can keep the data integrity of original content and thus prevent deepfake. This study also aims at discussing how blockchain technology can be used more effectively in deepfake prevention as well as highlight the current state of deepfake video detection research, including the generating process, various detection algorithms, and existing benchmarks.

Wiener filtering-based ambient noise reduction technique for improved acoustic target detection of directional frequency analysis and recording sonobuoy (Directional frequency analysis and recording 소노부이의 표적 탐지 성능 향상을 위한 위너필터링 기반 주변 소음 제거 기법)

  • Hong, Jungpyo;Bae, Inyeong;Seok, Jongwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.192-198
    • /
    • 2022
  • As an effective weapon system for anti-submarine warfare, DIrectional Frequency Analysis and Recording (DIFAR) sonobuoy detects underwater targets via beamforming with three channels composed of an omni-direcitonal and two directional channels. However, ambient noise degrades the detection performance of DIFAR sonobouy in specific direction (0°, 90°, 180°, 270°). Thus, an ambient noise redcution technique is proposed for performance improvement of acoustic target detection of DIFAR sonobuoy. The proposed method is based on OTA (Order Truncate Average), which is widely used in sonar signal processing area, for ambient noise estimation and Wiener filtering, which is widely used in speech signal processing area, for noise reduction. For evaluation, we compare mean square errors of target bearing estmation results of conventional and proposed methods and we confirmed that the proposed method is effective under 0 dB signal-to-noise ratio.

Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector (우도비 특징 벡터를 이용한 SVM 기반의 음성 검출기)

  • Jo, Q-Haing;Kang, Sang-Ki;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we apply a support vector machine(SVM) that incorporates an optimized nonlinear decision rule over different sets of feature vectors to improve the performance of statistical model-based voice activity detection(VAD). Conventional method performs VAD through setting up statistical models for each case of speech absence and presence assumption and comparing the geometric mean of the likelihood ratio (LR) for the individual frequency band extracted from input signal with the given threshold. We propose a novel VAD technique based on SVM by treating the LRs computed in each frequency bin as the elements of feature vector to minimize classification error probability instead of the conventional decision rule using geometric mean. As a result of experiments, the performance of SVM-based VAD using the proposed feature has shown better results compared with those of reported VADs in various noise environments.

A Study on the Automatic Howling Signal Detection Algorithm for Speech Sound Reinforcement (음성 확성을 위한 하울링 신호 자동 검출기법 연구)

  • Kim, Kyung-Taek;Kim, Dong-Gyu;Roh, Yong-Wan;Hong, Kwang-Seok
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2005.11a
    • /
    • pp.246-249
    • /
    • 2005
  • 음향 시스템에 있어서 하울링 현상은 음성 레벨을 제한함으로써 음성의 명료도를 저하시키는 주된 요인이다. 그리고 이를 해결하기 위한 방법으로 하울링 주파수 대역의 게인을 낮추어 음향신호의 피드백을 최소화 하는 것이 일반적이기 때문에 하울링 주파수를 찾아내는 것이 하울링 제어에 있어서 가장 핵심적인 요소가 된다. 그래서 본 논문에서는 하울링 주파수를 자동으로 검출할 수 있는 기법을 제시하였다. 이는 외부로부터 입력된 오디오신호가 하울링 신호 특성을 만족하는 정도를 ‘하울링 지수’라는 파라메터로 정의한 후 이를 기준으로 하울링 발생여부를 판단하고 하울링으로 판별된 신호의 최대 진폭을 갖는 주파수를 하울링 주파수로 출력하는 기법이다. 본 하울링 신호 자동 검출기법의 내용을 검증하기 위하여 하울링 자동 검출 프로그램을 제작하여 실험을 수행한 결과 전체 하울링 신호의 95% 이상을 검출할 수 있었다.

  • PDF

Speaker Adaptation Performance Evaluation in Keyword Spotting System (500단어급 핵심어 검출기에서 화자적응 성능 평가)

  • Seo Hyun-Chul;Lee Kyong-Rok;Kim Jin-Young;Choi Seung-Ho
    • MALSORI
    • /
    • no.43
    • /
    • pp.151-161
    • /
    • 2002
  • This study presents performance analysis results of speaker adaptation for keyword spotting system. In this paper, we implemented MLLR (Maximum Likelihood Linear Regression) method on our middle size vocabulary keyword spotting system. This system was developed for directory services of universities and colleges. The experimental results show that speaker adaptation reduces the false alarm rate to 1/3 with the preservation of the mis-detection ratio. This improvement is achieved when speaker adaptation is applied to not only keyword models but also non-keyword models.

  • PDF

A Study on the Fevelopment of Teal Time Speech Detection in PC (PC를 이용한 실시간 음성검출 알고리즘에 관한 연구)

  • Chung, Hoon;Chung, Kwon;Chung, Ik-joo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.129-132
    • /
    • 1994
  • 본 논문에서는 윈도우즈용 음성인식 software "voice access"를 개발하여 연구한 실시간 음성검출 알고리즘에 관해 소개한다. 이 음성검출 알고리즘은 200 sample 단위의 프레임 에너지, 프레임 영교차율, 음성의 길이를 음성검출의 파라메타로 사용한다. 각 파라메타의 문턱값은 신호의 평균값, 잡음의 표준편차, 미디안 표준편차와 한국어의 음성적 특성을 고려하여 설정하였으며 주변의 환경에 적응해 가며 문턱값을 조정하므로 주변 잡음환경의 변화에 대해서도 강인한 음성검출 결과를 보여준다. 또한 실시간으로 음성을 검출하므로 실용성이 높다. 음성의 검출은 일반사운드 카드를 통해 16-bit의 8KHz로 샘플링된 신호를 사용한다. 음성검출을 위한 분석은 200 sample 씩 하고 100 sample 씩 overlap 하면서 수행한다. 음성검출을 위한 모든 분석은 특별한 DSP의 도움없이 486D 이상에서 실시간으로 구현했다.시간으로 구현했다.

  • PDF

A Study on Speech Recognition in a running automobile (주행중인 자동차 환경에서의 음성인식 연구)

  • 유봉근
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.47-50
    • /
    • 1998
  • 본 논문은 자동차의 편의성 및 안전성의 동시 확보를 위하여, 보조적 스위치의 조작없이 상시 음성의 입,출력이 가능하도록 하며, band pass filter를 이용하여 잡음환경에서 자동으로 정확하게 음성구간 검출(End Point Detection)을 하게 하였다. Reference Pattern은 Dynamic Multi-Section(DMS)[1] 모델을 사용하였고 차량의 속도에 따라 자동으로 잡음환경에 강인한 모델을 선택하도록 하였으며, 음성의 특징 파라미터와 인식 알고리즘은 Perceptual Linear Predictive(PLP) 13차와 One Stage Dynamic Programming(OSDP)를 사용하였다. 주행중인 자동차 환경(30~70km/h)에서 자주 사용되는 차량제어 명령 33개에 대하여 화자독립 92.98%, 화자종속 94.44% 인식율을 구하였다. 또한 주행중인 차량에서 카폰, 핸드폰 사용으로 인한 사고를 줄이기 위하여 음성으로 전화를 걸 수 있도록 하는 Voice Dialing 기능도 구현하였다.

  • PDF

Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter (HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상)

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • MALSORI
    • /
    • no.66
    • /
    • pp.61-72
    • /
    • 2008
  • This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • PDF