• Title/Summary/Keyword: 비음성 탐지

Search Result 17, Processing Time 0.032 seconds

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises (비정체성 잡음을 위한 SPD-TE 기반 계수형 음성 활동 탐지)

  • Koo, Boneung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.310-315
    • /
    • 2015
  • A single channel VAD (Voice Activity Detection) algorithm for nonstationary noise environment is proposed in this paper. Threshold values of the feature parameter for VAD decision are updated adaptively based on estimates of means and standard deviations of past non-speech frames. The feature parameter, SPD-TE (Spectral Power Difference-Teager Energy), is obtained by applying the Teager energy to the WPD (Wavelet Packet Decomposition) coefficients. It was reported previously that the SPD-TE is robust to noise as a feature for VAD. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that decision accuracy of the proposed algorithm is comparable to several typical VAD algorithms including standards for SNR values ranging from 10 to -10 dB.

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy (웨이블렛 패킷 변환과 Teager 에너지를 이용한 잡음 환경에서의 단일 채널 음성 판별)

  • Koo, Boneung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.2
    • /
    • pp.139-145
    • /
    • 2014
  • In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD's for SNR values of ranging from 10 to -10 dB.

A Study on a Non-Voice Section Detection Model among Speech Signals using CNN Algorithm (CNN(Convolutional Neural Network) 알고리즘을 활용한 음성신호 중 비음성 구간 탐지 모델 연구)

  • Lee, Hoo-Young
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.33-39
    • /
    • 2021
  • Speech recognition technology is being combined with deep learning and is developing at a rapid pace. In particular, voice recognition services are connected to various devices such as artificial intelligence speakers, vehicle voice recognition, and smartphones, and voice recognition technology is being used in various places, not in specific areas of the industry. In this situation, research to meet high expectations for the technology is also being actively conducted. Among them, in the field of natural language processing (NLP), there is a need for research in the field of removing ambient noise or unnecessary voice signals that have a great influence on the speech recognition recognition rate. Many domestic and foreign companies are already using the latest AI technology for such research. Among them, research using a convolutional neural network algorithm (CNN) is being actively conducted. The purpose of this study is to determine the non-voice section from the user's speech section through the convolutional neural network. It collects the voice files (wav) of 5 speakers to generate learning data, and utilizes the convolutional neural network to determine the speech section and the non-voice section. A classification model for discriminating speech sections was created. Afterwards, an experiment was conducted to detect the non-speech section through the generated model, and as a result, an accuracy of 94% was obtained.

A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment (모바일 환경에서의 시각 음성인식을 위한 눈 정위 기반 입술 탐지에 대한 연구)

  • Gyu, Song-Min;Pham, Thanh Trung;Kim, Jin-Young;Taek, Hwang-Sung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.478-484
    • /
    • 2009
  • Automatic speech recognition(ASR) is attractive technique in trend these day that seek convenient life. Although many approaches have been proposed for ASR but the performance is still not good in noisy environment. Now-a-days in the state of art in speech recognition, ASR uses not only the audio information but also the visual information. In this paper, We present a novel lip detection method for visual speech recognition in mobile environment. In order to apply visual information to speech recognition, we need to extract exact lip regions. Because eye-detection is more easy than lip-detection, we firstly detect positions of left and right eyes, then locate lip region roughly. After that we apply K-means clustering technique to devide that region into groups, than two lip corners and lip center are detected by choosing biggest one among clustered groups. Finally, we have shown the effectiveness of the proposed method through the experiments based on samsung AVSR database.

The Design and Implementation of Autoencoder-Based FTAE for Real-Time Audio Monitoring (실시간 음성 모니터링을 위한 오토인코더 기반 FTAE 설계 및 구현)

  • Jin-Hwan Yang;Hyuk-Soon Choi;Jeong-hyeon park;Sung-Sik Kim;Nammee Moon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.741-744
    • /
    • 2024
  • 본 연구에서는 음성 전처리 기법인 푸리에 변환의 높은 시간 복잡도로 인해 많은 계산 자원을 요구한다는 단점을 보완하기 위한 FTAE(Fourier Transform Auto Encoder)를 설계하고 구현한다. FTAE는 음성 데이터를 입력으로 받아 Early Fusion 특징맵을 출력하도록 설계된 오토인코더 기반 신경망이다. 학습 결과 FTAE의 최종 Training Loss는 0.1479를 나타냈다. 기존 푸리에 변환 기반 Early Fusion 방법과의 성능 비교 실험 결과 FTAE 방법은 Accuracy 0.905, F1-Score 0.905, 탐지 소요 시간 17초의 성능을 보였다. FTAE 방법은 Early Fusion 방법에 비해 Accuracy와 F1-Score는 0.065 하락했지만, 탐지 소요 시간은 약 72배 빠른 결과를 보여주었다.

  • PDF

Zigbee Communication Based Wireless System for Measuring Lap Time on a Sprints (지그비 통신에 기반한 단거리 육상경기 기록측정 시스템)

  • Jeong, Seung-Hyun;Choi, Deuk-sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.2
    • /
    • pp.86-89
    • /
    • 2018
  • This paper introduces a ZigBee network-based four-lane lap time measurement system that can be set up for short-distance races. The instructions "Ready-Set-Go" can be announced at the entry point node when the start button is pushed, and foot switches installed at the exit point node can be stepped on by the runner for lap time measurement of the race. The start and exit point nodes are connected to a ZigBee network to communicate time synchronization packets. The exit point node maintains synchronized local time within 10 ms at most. The system does not need expensive measurement equipment and provides lap time recording in a more convenient manner than conventional lap time measurement methods.

Endpoint Detection of Speech Signal Using Lyapunov Exponent (리아프노프 지수를 이용한 음성신호 종점 탐색 방법)

  • Zang, Xian;Kim, Jeong-Yeon;Chong, Kil-To
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.46 no.1
    • /
    • pp.28-33
    • /
    • 2009
  • In the research of speech recognition, locating the beginning and end of a speech utterance in a background of noise is of great importance. The conventional methods for speech endpoint detection are based on two simple time-domain measurements-short-time energy, and short-time zero-crossing rate, which couldn't guarantee the precise results if in the low signal-to-noise ratio environments. This paper proposes a novel approach that finds the Lyapunov exponent of time-domain waveform. This proposed method has no use for obtaining the frequency-domain parameters for endpoint detection process, e.g. Mel-Scale Features, which have been introduced in other paper. Accordingly, this algorithm is low complexity and suitable for Digital Isolated Word Recognition System.

Stateful Virtual Proxy Server for Attack Detection based on SIP Protocol State Monitoring Mechanism (SIP 프로토콜 상태정보 기반 공격 탐지 기능을 제공하는 가상 프록시 서버 설계 및 구현)

  • Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.9 no.6
    • /
    • pp.37-48
    • /
    • 2008
  • VoIP service is a transmission of voice data using SIP protocol on IP based network, The SIP protocol has many advantages such as providing IP based voice communication and multimedia service with cheap communication cost and so on. Therefore the SIP protocol spread out very quickly. But, SIP protocol exposes new forms of vulnerabilities on malicious attacks such as Message Flooding attack and protocol parsing attack. And it also suffers threats from many existing vulnerabilities like on IP based protocol. In this paper, we propose a new Virtual Proxy Server system in front of the existed Proxy Server for anomaly detection of SIP attack and stateful management of SIP session with enhanced security. Based on stateful virtual proxy server, out solution shows promising SIP Message Flooding attack verification and detection performance with minimized latency on SIP packet transmission.

  • PDF

Intelligent Abnormal Event Detection Algorithm for Single Households at Home via Daily Audio and Vision Patterns (지능형 오디오 및 비전 패턴 기반 1인 가구 이상 징후 탐지 알고리즘)

  • Jung, Juho;Ahn, Junho
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.77-86
    • /
    • 2019
  • As the number of single-person households increases, it is not easy to ask for help alone if a single-person household is severely injured in the home. This paper detects abnormal event when members of a single household in the home are seriously injured. It proposes an vision detection algorithm that analyzes and recognizes patterns through videos that are collected based on home CCTV. And proposes audio detection algorithms that analyze and recognize patterns of sound that occur in households based on Smartphones. If only each algorithm is used, shortcomings exist and it is difficult to detect situations such as serious injuries in a wide area. So I propose a fusion method that effectively combines the two algorithms. The performance of the detection algorithm and the precise detection performance of the proposed fusion method were evaluated, respectively.

Study of a underpass inundation forecast using object detection model (객체탐지 모델을 활용한 지하차도 침수 예측 연구)

  • Oh, Byunghwa;Hwang, Seok Hwan
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.302-302
    • /
    • 2021
  • 지하차도의 경우 국지 및 돌발홍수가 발생할 경우 대부분 침수됨에도 불구하고 2020년 7월 23일 부산 지역에 밤사이 시간당 80mm가 넘는 폭우가 발생하면서 순식간에 지하차도 천장까지 물이 차면서 선제적인 차량 통제가 우선적으로 수행되지 못하여 미처 대피하지 못한 3명의 운전자 인명사고가 발생하였다. 수재해를 비롯한 재난 관리를 빠르게 수행하기 위해서는 기존의 정부 및 관주도 중심의 단방향의 재난 대응에서 벗어나 정형 데이터와 비정형 데이터를 총칭하는 빅데이터의 통합적 수집 및 분석을 수행이 필요하다. 본 연구에서는 부산지역의 지하차도와 인접한 지하터널 CCTV 자료(센서)를 통한 재난 발생 시 인명피해를 최소화 정보 제공을 위한 Object Detection(객체 탐지)연구를 수행하였다. 지하터널 침수가 발생한 부산지역의 CCTV 영상을 사용하였으며, 영상편집에 사용되는 CCTV 자료의 음성자료를 제거하는 인코딩을 통하여 불러오는 영상파일 용량파일 감소 효과를 볼 수 있었다. 지하차도에 진입하는 물체를 탐지하는 방법으로 YOLO(You Only Look Once)를 사용하였으며, YOLO는 가장 빠른 객체 탐지 알고리즘 중 하나이며 최신 GPU에서 초당 170프레임의 속도로 실행될 수 있는 YOLOv3 방법을 적용하였으며, 분류작업에서 보다 높은 Classification을 가지는 Darknet-53을 적용하였다. YOLOv3 방법은 기존 객체탐지 모델 보다 좀 더 빠르고 정확한 물체 탐지가 가능하며 또한 모델의 크기를 변경하기만 하면 다시 학습시키지 않아도 속도와 정확도를 쉽게 변경가능한 장점이 있다. CCTV에서 오전(일반), 오후(침수발생) 시점을 나눈 후 Car, Bus, Truck, 사람을 분류하는 YOLO 알고리즘을 적용하여 지하터널 인근 Object Detection을 실제 수행 하였으며, CCTV자료를 이용하여 실제 물체 탐지의 정확도가 높은 것을 확인하였다.

  • PDF