• Title/Summary/Keyword: 음향 이벤트 검출

Search Result 18, Processing Time 0.022 seconds

Noise Robust Baseball Event Detection with Multimodal Information (멀티모달 정보를 이용한 잡음에 강인한 야구 이벤트 시점 검출 방법)

  • Young-Ik Kim;Hyun Jo Jung;Minsoo Na;Younghyun Lee;Joonsoo Lee
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.11a
    • /
    • pp.136-138
    • /
    • 2022
  • 스포츠 방송/미디어 데이터에서 특정 이벤트 시점을 효율적으로 검출하는 방법은 정보 검색이나 하이라이트, 요약 등을 위해 중요한 기술이다. 이 논문에서는, 야구 중계 방송 데이터에서 투구에 대한 타격 및 포구 이벤트 시점을 강인하게 검출하는 방법으로, 음향 및 영상 정보를 융합하는 방법에 대해 제안한다. 음향 정보에 기반한 이벤트 검출 방법은 계산이 용이하고 정확도가 높은 반면, 영상 정보의 도움 없이는 모호성을 해결하기 힘든 경우가 많이 발생한다. 특히 야구 중계 데이터의 경우, 투수의 투구 시점에 대한 영상 정보를 활용하여 타격 및 포구 이벤트 검출의 정확도를 보다 향상시킬 수 있다. 이 논문에서는 음향 기반의 딥러닝 이벤트 시점 검출 모델과 영상 기반의 보정 방법을 제안하고, 실제 KBO 야구 중계 방송 데이터에 적용한 사례와 실험 결과에 대해 기술한다.

  • PDF

Overlapping Sound Event Detection Using NMF with K-SVD Based Dictionary Learning (K-SVD 기반 사전 훈련과 비음수 행렬 분해 기법을 이용한 중첩음향이벤트 검출)

  • Choi, Hyeonsik;Keum, Minseok;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.3
    • /
    • pp.234-239
    • /
    • 2015
  • Non-Negative Matrix Factorization (NMF) is a method for updating dictionary and gain in alternating manner. Due to ease of implementation and intuitive interpretation, NMF is widely used to detect and separate overlapping sound events. However, NMF that utilizes non-negativity constraints generates parts-based representation and this distinct property leads to a dictionary containing fragmented acoustic events. As a result, the presence of shared basis results in performance degradation in both separation and detection tasks of overlapping sound events. In this paper, we propose a new method that utilizes K-Singular Value Decomposition (K-SVD) based dictionary to address and mitigate the part-based representation issue during the dictionary learning step. Subsequently, we calculate the gain using NMF in sound event detection step. We evaluate and confirm that overlapping sound event detection performance of the proposed method is better than the conventional method that utilizes NMF based dictionary.

Performance analysis of weakly-supervised sound event detection system based on the mean-teacher convolutional recurrent neural network model (평균-교사 합성곱 순환 신경망 모델을 이용한 약지도 음향 이벤트 검출 시스템의 성능 분석)

  • Lee, Seokjin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.139-147
    • /
    • 2021
  • This paper introduces and implements a Sound Event Detection (SED) system based on weakly-supervised learning where only part of the data is labeled, and analyzes the effect of parameters. The SED system estimates the classes and onset/offset times of events in the acoustic signal. In order to train the model, all information on the event class and onset/offset times must be provided. Unfortunately, the onset/offset times are hard to be labeled exactly. Therefore, in the weakly-supervised task, the SED model is trained by "strongly labeled data" including the event class and activations, "weakly labeled data" including the event class, and "unlabeled data" without any label. Recently, the SED systems using the mean-teacher model are widely used for the task with several parameters. These parameters should be chosen carefully because they may affect the performance. In this paper, performance analysis was performed on parameters, such as the feature, moving average parameter, weight of the consistency cost function, ramp-up length, and maximum learning rate, using the data of DCASE 2020 Task 4. Effects and the optimal values of the parameters were discussed.

Development of Sound Event Detection for Home with Limited Computation Power (제한된 계산량으로 가정내 음향 상황을 검출하는 사운드 이벤트 검출 시스템 개발)

  • Jang, Dalwon;Lee, Jaewon;Lee, JongSeol
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.06a
    • /
    • pp.257-258
    • /
    • 2019
  • 이 논문에서는 가정내 음향 상황에 대한 사운드 이벤트 검출을 수행하는 시스템을 개발하는 내용을 담고 있다. 사운드 이벤트 검출 시스템은 마이크로폰 입력에 대해서 입력신호로부터 특징을 추출하고, 특징으로부터 이벤트가 있었는지 아닌지를 분류하는 형태를 가지고 있다. 본 연구에서는 독립형 디바이스가 가정내 위치한 상황을 가정하여 개발을 진행하였다. 가정내에서 일어날 수 있는 음향 상황을 가정하고 데이터셋 녹음을 진행하였다. 데이터셋을 기반으로 특징과 분류기를 개발하였으며, 적은 계산량으로 결과를 출력해야 하는 독립형 디바이스에 활용하기 위해서 특징셋을 간소화하는 과정을 거쳤다. 개발결과는 가정의 거실환경에서 녹음된 소리를 스피커로 출력하여 테스트하였으며, 다양한 음향 상황에 대한 개발이 추가적으로 필요하다.

  • PDF

A study on training DenseNet-Recurrent Neural Network for sound event detection (음향 이벤트 검출을 위한 DenseNet-Recurrent Neural Network 학습 방법에 관한 연구)

  • Hyeonjin Cha;Sangwook Park
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.5
    • /
    • pp.395-401
    • /
    • 2023
  • Sound Event Detection (SED) aims to identify not only sound category but also time interval for target sounds in an audio waveform. It is a critical technique in field of acoustic surveillance system and monitoring system. Recently, various models have introduced through Detection and Classification of Acoustic Scenes and Events (DCASE) Task 4. This paper explored how to design optimal parameters of DenseNet based model, which has led to outstanding performance in other recognition system. In experiment, DenseRNN as an SED model consists of DensNet-BC and bi-directional Gated Recurrent Units (GRU). This model is trained with Mean teacher model. With an event-based f-score, evaluation is performed depending on parameters, related to model architecture as well as model training, under the assessment protocol of DCASE task4. Experimental result shows that the performance goes up and has been saturated to near the best. Also, DenseRNN would be trained more effectively without dropout technique.

A study on the waveform-based end-to-end deep convolutional neural network for weakly supervised sound event detection (약지도 음향 이벤트 검출을 위한 파형 기반의 종단간 심층 콘볼루션 신경망에 대한 연구)

  • Lee, Seokjin;Kim, Minhan;Jeong, Youngho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.1
    • /
    • pp.24-31
    • /
    • 2020
  • In this paper, the deep convolutional neural network for sound event detection is studied. Especially, the end-to-end neural network, which generates the detection results from the input audio waveform, is studied for weakly supervised problem that includes weakly-labeled and unlabeled dataset. The proposed system is based on the network structure that consists of deeply-stacked 1-dimensional convolutional neural networks, and enhanced by the skip connection and gating mechanism. Additionally, the proposed system is enhanced by the sound event detection and post processings, and the training step using the mean-teacher model is added to deal with the weakly supervised data. The proposed system was evaluated by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 dataset, and the result shows that the proposed system has F1-scores of 54 % (segment-based) and 32 % (event-based).

Frequency-Cepstral Features for Bag of Words Based Acoustic Context Awareness (Bag of Words 기반 음향 상황 인지를 위한 주파수-캡스트럴 특징)

  • Park, Sang-Wook;Choi, Woo-Hyun;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.4
    • /
    • pp.248-254
    • /
    • 2014
  • Among acoustic signal analysis tasks, acoustic context awareness is one of the most formidable tasks in terms of complexity since it requires sophisticated understanding of individual acoustic events. In conventional context awareness methods, individual acoustic event detection or recognition is employed to generate a relevant decision on the impending context. However this approach may produce poorly performing decision results in practical situations due to the possibility of events occurring simultaneously or the acoustically similar events that are difficult to distinguish with each other. Particularly, the babble noise acoustic event occurring at a bus or subway environment may create confusion to context awareness task since babbling is similar in any environment. Therefore in this paper, a frequency-cepstral feature vector is proposed to mitigate the confusion problem during the situation awareness task of binary decisions: bus or metro. By employing the Support Vector Machine (SVM) as the classifier, the proposed feature vector scheme is shown to produce better performance than the conventional scheme.

Performance Improvement of Mean-Teacher Models in Audio Event Detection Using Derivative Features (차분 특징을 이용한 평균-교사 모델의 음향 이벤트 검출 성능 향상)

  • Kwak, Jin-Yeol;Chung, Yong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.401-406
    • /
    • 2021
  • Recently, mean-teacher models based on convolutional recurrent neural networks are popularly used in audio event detection. The mean-teacher model is an architecture that consists of two parallel CRNNs and it is possible to train them effectively on the weakly-labelled and unlabeled audio data by using the consistency learning metric at the output of the two neural networks. In this study, we tried to improve the performance of the mean-teacher model by using additional derivative features of the log-mel spectrum. In the audio event detection experiments using the training and test data from the Task 4 of the DCASE 2018/2019 Challenges, we could obtain maximally a 8.1% relative decrease in the ER(Error Rate) in the mean-teacher model using proposed derivative features.

Polyphonic sound event detection using multi-channel audio features and gated recurrent neural networks (다채널 오디오 특징값 및 게이트형 순환 신경망을 사용한 다성 사운드 이벤트 검출)

  • Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.4
    • /
    • pp.267-272
    • /
    • 2017
  • In this paper, we propose an effective method of applying multichannel-audio feature values to GRNNs (Gated Recurrent Neural Networks) in polyphonic sound event detection. Real life sounds are often overlapped with each other, so that it is difficult to distinguish them by using a mono-channel audio features. In the proposed method, we tried to improve the performance of polyphonic sound event detection by using multi-channel audio features. In addition, we also tried to improve the performance of polyphonic sound event detection by applying a gated recurrent neural network which is simpler than LSTM (Long Short Term Memory), which shows the highest performance among the current recurrent neural networks. The experimental results show that the proposed method achieves better sound event detection performance than other existing methods.

Home monitoring system based on sound event detection for the hard-of-hearing (청각장애인을 위한 사운드 이벤트 검출 기반 홈 모니터링 시스템)

  • Kim, Gee Yeun;Shin, Seung-Su;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.4
    • /
    • pp.427-432
    • /
    • 2019
  • In this paper, we propose a home monitoring system using sound event detection based on a bidirectional gated recurrent neural network for the hard-of-hearing. First, in the proposed system, packet loss concealment is used to recover a lost signal captured through wireless sensor networks, and reliable channels are selected using multi-channel cross correlation coefficient for effective sound event detection. The detected sound event is converted into the text and haptic signal through a harmonic/percussive sound source separation method to be provided to hearing impaired people. Experimental results show that the performance of the proposed sound event detection method is superior to the conventional methods and the sound can be expressed into detailed haptic signal using the source separation.