• Title/Summary/Keyword: 오디오 특징 추출

Search Result 63, Processing Time 0.022 seconds

Design of Cough Detection System Based on Mutimodal Learning & Wearable Sensor to Predict the Spread of Influenza (독감 확산 예측을 위한 멀티모달 학습과 웨어러블 센서 기반의 기침 감지 시스템 설계)

  • Kang, Jae-Sik;Back, Moon-Ki;Choi, Hyung-Tak;Lee, Kyu-Chul
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.428-430
    • /
    • 2018
  • 본 논문에서는 독감확산 예측을 위한 웨어러블 센서를 이용한 기침 감지 모델을 제안한다. 서로 상이한 기침 신체데이터를 사용하고 기침 감지 알고리즘의 구현없이 기계가 학습하는 방식인 멀티모달 DNN을 이용하여 설계하였다. 또한 웨어러블 센서를 통해 실생활의 기침 오디오 데이터와 기침 3축 가속도 데이터를 수집하였고, 두 개의 데이터중 하나의 데이터만으로도 감지를 위한 학습이 가능토록하기 위해 각각 MFCC와 FFT를 이용하여 특징 벡터를 추출하는 방법을 이용하였다.

Design and Implementation of Illegal Content Tracking System Using Hybrid Content Recognition (하이브리드 인식을 이용한 불법 콘텐츠 추적시스템 설계 및 구현)

  • Kim, Won-Gyum;Park, Kyung-Soo;Kim, Sang-Jin;Yu, Won-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1555-1558
    • /
    • 2011
  • 본 논문에서는 멀티미디어 데이터에 대한 내용기반 인식 기법을 이용하여 인터넷에 불법으로 배포되어 있는 콘텐츠를 추적하는 기법을 소개한다. 내용기반 인식 기법은 콘텐츠의 원신호에서 내용기반 해쉬나 혹은 축약된 형태의 특징벡터를 추출하여 콘텐츠를 인식하는 기술로 저작권보호 분야에서 불법 저작물을 필터링하는데 많이 활용되고 있다. 불법 콘텐츠 추적시스템은 인터넷에서 광범위하게 유포되어 있는 저작물을 검색하여 그 내용을 기반으로 인식하여 불법 여부를 판단한 후 삭제메일이나 재전송 중지 등의 후속 조치를 자동으로 수행하는 저작권보호 시스템이다. 본 논문에서는 오디오, 비디오, 어문, 게임 콘텐츠에 대해 내용을 기반으로 인식을 수행하고 불법 여부를 판단하여 재전송 중지 조치를 취하는 능동적 저작물 추적 시스템을 제안한다. 제안된 시스템에서는 검색모듈에 의해 수집된 다양한 저작물에 대해 저작물별 독립적으로 인식 기능을 수행하는 기능을 제공한다.

Subdivision Ensemble Model for Highlight Detection (하이라이트 검출을 위한 구간 분할 앙상블 모델)

  • Lee, Hansol;Lee, Gyemin
    • Journal of Broadcast Engineering
    • /
    • v.25 no.4
    • /
    • pp.620-628
    • /
    • 2020
  • Automatically predicting video highlight is an important task for media industry and streaming platform providers to save time and cost of manual video editing process. We propose a new ensemble model that combines multiple highlight detectors with each focusing on different parts of highlight events. Therefore, our model can capture more information-rich sections of events. Furthermore, the proposed model can extract improved features for highlight detection particularly when the train video set is small. We evaluate our model on e-sports and baseball videos.

Music Genre Classification using Time Delay Neural Network (시간 지연 신경망을 이용한 음악 장르 분류)

  • 이재원;조찬윤;김상균
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.5
    • /
    • pp.414-422
    • /
    • 2001
  • This paper proposes a classifier of music genre using time delay neural network(TDNN) fur an audio data retrieval systems. The classifier considers eight kinds of genres such as Blues, Country, Hard Core, Hard Rock, Jazz, R&B(Soul), Techno and Trash Metal. The comparative unit to classify the genres is a melody between bars. The melody pattern is extracted based un snare drum sound which represents the periodicity of rhythm effectively. The classifier is constructed with the TDNN and uses fourier transformed feature vector of the melody as input pattern. We experimented the classifier on eighty training data from ten musics for each genres and forty test data from five musics for each genres, and obtained correct classification rates of 92.5% and 60%, respectively.

  • PDF

A New Tempo Feature Extraction Based on Modulation Spectrum Analysis for Music Information Retrieval Tasks

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.6 no.2
    • /
    • pp.95-106
    • /
    • 2007
  • This paper proposes an effective tempo feature extraction method for music information retrieval. The tempo information is modeled by the narrow-band temporal modulation components, which are decomposed into a modulation spectrum via joint frequency analysis. In implementation, the tempo feature is directly extracted from the modified discrete cosine transform coefficients, which is the output of partial MP3(MPEG 1 Layer 3) decoder. Then, different features are extracted from the amplitudes of modulation spectrum and applied to different music information retrieval tasks. The logarithmic scale modulation frequency coefficients are employed in automatic music emotion classification and music genre classification. The classification precision in both systems is improved significantly. The bit vectors derived from adaptive modulation spectrum is used in audio fingerprinting task That is proved to be able to achieve high robustness in this application. The experimental results in these tasks validate the effectiveness of the proposed tempo feature.

  • PDF

Shooting sound analysis using convolutional neural networks and long short-term memory (합성곱 신경망과 장단기 메모리를 이용한 사격음 분석 기법)

  • Kang, Se Hyeok;Cho, Ji Woong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.312-318
    • /
    • 2022
  • This paper proposes a model which classifies the type of guns and information about sound source location using deep neural network. The proposed classification model is composed of convolutional neural networks (CNN) and long short-term memory (LSTM). For training and test the model, we use the Gunshot Audio Forensic Dataset generated by the project supported by the National Institute of Justice (NIJ). The acoustic signals are transformed to Mel-Spectrogram and they are provided as learning and test data for the proposed model. The model is compared with the control model consisting of convolutional neural networks only. The proposed model shows high accuracy more than 90 %.

The Vocabulary Recognition Optimize using Acoustic and Lexical Search (음향학적 및 언어적 탐색을 이용한 어휘 인식 최적화)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.4
    • /
    • pp.496-503
    • /
    • 2010
  • Speech recognition system is developed of standalone, In case of a mobile terminal using that low recognition rate represent because of limitation of memory size and audio compression. This study suggest vocabulary recognition highest performance improvement system for separate acoustic search and lexical search. Acoustic search is carry out in mobile terminal, lexical search is carry out in server processing system. feature vector of speech signal extract using GMM a phoneme execution, recognition a phoneme list transmission server using Lexical Tree Search algorithm lexical search recognition execution. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.71%, represent recognition speed of 1.58 second.

A Digital Watermark Scheme for Rational Bezier Curves (유리 베지에곡선을 위한 디지털워터마크 기법)

  • Kim, Tae-Wan;Kwon, Song-Hwa;Moon, Hwan-Pyo;Choi, Hyeong-In;Wee, Nam-Sook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04a
    • /
    • pp.625-628
    • /
    • 2002
  • 디지틸워터마킹은 디지털콘텐츠의 저작권보호 솔루션으로서 현재 주로 이미지, 오디오, 비디오, 텍스트 등을 대상으로 연구되고 있다. 컴퓨터 하드웨어, 네트워크, 그리고, 응용 소프트웨어의 빠른 발전과 함께 국가 차원의 초고속 통신망의 인프라 구축에 힘입어, 3차원 폴리곤과 곡선 및 곡면에 대한 디지털워터마킹에 관심이 높아지고 있다. 본 논문에서는 유리 베지에곡선에 대한 디지털워터마킹에 대한 하나의 방법을 제시한다. 기존의 베지에곡선의 차수를 증가시키는 일반적인 방법이 아닌 유리항의 분모와 분자에 공통의 다항식을 곱하여 차수를 증가시킨다. 이때 공통으로 칠하는 다항식의 관들의 복비(cross ratio) 값에 우리가 숨기고자하는 마크를 삽입하고, 추출해내는 방법을 제시한다. 본 논문에서 제시된 알고리듬은 워터마크를 삽입하는 과정에서 곡선의 형태를 전혀 변화시키지 않는 형태 유지성(shape preserving property)을 갖는다. 또한. 본 알고리듬의 다른 중요한 특징은 곡선이 CAD 시스템에 의해 이용되는 과정에서 흔히 일어나는 재매개화 방법 중 뫼비우스 변환을 이용한 재매개화에 저항성이 있는 알고리듬이라는 것이다. 마지막으로 본 연구에서 제시한 방범에 의한 예제의 결과를 보여준다.

  • PDF

Sound event detection based on multi-channel multi-scale neural networks for home monitoring system used by the hard-of-hearing (청각 장애인용 홈 모니터링 시스템을 위한 다채널 다중 스케일 신경망 기반의 사운드 이벤트 검출)

  • Lee, Gi Yong;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.6
    • /
    • pp.600-605
    • /
    • 2020
  • In this paper, we propose a sound event detection method using a multi-channel multi-scale neural networks for sound sensing home monitoring for the hearing impaired. In the proposed system, two channels with high signal quality are selected from several wireless microphone sensors in home. The three features (time difference of arrival, pitch range, and outputs obtained by applying multi-scale convolutional neural network to log mel spectrogram) extracted from the sensor signals are applied to a classifier based on a bidirectional gated recurrent neural network to further improve the performance of sound event detection. The detected sound event result is converted into text along with the sensor position of the selected channel and provided to the hearing impaired. The experimental results show that the sound event detection method of the proposed system is superior to the existing method and can effectively deliver sound information to the hearing impaired.

Acceleration signal-based haptic texture recognition according to characteristics of object surface material using conformer model (Conformer 모델을 이용한 물체 표면 재료의 특성에 따른 가속도 신호 기반 햅틱 질감 인식)

  • Hyoung-Gook Kim;Dong-Ki Jeong;Jin-Young Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.3
    • /
    • pp.214-220
    • /
    • 2023
  • In this paper, we propose a method to improve texture recognition performance from haptic acceleration signals representing the texture characteristics of object surface materials by using a Conformer model that combines the advantages of a convolutional neural network and a transformer. In the proposed method, three-axis acceleration signals generated by impact sound and vibration are combined into one-dimensional acceleration data while a person contacts the surface of the object materials using a tool such as a stylus , and the logarithmic Mel-spectrogram is extracted from the haptic acceleration signal similar to the audio signal. Then, Conformer is applied to the extracted the logarithmic Mel-spectrogram to learn main local and global frequency features in recognizing the texture of various object materials. Experiments on the Lehrstuhl für Medientechnik (LMT) haptic texture dataset consisting of 60 materials to evaluate the performance of the proposed model showed that the proposed method can effectively recognize the texture of the object surface material better than the existing methods.