• Title/Summary/Keyword: 사운드 분류

Search Result 60, Processing Time 0.023 seconds

Sound event classification using deep neural network based transfer learning (깊은 신경망 기반의 전이학습을 이용한 사운드 이벤트 분류)

  • Lim, Hyungjun;Kim, Myung Jong;Kim, Hoirin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.143-148
    • /
    • 2016
  • Deep neural network that effectively capture the characteristics of data has been widely used in various applications. However, the amount of sound database is often insufficient for learning the deep neural network properly, so resulting in overfitting problems. In this paper, we propose a transfer learning framework that can effectively train the deep neural network even with insufficient sound event data by employing rich speech or music data. A series of experimental results verify that proposed method performs significantly better than the baseline deep neural network that was trained only with small sound event data.

Polyphonic sound event detection using multi-channel audio features and gated recurrent neural networks (다채널 오디오 특징값 및 게이트형 순환 신경망을 사용한 다성 사운드 이벤트 검출)

  • Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.4
    • /
    • pp.267-272
    • /
    • 2017
  • In this paper, we propose an effective method of applying multichannel-audio feature values to GRNNs (Gated Recurrent Neural Networks) in polyphonic sound event detection. Real life sounds are often overlapped with each other, so that it is difficult to distinguish them by using a mono-channel audio features. In the proposed method, we tried to improve the performance of polyphonic sound event detection by using multi-channel audio features. In addition, we also tried to improve the performance of polyphonic sound event detection by applying a gated recurrent neural network which is simpler than LSTM (Long Short Term Memory), which shows the highest performance among the current recurrent neural networks. The experimental results show that the proposed method achieves better sound event detection performance than other existing methods.

Analysis on Topics in Soundscape Research based on Topic Modeling (토픽 모델링을 이용한 사운드스케이프 연구 주제어 분석)

  • Choe, Sou-Hwan
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.7
    • /
    • pp.427-435
    • /
    • 2019
  • Soundscape provides important resources to understand social and cultural aspects of our society, however, it is still its infancy to study on the research framework to record, conserve, categorize, and analyze soundscapes. Topic modeling is an automatic approach to discover hidden themes that are disperse in unstructured documents, thus topic modeling is robust enough to find latent topics such as research trends behind a collection of documents. The purpose of this paper is to discover topics on current soundscape research based on topic modeling, furthermore, to discuss the possibilities to design a metadata system for sound archives and to improve Soundscape Ontology which is currently developing.

Sound event detection based on multi-channel multi-scale neural networks for home monitoring system used by the hard-of-hearing (청각 장애인용 홈 모니터링 시스템을 위한 다채널 다중 스케일 신경망 기반의 사운드 이벤트 검출)

  • Lee, Gi Yong;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.6
    • /
    • pp.600-605
    • /
    • 2020
  • In this paper, we propose a sound event detection method using a multi-channel multi-scale neural networks for sound sensing home monitoring for the hearing impaired. In the proposed system, two channels with high signal quality are selected from several wireless microphone sensors in home. The three features (time difference of arrival, pitch range, and outputs obtained by applying multi-scale convolutional neural network to log mel spectrogram) extracted from the sensor signals are applied to a classifier based on a bidirectional gated recurrent neural network to further improve the performance of sound event detection. The detected sound event result is converted into text along with the sensor position of the selected channel and provided to the hearing impaired. The experimental results show that the sound event detection method of the proposed system is superior to the existing method and can effectively deliver sound information to the hearing impaired.

Towards the Generation of Language-based Sound Summaries Using Electroencephalogram Measurements (뇌파측정기술을 활용한 언어 기반 사운드 요약의 생성 방안 연구)

  • Kim, Hyun-Hee;Kim, Yong-Ho
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.3
    • /
    • pp.131-148
    • /
    • 2019
  • This study constructed a cognitive model of information processing to understand the topic of a sound material and its characteristics. It then proposed methods to generate sound summaries, by incorporating anterior-posterior N400/P600 components of event-related potential (ERP) response, into the language representation of the cognitive model of information processing. For this end, research hypotheses were established and verified them through ERP experiments, finding that P600 is crucial in screening topic-relevant shots from topic-irrelevant shots. The results of this study can be applied to the design of classification algorithm, which can then be used to generate the content-based metadata, such as generic or personalized sound summaries and video skims.

Recognition of Overlapped Sound and Influence Analysis Based on Wideband Spectrogram and Deep Neural Networks (광역 스펙트로그램과 심층신경망에 기반한 중첩된 소리의 인식과 영향 분석)

  • Kim, Young Eon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.421-430
    • /
    • 2018
  • Many voice recognition systems use methods such as MFCC, HMM to acknowledge human voice. This recognition method is designed to analyze only a targeted sound which normally appears between a human and a device one. However, the recognition capability is limited when there is a group sound formed with diversity in wider frequency range such as dog barking and indoor sounds. The frequency of overlapped sound resides in a wide range, up to 20KHz, which is higher than a voice. This paper proposes the new recognition method which provides wider frequency range by conjugating the Wideband Sound Spectrogram and the Keras Sequential Model based on DNN. The wideband sound spectrogram is adopted to analyze and verify diverse sounds from wide frequency range as it is designed to extract features and also classify as explained. The KSM is employed for the pattern recognition using extracted features from the WSS to improve sound recognition quality. The experiment verified that the proposed WSS and KSM excellently classified the targeted sound among noisy environment; overlapped sounds such as dog barking and indoor sounds. Furthermore, the paper shows a stage by stage analyzation and comparison of the factors' influences on the recognition and its characteristics according to various levels of noise.

Information Gathering Agent System using XML (이동에이전트를 이용한 XML 정보의 수집 및 분류)

  • 서효정;방대욱
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10b
    • /
    • pp.131-133
    • /
    • 1999
  • 요즘처럼 웹을 이용하여 저오 검색시 너무나 많은 양의 정보를 수집, 정리, 관리해야 하는 문제에 직면하게 되었다. 또한 인터넷상에는 기존의 텍스트 자료 이외에도 이미지, 사운드, 데이터 베이스 등 우리가 원하는 여러 유형의 자료가 존재한다. 하지만 웹상에서는 텍스트만을 위주로 자료를 검색, 수집, 분류를 한다. 이러한 문제점을 해결하기 위해 XML를 이용하여 정보의 종류에 관계없이 수집할 수 있다. 이 논문에서는 이동 에이전트를 이용한 정보 검색 모형을 제시하고 이때 이동에이전트가 정보의 표현방법으로 XML를 사용한다. 또한 XML의 계층적인 특성을 활용하여 XML 문서의 분류, 병합을 할 수 있다. 따라서 수집된 정보의 정리된 형태로 쉽게 얻을 수 있다.

  • PDF

A Study on Relation of Visual/Auditory Factors in Video Communication. (영상 커뮤니케이션의 시각과 청각의 연관성에 관한 연구)

  • Ham, Gi-Hun;Jeong, Seong-Hwan;Jo, Dong-Min
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2009.11a
    • /
    • pp.111-114
    • /
    • 2009
  • 멀티미디어(Multimedia) 시대에 있어 메시지(Message)를 통한 사회적 상호작용, 즉 커뮤니케이션 (Communication)은 시각적인 요소, 색채, 형태, 시간, 움직임 뿐 만 아니라 청각적인 요소도 메시지를 전달하는 표현요소로 자리 잡고 있다. 왜냐하면 시각이나 청각 어느 하나만으로 메시지를 전달할 때보다 시 청각을 조화시켜 메시지를 전달 할 경우에 인지도가 훨씬 높기 때문이다. 그리하여 본 연구는 영상 커뮤니케이션이 가지고 있는 시 청각적 요소들의 연관성을 찾고자 하였다. 오늘날 TV 이나 영화 인터넷 등등의 멀티미디어에서 우리는 정보전달과 설득의 영상 메시지들을 쉽게 접할 수 있다. 그 중에 특히 영상메시지의 역할을 극적으로 나타낼 수 있는 광고영상에 속한 시각적 요소 타이포그래피와 청각적 요소 사운드를 통해서 시 청각적 요소의 연관성에 대해 연구하였다. 먼저 다양한 광고영상을 소구방법과 내용에 따른 분류를 통해 분야별로 나누고 그 분야에 따른 시 청각요소의 사용 빈도와 유형을 조사하였다. 타이포그래피는 전달방법에 따라, 사운드는 사용 유형에 따라 나누어 빈도와 유형을 조사하였다. 영상의 시각요소와 청각요소의 적절한 사용이 수용자로 하여금 선호도 및 인지도에 높은 효과가 있다는 분석 결과를 가지고 국내와 국외 광고영상의 시 청각요소의 분포도를 조사, 분석하였다. 그리하여 향후 효과적인 영상 커뮤니케이션의 역할을 하기 위해 시 청각요소의 연관성을 고려한 효율적인 광고영상 제작방향을 제시하고자 한다.

  • PDF