• Title/Summary/Keyword: 음향 장면 분류

Search Result 5, Processing Time 0.028 seconds

Light weight architecture for acoustic scene classification (음향 장면 분류를 위한 경량화 모형 연구)

  • Lim, Soyoung;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.979-993
    • /
    • 2021
  • Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). In this study, we considered the problem that ASC faces in real-world applications that the model used should have low-complexity. We compared several models that apply light-weight techniques. First, a base CNN model was proposed using log mel-spectrogram, deltas, and delta-deltas features. Second, depthwise separable convolution, linear bottleneck inverted residual block was applied to the convolutional layer, and Quantization was applied to the models to develop a low-complexity model. The model considering low-complexity was similar or slightly inferior to the performance of the base model, but the model size was significantly reduced from 503 KB to 42.76 KB.

Indoor Scene Classification based on Color and Depth Images for Automated Reverberation Sound Editing (자동 잔향 편집을 위한 컬러 및 깊이 정보 기반 실내 장면 분류)

  • Jeong, Min-Heuk;Yu, Yong-Hyun;Park, Sung-Jun;Hwang, Seung-Jun;Baek, Joong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.3
    • /
    • pp.384-390
    • /
    • 2020
  • The reverberation effect on the sound when producing movies or VR contents is a very important factor in the realism and liveliness. The reverberation time depending the space is recommended in a standard called RT60(Reverberation Time 60 dB). In this paper, we propose a scene recognition technique for automatic reverberation editing. To this end, we devised a classification model that independently trains color images and predicted depth images in the same model. Indoor scene classification is limited only by training color information because of the similarity of internal structure. Deep learning based depth information extraction technology is used to use spatial depth information. Based on RT60, 10 scene classes were constructed and model training and evaluation were conducted. Finally, the proposed SCR + DNet (Scene Classification for Reverb + Depth Net) classifier achieves higher performance than conventional CNN classifiers with 92.4% accuracy.

A Study on Recognizing faces in broadcast video (영상에서의 얼굴인식)

  • Han Jun-hee;Nam Kee-hwan;Joung Youn-sook;Jeong Joo-byeong;Ra Sang-dong;Bae Cheol-soo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.339-342
    • /
    • 2004
  • 최근 영상 자료의 저장과 검색을 위한 시스템이 많이 연구되고 있다. 방대한 양의 영상 자료를 디지털화하여 파일로 저장하고 영상에 관한 각종 정보를 데이터 베이스로 구성한 뒤, 키워드 등을 사용하여 필요한 영상을 네트워크를 통하여 검색하고 이것을 편집 등에 활용할 수 있도록 하는 것이 본 논문의 목적이다. 영상을 데이터베이스로 구축하기 위해 선행되어야 할 것은 연속적인 장면마다 또는 의미 있는 장면마다 영상을 분류하는 작업이다. 본 논문에서는 MPEG 비트스트림을 분석하여 장면 전환 지점을 자동으로 찾는 실험을 워크스테이션을 통하여 시행하였으며 기존 실행한 실험을 바탕으로 PC상에서 동영상 검색 시스템을 구현하였다. 동영상 검색 시스템은 뉴스, 드라마는 물론 각종 보안 영상 등 다양한 분야의 영상을 분석하여 장면 전환 지점을 찾고, 각 장면의 대표 영상을 저장한 뒤, 네트워크 환경에서 동영상을 검색할 수 있도록 만든 시스템이다.

  • PDF

Listenable Explanation for Heatmap in Acoustic Scene Classification (음향 장면 분류에서 히트맵 청취 분석)

  • Suh, Sangwon;Park, Sooyoung;Jeong, Youngho;Lee, Taejin
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.727-731
    • /
    • 2020
  • 인공신경망의 예측 결과에 대한 원인을 분석하는 것은 모델을 신뢰하기 위해 필요한 작업이다. 이에 컴퓨터 비전 분야에서는 돌출맵 또는 히트맵의 형태로 모델이 어떤 내용을 근거로 예측했는지 시각화 하는 모델 해석 방법들이 제안되었다. 하지만 오디오 분야에서는 스펙트로그램 상의 시각적 해석이 직관적이지 않으며, 실제 어떤 소리를 근거로 판단했는지 이해하기 어렵다. 따라서 본 연구에서는 히트맵의 청취 분석 시스템을 제안하고, 이를 활용한 음향 장면 분류 모델의 히트맵 청취 분석 실험을 진행하여 인공신경망의 예측 결과에 대해 사람이 이해할 수 있는 설명을 제공할 수 있는지 확인한다.

  • PDF

Salient Region Detection Algorithm for Music Video Browsing (뮤직비디오 브라우징을 위한 중요 구간 검출 알고리즘)

  • Kim, Hyoung-Gook;Shin, Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.112-118
    • /
    • 2009
  • This paper proposes a rapid detection algorithm of a salient region for music video browsing system, which can be applied to mobile device and digital video recorder (DVR). The input music video is decomposed into the music and video tracks. For the music track, the music highlight including musical chorus is detected based on structure analysis using energy-based peak position detection. Using the emotional models generated by SVM-AdaBoost learning algorithm, the music signal of the music videos is classified into one of the predefined emotional classes of the music automatically. For the video track, the face scene including the singer or actor/actress is detected based on a boosted cascade of simple features. Finally, the salient region is generated based on the alignment of boundaries of the music highlight and the visual face scene. First, the users select their favorite music videos from various music videos in the mobile devices or DVR with the information of a music video's emotion and thereafter they can browse the salient region with a length of 30-seconds using the proposed algorithm quickly. A mean opinion score (MOS) test with a database of 200 music videos is conducted to compare the detected salient region with the predefined manual part. The MOS test results show that the detected salient region using the proposed method performed much better than the predefined manual part without audiovisual processing.