DOI QR코드

DOI QR Code

Non-Dialog Section Detection for the Descriptive Video Service Contents Authoring

화면해설방송 저작을 위한 비 대사 구간 검출

  • Jang, Inseon (Realistic Broadcasting Media Research Department, ETRI) ;
  • Ahn, ChungHyun (Realistic Broadcasting Media Research Department, ETRI) ;
  • Jang, Younseon (Dept. of Electronic Engineering, Chungnam National University)
  • 장인선 (한국전자통신연구원 실감방송미디어연구부 감성미디어연구실) ;
  • 안충현 (한국전자통신연구원 실감방송미디어연구부 감성미디어연구실) ;
  • 장윤선 (충남대학교 전자공학과)
  • Received : 2014.03.10
  • Accepted : 2014.04.30
  • Published : 2014.05.30

Abstract

This paper addresses a problem of non-dialog section detection for the DVS authoring, the goal of which is to find meaningful section from the broadcasting audio, where audio description can be inserted. The broadcasting audio involves the presence of various sounds so that it first discriminates between speech and non-speech for each audio frame. Proposed method jointly exploits the inter-channels structure and speech source characteristics of the broadcasting audio whose number of channel is stereo. Also, rule based post-processing is finally applied to detect the non-dialog section whose length is appropriate for audio description. Proposed method provides more accurate detection compared to conventional method. Experimental results on real broadcasting contents show that qualitative superiority of the proposed method.

본 논문에서는 방송 오디오에서로부터 화면해설 삽입을 위한 비 대사 구간 검출 방법을 제시한다. 방송 오디오에서의 대사와 비 대사 구간을 분류하기 위해서는 대사와 배경 음악 등 다양한 종류의 소리가 혼합되어 있는 스테레오 신호로부터 음성 활성 여부의 검출이 우선되어야 한다. 본 논문에서는 방송 오디오 제작과정을 파악함으로써 신호의 채널 특성 분석 결과를 대사 음성 활성 여부 검출에 적용한다. 본 논문에서 제안하는 비 대사 구간 검출 방법은 방송 오디오의 센터채널과 서라운드 성분 간의 에너지 비율을 추가적인 오디오 특징으로 이용하여 센터채널의 음성 활성도와의 결합을 통해 성능 향상을 이루어 낸다. 또한, 실제 화면해설 방송물의 분석을 통해 생성한 규칙 기반의 후처리를 통해 화면해설 삽입이 가능한 비 대사 구간을 검출한다. 이를 실제 방송 컨텐츠를 대상으로 한 실험을 통하여 검증한다.

Keywords

References

  1. Korea Employment Agency for the Disabled, 2013 the disables statistics, Ministry of Employment and Labor, April 2013.
  2. M. Park, ITU Activities for improving ICT accessibility of disabled people, Policy of Broadcasting and Telecommunication, vol 25, no. 12, July 2013.
  3. ITU-T BT.2207-2 (11/2012) Accessibility to broadcasting services for persons with disabilities. (http://www.itu.int/pub/R-REP-BT.2207-2-2012)
  4. Korean Association for Broadcasting & Telecommunication Studies, Study on improving the media accessibility of broadcasting alienation class including the blind and the deaf, Korea Communications Commission, Dec. 2010.
  5. Korea Communications Commission Announcement issue 2011-53, "Announcement of broadcasting access right guarantee for the disabled, which is including organizing and providing the broadcasting for the disabled, " Dec. 2011.
  6. http://www.miranda.com/family/12/Audio_or_Video_Description
  7. A. Szarkowska, "Text-to-speech audio description: towards wider availability of AD", Journal of Specialised Translation 15, pp. 142-163, 2011.
  8. W. Lim, C. Ahn, "Descriptive video service using text to speech," in Proc. Conference of the Korean Society of Broadcast Engineers, June 2013.
  9. B. Elizalde, G. Friedland, "Lost in segmentation: three approaches for speech/non-speech detection in consumer-produced videos," in Proc. ICME, SanJose, USA, July 2013.
  10. T. Ng, B. Zhang, L. Nguyen, S. Matsoukas, X. Zhou, N. Mesgarani, K. Vesely, and P.l Matejka, "Developing a speech activity detection system for the DARPA RATS program," in Proc. Interspeech, 2012.
  11. H. Meinedo and J. Neto, "Audio segmentation, classification and clustering in a broadcast news task," in Proc. ICASSP, pp. II 5-8, 2003.
  12. L. Lu, S. Li, and H. J. Zhang, "Content-based audio segmentation using support vector machines," in Proc. ICME, pp. 749-752, 2001.
  13. G. Jung, Management of TV System and Image Production, Cheongmoongak publishing co., 2009.