• 제목/요약/키워드: audio identification

검색결과 53건 처리시간 0.018초

TV 광고 식별을 위한 Constant-Q 변환 기반의 오디오 핑거프린팅 방식 (Audio Fingerprinting Based on Constant Q Transform for TV Commercial Advertisement Identification)

  • 류상현;김형국
    • 한국음향학회지
    • /
    • 제33권3호
    • /
    • pp.210-215
    • /
    • 2014
  • 오디오 핑거프린팅 기술은 잡음과 에코 등으로 인한 왜곡에도 성공적으로 음원을 식별해야한다. 이러한 오디오 핑거프린팅 기술을 TV광고식별에 적용하고자 한다. 본 논문은 TV 광고 식별을 위한 강인한 오디오 핑거프린팅 방식을 제안한다. 제안된 방법에서 사용되는 Constant Q 변환 기반에서 추출된 현저한 오디오 피크 쌍 핑거프린트는 실제 다양한 잡음환경에서 오디오 핑거프린팅 시스템의 정확도를 향상시키고, 낮은 복잡도를 가진다. 실험결과는 제안된 방식이 기존의 오디오 핑거프린팅 방식에 비해 다양한 잡음환경에서도 안정적이며 신뢰할 수 있는 검색 정확도를 제공함을 보여준다.

Background music monitoring framework and dataset for TV broadcast audio

  • Hyemi Kim;Junghyun Kim;Jihyun Park;Seongwoo Kim;Chanjin Park;Wonyoung Yoo
    • ETRI Journal
    • /
    • 제46권4호
    • /
    • pp.697-707
    • /
    • 2024
  • Music identification is widely regarded as a solved problem for music searching in quiet environments, but its performance tends to degrade in TV broadcast audio owing to the presence of dialogue or sound effects. In addition, constructing an accurate dataset for measuring the performance of background music monitoring in TV broadcast audio is challenging. We propose a framework for monitoring background music by automatic identification and introduce a background music cue sheet. The framework comprises three main components: music identification, music-speech separation, and music detection. In addition, we introduce the Cue-K-Drama dataset, which includes reference songs, audio tracks from 60 episodes of five Korean TV drama series, and corresponding cue sheets that provide the start and end timestamps of background music. Experimental results on the constructed and existing datasets demonstrate that the proposed framework, which incorporates music identification with music-speech separation and music detection, effectively enhances TV broadcast audio monitoring.

Robust Person Identification Using Optimal Reliability in Audio-Visual Information Fusion

  • Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-You;Choi, Seung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • 제28권3E호
    • /
    • pp.109-117
    • /
    • 2009
  • Identity recognition in real environment with a reliable mode is a key issue in human computer interaction (HCI). In this paper, we present a robust person identification system considering score-based optimal reliability measure of audio-visual modalities. We propose an extension of the modified reliability function by introducing optimizing parameters for both of audio and visual modalities. For degradation of visual signals, we have applied JPEG compression to test images. In addition, for creating mismatch in between enrollment and test session, acoustic Babble noises and artificial illumination have been added to test audio and visual signals, respectively. Local PCA has been used on both modalities to reduce the dimension of feature vector. We have applied a swarm intelligence algorithm, i.e., particle swarm optimization for optimizing the modified convection function's optimizing parameters. The overall person identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimal reliability measures have effectively enhanced the identification accuracy of 7.73% and 8.18% at different illumination direction to visual signal and consequent Babble noises to audio signal, respectively, in comparison with the best classifier system in the fusion system and maintained the modality reliability statistics in terms of its performance; it thus verified the consistency of the proposed extension.

Frequency-Temporal Filtering for a Robust Audio Fingerprinting Scheme in Real-Noise Environments

  • Park, Man-Soo;Kim, Hoi-Rin;Yang, Seung-Hyun
    • ETRI Journal
    • /
    • 제28권4호
    • /
    • pp.509-512
    • /
    • 2006
  • In a real environment, sound recordings are commonly distorted by channel and background noise, and the performance of audio identification is mainly degraded by them. Recently, Philips introduced a robust and efficient audio fingerprinting scheme applying a differential (high-pass filtering) to the frequency-time sequence of the perceptual filter-bank energies. In practice, however, the robustness of the audio fingerprinting scheme is still important in a real environment. In this letter, we introduce alternatives to the frequency-temporal filtering combination for an extension method of Philips' audio fingerprinting scheme to achieve robustness to channel and background noise under the conditions of a real situation. Our experimental results show that the proposed filtering combination improves noise robustness in audio identification.

  • PDF

Robust Audio Fingerprinting Method Using Prominent Peak Pair Based on Modulated Complex Lapped Transform

  • Kim, Hyoung-Gook;Kim, Jin Young
    • ETRI Journal
    • /
    • 제36권6호
    • /
    • pp.999-1007
    • /
    • 2014
  • The robustness of an audio fingerprinting system in an actual noisy environment is a major challenge for audio-based content identification. This paper proposes a high-performance audio fingerprint extraction method for use in portable consumer devices. In the proposed method, a salient audio peak-pair fingerprint, based on a modulated complex lapped transform, improves the accuracy of the audio fingerprinting system in actual noisy environments with low computational complexity. Experimental results confirm that the proposed method is quite robust in different noise conditions and achieves promising preliminary accuracy results.

부밴드 스펙트럼의 무게중심을 이용한 강인한 오디오 인식기 (Robust Audio Identification Using Spectro-Temporal Subband Centroids)

  • 서진수;이승재
    • 한국음향학회지
    • /
    • 제27권5호
    • /
    • pp.239-243
    • /
    • 2008
  • 본 논문에서는 스펙트럼의 주파수 및 시간 방향의 특성을 결합한 오디오 인식 방법을 제안하였다. 특히 스펙트럼의 형태를 모사하기 위해 부밴드로 나누고 주파수와 시간 방향의 무게중심을 구하고 정규화하여 인식기에 사용하였다. 무게중심 값은 스펙트럼의 형태적 특징을 잘 나타내면서도 간결하여 인식기에 사용되는 특징 DB의 크기를 줄여줄 수 있는 장점이 있다. 수 천곡 규모의 오디오에 대해서, 부밴드 스펙트럼의 주파수와 시간 방향 무게중심의 인식 성능을 비교하였다. 실험 결과 주파수와 시간 방향 특징을 결합하면 상보적으로 인식 성능을 높일 수 있음을 발견하고, 선형 변환을 이용하여 주파수와 시간 방향 특징을 하나로 결합하는 방법을 제안하였다.

Towards automatic inspection of nuclear fuel elements in spent fuel pools: Audio analysis

  • Sergio Segovia;Angel Ramos;David Izard;Doroteo T. Toledano
    • Nuclear Engineering and Technology
    • /
    • 제56권10호
    • /
    • pp.4062-4067
    • /
    • 2024
  • In this article, we propose and explore a novel step in the digitization of the mapping of the spent fuel pool of nuclear power plants, in which the audio signal from the operator's microphone is used to obtain the identification codes of those components that are in each of the cells of the pool. In this way, we have not only an acquisition system but also a verification system that can be used in combination with the outcome of the analysis of the video signal. We developed an algorithm that uses at its core one of the latest models of multilingual Automatic Speech Recognition to transcribe audio signal, and with a post-processing of the timed transcriptions we build the identification code of fuel heads and other components. Results show a very high accuracy in audios from real recording of Spanish nuclear facilities, and the methodology proposed is easily extensible to other nuclear facilities in the world.

압축 도메인 특징을 이용한 강인한 오디오 핑거프린팅 (Robust Audio Fingerprinting Using Compressed-Domain Features)

  • 서진수;이승재
    • 한국음향학회지
    • /
    • 제28권4호
    • /
    • pp.375-382
    • /
    • 2009
  • 본 논문에서는 압축도메인 특징을 이용한 오디오 핑거프린팅 방법을 제안하였다. 압축도메인을 이용함으로써 계산량과 시간을 크게 줄일 수 있는 장점이 있다. 특히 오디오 압축에 널리 쓰이고 있는 MDCT 도메인을 이용하였으며, MDCT 도메인을 부밴드로 나누고 대표적인 모멘트 특징인 에너지, 무게중심, 평탄도로 부터 각각 핑거프린트를 얻었다. 추출된 특징을 차분 필터링하고 부호를 취하여 이진 핑거프린트를 얻었다. 실험을 통해서 고려한 MDCT 도메인 특징들로부터 얻은 핑거프린트들의 인식 성능을 비교하였다. 수 천곡 규모의 오디오에 대해서 다양한 변환에 대한 인식 성능을 고려하였으며, 실험결과 부밴드 에너지가 가장 우수한 핑거프린팅 성능을 보였다.

A Robust Audio Fingerprinting Method Based on Segmentation Boundaries

  • Seo, Jin-Soo
    • 한국음향학회지
    • /
    • 제31권4호
    • /
    • pp.260-265
    • /
    • 2012
  • A robust audio fingerprinting method is presented based on segmentation boundaries. In order to obtain robustness against linear speed changes, fingerprint extraction and matching are synchronized with the segmentation boundaries. Experimental results show that the proposed method is also robust against other common audio processing steps including low bit-rate compression, equalization, and time-scale modification.

The Effect of Audio and Visual Cues on Korean and Japanese EFL Learners' Perception of English Liquids

  • Chung, Hyun-Song
    • 영어어문교육
    • /
    • 제11권2호
    • /
    • pp.135-148
    • /
    • 2005
  • This paper investigated the effect of audio and visual cues on Korean and Japanese EFL learners' perception of the lateral/retroflex contrast in English. In a perception experiment, the two English consonants /l/ and /r/ were embedded in initial and medial position in nonsense words in the context of the vowels /i, a, u/. Singletons and clusters were included in the speech material. Audio and video recordings were made using a total of 108 items. The items were presented to Korean and Japanese learners of English in three conditions: audio-alone (A), visual-alone (V) and audio-visual presentation (AV). The results showed that there was no evidence of AV benefit for the perception of the /l/-/r/ contrast for either Korean or Japanese learners of English. Korean listeners showed much better identification rates of the /l/-/r/ contrast than Japanese listeners when presented in audio or audio-visual conditions.

  • PDF