• Title/Summary/Keyword: Audio identification

Search Result 50, Processing Time 0.024 seconds

Audio Fingerprinting Based on Constant Q Transform for TV Commercial Advertisement Identification (TV 광고 식별을 위한 Constant-Q 변환 기반의 오디오 핑거프린팅 방식)

  • Ryu, Sang Hyeon;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.3
    • /
    • pp.210-215
    • /
    • 2014
  • In spite of distortion caused by noise and echo, the audio fingerprinting technique must identify successfully an audio source. This audio fingerprinting technique is applying for TV commercial advertisement identification. In this paper, we propose a robust audio fingerprinting method for TV commercial advertisement identification. In the proposed method, a prominent audio peak pair fingerprint based on constant Q transform improves the accuracy of the audio fingerprinting system in real noisy environments. Experimental results confirm that the proposed method is quite robust than previous audio fingerprinting method in different noise conditions and achieves promising accurate results.

Robust Person Identification Using Optimal Reliability in Audio-Visual Information Fusion

  • Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-You;Choi, Seung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3E
    • /
    • pp.109-117
    • /
    • 2009
  • Identity recognition in real environment with a reliable mode is a key issue in human computer interaction (HCI). In this paper, we present a robust person identification system considering score-based optimal reliability measure of audio-visual modalities. We propose an extension of the modified reliability function by introducing optimizing parameters for both of audio and visual modalities. For degradation of visual signals, we have applied JPEG compression to test images. In addition, for creating mismatch in between enrollment and test session, acoustic Babble noises and artificial illumination have been added to test audio and visual signals, respectively. Local PCA has been used on both modalities to reduce the dimension of feature vector. We have applied a swarm intelligence algorithm, i.e., particle swarm optimization for optimizing the modified convection function's optimizing parameters. The overall person identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimal reliability measures have effectively enhanced the identification accuracy of 7.73% and 8.18% at different illumination direction to visual signal and consequent Babble noises to audio signal, respectively, in comparison with the best classifier system in the fusion system and maintained the modality reliability statistics in terms of its performance; it thus verified the consistency of the proposed extension.

Frequency-Temporal Filtering for a Robust Audio Fingerprinting Scheme in Real-Noise Environments

  • Park, Man-Soo;Kim, Hoi-Rin;Yang, Seung-Hyun
    • ETRI Journal
    • /
    • v.28 no.4
    • /
    • pp.509-512
    • /
    • 2006
  • In a real environment, sound recordings are commonly distorted by channel and background noise, and the performance of audio identification is mainly degraded by them. Recently, Philips introduced a robust and efficient audio fingerprinting scheme applying a differential (high-pass filtering) to the frequency-time sequence of the perceptual filter-bank energies. In practice, however, the robustness of the audio fingerprinting scheme is still important in a real environment. In this letter, we introduce alternatives to the frequency-temporal filtering combination for an extension method of Philips' audio fingerprinting scheme to achieve robustness to channel and background noise under the conditions of a real situation. Our experimental results show that the proposed filtering combination improves noise robustness in audio identification.

  • PDF

Robust Audio Fingerprinting Method Using Prominent Peak Pair Based on Modulated Complex Lapped Transform

  • Kim, Hyoung-Gook;Kim, Jin Young
    • ETRI Journal
    • /
    • v.36 no.6
    • /
    • pp.999-1007
    • /
    • 2014
  • The robustness of an audio fingerprinting system in an actual noisy environment is a major challenge for audio-based content identification. This paper proposes a high-performance audio fingerprint extraction method for use in portable consumer devices. In the proposed method, a salient audio peak-pair fingerprint, based on a modulated complex lapped transform, improves the accuracy of the audio fingerprinting system in actual noisy environments with low computational complexity. Experimental results confirm that the proposed method is quite robust in different noise conditions and achieves promising preliminary accuracy results.

Robust Audio Identification Using Spectro-Temporal Subband Centroids (부밴드 스펙트럼의 무게중심을 이용한 강인한 오디오 인식기)

  • Seo, Jin-Soo;Lee, Seung-Jae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.5
    • /
    • pp.239-243
    • /
    • 2008
  • This paper proposes a new audio identification method based on a combination of the instantaneous and dynamic spectral features of the audio spectrum. Especially we propose the spectro-temporal subband centroids that are easy to compute and effective to summarize the instantaneous and dynamic spectral variations. Experimental results demonstrate that the identification performance can be greatly improved by combining both the spectral and the temporal subband centroids.

Robust Audio Fingerprinting Using Compressed-Domain Features (압축 도메인 특징을 이용한 강인한 오디오 핑거프린팅)

  • Seo, Jin-Soo;Lee, Seung-Jae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.375-382
    • /
    • 2009
  • This paper proposes a new audio fingerprinting method based on compressed-domain features. By basing on the compressed domain, the computational efficiency of the proposed method can be greatly enhanced. Especially we deal with MDCT domain, which is widely employed in audio compression, and extract three kinds of subband features; energy, centroid, and flatness. By taking signs after differentially filtering each feature, binary audio fingerprints are obtained. The identification performance of the three kinds of fingerprints are experimentally compared. Among the considered compressed-domain subband features, the subband energy showed the best performance for fingerprinting.

A Robust Audio Fingerprinting Method Based on Segmentation Boundaries

  • Seo, Jin-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.4
    • /
    • pp.260-265
    • /
    • 2012
  • A robust audio fingerprinting method is presented based on segmentation boundaries. In order to obtain robustness against linear speed changes, fingerprint extraction and matching are synchronized with the segmentation boundaries. Experimental results show that the proposed method is also robust against other common audio processing steps including low bit-rate compression, equalization, and time-scale modification.

The Effect of Audio and Visual Cues on Korean and Japanese EFL Learners' Perception of English Liquids

  • Chung, Hyun-Song
    • English Language & Literature Teaching
    • /
    • v.11 no.2
    • /
    • pp.135-148
    • /
    • 2005
  • This paper investigated the effect of audio and visual cues on Korean and Japanese EFL learners' perception of the lateral/retroflex contrast in English. In a perception experiment, the two English consonants /l/ and /r/ were embedded in initial and medial position in nonsense words in the context of the vowels /i, a, u/. Singletons and clusters were included in the speech material. Audio and video recordings were made using a total of 108 items. The items were presented to Korean and Japanese learners of English in three conditions: audio-alone (A), visual-alone (V) and audio-visual presentation (AV). The results showed that there was no evidence of AV benefit for the perception of the /l/-/r/ contrast for either Korean or Japanese learners of English. Korean listeners showed much better identification rates of the /l/-/r/ contrast than Japanese listeners when presented in audio or audio-visual conditions.

  • PDF

Audio Fingerprint Binarization by Minimizing Hinge-Loss Function (경첩 손실 함수 최소화를 통한 오디오 핑거프린트 이진화)

  • Seo, Jin Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.5
    • /
    • pp.415-422
    • /
    • 2013
  • This paper proposes a robust binary audio fingerprinting method by minimizing hinge-loss function. In the proposed method, the type of fingerprints is binary, which is conducive in reducing the size of fingerprint DB. In general, the binarization of features for fingerprinting deteriorates the performance of fingerprinting system, such as robustness and discriminability. Thus it is necessary to minimize such performance loss. Since the similarity between two audio clips is represented by a hinge-like function, we propose a method to derive a binary fingerprinting by minimizing a hinge-loss function. The derived hinge-loss function is minimized by using the minimal loss hashing. Experiments over thousands of songs demonstrate that the identification performance of binary fingerprinting can be improved by minimizing the proposed hinge loss function.

Development of Audio Watermarking Technique using Group Quantization (그룹 양자화를 이용한 오디오 워터마킹 기술 개발)

  • Shin Seungwon;Park Changmok;Kim Jongweon;Choi Jonguk
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.323-326
    • /
    • 2002
  • In this paper, we propose a watermarking technique that it is possible to winnow illegal contents from scattered contents on the internet. The identification is performed using an embedded unique content ID by the watermarking technique. The proposed watermarking technique accepts A/D-D/A conversion and a lot of lossy compression such as MP3, AAC, WMA and Real Audio. Watermark robustness is enabled using group quantization, selecting watermark inserting point, and error correction code. Test results show that the correct extraction is about $90\%$ and SNR is above $50\~60dB$. The above figures means that the proposed technique is able to extract encoded information at least one more times per audio and that it is very difficult to discriminate between a watermarked audio and a original audio.

  • PDF