• Title/Summary/Keyword: audio identification

Search Result 50, Processing Time 0.022 seconds

The Effect of Visual Cues in the Identification of the English Consonants /b/ and /v/ by Native Korean Speakers (한국어 화자의 영어 양순음 /b/와 순치음 /v/ 식별에서 시각 단서의 효과)

  • Kim, Yoon-Hyun;Koh, Sung-Ryong;Valerie, Hazan
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.25-30
    • /
    • 2012
  • This study investigated whether native Korean listeners could use visual cues for the identification of the English consonants /b/ and /v/. Both auditory and audiovisual tokens of word minimal pairs in which the target phonemes were located in word-initial or word-medial position were used. Participants were instructed to decide which consonant they heard in $2{\times}2$ conditions: cue (audio-only, audiovisual) and location (word-initial, word-medial). Mean identification scores were significantly higher for audiovisual than audio-only condition and for word-initial than word-medial condition. Also, according to signal detection theory, sensitivity, d', and response bias, c were calculated based on both hit rates and false alarm rates. The measures showed that the higher identification rate in the audiovisual condition was related with an increase in sensitivity. There were no significant differences in response bias measures across conditions. This result suggests that native Korean speakers can use visual cues while identifying confusing non-native phonemic contrasts. Visual cues can enhance non-native speech perception.

Improvement of Reliability based Information Integration in Audio-visual Person Identification (시청각 화자식별에서 신뢰성 기반 정보 통합 방법의 성능 향상)

  • Tariquzzaman, Md.;Kim, Jin-Young;Hong, Joon-Hee
    • MALSORI
    • /
    • no.62
    • /
    • pp.149-161
    • /
    • 2007
  • In this paper we proposed a modified reliability function for improving bimodal speaker identification(BSI) performance. The convectional reliability function, used by N. Fox[1], is extended by introducing an optimization factor. We evaluated the proposed method in BSI domain. A BSI system was implemented based on GMM and it was tested using VidTIMIT database. Through speaker identification experiments we verified the usefulness of our proposed method. The experiments showed the improved performance, i.e., the reduction of error rate by 39%.

  • PDF

Speaker Change Detection Based on a Graph-Partitioning Criterion

  • Seo, Jin-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.80-85
    • /
    • 2011
  • Speaker change detection involves the identification of time indices of an audio stream, where the identity of the speaker changes. In this paper, we propose novel measures for the speaker change detection based on a graph-partitioning criterion over the pairwise distance matrix of feature-vector stream. Experiments on both synthetic and real-world data were performed and showed that the proposed approach yield promising results compared with the conventional statistical measures.

Development of Subwoofer for Car Audio System (자동차 오디오용 서브우퍼 개발)

  • Park, Seok-Tae
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2004.11a
    • /
    • pp.166-169
    • /
    • 2004
  • In this paper, computational analysis and experiments of subwoofer for car audio speaker system were performed and discussed to analyze acoustical phenomena for subwoofer. Ported enclosure system with subwoofer were manufactured and provided for test and simulation purposes. Subwoofer with single voice coil and double voice coil were identified by linear and nonlinear parameter identification method for loudspeaker parameters. For high power inputs to subwoofer, sound pressure levels were compared according to input powers with linear and nonlinear loudspeaker models. For subwoofer system with high power nonlinear speaker model was showed to be adequate to describe the behaviour of loudspeaker.

  • PDF

Audio Fingerprinting Using a Robust Hash Function Based on the MCLT Peak-Pair (MCLT 피크쌍 기반의 강인한 해시 함수를 이용한 오디오 핑거프린팅)

  • Lee, Jun-Yong;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.2
    • /
    • pp.157-162
    • /
    • 2015
  • In this paper, we propose an audio fingerprinting using robust hash based on the MCLT (Modulated Complex Lapped Transform) peak-pair. In existing methods, the robust audio fingerprinting is not generated if various distortions occurred; time-scaling, pith-shifting and equalization. To solve this problem, we used the spectrum of the MCLT, an adaptive thresholding method for detection of prominent peaks and the novel hash function in the audio fingerprinting. Experimental results show that the proposed method is highly robust in various distorted environments and achieves better identification rates compared to other methods.

Hand-held Multimedia Device Identification Based on Audio Source (음원을 이용한 멀티미디어 휴대용 단말장치 판별)

  • Lee, Myung Hwan;Jang, Tae Ung;Moon, Chang Bae;Kim, Byeong Man;Oh, Duk-Hwan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.19 no.2
    • /
    • pp.73-83
    • /
    • 2014
  • Thanks to the development of diverse audio editing Technology, audio file can be easily revised. As a result, diverse social problems like forgery may be caused. Digital forensic technology is actively studied to solve these problems. In this paper, a hand-held device identification method, an area of digital forensic technology is proposed. It uses the noise features of devices caused by the design and the integrated circuit of each device but cannot be identified by the audience. Wiener filter is used to get the noise sounds of devices and their acoustic features are extracted via MIRtoolbox and then they are trained by multi-layer neural network. To evaluate the proposed method, we use 5-fold cross-validation for the recorded data collected from 6 mobile devices. The experiments show the performance 99.9%. We also perform some experiments to observe the noise features of mobile devices are still useful after the data are uploaded to UCC. The experiments show the performance of 99.8% for UCC data.

A Study on the Variable Transmission of xHE-AAC Audio Frame (xHE-AAC 오디오 프레임의 가변 전송에 관한 연구)

  • Lee, Bongho;Yang, Kyutae;Lim, Hyoungsoo;Hur, Namho
    • Journal of Broadcast Engineering
    • /
    • v.21 no.3
    • /
    • pp.357-368
    • /
    • 2016
  • In DAB+, HE-AAC v2 codec is applied for the fixed rate transmission of audio stream. In case that xHE-AAC codec including USAC, a more efficiency is expected when the variable frame is used in a given same bandwidth compared to the fixed frame transmission. For this to be realized, audio streams need to be multiplexed in a sub-channel before transmission, then a method is required to identify the border of each audio frames. In this paper, the toggled sync byte and additional identification field being sequentially placed between AU borders are proposed in order to deal with the AU border identification. In addition, the Reed-Solomon based error correction code which is compliant to DAB+ is proposed.

Comparison of McGurk Effect across Three Consonant-Vowel Combinations in Kannada

  • Devaraju, Dhatri S;U, Ajith Kumar;Maruthy, Santosh
    • Journal of Audiology & Otology
    • /
    • v.23 no.1
    • /
    • pp.39-48
    • /
    • 2019
  • Background and Objectives: The influence of visual stimulus on the auditory component in the perception of auditory-visual (AV) consonant-vowel syllables has been demonstrated in different languages. Inherent properties of unimodal stimuli are known to modulate AV integration. The present study investigated how the amount of McGurk effect (an outcome of AV integration) varies across three different consonant combinations in Kannada language. The importance of unimodal syllable identification on the amount of McGurk effect was also seen. Subjects and Methods: Twenty-eight individuals performed an AV identification task with ba/ga, pa/ka and ma/ṇa consonant combinations in AV congruent, AV incongruent (McGurk combination), audio alone and visual alone condition. Cluster analysis was performed using the identification scores for the incongruent stimuli, to classify the individuals into two groups; one with high and the other with low McGurk scores. The differences in the audio alone and visual alone scores between these groups were compared. Results: The results showed significantly higher McGurk scores for ma/ṇa compared to ba/ga and pa/ka combinations in both high and low McGurk score groups. No significant difference was noted between ba/ga and pa/ka combinations in either group. Identification of /ṇa/ presented in the visual alone condition correlated negatively with the higher McGurk scores. Conclusions: The results suggest that the final percept following the AV integration is not exclusively explained by the unimodal identification of the syllables. But there are other factors which may also contribute to making inferences about the final percept.

Comparison of McGurk Effect across Three Consonant-Vowel Combinations in Kannada

  • Devaraju, Dhatri S;U, Ajith Kumar;Maruthy, Santosh
    • Korean Journal of Audiology
    • /
    • v.23 no.1
    • /
    • pp.39-48
    • /
    • 2019
  • Background and Objectives: The influence of visual stimulus on the auditory component in the perception of auditory-visual (AV) consonant-vowel syllables has been demonstrated in different languages. Inherent properties of unimodal stimuli are known to modulate AV integration. The present study investigated how the amount of McGurk effect (an outcome of AV integration) varies across three different consonant combinations in Kannada language. The importance of unimodal syllable identification on the amount of McGurk effect was also seen. Subjects and Methods: Twenty-eight individuals performed an AV identification task with ba/ga, pa/ka and ma/ṇa consonant combinations in AV congruent, AV incongruent (McGurk combination), audio alone and visual alone condition. Cluster analysis was performed using the identification scores for the incongruent stimuli, to classify the individuals into two groups; one with high and the other with low McGurk scores. The differences in the audio alone and visual alone scores between these groups were compared. Results: The results showed significantly higher McGurk scores for ma/ṇa compared to ba/ga and pa/ka combinations in both high and low McGurk score groups. No significant difference was noted between ba/ga and pa/ka combinations in either group. Identification of /ṇa/ presented in the visual alone condition correlated negatively with the higher McGurk scores. Conclusions: The results suggest that the final percept following the AV integration is not exclusively explained by the unimodal identification of the syllables. But there are other factors which may also contribute to making inferences about the final percept.

Audio Fingerprint Based on Combining Binary Fingerprints (이진 핑거프린트의 결합에 의한 강인한 오디오 핑거프린트)

  • Jang, Dal-Won;Lee, Seok-Pil
    • Journal of Broadcast Engineering
    • /
    • v.17 no.4
    • /
    • pp.659-669
    • /
    • 2012
  • This paper proposes the method to extract a binary audio fingerprint by combining several base binary fingerprints. Based on majority voting of base fingerprints, which are designed by mimicking the fingerprint used in Philips fingerprinting system, the proposed fingerprint is determined. In the matching part, the base fingerprints are extracted from the query, and distance is computed using the sum of them. In the experiments, the proposed fingerprint outperforms the base binary fingerprints. The method can be used for enhancing the existing binary fingerprint or for designing a new fingerprint.