• Title/Summary/Keyword: Audio processing

Search Result 460, Processing Time 0.024 seconds

Interval-based Audio Integrity Authentication Algorithm using Reversible Watermarking (가역 워터마킹을 이용한 구간 단위 오디오 무결성 인증 알고리즘)

  • Yeo, Dong-Gyu;Lee, Hae-Yeoun
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.9-18
    • /
    • 2012
  • Many audio watermarking researches which have been adapted to authenticate contents can not recover the original media after watermark removal. Therefore, reversible watermarking can be regarded as an effective method to ensure the integrity of audio data in the applications requiring high-confidential audio contents. Reversible watermarking inserts watermark into digital media in such a way that perceptual transparency is preserved, which enables the restoration of the original media from the watermarked one without any loss of media quality. This paper presents a new interval-based audio integrity authentication algorithm which can detect malicious tampering. To provide complete reversibility, we used differential histogram-based reversible watermarking. To authenticate audio in parts, not the entire audio at once, the proposed algorithm processes audio by dividing into intervals and the confirmation of the authentication is carried out in each interval. Through experiments using multiple kinds of test data, we prove that the presented algorithm provides over 99% authenticating rate, complete reversibility, and higher perceptual quality, while maintaining the induced-distortion low.

Design and Implementation of Emergency Recognition System based on Multimodal Information (멀티모달 정보를 이용한 응급상황 인식 시스템의 설계 및 구현)

  • Kim, Eoung-Un;Kang, Sun-Kyung;So, In-Mi;Kwon, Tae-Kyu;Lee, Sang-Seol;Lee, Yong-Ju;Jung, Sung-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.2
    • /
    • pp.181-190
    • /
    • 2009
  • This paper presents a multimodal emergency recognition system based on visual information, audio information and gravity sensor information. It consists of video processing module, audio processing module, gravity sensor processing module and multimodal integration module. The video processing module and gravity sensor processing module respectively detects actions such as moving, stopping and fainting and transfer them to the multimodal integration module. The multimodal integration module detects emergency by fusing the transferred information and verifies it by asking a question and recognizing the answer via audio channel. The experiment results show that the recognition rate of video processing module only is 91.5% and that of gravity sensor processing module only is 94%, but when both information are combined the recognition result becomes 100%.

A Study on Immersive Audio Improvement of FTV using an effective noise (유효 잡음을 활용한 FTV 입체음향 개선방안 연구)

  • Kim, Jong-Un;Cho, Hyun-Seok;Lee, Yoon-Bae;Yeo, Sung-Dae;Kim, Seong-Kweon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.2
    • /
    • pp.233-238
    • /
    • 2015
  • In this paper, we proposed that immersive audio effect method using the effective noise to improve engagement in free-viewpoint TV(FTV) service. In the basketball court, we monitored the frequency spectrums by acquiring continuous audio data of players and referee using shotgun and wireless microphone. By analyzing this spectrum, in case that users zoomed in, we determined whether it is effective frequency or not. Therefore when users using FTV service zoom in toward the object, it is proposed that we need to utilize unnecessary noise instead of removing that. it will be able to be useful for an immersive audio implementation of FTV.

Audio Watermarking through Modification of Tonal Maskers

  • Lee, Hee-Suk;Lee, Woo-Sun
    • ETRI Journal
    • /
    • v.27 no.5
    • /
    • pp.608-616
    • /
    • 2005
  • Watermarking has become a technology of choice for a broad range of multimedia copyright protection applications. This paper proposes an audio watermarking scheme that uses the modified tonal masker as an embedding carrier for imperceptible and robust audio watermarking. The method of embedding is to select one of the tonal maskers using a secret key, and to then modify the frequency signals that consist of the tonal masker without changing the sound pressure level. The modified tonal masker can be found using the same secret key without the original sound, and the embedded information can be extracted. The results show that the frequency signals are stable enough to keep embedded watermarks against various common signal processing types, while at the same time the proposed scheme has a robust performance.

  • PDF

A public key audio watermarking using patchwork algorithm

  • Hong, Doo-Gun;Park, Se-Hyoung;Jaeho Shin
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.160-163
    • /
    • 2002
  • This paper presents a statistical technique for audio watermarking. We describe the application of the promising public key watermarking method to the patchwork algorithm. Its detection process does not need the original content nor the secret key used in the embedding process. Special attention is given to statistical method working in the frequency domain. We will present a solution of robust watermarking of audio data. In this scheme, an extension of patchwork audio watermarking is presented which enables public detection of the watermark. Experimental results show good robustness of the approach against MP3 compression and other common signal processing manipulations.

  • PDF

An Improved Digital Filter Design for the DSD Encoder with Multi-rate PCM Input (다중 표본화율의 PCM 입력을 위한 개선된 DSD 인코더용 디지털 필털 설계)

  • Moon, Dong-Wook;Kim, Lark-Kyo
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.358-360
    • /
    • 2005
  • The DSD(Direct Stream Digital) encoder, which is a standard for SACD(Super Audio Compact Disc) proposed by Sony and philips, uses 1 bit representation with a sampling frequency of 2.8224MHz (64${\times}$44.1kHz). For multi-rate PCM (Pulse Code Modulation) input such as 8${\sim}$192kHz, a external sample-rate converter is necessary to the DSD encoder. This paper has been proposed a digital mter structure composed of sample-rate converter and interpolaton filter for the DSD encoder with multi-rate (8${\sim}$192kHz) PCM input, without a external sample-rate converter.

  • PDF

Comparisions of stream activation mechanisms in computer based teleconferencing systems for low delay (지연 축소를 위한 컴퓨터 영상회의 시스템의 시트림 동작 구조 비교)

  • Lee, Gyeong-Hui;Kim, Du-Hyeon;Gang, Min-Gyu;Jeong, Chan-Geun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.363-376
    • /
    • 1997
  • In this paper, we present a hardware architecture and a sofrware architecture for cimputer based teleconferencing systems.And also we analyse stream adtivation mechanisms for them form the viewpoint of delay. MuX that is a multimedia I/O server provides various processing elements for data I/O, synchronization, interleaving and mixing.We describe methods to build teleconferencing systems with the elements and compares the technique using master click with the techniquie using self clock.In the plase of dta input.the technique using self click is berrer than the technique using master clock.When we generate interleved stream from audio and video stream and activate channel objects by periodic audio stream as activation clock, dealy from imput audio stream to imterleved stream is reduced but delay for video stream is not reduced as much as in the case of audio stream.

  • PDF

On-Line Audio Genre Classification using Spectrogram and Deep Neural Network (스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류)

  • Yun, Ho-Won;Shin, Seong-Hyeon;Jang, Woo-Jin;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.21 no.6
    • /
    • pp.977-985
    • /
    • 2016
  • In this paper, we propose a new method for on-line genre classification using spectrogram and deep neural network. For on-line processing, the proposed method inputs an audio signal for a time period of 1sec and classifies its genre among 3 genres of speech, music, and effect. In order to provide the generality of processing, it uses the spectrogram as a feature vector, instead of MFCC which has been widely used for audio analysis. We measure the performance of genre classification using real TV audio signals, and confirm that the proposed method has better performance than the conventional method for all genres. In particular, it decreases the rate of classification error between music and effect, which often occurs in the conventional method.

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks (신경망 기반 음성, 영상 및 문맥 통합 음성인식)

  • 김명원;한문성;이순신;류정우
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.3
    • /
    • pp.67-77
    • /
    • 2004
  • The recent research has been focused on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network(BMNN) is a multi-layer perception of 4 layers, each of which performs a certain level of abstraction of input features. In BMNN the third layer combines audio md visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition. Our research demonstrates that diverse sources of information need to be integrated to improve the accuracy of speech recognition particularly in noisy environments.

Effect on Audio Play Latency for Real-Time HMD-Based Headphone Listening (HMD를 이용한 오디오 재생 기술에서 Latency의 영향 분석)

  • Son, Sangmo;Jo, Hyun;Kim, Sunmin
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2014.10a
    • /
    • pp.141-145
    • /
    • 2014
  • A minimally appropriate time delay of audio data processing is investigated for rendering virtual sound source direction in real-time head-tracking environment under headphone listening. Less than 3.7 degree of angular mismatch should be maintained in order to keep desired sound source directions in virtually fixed while listeners are rotating their head in a horizontal plane. The angular mismatch is proportional to speed of head rotation and data processing delay. For 20 degree/s head rotation, which is a relatively slow head-movement case, less than total of 63ms data processing delay should be considered.

  • PDF