Search | Korea Science

Retrieval of Broadcast News Using Audio Content Analysis

Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.3E
- /
- pp.74-79
- /
- 2007
In this paper, we report our recent work on a indexing and retrieval system of broadcast news using audio content analysis. Key issues addressed in this work are two major parts of the audio indexing system: anchorperson detection based on audio segmentation, and phone-based spoken document retrieval, developed in the framework of the emerging MPEG-7 standard. Experiments are conducted on a database of Britisch broadcast news videos. We discuss the development of the retrieval system, and the evaluation of each part and the retrieval system.
PDF KSCI

A Study on Setting the Minimum and Maximum Distances for Distance Attenuation in MPEG-I Immersive Audio

Lee, Yong Ju;Yoo Jae-hyoun;Jang, Daeyoung;Kang, Kyeongok;Lee, Taejin
- Journal of Broadcast Engineering
- /
- v.27 no.7
- /
- pp.974-984
- /
- 2022
In this paper, we introduce the minimum and maximum distance setting methods used in geometric distance attenuation processing, which is one of spatial sound reproduction methods. In general, sound attenuation by distance is inversely proportional to distance, that is 1/r law, but when the relative distance between the user and the audio object is very short or long, exceptional processing might be performed by setting the minimum distance or the maximum distance. While MPEG-I Immersive Audio's RM0 uses fixed values for the minimum and maximum distances, this study proposes effective methods for setting the distances considering the signal gain of an audio object. Proposed methods were verified through simulation of the proposed methods and experiments using RM0 renderer.
https://doi.org/10.5909/JBE.2022.27.7.974 인용 PDF KSCI KPUBS

A Novel Audio Watermarking Algorithm for Copyright Protection of Digital Audio

Seok, Jong-Won;Hong, Jin-Woo;Kim, Jin-Woong
- ETRI Journal
- /
- v.24 no.3
- /
- pp.181-189
- /
- 2002
Digital watermark technology is now drawing attention as a new method of protecting digital content from unauthorized copying. This paper presents a novel audio watermarking algorithm to protect against unauthorized copying of digital audio. The proposed watermarking scheme includes a psychoacoustic model of MPEG audio coding to ensure that the watermarking does not affect the quality of the original sound. After embedding the watermark, our scheme extracts copyright information without access to the original signal by using a whitening procedure for linear prediction filtering before correlation. Experimental results show that our watermarking scheme is robust against common signal processing attacks and it introduces no audible distortion after watermark insertion.
PDF

Constructing a Noise-Robust Speech Recognition System using Acoustic and Visual Information (청각 및 시가 정보를 이용한 강인한 음성 인식 시스템의 구현)

Lee, Jong-Seok;Park, Cheol-Hoon
- Journal of Institute of Control, Robotics and Systems
- /
- v.13 no.8
- /
- pp.719-725
- /
- 2007
In this paper, we present an audio-visual speech recognition system for noise-robust human-computer interaction. Unlike usual speech recognition systems, our system utilizes the visual signal containing speakers' lip movements along with the acoustic signal to obtain robust speech recognition performance against environmental noise. The procedures of acoustic speech processing, visual speech processing, and audio-visual integration are described in detail. Experimental results demonstrate the constructed system significantly enhances the recognition performance in noisy circumstances compared to acoustic-only recognition by using the complementary nature of the two signals.
https://doi.org/10.5302/J.ICROS.2007.13.8.719 인용 PDF KSCI

Implementation of the Audio CODEC for Digital Audio Broadcasting Service (디지털 오디오 방송 서비스를 위한 오디오 코덱의 구현)

장대영;홍진우
- Journal of Broadcast Engineering
- /
- v.6 no.1
- /
- pp.66-71
- /
- 2001
This paper Introduces an implementation of MPEG-2 AAC codec system for digital audio broadcasting. This system consists of the encoder and the decoder. This system includes MPEG-2 system multiplexing and demultiplexing modules for Interfacing to the ETRI-DAB system. Four DSPs are adopted for the encoder and three DSPs for 7he decoder. Each DSP Processes system control. 1/0 control, audio signal processing. multiplexing and demultiplexing. This Paper also discusses some near future estimations relaxed to the DAB system and it\`s services. Currently a stereo audio codec is available but multi-channel audio codec and MPEG-4 audio cosec wall be also Implemented.
PDF

Improving Low Frequency Signal Reproduction in TV Audio (TV 스피커의 저주파수 신호 재생 개선)

Arora Manish;Oh Yoonhark;Kim SeoungHun;Lee Hyuckjae;Jang Seongcheol
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.275-278
- /
- 2004
In TV sound system, loudspeakers are subject to severe size constraints. The small size of the transducer affects the low frequency signal performance of the system. Bass signal performance contributes significantly to the user perceived sound quality and a good bass signal reproduction is essential. Increasing the sound energy in the bass signal range is an unviable solution since the gain required are exceedingly high and signal distortion occurs because of the speaker overload. Recently methods are being proposed to invoke low frequency illusion using psychoacoustic phenomena of the missing fundamental. This paper proposes a simple and effective signal processing method to create bass signal illusion in TV speakers using the missing fundamental effect, at a complexity of 12 MIPS on Motorola 56371 audio DSP.
PDF

A Blind Audio Watermarking using the Tonal Characteristic (토널 특성을 이용한 브라인드 오디오 워터마킹)

이희숙;이우선
- Journal of Korea Multimedia Society
- /
- v.6 no.5
- /
- pp.816-823
- /
- 2003
In this paper, we propose a blind audio watermarking using the tonal characteristic. First, we explain the perceptional effect of tonal on the existed researches and shout the experimental result that tonal characteristic is more stable than other characteristics used in previous watermarking studies against several signal processing. On the base of the result, we propose the blind audio watermarking using the relation among the signals on the frequency domain which compose a tonal masker. To evaluate the sound quality of our watermarked audios, we used the SDG(Subjective Diff-Grades) and got the average SDG 0.27. This result says the watermarking using the perceptional effect of tonal is available from the viewpoint of non-perception. And we detected the watermark hits from the watermarked audios which were changed by several signal processing and the detection ratios with exception of the time shift processing were over 98%. About the time shift processing, we applied the new method that searched the most proper position on the time domain and then detected the watermark bits by the ratio of 90%.
PDF

The Digital Redundancy Design for Back-up Mode Operation of Aviation Intercom (항공용 인터콤의 백업 모드 운용을 위한 디지털 방식의 이중화 설계)

Jeong, Seong-jae;Cho, Kyung-hak;Kim, Dong-hyouk;Lee, Seong-woo
- Journal of Advanced Navigation Technology
- /
- v.26 no.5
- /
- pp.358-364
- /
- 2022
The Inter Communication System for avionics is in charge of processing all voice signals that internal calls between Pilot and Co-pilot, internal calls between Pilots and Crews, external calls through communication equipment such as Ultra/Very High Frequency Receiver/Transmitter(U/VHF RT), audio signal monitoring for navigation and mission equipment such as VHF Omnidirectional Range/Instrument Landing System(VOR/ILS), Tactical Air Navigation(TACAN), audio signal output for voice recording to Flight Data Recorder(FDR) and Data Transfer System(DTS), and warning/caution audio signal generate about the status and threat of aircraft. Because Inter Communication System for avionics is sensitive to noise in the case of analog audio signals, a redundant design that can protect audio signal from electromagnetic noise inside/outside of aircraft is required for the mission of pilots and crews. In this paper, Normal/Back-up operation mode and redundancy design plan based on digital method for the redundancy of the digital Inter Communication System for avionics and manufacturing, verification results are described.
https://doi.org/10.12673/jant.2022.26.5.358 인용 PDF KSCI HTML

Finite Alphabet Control and Estimation

Goodwin, Graham C.;Quevedo, Daniel E.
- International Journal of Control, Automation, and Systems
- /
- v.1 no.4
- /
- pp.412-430
- /
- 2003
In many practical problems in signal processing and control, the signal values are often restricted to belong to a finite number of levels. These questions are generally referred to as "finite alphabet" problems. There are many applications of this class of problems including: on-off control, optimal audio quantization, design of finite impulse response filters having quantized coefficients, equalization of digital communication channels subject to intersymbol interference, and control over networked communication channels. This paper will explain how this diverse class of problems can be formulated as optimization problems having finite alphabet constraints. Methods for solving these problems will be described and it will be shown that a semi-closed form solution exists. Special cases of the result include well known practical algorithms such as optimal noise shaping quantizers in audio signal processing and decision feedback equalizers in digital communication. Associated stability questions will also be addressed and several real world applications will be presented.
PDF KSCI

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

Liu, Min;Tang, Jun
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.754-771
- /
- 2021
In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.
https://doi.org/10.3745/JIPS.02.0161 인용 PDF KSCI

Search Result 157, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)