Search | Korea Science

Constructing a Noise-Robust Speech Recognition System using Acoustic and Visual Information (청각 및 시가 정보를 이용한 강인한 음성 인식 시스템의 구현)

Lee, Jong-Seok;Park, Cheol-Hoon
- Journal of Institute of Control, Robotics and Systems
- /
- v.13 no.8
- /
- pp.719-725
- /
- 2007
In this paper, we present an audio-visual speech recognition system for noise-robust human-computer interaction. Unlike usual speech recognition systems, our system utilizes the visual signal containing speakers' lip movements along with the acoustic signal to obtain robust speech recognition performance against environmental noise. The procedures of acoustic speech processing, visual speech processing, and audio-visual integration are described in detail. Experimental results demonstrate the constructed system significantly enhances the recognition performance in noisy circumstances compared to acoustic-only recognition by using the complementary nature of the two signals.
https://doi.org/10.5302/J.ICROS.2007.13.8.719 인용 PDF KSCI

A Digital Audio Watermark Using Wavelet Transform and Masking Effect (웨이브릿과 마스킹 효과를 이용한 디지털 오디오 워터마킹)

Hwang, Won-Young;Kang, Hwan-Il;Han, Seung-Soo;Kim, Kab-Il;Kang, Hwan-Soo
- Proceedings of the IEEK Conference
- /
- 2003.11b
- /
- pp.243-246
- /
- 2003
In this paper, we propose a new digital audio watermarking technique with the wavelet transform. The watermark is embedded by eliminating unnecessary information of audio signal based on human auditory system (HAS). This algorithm is an audio watermarking method, which does not require any original audio information in watermark extraction process. In this paper, the masking effect is used for audio watermarking, that is, post-tempera] masking effect. We construct the window with the synchronization signal and we extract the best frame in the window by using the zero-crossing rate (ZCR) and the energy of the audio signal. The watermark may be extracted by using the correlation of the watermark signal and the portion of the frame. Experimental results show good robustness against MPEG1-layer3 compression and other common signal processing manipulations. All the attacks are made after the D/A/D conversion.
PDF

Compensation of the Non-linearity of the Audio Power Amplifier Converged with Digital Signal Processing Technic (디지털 신호 처리 기술을 융합한 음향 전력 증폭기의 비선형 보상)

Eun, Changsoo;Lee, Yu-chil
- Journal of the Korea Convergence Society
- /
- v.7 no.3
- /
- pp.77-85
- /
- 2016
We propose a digital signal processing technic that can compensate the non-linearity inherent in audio amplifiers, and present the result of the simulation. The inherent non-linearity of the audio power amplifier arising from analog devices is compensated via a digital signal processing technic consisting of indirect learning architecture and an adaptive filter. The simulation results show that the compensator can be realized using a third-order polynomial and compensates odd-order non-linearity efficiently. The even-oder non-linearity is mainly due to the dc offset at the output, which is difficult to eliminate with the proposed method. Care must be taken in designing the bias circuit to avoid the DC offset at the output. The proposed technic has significance in that digital signal processing technic can compensate for the impairment that is an inherent characteristic of an analog system.
https://doi.org/10.15207/JKCS.2016.7.3.077 인용 PDF KSCI

Audio Enhancement Algorithm Using Adaptive Perceptual Filter (적응 지각 필터를 이용한 오디오 음질 개선 알고리즘)

엄혜영;한헌수;홍민철;차형태
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.8
- /
- pp.687-693
- /
- 2003
In this paper, a new adaptive audio signal enhancement algorithm is proposed. In order to remove a broadband noise from a noisy signal, a filter is designed and applied adaptively to noisy audio signal. The noisy signal is first transformed to frequency domain and divided into bark domain to calculate excitation energy. A filter will be calculated to eliminate the noise by using the excitation energy and noisy energy which is obtained from a silent area. The filter is adaptively adjusted and continuously applied until the threshold point is met. The algorithm also works well even though the noise's energy change all of a sudden. SNR, NMR comparison and MOS Test are performed to show the effectiveness of the proposed algorithm.
PDF KSCI

DCT and DWT Based Robust Audio Watermarking Scheme for Copyright Protection

Deb, Kaushik;Rahman, Md. Ashikur;Sultana, Kazi Zakia;Sarker, Md. Iqbal Hasan;Chong, Ui-Pil
- Journal of the Institute of Convergence Signal Processing
- /
- v.15 no.1
- /
- pp.1-8
- /
- 2014
Digital watermarking techniques are attracting attention as a proper solution to protect copyright for multimedia data. This paper proposes a new audio watermarking method based on Discrete Cosine Transformation (DCT) and Discrete Wavelet Transformation (DWT) for copyright protection. In our proposed watermarking method, the original audio is transformed into DCT domain and divided into two parts. Synchronization code is applied on the signal in first part and 2 levels DWT domain is applied on the signal in second part. The absolute value of DWT coefficient is divided into arbitrary number of segments and calculates the energy of each segment and middle peak. Watermarks are then embedded into each middle peak. Watermarks are extracted by performing the inverse operation of watermark embedding process. Experimental results show that the hidden watermark data is robust to re-sampling, low-pass filtering, re-quantization, MP3 compression, cropping, echo addition, delay, and pitch shifting, amplitude change. Performance analysis of the proposed scheme shows low error probability rates.
PDF KSCI

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

Liu, Min;Tang, Jun
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.754-771
- /
- 2021
In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.
https://doi.org/10.3745/JIPS.02.0161 인용 PDF KSCI

A Design of real sound recommendation service based-on User's preference, emotion and circumstance (사용자 취향, 감성 및 상황인지 기반 음원 추천 서비스 구현)

Jung, Jong-Jin;Lim, Tae-Beom;Lee, Seok-Pil
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.04a
- /
- pp.689-691
- /
- 2011
Due to the rapid development of Information and communication, the technology of multimedia presentation technology is evolving into the service that user can actively, realistically enjoy and play based on user's preference and taste not only for User's passive service. Especially, the industry related the realistic multimedia service that supports targeting Human emotion with the property of Human hearing is expected to be formed of the high value-added premium market. Audio technology is affected on human's emotion and the viewing environment around than video technology. Also the audio technology compared to video technology is a research part that appeals to human emotion and emphasize on psychological aspects. With this viewpoint, the development of intelligent and realistic audio technology needs highly specialty. In this study, "intelligent real-sound presentation technology" that support high quality and realistic audio and the "core technologies" that are composing of this will be introduced.
https://doi.org/10.3745/PKIPS.y2011m04a.689 인용 PDF

DNN based Speech Detection for the Media Audio (미디어 오디오에서의 DNN 기반 음성 검출)

Jang, Inseon;Ahn, ChungHyun;Seo, Jeongil;Jang, Younseon
- Journal of Broadcast Engineering
- /
- v.22 no.5
- /
- pp.632-642
- /
- 2017
In this paper, we propose a DNN based speech detection system using acoustic characteristics and context information of media audio. The speech detection for discriminating between speech and non-speech included in the media audio is a necessary preprocessing technique for effective speech processing. However, since the media audio signal includes various types of sound sources, it has been difficult to achieve high performance with the conventional signal processing techniques. The proposed method improves the speech detection performance by separating the harmonic and percussive components of the media audio and constructing the DNN input vector reflecting the acoustic characteristics and context information of the media audio. In order to verify the performance of the proposed system, a data set for speech detection was made using more than 20 hours of drama, and an 8-hour Hollywood movie data set, which was publicly available, was further acquired and used for experiments. In the experiment, it is shown that the proposed system provides better performance than the conventional method through the cross validation for two data sets.
https://doi.org/10.5909/JBE.2017.22.5.632 인용 PDF KSCI KPUBS

Representative Melodies Retrieval using Waveform and FFT Analysis of Audio (오디오의 파형과 FFT 분석을 이용한 대표 선율 검색)

Chung, Myoung-Bum;Ko, Il-Ju
- Journal of KIISE:Software and Applications
- /
- v.34 no.12
- /
- pp.1037-1044
- /
- 2007
Recently, we extract the representative melody of the music and index the music to reduce searching time at the content-based music retrieval system. The existing study has used MIDI data to extract a representative melody but it has a weak point that can use only MIDI data. Therefore, this paper proposes a representative melody retrieval method that can be use at all audio file format and uses digital signal processing. First, we use Fast Fourier Transform (FFT) and find the tempo and node for the representative melody retrieval. And we measure the frequency of high value that appears from PCM Data of each node. The point which the high value is gathering most is the starting point of a representative melody and an eight node from the starting point is a representative melody section of the audio data. To verity the performance of the method, we chose a thousand of the song and did the experiment to extract a representative melody from the song. In result, the accuracy of the extractive representative melody was 79.5% among the 737 songs which was found tempo.
PDF KSCI

Interval-based Audio Integrity Authentication Algorithm using Reversible Watermarking (가역 워터마킹을 이용한 구간 단위 오디오 무결성 인증 알고리즘)

Yeo, Dong-Gyu;Lee, Hae-Yeoun
- The KIPS Transactions:PartB
- /
- v.19B no.1
- /
- pp.9-18
- /
- 2012
Many audio watermarking researches which have been adapted to authenticate contents can not recover the original media after watermark removal. Therefore, reversible watermarking can be regarded as an effective method to ensure the integrity of audio data in the applications requiring high-confidential audio contents. Reversible watermarking inserts watermark into digital media in such a way that perceptual transparency is preserved, which enables the restoration of the original media from the watermarked one without any loss of media quality. This paper presents a new interval-based audio integrity authentication algorithm which can detect malicious tampering. To provide complete reversibility, we used differential histogram-based reversible watermarking. To authenticate audio in parts, not the entire audio at once, the proposed algorithm processes audio by dividing into intervals and the confirmation of the authentication is carried out in each interval. Through experiments using multiple kinds of test data, we prove that the presented algorithm provides over 99% authenticating rate, complete reversibility, and higher perceptual quality, while maintaining the induced-distortion low.
https://doi.org/10.3745/KIPSTB.2012.19B.1.009 인용 PDF KSCI

Search Result 459, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)