• Title/Summary/Keyword: 오디오신호

Search Result 435, Processing Time 0.024 seconds

Reduction of Computation Algorithm for Adaptive Perceptual Filter Using Enhanced Noise Estimation (향상된 잡음 추정을 이용한 적응 지각필터의 연산량 개선 알고리즘)

  • Seo, Bo-Kug;Cha, Hyung-Tai;Ryu, Il-Hyun;Koo, Kyo-Sik
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2005.11a
    • /
    • pp.264-267
    • /
    • 2005
  • 본 논문에서는 매 프레임 단위로 노이즈를 추정하는 방법을 적용하는 전처리 기법을 이용하여 적응 지각필터의 연산량을 개선하는 알고리즘을 제안한다. 제안된 전처리 잡음 주정 알고리즘은 잡음에 열화 된 대역으로부터 잡음을 추정하여 적응 지각필터에 적용함으로써 연산량 개선과 동시에 오디오 신호의 음질을 개선하는 알고리즘이다. 이는 처리되는 신호 구간에 따라 잡음에 열화 된 대역으로부터 잡음을 추정함으로써 초기 추정 잡음에 보다 가까운 추정 잡음을 얻을 수 있다. 결과적으로 적응 지각필터의 연산량을 효과적으로 줄일 수 있다. 성능 평가를 위하여 지각필터의 적용 결과와 제안한 알고리즘의 적용 결과로 얻어진 개선 신호의 SSNR, NMR의 비교와 적응 지각필터 적용 횟수, 동작 시간 등을 이용하여 성능의 개선을 확인하다.

  • PDF

A Remote Control Method Using Bluetooth Under Embedded Linux System (블루투스를 이용한 가전기기 원격제어 시스템)

  • 이우중;황우식;김정선
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10c
    • /
    • pp.403-405
    • /
    • 2003
  • 적외선 무선 통신을 이용한 가전기기의 무선 제어는 신호를 증폭하거나 장거리로 제어 신호를 전송할 수 없고, 제어신호가 물리적인 장벽을 투과하지 못하므로 사용자의 이용에 많은 제한이 있다. 또한 무선 주파수(Radio Frequency)를 이용하여 제어 신호를 전송하면 적외선이 가지는 이러한 단점을 해결할 수 있으나 서로 다른 주파수 대역을 사용하므로 호환성과 확장성이 부족하다. 이러한 점을 극복하기 위하여 단거리 무선통신 규약인 블루투스를 이용한 가전기기 원격 제어 시스템을 제안한다. 본 논문에서는 가택이나 사무실에서 쉽게 접할 수 있는 가전기기(오디오)를 대상으로 제어용 클라이언트(Wall plate)와 이로부터 제어신호를 수신하여 가전기기를 제어하는 메인 컨트롤러의 무선 제어 시스템을 제안하고 그 구현을 소개한다.

  • PDF

Audio Watermarking Using Quantization Index Modulation on Significant Peaks in Frequency Domain (주파수 영역에서 주요 피크에 QIM을 적용한 오디오 워터마킹)

  • Kang, Jung-Sun;Cho, Sang-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.303-307
    • /
    • 2011
  • This paper describes an audio watermarking using Quantization Index Modulation (QIM) on significant peaks in frequency domain. The audio signal is broken up into L samples length frames with non-overlapping and rectangular window. The zero-crossing rate of each frame is calculated for decision whether it is proper to be watermarked or not. If the frame is legitimate, frequency magnitude response is computed by discrete Fourier transform. For the QIM, we set the quantization step size based on maximum value of frequency magnitude response and select n significant peaks with w samples around them in frequency domain, totally $n{\times}(w+1)$ samples. Finally, watermark embedding is performed. Decoder extract watermarks based on Euclidean distance, that is a blind detection. The proposed method is robust against many attacks of watermark benchmark.

Polyphonic sound event detection using multi-channel audio features and gated recurrent neural networks (다채널 오디오 특징값 및 게이트형 순환 신경망을 사용한 다성 사운드 이벤트 검출)

  • Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.4
    • /
    • pp.267-272
    • /
    • 2017
  • In this paper, we propose an effective method of applying multichannel-audio feature values to GRNNs (Gated Recurrent Neural Networks) in polyphonic sound event detection. Real life sounds are often overlapped with each other, so that it is difficult to distinguish them by using a mono-channel audio features. In the proposed method, we tried to improve the performance of polyphonic sound event detection by using multi-channel audio features. In addition, we also tried to improve the performance of polyphonic sound event detection by applying a gated recurrent neural network which is simpler than LSTM (Long Short Term Memory), which shows the highest performance among the current recurrent neural networks. The experimental results show that the proposed method achieves better sound event detection performance than other existing methods.

Time-Scale Modification of Polyphonic Audio Signals Using Sinusoidal Modeling (정현파 모델링을 이용한 폴리포닉 오디오 신호의 시간축 변화)

  • 장호근;박주성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.77-85
    • /
    • 2001
  • This paper proposes a method of time-scale modification of polyphonic audio signals based on a sinusoidal model. The signals are modeled with sinusoidal component and noise component. A multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in time-scale modification a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. For extracting sinusoidal components and calculating their parameters matching pursuit algorithm is applied to each analysis frame of subband signal. In accordance with spectrum analysis a psychoacoustic model implementing the effect of frequency masking is incorporated with matching pursuit to provide a resonable stop condition of iteration and reduce the number of sinusoids. The noise component obtained by subtracting the synthesized signal with sinusoidal components from the original signal is modeled by line-segment model of short time spectrum envelope. For various polyphonic audio signals the result of simulation shows suggested sinusoidal modeling can synthesize original signal without loss of perceptual quality and do more robust and high quality time-scale modification for large scale factor because of representing transients without any perceptual loss.

  • PDF

On-Line Audio Genre Classification using Spectrogram and Deep Neural Network (스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류)

  • Yun, Ho-Won;Shin, Seong-Hyeon;Jang, Woo-Jin;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.21 no.6
    • /
    • pp.977-985
    • /
    • 2016
  • In this paper, we propose a new method for on-line genre classification using spectrogram and deep neural network. For on-line processing, the proposed method inputs an audio signal for a time period of 1sec and classifies its genre among 3 genres of speech, music, and effect. In order to provide the generality of processing, it uses the spectrogram as a feature vector, instead of MFCC which has been widely used for audio analysis. We measure the performance of genre classification using real TV audio signals, and confirm that the proposed method has better performance than the conventional method for all genres. In particular, it decreases the rate of classification error between music and effect, which often occurs in the conventional method.

Research on Open Source Encoding Technology for MPEG Unified Speech and Audio Coding (MPEG 통합 음성/오디오 코덱을 위한 오픈 소스 부호화 기술에 관한 연구)

  • Song, Jeongook;Lee, Joonil;Kang, Hong-Goo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.1
    • /
    • pp.86-96
    • /
    • 2013
  • Unified Speech and Audio Coding (USAC) is the speech/audio codec with the best quality, approved on Final Draft International Standard (FDIS) at MPEG meeting in 2011. Since MPEG conventionally standardizes only the decoder, it is not easy to study on the encoder technologies. Furthermore, Reference Model(RM) shows extremely poor performance. To solve these problems, the open source project(JAME) proposes the methods to make the improved performance of main encoder technologies in USAC. Especially, this paper introduces the encoder modules: the signal classifier for selective operation between two coders, the psychoacoustic model in frequency domain, and window transition technology. Finally, the results of verification test for FDIS and the performance of Common Encoder are appended.

Low delay window switching modified discrete cosine transform for speech and audio coder (음성 및 오디오 부호화기를 위한 저지연 윈도우 스위칭 modified discrete cosine transform)

  • Kim, Young-Joon;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.2
    • /
    • pp.110-117
    • /
    • 2018
  • In this paper, we propose a low delay window switching MDCT (Modified Discrete Cosine Transform) method for speech/audio coder. The window switching algorithm is used to reduce the degradation of sound quality in non-stationary trasient duration and to reduce the algorithm delay by using the low delay TDAC (Time Domain Aliasing Cancellation). While the conventional window switching algorithms uses overlap-add with different lengths, the proposed method uses the fixed overlap add length. It results the reduction of algorithm delay by half and 1 bit reduction in frame indication information by using 2 window types. We apply the proposed algorithm to G.729.1 based on MDCT in order to evaluate the performance. The propose method shows the reduction of algorithm delay by half while speech quality of the proposed method maintains same as the conventional method.

Analyses on limitations of binaural sound based on the first order Ambisonics for virtual reality audio (1차 Ambisonics에 의해 생성되는 가상현실 오디오용 양이 사운드의 한계에 대한 분석)

  • Chang, Ji-Ho;Cho, Wan-Ho.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.6
    • /
    • pp.637-650
    • /
    • 2019
  • This paper analyzes the limitations of binaural sound that is reproduced with headphones based on Ambisonics for Virtual Reality (VR) audio. VR audio can be provided with binaural sound that compensates head rotation of a listener. Ambisonics is widely used for recording and reproducing ambient sound fields around a listener in VR audio, and the First order Ambisonics (FOA) is still being used for VR audio because of its simplicity. However, the maximum frequencies with this order is too low to perfectly reproduce ear signals, and thus the binaural reproduction has inherent limitations in terms of spectrum and sound localization. This paper investigates these limitations by comparing the signals arrived at ear positions in the reference field and the reproduced field. An incidence wave is defined as a reference field, and reproduced over virtual loudspeakers. Frequency responses, inter-aural level differences, and inter-aural phase differences are compared. The results show, above the maximum cut off frequency in general, that the reproduced levels decrease, and the horizontal localization can be provided only around the forward direction.

Design of Low Bits Rate Transform Excitation Wide Band Speech and Audio Coder of Analysis-by-Synthesis Structure (분석/합성 구조의 저 전송률 변환여기 광대역 음성/오디오 부호화기 설계)

  • Jang, Sunghoon;Hong, Kibong;Lee, Insung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.7
    • /
    • pp.472-479
    • /
    • 2012
  • This paper is aimed to design 9.2 kbps low bits late transform excitation coder that target to voice and audio signal. To set up low bit rate, we used Band-selection in frequency domain and gain-shape quantization and AbS structure. To decrease lots of calculation from ABS structure, we used each band IDFT and synthesis. And we designed non-transfer band for performance by inserting comfort noise. We propose coder that has low bit rate and similar performance comparing with original 10.4 kbps AMR-WB+ TCX mode.