• Title/Summary/Keyword: 청각신호

Search Result 211, Processing Time 0.027 seconds

An Adaptive Speech Enhancement System Using Lateral Inhibition and Time-Delay Neural Network (상호억제와 시간지연 신경회로망을 사용한 적응적인 음성강조시스템)

  • Choi, Jae-Seung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.2
    • /
    • pp.95-102
    • /
    • 2008
  • This paper proposes an adaptive speech enhancement system based on an auditory system to enhance speech that is degraded by various background noises. As such, the proposed system detects voiced and unvoiced sections, adaptively adjusts the coefficients for both the lateral inhibition and the amplitude component according to the detected sections for each input fame, then reduces the noise signal using a time-delay neural network. Based on measuring the signal-to-noise ratio, experiments confirm that the proposed system is effective for speech degraded by various noises.

New Echo Embedding Technique for Robust Audio Watermarking (강인한 오디오 워터마킹을 위한 새로운 반향 커널 설계)

  • 오현오;김현욱;윤대희;석종원;홍진우
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.66-76
    • /
    • 2001
  • Conventional echo watermarking techniques often exhibit inherent trade-offs between imperceptibility and robustness. In this paper, a new echo embedding technique is proposed. The proposed method enables one to embed high energy echoes while the host audio quality is not deteriorated, so that it is robust to common signal processing modifications and resistant to tampering. It is possible due to echo kernels that are designed based on psychoacoustic analyses. In addition, we propose some novel techniques to improve robustness against signal processing attacks. Subjective and objective evaluations confirmed that the proposed method could improve the robustness without perceptible distortion.

  • PDF

Optimization of Multi-time Scale Loss Function Suitable for DNN-based Audio Coder (심층신경망 기반 오디오 부호화기를 위한 Multi-time Scale 손실함수의 최적화)

  • Shin, Seung-Min;Byun, Joon;Park, Young-Cheol;Beack, Seung-kwon;Sung, Jong-mo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1315-1317
    • /
    • 2022
  • 최근, 심층신경망 기반 오디오 부호화기가 활발히 연구되고 있다. 심층신경망 기반 오디오 부호화기는 기존의 전통적인 오디오 부호화기보다 구조적으로 간단하지만, 네트워크의 복잡도를 증가시키지 않고 인지적 성능향상을 기대하는 것은 어렵다. 이 문제를 해결하기 위하여 인간의 청각적 특성을 활용한 심리음향모델 기반 손실함수를 사용한 기법들이 소개되었다. 심리음향 모델 기반 손실함수를 사용한 오디오 부호화기는 양자화 잡음을 잘 제어하였지만, 여전히 지각적인 향상이 필요하다. 본 논문에서는 심층신경망 기반 오디오 부호화기를 위한 Multi-time Scale 손실함수의 지역 손실함수 윈도우 크기의 최적화 제안한다. Multi-time Scale 손실함수의 지역 손실함수 계산을 위한 윈도우 크기를 조절하며, 이를 통하여 오디오 부호화에 적합한 윈도우 사이즈를 결정한다. 실험을 통해 얻은 최적의 Multi-time Scale 손실함수를 사용하여 네트워크를 훈련하였고, 주관적 평가를 통해 기존의 심리음향모델 기반 손실함수보다 좋은 음성 품질을 보여주는 것을 확인하였다.

  • PDF

Enhanced Adjustment Strategy of Masking Threshold for Speech Signals in Low Bit-Rate Audio Coding (저전송률 오디오 부호화에서 음성 신호의 성능 개선을 위한 마스킹 임계값 적응기법 향상)

  • Lee, Chang-Heon;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.62-68
    • /
    • 2010
  • This paper proposes a new masking threshold adjustment strategy to improve the performance for speech signals in low bit-rate audio coding. After determining formant regions, the masking threshold is adjusted by using the energy ratio of each sub-band to the average energy of each formant. More quantization noises are added to the bands that have relatively large energy, but less distortion is allowed in spectral valley regions by allocating more bits, which reflects the concept of perceptual weighting widely used in speech coding. From the results of objective speech quality measure, we verified that the proposed method improves quality for the speech input signals compared to the conventional one.

Korean isolated word recognizer using new time alignment method of speech signal (새로운 시간축 정규화 방법을 이용한 한국어 고립단어 인식기)

  • Nam, Myeong-U;Park, Gyu-Hong;No, Seung-Yong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.5
    • /
    • pp.567-575
    • /
    • 2001
  • This paper suggests new method to get fixed size parameter from different length of voice signals. The efficiency of speech recognizer is determined by how to compare the similarity(distance of each pattern) of the parameter from voice signal. But the variation of voice signal and the difference of speech speed make it difficult to extract the fixed size parameter from the voice signal. The method suggested in this paper is to normalize the parameter at fixed size by using the 2 dimension DCT(Discrete Cosine Transform) after representing the parameter by spectrogram. To prove validity of the suggested method, parameter extracted from 32 auditory filter-bank(it estimates auditory nerve firing probabilities) is used for the input of neural network after being processed by 2 dimension DCT. And to compare with conventional methods, we used one of conventional methods which solve time alignment problem. The result shows more efficient performance and faster recognition speed in the speaker dependent and independent isolated word recognition than conventional method.

  • PDF

An Adaptive Microphone Array with Linear Phase Response (선형 위상 특성을 갖는 적응 마이크로폰 어레이)

  • Kang, Hong-Gu;Youn, Dae-Hui;Cha, Il-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.53-60
    • /
    • 1992
  • Many adaptive beamforming methods have been studied for interference cancellation and speech signal enhancement in telephone conference and auditorium. Main aspect of adaptive beamforming methods for speech signal processing is different from radar, sonar and seismic signal processing because desire output signal should be apt to the human ear. Considering that phase of speech is quite insensible to the human ear, Sondhi proposed a nonlinear constrained optimization technique whose constraint was on the magnitude transfer function from the source to the output. In real environment the phase response of the speech signal affects the human auditorium system. So it is desirable to design linear phase system. In this paper, linear phase beamformer is proposed and sample processing algorithm is also proposed for real time consideration Simulation results show that the proposed algorithm yields more consistent beam patterns and deep nulls to the noise direction than Sondhi's.

  • PDF

An Image Watermarking Method for Embedding Copyrighter's Audio Signal (저작권자의 음성 삽입을 위한 영상 워터마킹 방법)

  • Choi Jae-Seung;Kim Chung-Hwa;Koh Sung-Shik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4
    • /
    • pp.202-209
    • /
    • 2005
  • The rapid development of digital media and communication network urgently brings about the need of data certification technology to protect IPR (Intellectual property right). This paper proposed a new watermarking method for embedding owner's audio signal. Because this method uses an audio signal as a watermark to be embedded, it is very useful to claim the ownership aurally. And it has the advantage of restoring audio signal modified and especially removed by image removing attacks by applying our LBX(Linear Bit-expansion) interleaving. Three basic stages of our watermarking include: 1) Encode . analogue owner's audio signal by PCM and create new digital audio watermark, 2) Interleave an audio watermark by our LBX; and 3) Embed the interleaved audio watermark in the low frequency band on DTn (Discrete Haar Wavelet Transform) of image. The experimental results prove that this method is resistant to lossy JPEG compression as standard image compression and especially to cropping and rotation which remove a part of Image.

Adaptive Noise Reduction using Standard Deviation of Wavelet Coefficients in Speech Signal (웨이브렛 계수의 표준편차를 이용한 음성신호의 적응 잡음 제거)

  • 황향자;정광일;이상태;김종교
    • Science of Emotion and Sensibility
    • /
    • v.7 no.2
    • /
    • pp.141-148
    • /
    • 2004
  • This paper proposed a new time adapted threshold using the standard deviations of Wavelet coefficients after Wavelet transform by frame scale. The time adapted threshold is set up using the sum of standard deviations of Wavelet coefficient in cA3 and weighted cDl. cA3 coefficients represent the voiced sound with low frequency and cDl coefficients represent the unvoiced sound with high frequency. From simulation results, it is demonstrated that the proposed algorithm improves SNR and MSE performance more than Wavelet transform and Wavelet packet transform does. Moreover, the reconstructed signals by the proposed algorithm resemble the original signal in terms of plosive sound, fricative sound and affricate sound but Wavelet transform and Wavelet packet transform reduce those sounds seriously.

  • PDF

Tonality Detection based on Spectrum Energy in Perceptual Audio Coder (지각 오디오 부호화기에서의 스펙트럼 에너지 기반 톤 성분 검출 알고리듬)

  • 이근섭;연규철;박영철;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.6C
    • /
    • pp.770-776
    • /
    • 2004
  • The goal of perceptual audio coder is to reduce redundancy and irrelevancy of audio signal based on the concept of masking. Several studies on masking effect reveal that the masking threshold varies as a function of the noise-like or tone-like nature of audio signals. Therefore, tonality of audio signal influences significantly the quality and efficiency of perceptual audio coder In this paper, we propose a new effective algorithm for tonality measure using spectrum energy. Since the proposed algorithm consists of a few transcendental functions and simple operations, it has lower complexity than MPEG psychoacoustic model-II. The proposed algorithm was tested with some audio signals, and DSP implementation showed that the proposed algorithm could be implemented with 3 MIPS. These results illustrate the efficiency of proposed algorithm in both performance and complexity.

Evaluation on the stress using HRV according to elapsed time of MRI noise (HRV를 이용한 자기공명영상 소음의 시간 변화에 따른 스트레스 평가)

  • Ye, Soo-Young;Kim, Dong-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.2
    • /
    • pp.50-55
    • /
    • 2015
  • The noise of MRI shooting is 100dB loud and has an intensive psychological and physiological influences on the human body. ECG signals were measured by experimental methods, while wearing earplugs for 15 minutes in the stable state. Then the ECG signals were measured for 30 minutes while listening to about 100dB of sound in a MRI equipment. In this study, the heart rate variability of men and women was analyzed according to the MRI noise stress level through the frequency analysis. As the MRI noise level is about 100dB, HRV analysis resulted in an imbalance between the sympathetic and parasympathetic. During the period from the resting state up to 10 minutes, the maximum stress state was shown. This study will encourage MRI workers to take interests in hearing protection for the patient and to make objective indicators about MRI noises.