• Title/Summary/Keyword: 음성 신호 처리

Search Result 474, Processing Time 0.031 seconds

Wiener filtering-based ambient noise reduction technique for improved acoustic target detection of directional frequency analysis and recording sonobuoy (Directional frequency analysis and recording 소노부이의 표적 탐지 성능 향상을 위한 위너필터링 기반 주변 소음 제거 기법)

  • Hong, Jungpyo;Bae, Inyeong;Seok, Jongwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.192-198
    • /
    • 2022
  • As an effective weapon system for anti-submarine warfare, DIrectional Frequency Analysis and Recording (DIFAR) sonobuoy detects underwater targets via beamforming with three channels composed of an omni-direcitonal and two directional channels. However, ambient noise degrades the detection performance of DIFAR sonobouy in specific direction (0°, 90°, 180°, 270°). Thus, an ambient noise redcution technique is proposed for performance improvement of acoustic target detection of DIFAR sonobuoy. The proposed method is based on OTA (Order Truncate Average), which is widely used in sonar signal processing area, for ambient noise estimation and Wiener filtering, which is widely used in speech signal processing area, for noise reduction. For evaluation, we compare mean square errors of target bearing estmation results of conventional and proposed methods and we confirmed that the proposed method is effective under 0 dB signal-to-noise ratio.

Nonlinear Prediction of Nonstationary Signals using Neural Networks (신경망을 이용한 비정적 신호의 비선형 예측)

  • Choi, Han-Go;Lee, Ho-Sub;Kim, Sang-Hee
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.10
    • /
    • pp.166-174
    • /
    • 1998
  • Neural networks, having highly nonlinear dynamics by virtue of the distributed nonlinearities and the learing ability, have the potential for the adaptive prediction of nonstationary signals. This paper describes the nonlinear prediction of these signals in two ways; using a nonlinear module and the cascade combination of nonlinear and linear modules. Fully-connected recurrent neural networks (RNNs) and a conventional tapped-delay-line (TDL) filter are used as the nonlinear and linear modules respectively. The dynamic behavior of the proposed predictors is demonstrated for chaotic time series adn speech signals. For the relative comparison of prediction performance, the proposed predictors are compared with a conventional ARMA linear prediction model. Experimental results show that the neural networks based adaptive predictor ourperforms the traditional linear scheme significantly. We also find that the cascade combination predictor is well suitable for the prediction of the time series which contain large variations of signal amplitude.

  • PDF

A Study on Frequency-Time Plane Analysis of Wavelet (웨이브렛의 주파수-시간 평면 해석에 관한 연구)

  • Bae, Sang-Bum;Ryu, Ji-Goo;Kim, Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.451-454
    • /
    • 2005
  • Recently, many methods to analyze signal have been proposed and representative methods are the Fourier transform and wavelet transform. In these methods, the Fourier transform represents signal with combination cosine and sine at all locations in the frequency domain. However, it doesn't provide time information that particular frequency occurs in signal and depends on only the global feature of the signal. So, to improve these points the wavelet transform which is capable of multiresolution analysis has been applied to many fields such as speech processing, image processing and computer vision. And the wavelet transform, which uses changing window according to scale parameter, presents time-frequency localization. In this paper, we proposed a new approach using a wavelet of cosine and sine type and analyzed features of signal in a limited point of frequency-time plane.

  • PDF

Comparison of Korean Real-time Text-to-Speech Technology Based on Deep Learning (딥러닝 기반 한국어 실시간 TTS 기술 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.640-645
    • /
    • 2021
  • The deep learning based end-to-end TTS system consists of Text2Mel module that generates spectrogram from text, and vocoder module that synthesizes speech signals from spectrogram. Recently, by applying deep learning technology to the TTS system the intelligibility and naturalness of the synthesized speech is as improved as human vocalization. However, it has the disadvantage that the inference speed for synthesizing speech is very slow compared to the conventional method. The inference speed can be improved by applying the non-autoregressive method which can generate speech samples in parallel independent of previously generated samples. In this paper, we introduce FastSpeech, FastSpeech 2, and FastPitch as Text2Mel technology, and Parallel WaveGAN, Multi-band MelGAN, and WaveGlow as vocoder technology applying non-autoregressive method. And we implement them to verify whether it can be processed in real time. Experimental results show that by the obtained RTF all the presented methods are sufficiently capable of real-time processing. And it can be seen that the size of the learned model is about tens to hundreds of megabytes except WaveGlow, and it can be applied to the embedded environment where the memory is limited.

The Technique of Spectrum Flattening by Algorithm for Minimized Harmonics Variance Value (Harmonic 분산값 최소화 알고리즘에 의한 주파수 영역 평탄화 기법)

  • Min, So-Yeon;Kim, Young-Kyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.9
    • /
    • pp.3558-3562
    • /
    • 2010
  • The exact fundamental frequency (pitch) extraction is important in speech signal processing. However the exact pitch extraction from speech signal is very difficult due to the effect of formant and transitional amplitude. So in this paper, the pitch is detected after flattening the spectrum in frequency region by proposed algorithm for minimized harmonics variance value. Experimental result showed the proposed method appeared an outstanding performance in compared with LPC, Cepstrum. Also, the results show the proposed method is better than conventional method.

A Study of Subjective Speech Quality Measurement in VoIP (VoIP 음질의 주관적 평가에 관한 연구)

  • 강영도;강진석;최연성;김장형
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.2
    • /
    • pp.279-287
    • /
    • 2001
  • In this paper, we discuss the scale of subjective speech quality measurement over VoIP(Voice over IP) network which is a component of broadband networks. Objective parameters of multimedia services like PSNR or jitter can easily measured and defined, but these factors are not easily meet the user's perceptual recognition. We suggest the speech quality measurement scale through the subjective measurement for end-to-end speech quality composed of sender-side quality, transmission quality, receiver-side quality, which provide the degree of correctness of representation of speaker, the degree of impairment caused by various factors, the degree of recognition of processed speech, respectively. Also, we examined the proposed method and verify it's availability.

  • PDF

The Design of Remote Control System using Bluetooth Wireless Technology (블루투스 무선기술을 응용한 원격제어 시스템의 설계)

  • 전형준;이창희
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.4
    • /
    • pp.547-552
    • /
    • 2003
  • In this thesis, interference phenomena of bluetooth networks requiring Security were minimized; strengthened security of piconet by assigning an identical PIN code to bluetooth devices, which was establishing a specific piconet during authentication stage. To establish a bluetooth piconet system. an unique ID was assigned to each bluetooth device, communication algorithms having different data formats between devices was designed, and an embedded hardware module using ARM processor and uCOS-II RTOS was implemented. About 30% of CPU efficiency in the module was increased by modifying functions including block parameters to work as nonblocking; by the increased efficiency of total piconet, the module could be used as an access point. The module could transmit maximum 10 frames of image and also audio signal by switching the packet effectively according to channel condition. By above-mentioned process, video, audio, and data could be well transmitted by the bluetooth managing program and the possibility of a commercial remote control system using bluetooth technology was suggested.

  • PDF

A simulation study of speech perception enhancement for cochlear implant patients using companding in noisy environment (잡음 환경에서 압신을 이용한 인공 와우 환자의 언어 인지 향상 시뮬레이션 연구)

  • Lee Young-Woo;Ji Yoon-Sang;Lee Jong-Shil;Kim In-Young;Kim Sun-I.;Hong Sung-Hwa;Lee Sang-Min
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.5 s.311
    • /
    • pp.79-87
    • /
    • 2006
  • In this study, we evaluated the performance of a companding strategy as a preprocessing for speech enhancement and noise reduction. The proposed algorithm is based on two tone suppression that is human's hearing characteristics. This algorithm enhances spectral peak of speech signal and reduces background noise, however it has tradeoff characteristics between speech distortion and noise reduction due to limited channel number and nonlinear block. Therefore, we designed two different companding structures that have relative characteristics of noise reduction and speech distortion and found suitable companding structures by difference of individual speech perception ability in noise environment. Thus we proposed speech perception enhancement of cochlear implant user in noise environment with low SNR. The performance of the proposed algorithm was evaluated through 5 normal hearing listeners using noise band simulation. Improvement of speech perception was observed for all subjects and each subject preferred the different type of companding structure.

Improve Communication Between Different PBX system using H.323 Research (이 기종간의 H.323 프로토콜상의 상호연동을 위한 Signaling 호환성 증대방안 연구)

  • Kim, Jung-Hoon;Choi, Hyon-Young;Min, Sung-Gi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.1221-1224
    • /
    • 2007
  • 현재 기업들 간의 전화비를 줄이고 각종 VoIP 부가 서비스를 위해 VoIP 시스템의 도입이 시작 되었다. 이에 VoIP 전화기들 간의 각 기능을 최대한 활용하기 위해 현재 VoIP 시장의 90%를 차지하고 있는 H.323 게이트웨이(Gateway)간의 H.323 프로토콜의 구현차이로 인한 문제점이 발생되기 시작되었다. 본 논문은 VoIP Gateway상에 H.323 프로토콜 통신을 하면서 프로토콜 연결 상 구현의 차이로 인해 VoIP 서비스에 비정상적인 작동으로 호가 종료가 되거나 음성이 들리지 않는 현상 및 전화기의 부가서비스를 사용할 수 없는 문제를 해결하기 위해 H.323 프로토콜의 작동을 분석하고 이기종간의 H.323프로토콜 신호가 호환되지 않을 경우 이를 해결하기 위해 H.323 프로토콜상의 H.245 시그널링 (signaling)을 Media gateway 서버를 이용해 구현한 RFC 2833 DTMF-compliant 프로토콜을 사용하여 H.323 프로토콜 처리함으로써 이기종간의 Call transfer, Hold 그리고 Conferenct 기능에 대한 호환성이 개선됨을 보여 준다.

  • PDF

A Study on a Reduction of the Transmission Bit Rate by the U/V Decision Using LSP in the CELP Vocoder (LSP를 이용한 음성신호의 성분분리에 의한 CELP 보코더의 전송률 감소에 관한 연구)

  • Na DuckSu;Park YoungHo;Jeong Chan Jung;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.61-64
    • /
    • 1999
  • 기존의 CELP 보코더에서, 무성음에 대한 별도의 처리 없이 유성음과 동일하게 처리하였다. 유성음과 무성음은 발성모델측면에서 임펄스열과 랜덤 잡음으로 각각 다름에 도 불구하고 동일하게 처리함으로써 합성음에서 음질의 저하 및 계산량과 전송률 측면에서 손실을 가져왔다. 또, U/V(Unvoiced /voiced) 분류기를 사용하는 경우에는 U/V 분류기의 성능에 따라 합성음의 음질저하의 정도의 차이가 심하다. 본 논문에서는 에러율과 전처리 계산량을 쳐소로 할 수 있는 U/V 분류기를 사용하여 CELP 보코더에서 전송률을 감소시키는 방법을 제안한다. CELP 보코더에서는 스펙트럼 정보를 LPC 파라미터로 추출한 후 다시 전송형 파라미터인 LSP(Line Spectrum Frequency)로 변환한다 새로운 린/V 분류기는 이 LSP 파라미터를 이용한다. LSP 파라미터의 주파수영역 분포도와 간격정보를 이용하여 U/V를 결정하게 된다 제안한 방법을 5.3kbps ACELP에 적용하여 성능 평가를 실시하였다 실험결과 음질의 저하 없이 $5.6\%$ (280bps)의 전송률을 감소할 수 있었다.

  • PDF