Search | Korea Science

Speech Enhancement using RNN Phoneme based VAD (음소기반의 순환 신경망 음성 검출기를 이용한 음성 향상)

Lee, Kang;Kang, Sang-Ick;Kwon, Jang-woo;Lee, Samgmin
- Journal of the Institute of Electronics and Information Engineers
- /
- v.54 no.5
- /
- pp.85-89
- /
- 2017
In this papers, we apply high performance hardware and machine learning algorithm to build an advanced VAD algorithm for speech enhancement. Since speech is made of series of phoneme, using recurrent neural network (RNN) which consider previous data is proper method to build a speech model. It is impossible to study every noise in real world. So our algorithm is builded by phoneme based study. we detect voice present frames in noisy speech signal and make enhancement of the speech signal. Phoneme based RNN model shows advanced performance in speech signal which has high correlation among each frames. To verify the performance of proposed algorithm, we compare VAD result with label data and speech enhancement result in various noise environments with previous speech enhancement algorithm.
https://doi.org/10.5573/ieie.2017.54.5.85 인용 PDF KSCI

Robust Speech Reinforcement Based on Gain-Modification incorporating Speech Absence Probability (음성 부재 확률을 이용한 음성 강화 이득 수정 기법)

Choi, Jae-Hun;Chang, Joon-Hyuk
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.47 no.1
- /
- pp.175-182
- /
- 2010
In this paper, we propose a robust speech reinforcement technique to enhance the intelligibility of the degraded speech signal under the ambient noise environments based on soft decision scheme incorporating a speech absence probability (SAP) with speech reinforcement gains. Since the ambient noise significantly decreases the intelligibility of the speech signal, the speech reinforcement approach to amplify the estimated clean speech signal from the background noise environments for improving the intelligibility and clarity of the corrupted speech signal was proposed. In order to estimate the robust reinforcement gain rather than the conventional speech reinforcement method between speech active periods and nonspeech periods or transient intervals, we propose the speech reinforcement algorithm based on soft decision applying the SAP to the estimation of speech reinforcement gains. The performances of the proposed algorithm are evaluated by the Comparison Category Rating (CCR) of the measurement for subjective determination of transmission quality in ITU-T P.800 under various ambient noise environments and show better performances compared with the conventional method.
PDF KSCI

Matching Pursuit Sinusoidal Modeling with Damping Factor (Damping 요소를 첨가한 매칭 퍼슈잇 정현파 모델링)

Jeong, Gyu-Hyeok;Kim, Jong-Hark;Lim, Joung-Woo;Joo, Gi-Ho;Lee, In-Sung
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.44 no.1
- /
- pp.105-113
- /
- 2007
In this paper, we propose the matching pursuit with damping factors, a new sinusoidal model improving the matching pursuit, for the codecs based on sinusoidal model. The proposed model defines damping factors by using a correlativity of parameters between the current and adjacent frame, and estimates sinusoidal parameters more accurately in analysis frame by using the matching pursuit according to damping factor, and synthesizes the final signal. Then it is possible to model efficiently without interpolation schemes. The proposed sinusoidal model shows a better speech quality without an additional delay than the conventional sinusoidal model with interpolation methods. Through the SNR(signal to noise ratio), the MOS(Mean Opinion Score), LR(Itakura-Saito likelihood ratio), and CD(cepstral distance), we compare the performance of our model with that of matching pursuit using interpolation methods.
PDF KSCI

Quality Improvement of Low Bitrate HE-AAC using Linear Prediction Pre-processor (저 전송률 환경에서 선형예측 전처리기를 사용한 HE-AAC의 성능 향상)

Lee, Jae-Seong;Lee, Gun-Woo;Park, Young-Chul;Youn, Dae-Hee
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.34 no.8C
- /
- pp.822-829
- /
- 2009
This paper proposes a new method of improving the quality of High Efficiency Advanced Audio Coding (HE-AAC). HE-AAC encodes input source by allocating bits for each scalefactor bands appropriately according to human ear's psychoacoustic property. As a result, insufficient bits are assigned to the bands which have relatively low energy. This imbalance between different energy bands can cause decreasing of sound quality like musical noise. In the proposed system, a Linear Prediction (LP) module is combined with HE-AAC as a pre-processor to improve sound quality by even bits distribution. To apply accurate human being's psychoacoustic property, the psychoacoustic model uses Fast Fourier Transform (FFT) spectrum of original input signal to make masking threshold. In its implementation, masking threshold of psychoacoustic model is normalized using the LP spectral envelope in prior to quantization of the LP residual. Experimental result shows that, the proposed algorithm allocates bits appropriately for insufficient bits condition and improves the performance of HE-AAC.
PDF KSCI

Improved Harmonic-CELP Speech Coder with Dual Bit-Rates(2.4/4.0 kbps) (이중 전송률(2.4/4.0 kbps)을 갖는 개선된 하모닉-CELP 음성부호화기)

김경민;윤성완;최용수;박영철;윤대희;강태익
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.28 no.3C
- /
- pp.239-247
- /
- 2003
This paper presents a dual-rate (2.4/4.0 kbps) Improved Harmonic-CELP(IHC) speech coder based on the EHC(Efficient Harmonic-CELP) which was presented by the authors. The proposed IHC employs the harmonic coding for voiced and the CELP for unvoiced segments. In the IHC, an initial voiced/unvoiced estimate is obtained by the pitch gain and energy. Then, the final V/UV mode is decided by using the frame energy contour. A new harmonic estimation combining peak picking and delta adjustment provides a more reliable harmonic estimation than that in the EHC. In addition, a noise mixing scheme in conjunction with an improved band voicing measurement provides the naturalness of the synthesized speech. To demonstrate the performance of the proposed IHC coder, the coder has been implemented and compared with the 2.0/4.0 kbps HVXC(Harmonic excitation Vector Coding) standardized by MPEG-4. Results of subjective evaluation showed that the proposed IHC coder and produce better speech quality than the HVXC, with only 40% complexity of the HVXC.
PDF KSCI

A Gain Control Algorithm of Low Computational Complexity based on Voice Activity Detection (음성 검출 기반의 저연산 이득 제어 알고리즘)

Kim, Sang-Kuyn;Cho, Woo-Hyeong;Jeong, Min-A;Kwon, Jang-Woo;Lee, Sangmin
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.40 no.5
- /
- pp.924-930
- /
- 2015
In this paper, we propose a novel approach of low computational complexity to improve the speech quality of the small acoustic equipment in noisy environment. The conventional gain control algorithm suppresses the noise of input signal, and then the part of wide dynamic range compression (WDRC) amplifies the undesired signal. The proposed algorithm controls the gain of hearing aids according to speech present probability by using the output of a voice activity detection (VAD). The performance of the proposed scheme is evaluated under various noise conditions by using objective measurement and yields superior results compared with the conventional algorithm.
https://doi.org/10.7840/kics.2015.40.5.924 인용 PDF KSCI

Frequency Domain Acoustic Echo Suppression Based on Boundary Condition (주파수 영역에서 구간조건을 이용한 음향학적 반향 제거)

Lee, Kyu-Ho;Chang, Joon-Hyuk
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.46 no.5
- /
- pp.162-166
- /
- 2009
In this paper, we propose a novel approach of an acoustic echo cancellation (AEC) algorithm which is differently adopted in the relevant period condition by the suppression parameter of a parametric wiener filter (PWF). The PWF uses the suppression parameter to compensate uncertainty of acoustic echo signal estimation. The existing PWF method using the fixed suppression parameter derives the distortion of the near-end signal at the double-talk. To solve this problem, the boundary condition is devised using decision of the double-talk detection (DTD) algorithm and voice activity detector (VAD). The boundary condition makes it possible to treat differently depending on the case of the single-talk and double-talk. According to the experimental results, the proposed approach is found to be effective for acoustic echo cancellation using the boundary condition.
PDF KSCI

An Audio Watermarking Method Using the Attribute of the Tonal Masker (토널 마스커 특성을 이용한 오디오 워터마킹)

이희숙;이우선
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.5
- /
- pp.367-374
- /
- 2003
In this paper, we propose an audio watermarking method using the attribute of tonal masker. First, the attribute of tonal masker as an audio watermarking attribute is analyzed. According to existing researches, it is possible to be imperceptible modulation for the energies of the frequencies that compose a tonal masker. And when the relation between the tone energy and the left or right frequency energy after various signal processing is compared with the one before the processing, very few changes are showed. We propose an audio watermarking method using these attributes of tonal masker. A watermark bit is embedded by the modulation of the difference between the two neighboring frequency energies of a tone. In the detection, the modulated the tonal masker is searched using the key wed in the embedding without original audio and the embedded watermark bit is detected. After each attack of noise insertion, band-pass filtering, re-sampling, compression, echo transform and equalization, the detection error ratios of the proposed method were average 0.11%, 1.26% for Classics and Pops. And the SDG(Subjective Diff-Grades) scale evaluation of the sound quality of the watermarked audio result in the average SDG -0.31.
PDF KSCI

A study on Biz Models Through the T-DMB Total Solution Developed by the Convergence of Communication and Broadcasting Technologies (통신.방송 융합기술 지상파 DMB Total Solution 비즈 모델 연구)

Eun, Jong-Won
- Journal of Satellite, Information and Communications
- /
- v.6 no.2
- /
- pp.15-19
- /
- 2011
The T-DMB(Terrestrial Digital Multimedia Broadcasting) which was developed by the convergence of digital broadcasting technology and communication technology provides us with very good quality of music like CD, and provides TV services in a super express train like the KTX whose velocity is over 300 Km per hour. The T-DMB is diffusing toward the world as a technology which is be able to provide the various convergent services of broadcasting and communication through mobile phone, PDA, dedicated terminal, and so on. A business model needed for the diffusion of the T-DMB toward the world was established and utilized to expand the T-DMB into Vietnam in the paper. In addition, this paper describes not only some predicting methods for the technological valuation of the T-DMB Total Solution, but also a case study on the marketing related to establishing the T-DMB system in order to provide the paid services in Vietnam. Finally, A couple of business models needed to globally expand the T-DMB have been provided.
PDF KSCI

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique (새로운 스펙트럼 완만화에 의한 합성 음질 개선)

장효종;최형일
- Journal of KIISE:Software and Applications
- /
- v.30 no.11
- /
- pp.1037-1043
- /
- 2003
This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.
PDF KSCI

Search Result 353, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)