Search | Korea Science

Sinusoidal Modeling of Polyphonic Audio Signals Using Dynamic Segmentation Method (동적 세그멘테이션을 이용한 폴리포닉 오디오 신호의 정현파 모델링)

장호근;박주성
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.4
- /
- pp.58-68
- /
- 2000
This paper proposes a sinusoidal modeling of polyphonic audio signals. Sinusoidal modeling which has been applied well to speech and monophonic signals cannot be applied directly to polyphonic signals because a window size for sinusoidal analysis cannot be determined over the entire signal. In addition, for high quality synthesized signal transient parts like attacks should be preserved which determines timbre of musical instrument. In this paper, a multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in sinusoidal modeling a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. The improved dynamic segmentation is proposed which shows better performance about transients and reduced computation. For various polyphonic audio signals the result of simulation shows the suggested sinusoidal modeling can model polyphonic audio signals without loss of perceptual quality.
PDF

A Study on the Enhanced Time Domain Aliasing Cancellation Transform of the AC-3 Algorithm (AC-3오디오 알고리듬의 시간축 영역 에일리어징 제거 변환부 성능향상에 관한 연구)

김준성;강현철;변윤식
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.2
- /
- pp.13-18
- /
- 2000
This paper presents the result of a technique to enhance TDAC in the AC-3 algorithm. To reduce block boundary noise without decreasing the performance of transform coding, We propose new special windows which improve the defect of the AC-3 algorithm that could not properly cancel aliasing in the transient period. In addition, a fast MDCT calculation algorithm based on a fast Fourier transform, is adopted.
PDF

Power Signal Inter-harmonics Detection using Adaptive Predictor Notch Characteristics (적응예측기 노치특성을 이용한 전력신호 중간고조파 검출)

Bae, Hyeon Deok
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.10 no.5
- /
- pp.435-441
- /
- 2017
Detecting an inter-harmonic accurately is not easy work, because it has small magnitude, and its frequency which can be observed is not an integer multiple of fundamental frequency. In this paper, a new method using filter bank system and adaptive predictor is proposed. Filter bank system decomposes input signal to sub bands. In adaptive predictor, inter-harmonic is detected with decomposed sub band signal as input, and error signal as output. In this scheme, input-output characteristic of adaptive predictor is notch filter, as predicted harmonic is canceled in error signal, so detecting an inter-harmonic can be possible. Magnitude and frequency of detected inter-harmonic is estimated by recursive algorithm. The performances of proposed method are evaluated to sinusoidal signal model synthesized with harmonics and inter-harmonics. And validity of the method is proved as comparing the inter-harmonic detection results to MUSIC and ESPRIT.
https://doi.org/10.17661/jkiiect.2017.10.5.435 인용 PDF KSCI

Voice Source Modeling Using Weighted Sum-of-Basis-Functions Model (기저함수의 가중합을 이용한 음원의 모델링)

강상기
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06c
- /
- pp.171-174
- /
- 1998
본 논문에서는 음성합성(speech synthesis) 및 부호화(coding) 시스템에 있어서 음원(voice source) 모델링에 관한 문제를 살펴보고자 한다. 기존의 음원 모델링 시스템이 가지고 있는 여러 문제들을 극복하고자 기저함수(basis function) 의 가중 합(weighted-sum)으로 음원을 모델링 하는 새로운 기법을 제안하고자 한다. 제안한 방법에서는 음원 파형(voice source waveform)을 적절히 표현하기 위해서 필터뱅크(filter bank)에 기초한 기저함수의 가중 합으로 나타낸다. 다양한 음원 특성을 효과적으로 나타내는 음원 파라미터를 구하기 위하여 EM(estimate maximize)에 기초한 구조에 관해 조사한다. 제안한 방법을 이용하여 다양한 유성음에 대해 실험을 수행하였다. 실험결과 제안한 추정(estimation) 방법 및 모델링 방법을 이용하면 기존의 방법에 비해 더 정확한 음원 파형을 추정할 수 있고, 다양한 음원 특성을 나타낼 수 있다. 또한 음성합성 및 부호화에서도 음성품질(voice quality)를 개선시킬 수 있으리라 기대된다.
PDF

Design of Over-sampled Channelized DRFM Structure in order to Remove Interference and Prevent Spurious Signal (간섭 제거 및 스퓨리어스 방지를 위한 오버샘플링 된 채널화 DRFM 구조 설계)

Kim, Yo-Han;Hong, Sang-Guen;Seo, Seung-Hun;Jo, Jung-Hun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.8
- /
- pp.1213-1221
- /
- 2022
In Electronic Warfare, the need to develop a jamming system that protects our location information from enemy radar is constantly increasing. The jamming system normally uses wide-band DRFM(Digital Radio Frequency Memory) that processes the entire bandwidth at once. However, it is difficult to jam if there is a CW(Continuous Wave) interference signal in the band. Recently, instead of wide-band signal processing, a structure using a filter bank that divides the entire band into several sub-bands and processes each sub-band independently has been proposed. Although it is possible to handle interference signal through the filter bank structure, spurious signal occurs when the signal is received at a boundary frequency between sub-bands. Spurious signal makes a output power of jamming signal distributed, resulting in lower JSR(Jamming to Signal Ratio) and less jamming effect. This paper proposes an over-sampled channelized DRFM structure that enables interference response and prevents spurious signal for sub-band boundary frequency input.
https://doi.org/10.6109/jkiice.2022.26.8.1213 인용 PDF KSCI

Voice-to-voice conversion using transformer network (Transformer 네트워크를 이용한 음성신호 변환)

Kim, June-Woo;Jung, Ho-Young
- Phonetics and Speech Sciences
- /
- v.12 no.3
- /
- pp.55-63
- /
- 2020
Voice conversion can be applied to various voice processing applications. It can also play an important role in data augmentation for speech recognition. The conventional method uses the architecture of voice conversion with speech synthesis, with Mel filter bank as the main parameter. Mel filter bank is well-suited for quick computation of neural networks but cannot be converted into a high-quality waveform without the aid of a vocoder. Further, it is not effective in terms of obtaining data for speech recognition. In this paper, we focus on performing voice-to-voice conversion using only the raw spectrum. We propose a deep learning model based on the transformer network, which quickly learns the voice conversion properties using an attention mechanism between source and target spectral components. The experiments were performed on TIDIGITS data, a series of numbers spoken by an English speaker. The conversion voices were evaluated for naturalness and similarity using mean opinion score (MOS) obtained from 30 participants. Our final results yielded 3.52±0.22 for naturalness and 3.89±0.19 for similarity.
https://doi.org/10.13064/KSSS.2020.12.3.055 인용 PDF KSCI

A DCT Adaptive Subband Filter Algorithm Using Wavelet Transform (웨이브렛 변환을 이용한 DCT 적응 서브 밴드 필터 알고리즘)

Kim, Seon-Woong;Kim, Sung-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.1
- /
- pp.46-53
- /
- 1996
Adaptive LMS algorithm has been used in many application areas due to its low complexity. In this paper input signal is transformed into the subbands with arbitrary bandwidth. In each subbands the dynamic range can be reduced, so that the independent filtering in each subbands has faster convergence rate than the full band system. The DCT transform domain LMS adaptive filtering has the whitening effect of input signal at each bands. This leads the convergence rate to very high speed owing to the decrease of eigen value spread Finally, the filtered signals in each subbands are synthesized for the output signal to have full frequency components. In this procedure wavelet filter bank guarantees the perfect reconstruction of signal without any interspectra interference. In simulation for the case of speech signal added additive white gaussian noise, the suggested algorithm shows better performance than that of conventional NLMS algorithm at high SNR.
PDF

Speech synthesis using acoustic Doppler signal (초음파 도플러 신호를 이용한 음성 합성)

Lee, Ki-Seung
- The Journal of the Acoustical Society of Korea
- /
- v.35 no.2
- /
- pp.134-142
- /
- 2016
In this paper, a method synthesizing speech signal using the 40 kHz ultrasonic signals reflected from the articulatory muscles was introduced and performance was evaluated. When the ultrasound signals are radiated to articulating face, the Doppler effects caused by movements of lips, jaw, and chin observed. The signals that have different frequencies from that of the transmitted signals are found in the received signals. These ADS (Acoustic-Doppler Signals) were used for estimating of the speech parameters in this study. Prior to synthesizing speech signal, a quantitative correlation analysis between ADS and speech signals was carried out on each frequency bin. According to the results, the feasibility of the ADS-based speech synthesis was validated. ADS-to-speech transformation was achieved by the joint Gaussian mixture model-based conversion rules. The experimental results from the 5 subjects showed that filter bank energy and LPC (Linear Predictive Coefficient) cepstrum coefficients are the optimal features for ADS, and speech, respectively. In the subjective evaluation where synthesized speech signals were obtained using the excitation sources extracted from original speech signals, it was confirmed that the ADS-to-speech conversion method yielded 72.2 % average recognition rates.
https://doi.org/10.7776/ASK.2016.35.2.134 인용 PDF KSCI

A New Wideband Speech/Audio Coder Interoperable with ITU-T G.729/G.729E (ITU-T G.729/G.729E와 호환성을 갖는 광대역 음성/오디오 부호화기)

Kim, Kyung-Tae;Lee, Min-Ki;Youn, Dae-Hee
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.45 no.2
- /
- pp.81-89
- /
- 2008
Wideband speech, characterized by a bandwidth of about 7 kHz (50-7000 Hz), provides a substantial quality improvement in terms of naturalness and intelligibility. Although higher data rates are required, it has extended its application to audio and video conferencing, high-quality multimedia communications in mobile links or packet-switched transmissions, and digital AM broadcasting. In this paper, we present a new bandwidth-scalable coder for wideband speech and audio signals. The proposed coder spits 8kHz signal bandwidth into two narrow bands, and different coding schemes are applied to each band. The lower-band signal is coded using the ITU-T G.729/G.729E coder, and the higher-band signal is compressed using a new algorithm based on the gammatone filter bank with an invertible auditory model. Due to the split-band architecture and completely independent coding schemes for each band, the output speech of the decoder can be selected to be a narrowband or wideband according to the channel condition. Subjective tests showed that, for wideband speech and audio signals, the proposed coder at 14.2/18 kbit/s produces superior quality to ITU-T 24 kbit/s G.722.1 with the shorter algorithmic delay.
PDF KSCI

Sustained Vowel Modeling using Nonlinear Autoregressive Method based on Least Squares-Support Vector Regression (최소 제곱 서포트 벡터 회귀 기반 비선형 자귀회귀 방법을 이용한 지속 모음 모델링)

Jang, Seung-Jin;Kim, Hyo-Min;Park, Young-Choel;Choi, Hong-Shik;Yoon, Young-Ro
- Journal of the Korean Institute of Intelligent Systems
- /
- v.17 no.7
- /
- pp.957-963
- /
- 2007
In this paper, Nonlinear Autoregressive (NAR) method based on Least Square-Support Vector Regression (LS-SVR) is introduced and tested for nonlinear sustained vowel modeling. In the database of total 43 sustained vowel of Benign Vocal Fold Lesions having aperiodic waveform, this nonlinear synthesizer near perfectly reproduced chaotic sustained vowels, and also conserved the naturalness of sound such as jitter, compared to Linear Predictive Coding does not keep these naturalness. However, the results of some phonation are quite different from the original sounds. These results are assumed that single-band model can not afford to control and decompose the high frequency components. Therefore multi-band model with wavelet filterbank is adopted for substituting single band model. As a results, multi-band model results in improved stability. Finally, nonlinear sustained vowel modeling using NAR based on LS-SVR can successfully reconstruct synthesized sounds nearly similar to original voiced sounds.
https://doi.org/10.5391/JKIIS.2007.17.7.957 인용 PDF KSCI

Search Result 36, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)