Search | Korea Science

Variable Time-Scale Modification of Speech Using Trasient Information (천이구간 정보를 이용한 음성의 가변적인 시간축 변환)

Lee, Sung-Joo;Kim, Hee-Dong;Kim, Hyung-Soon
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.6
- /
- pp.147-155
- /
- 1998
Conventional time-scale modification methods have the problem that as the modification rate gets higher the time-scale modified speech signal becomes less intelligible, because they ignore the effect of articulation rate on speech characteristics. In this paper, we propose a variable time-scale modification method based on the knowledge that the timing information of transient portions of a speech signal plays an important role in speech perception. After identifying steady protions only. The result of subjective preference test indicates that the proposed method produces performance superior to that of the conventional SOLA method.
PDF

Speech Modification and Concatenative Speech Synthesis by using Analysis-By-Synthesis/OverLap-Add(ABS/OLA) Sinusoidal Model (Analysis- By-Synthesis/OverLap- Add( ABS/OLA) Sinusoidal Model 을 이용한 음성변환과 연결음성합성)

구자형
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.339-343
- /
- 1998
Sinusoidal model 은 음성신호처리의 넓은 분야에 적용되고 있는 방법으로 고음질의 합성음을 생성해 낼 수 있고, 조작이 용이하다는 장점을 가지고 있다. 본 논문에서는 Analysis-by-synthesis/Overlap-add Sinusoidal model 이라는 방법을 이용하여 시간축 변환과 dam성 변환을 수행하였다. 특히 본 논문에서는 음질향상을 위하여 시간축 변환시에는 정적인 구간과 변화하는 구간을 구별하여 서로 다른 시간축 변환비를 이용하였고, 기존의 LPC 방법에 비해 스펙트럼 포락선을 보다 잘 추정하는 Improved Cepstrum을 이용하여 음정변환에 적용하였다. 또 서로 다른 문맥에서 얻어진 음성단위들을 결합할 때 생기는 위상차이를 극복하기 위하여, 기본주파수 성분이 일치하도록 시간축을 이동하여 합성하였다. 실험결과 본 논문에서 적용한 방법들을 통해 기존 방식에 비해 개선된 음질을 얻을 수 있었다.
PDF

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification (WSOLA 기반의 음성 시간축 변환을 위한 고속의 정규상호상관도 계산)

Lim, Sangjun;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.7
- /
- pp.427-434
- /
- 2012
The overlap-add technique based on waveform similarity (WSOLA) method is known to be an efficient high-quality algorithm for time scaling of speech signal. The computational load of WSOLA is concentrated on the repeated normalized cross-correlation (NCC) calculation to evaluate the similarity between two signal waveforms. To reduce the computational complexity of WSOLA, this paper proposes a fast NCC computation method, in which NCC is obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations in the adjacent regions. While the denominator part of NCC has much redundancy irrespective of the time-scale factor, the numerator part of NCC has less redundancy and the amount of redundancy is dependent on both the time-scale factor and optimal shift value, thereby requiring more sophisticated algorithm for fast computation. The simulation results show that the proposed method reduces about 40%, 47% and 52% of the WSOLA execution time for the time-scale compression, 2 and 3 times time-scale expansions, respectively, while maintaining exactly the same speech quality of the conventional WSOLA.
https://doi.org/10.7776/ASK.2012.31.7.427 인용 PDF KSCI

A Study on Real-time Implementing of Time-Scale Modification (음성 신호 시간축 변환의 실시간 구현에 관한 연구)

Han, Dong-Chul;Lee, Ki-Seung;Cha, Il-Hawan;Youn, Dae-Hee
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.2
- /
- pp.50-61
- /
- 1995
A time scale modification method yielding rate-modified speech while conserving the characteristic of speech was implemented in real-time using a goneral purpose digital signal processor. Time scale modification changed pronunciation speed only, producing a time difference between the input signal and the modified signal, making it impossible to implement it in real-time. In this thesis, a system was implemented to remove the time difference between the input and modified signals. Speech signals slowed down or speeded up by a physical time scale modification method, such as adjusting the motor speed of the cassett tape recorder, was used as the input signal. Physical modification that controled only the inter speed of the cassette tape player distorted the pitch period of the original speech. In this study, a real-time system was implemented so that the pitch-distorted speech was reconstructed back to the original by fractional sampling pitch shifting using an FIR filter, and this signal was time scale modified to match the cassette tape recorder motor speed using SOLA time-scale medification. In experiments using speech signals medifiedby the proposed method, results obtained using a 16-bit resolution ADSP2101 processor and using computer simulations employing floating point operations showed about the same average frame signal-to-noise ratio of about 20 dB.
PDF

Blind Video Fingerprinting Using Temporal Wavelet Transform (시간축 웨이블릿 변환을 이용한 블라인드 비디오 핑거프린팅)

Kang Hyun-Ho;Park Ji-Hwan;Lee Hye-Joo;Hong Jin-Woo
- Journal of Korea Multimedia Society
- /
- v.7 no.9
- /
- pp.1263-1272
- /
- 2004
In this paper, we present a novel video fingerprinting implementation method to identify the source of illegal copies. The video fingerprinting is achieved by the insertion of uniform distributed random number is made by seller and buyer's identification key-in the video wavelet coefficients by their temporal wavelet transform. The proposed fingerprinting is able to detect unique fingerprint of video contents even if they have been distorted by collusion attacks and MPEG2 compression. Especially, we use characteristics of the temporal wavelet transform to assign user's embedding area. Experimental results show the traceability of unauthorized distribution of video contents and its robustness to various collusion attacks and MPEG2 compression.
PDF

Video Fingerprinting based on the Temporal Wavelet Transform (시간축 웨이블릿 변환에 의한 비디오 핑거프린팅)

강현호;박지환;이혜주;홍진우
- Proceedings of the Korea Multimedia Society Conference
- /
- 2003.11a
- /
- pp.36-39
- /
- 2003
본 논문에서는 비디오 컨텐츠 내에 소유자와 구매자 정보를 함께 포함하는 핑거프린팅 정보를 삽입하여 불법으로 배포된 핑거프린팅 컨텐츠로부터 배포자가 누구인지를 추적할 수 있는 기법을 보인다. 특히, 문헌[1]에서 제시된 시간축 웨이블릿 변환을 이용하여 핑거프린팅 정보가 삽입될 영역을 분리해 주고, 역 변환을 통해 전 영역의 비디오 프레임에 정보가 삽입되게 된다. 이로 인해 핑거프린팅된 컨텐츠의 상이성을 이용한 기존의 여러 공모공격에도 강인함을 보이고 있다. 또한, 비디오 컨텐츠의 특성상 MPEG2의 압축에도 불법 배포자를 추적할 수 있는 강인함을 보인다.
PDF

Real-time Voice Change System using Pitch Change (피치 변환을 사용한 실시간 음성 변환 시스템)

Kim, Weon-Goo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.14 no.6
- /
- pp.759-763
- /
- 2004
In this paper, real-time voice change method using pitch change technique is proposed to change one's voice to the other voice. For this purpose, sampling rate change method using DFT (Discrete Fourier Transform) method and time scale modification method using SOLA (Synchronized Overlap and Add) method is combined to change pitch. In order to evaluate the performance of the proposed method, voice transformation experiments were conducted. Experimental results showed that original speech signal is changed to the other speech signal in which original speaker's identity is difficult to find. The system is implemented using TI TMS320C6711DSK board to verify the system runs in real time.
https://doi.org/10.5391/JKIIS.2004.14.6.759 인용 PDF KSCI

An approach to VOD traffic modeling using discrete wavelet transform (이산 웨이블릿 변환을 활용한 VOD 트래픽 모델링 방법)

이호석
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10c
- /
- pp.481-483
- /
- 2000
본 논문은 이산 웨이블릿 변환의 스케일러빌리티(scalability)를 활용한 VOD 트래픽 모델링에 대하여 소개한다. VOD는 사용자의 요구에 대하여 비디오 데이터를 제공하는 시스템이다. 비디오 데이터는 여러 가지 특징을 가지고 있다. 첫 번째 특징은 데이터 양이 상당히 많다는 점이다. 그리고 데이터 양이 비디오 데이터가 전달되는 시간축에 따라서 변화가 많다는 점이다. 그리고 두 번째 특징은 비디오 데이터는 전송되는 양상이 시간축에 대하여 거의 끊김이 없어야 한다는 점이다. 이러한 점들 때문에 VOD 트래픽을 정확하게 모델링하는 것은 상당히 어렵게 생각되었다. 이산 웨이블릿 변환(discrete wavelet transform)은 함수에 대한 근사이다. 우수한 점은 함수에 대한 근사가 상당히 용이하고 또 유연하다는 점이다. 다시 말하면 함수 근사의 정밀도를 용이하게 조절할 수 있다는 점이다. 또 다른 우수한 점은 시간과 공간 양쪽에 대하여 함수 근사를 할 수 있다는 점이다. 본 논문은 VOD server와 client 사이의 트래픽을 이산 웨이블릿 변환인 스케일러빌리티를 활용하여 모델링하여 server와 client 사이에 보다 효과적인 네트워크 트래픽 제어를 할 수 있음을 보인다.
PDF

Study on the Improvement of Speech Recognizer by Using Time Scale Modification (시간축 변환을 이용한 음성 인식기의 성능 향상에 관한 연구)

이기승
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.6
- /
- pp.462-472
- /
- 2004
In this paper a method for compensating for thp performance degradation or automatic speech recognition (ASR) is proposed. which is mainly caused by speaking rate variation. Before the new method is proposed. quantitative analysis of the performance of an HMM-based ASR system according to speaking rate is first performed. From this analysis, significant performance degradation was often observed in the rapidly speaking speech signals. A quantitative measure is then introduced, which is able to represent speaking rate. Time scale modification (TSM) is employed to compensate the speaking rate difference between input speech signals and training speech signals. Finally, a method for compensating the performance degradation caused by speaking rate variation is proposed, in which TSM is selectively employed according to speaking rate. By the results from the ASR experiments devised for the 10-digits mobile phone number, it is confirmed that the error rate was reduced by 15.5% when the proposed method is applied to the high speaking rate speech signals.
PDF KSCI

A Study on the Enhanced Time Domain Aliasing Cancellation Transform of the AC-3 Algorithm (AC-3오디오 알고리듬의 시간축 영역 에일리어징 제거 변환부 성능향상에 관한 연구)

김준성;강현철;변윤식
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.2
- /
- pp.13-18
- /
- 2000
This paper presents the result of a technique to enhance TDAC in the AC-3 algorithm. To reduce block boundary noise without decreasing the performance of transform coding, We propose new special windows which improve the defect of the AC-3 algorithm that could not properly cancel aliasing in the transient period. In addition, a fast MDCT calculation algorithm based on a fast Fourier transform, is adopted.
PDF

Search Result 84, Processing Time 0.018 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)