• Title/Summary/Keyword: 음성 코덱

Search Result 119, Processing Time 0.024 seconds

Efficient TTS Database Compression Based on AMR-WB Speech Coder (AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법)

  • Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.290-297
    • /
    • 2009
  • This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.

Implementation of a storage device the noise elimination negative input that using adaptive filter. (적응형 필터를 이용한 잡음제거 음성입력 및 저장장치의 구현)

  • Ji, Yoo-Kang;Moon, Dae-Wong;Kim, Sa-Wung;Park, Soo-Bong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.05a
    • /
    • pp.147-150
    • /
    • 2008
  • The explanation of Tourism Guide of present whole country main tourist resort helps to understand the tourist resort. However, activity space of Tourism Guide is not state that can be understood all by upbringing by a natural voice. Because action of Tourism Guide is much in case of most sightseeing explanation to use microphone and speaker etc., as sticking that attach and uses to clothing and so on uses, there are much vexatious. Treatise that see hereupon makes use of establishment style fixing microphone and embody inputted obscene sounds by On-board system inflecting MCU (ATmega128), MSM7731-02 Oki-Dual Codec to minimize noise using ecad filter, and embodied a control program by serial communication method with filter codec. The resultant audible direction the maximum 59ms, the line echo maximum 27ms, the echo decrease maximum 35dB, it embodied the system which removes the adaptation elder brother noise of the back.

  • PDF

Hardware Design of Enhanced Real-Time Sound Direction Estimation System (향상된 실시간 음원방향 인지 시스템의 하드웨어 설계)

  • Kim, Tae-Wan;Kim, Dong-Hoon;Chung, Yun-Mo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.3
    • /
    • pp.115-122
    • /
    • 2011
  • In this paper, we present a method to estimate an accurate real-time sound source direction based on time delay of arrival by using generalized cross correlation with four cross-type microphones. In general, existing systems have two disadvantages such as system embedding limitation due to the necessity of data acquisition for signal processing from microphone input, and real-time processing difficulty because of the increased number of channels for sound direction estimation using DSP processors. To cope with these disadvantages, the system considered in this paper proposes hardware design for enhanced real-time processing using microphone array signal processing. An accurate direction estimation and its design time reduction is achieved by means of an efficient hardware design using spatial segmentation methods and verification techniques. Finally we develop a system which can be used for embedded systems using a sound codec and an FPGA chip. According to experimental results, the system gives much faster real-time processing time compared with either PC-based systems or the case with DSP processors.

Low-Delay LSF FEC Technique Robust in Lossy VoIP Environment (VoIP 손실 환경에 강인한 저지연 LSF FEC 기법)

  • Yang, Hae-Yong;Lee, Kyung-Hoon;Hwang, In-Ho
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.6
    • /
    • pp.687-695
    • /
    • 2002
  • Media-specific FEC techniques, suggested to confront with VoIP speech packet loss, improve speech quality at the expense of generating additional one-frame delay. In this paper, we suggest new media-specific FEC, i.e, LSF FEC technique which is able to improve speech quality with much shortened additional delay. In the proposed technique, the LSF parameters of the future frame are utilized to recover a lost packet. To evaluate performance of the proposed technique, we use ITU-T G.723.1 and G.729 Codec and apply Gilbert packet loss model and estimate MOS per every packet loss rate using PESQ speech quality estimation algorithm. The proposed technique has effect of shortening delay over from 6.5ms to 27ms compared with existing media-specific FEC techniques. Simulation results for comparison of reconstructed speech quality show this novel technique improves the MOS over 0.1 in practical lossy environment of 5 % packet loss rate.

A Method of Adaptive ISF Split Vector Quantization Using Normalized Codebook (정규화 코드북을 이용한 분할 벡터 구조의 ISF 적응적 양자화 기법)

  • Piao, Zhigang;Lim, Jong-Ha;Hong, Gi-Bong;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.5
    • /
    • pp.265-272
    • /
    • 2011
  • In most of the ISF (or LSF) based real time speech codec, SVQ (split vector quantization) method is used to decrease the quantizer complexity and memory size of codebook. However, it produces drawback that the level of correlation between code vectors can not be used during vector splits. This paper presents a new method of adaptive ISF vector quantization, which compensates the drawbacks of SVQ structured quantizer for wideband speech codec. In each different frame, the proposed method makes use of the correlation between splitted vectors by adaptively changing codebook distribution according to ordering property of ISF. The algorithm is evaluated in AMR-WB, and shows about 1.5 bit per frame improvement.

A Research on Quality Improvement of Software-based Video Teleconferencing on the Tactical Communication Networks Less Than 1Mbps (1Mbps 이하 전술통신망에서의 소프트웨어 방식 화상회의 품질향상 연구)

  • Kim, Gwon-Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.1C
    • /
    • pp.63-75
    • /
    • 2012
  • This paper researched the operation methods of software video teleconferencing on the tactical communication networks under 1Mbps. The tactical communication networks have limited bandwidths, frequent data losses and transmission delays due to the unstable networks. In addition, the bandwidth for video teleconferencing has to be much smaller since the Army Tactical Command Information System(ATCIS) has priority of using the bandwidth. This paper analyzed such restrictions of tactical communication networks, presented some methods to improve the quality of the software video teleconferencing on the tactical communication networks and their actual experiments as well. It is applied in the first place to re-transmit the lost packets and to reduce the image size for the data traffic. Nothing is better for the video teleconferencing than to provide the bandwidth enough for every user. However, on the tactical communication networks with the limited bandwidth, video teleconferencing can be improved by optimizing the compression rate of image data, the number of image frames, the audio codec and the usage of audio compensation data.

Deep Learning based Raw Audio Signal Bandwidth Extension System (딥러닝 기반 음향 신호 대역 확장 시스템)

  • Kim, Yun-Su;Seok, Jong-Won
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1122-1128
    • /
    • 2020
  • Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.

The Trend of Internet Telephony(VoIP) Technology for BcN (BcN 인터넷전화(VoIP) 기술 동향)

  • Kang, T.G.;Kim, D.Y.;Kim, Y.S.
    • Electronics and Telecommunications Trends
    • /
    • v.19 no.6 s.90
    • /
    • pp.66-73
    • /
    • 2004
  • 2004년도 정보통신부의 IT839 전략에 따르면 8대 신규 서비스로 인터넷 전화(VoIP)와 3대 첨단 인프라로 광대역 통합망(BcN)을 선정한 바 있다. 이에 본 고에서는 9대 신성장동력에서 공동으로 활용하기 위한 VoIP와 BcN을 연계한 기술 동향을 설명한다. SIP 와 H. 323에 의한 VoIP 서비스가 제공되면서 네트워크 구조 변화가 시작되었다. 유선 네트워크, 무선 이동통신 네트워크, 인터넷 등을 통합하는 유무선 통합네트워크(BcN)는 VoIP 기술로 구축될 것이다. 본 고의 구성으로는 VoIP 기술 특성을 설명하고, VoIP 표준 동향을 분석한 후에 앞으로 구축될 BcN VoIP 의 발전 방향에 대하여 설명한다. VoIP 기술은 회선 정합및 인터넷 정합 기술과 H. 323, SIP, MEGACO 등의 프로토콜 기술과 VoIP 품질을 결정하는 코덱 기술이있다. VoIP 표준은 IETF에서 제정한 SIP를 중심으로 네트워크 특성을 고려하여 3GPP, 3GPP2, ITU에서개발하고 있다. BcN VoIP 는 기존의 VoIP 기술을 발전시켜 음성 품질이 오디오 수준으로 향상되고, 유선과 무선, 방송이 통합되는 제어 기술과 연계하여 실시간 멀티미디어 서비스를 제공할 것이다.

Performance Comparison of AMR Codec Mode Allocations in Downlink WCDMA System (순방향 WCDMA 채널에서 AMR 음성 코덱 모드 할당방식에 대한 성능 비교)

  • Jeong, S.H.;Hong, J.W.;Lee, S.C.;Lie, C.H.
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.31 no.4
    • /
    • pp.349-357
    • /
    • 2005
  • The Adaptive Multi-Rate (AMR) speech codec is the mandatory for voice service in WCDMA systems. The AMR codec can be used efficiently to provide a balanced trade-off between the capacity and quality of voice by adjusting various service rates. In this paper, three ways of AMR mode allocation schemes on the downlink in WCDMA system are evaluated. To evaluate users satisfaction efficiently, new system performance measure and analytic models are proposed. The proposed analytic models can be applied to obtain optimal mode allocation ways while considering the system capacity and quality of voice. In numerical examples, the ways of finding optimal parameters are illustrated for the given traffic loads and the performances of three mode allocation schemes are compared.

Design of a variable rate speech codec for the W-CDMA system (W-CDMA 시스템을 위한 가변율 음성코덱 설계)

  • 정우성
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.142-147
    • /
    • 1998
  • Recently, 8 kb/s CS-ACELP coder of G.729 is atandardized by ITU-T SG15 and it has been reported that the speech quality of G729 is better than or equal to that of 32kb/s ADPCM. However G.729 is the fixed rate speech coder, and it does not consider the property of voice activity in mutual conversation. If we use the voice activity, we can reduce the average bit rate in half without any degradations of the speech quality. In this paper, we propose an efficient variable rate algorithm for G.729. The variable rate algorithm consists of two main subjects, the rate determination algorithm and algorithm, we combine the energy-thresholding method, the phonetic segmentation method by integration of various feature parameters obtained through the analysis procedure, and the variable hangover period method. Through the analysis of noise features, the 1 kb/s sub rate coder is designed for coding the background noise signal. So, we design the 4 kb/s sub rate coder for the unvoiced parts. The performance of the variable rate algorithm is evaluated by the comparison of speed quality and average bit rate with G.729. Subjective quality test is also done by MOS test. Conclusively, it is verified that the proposed variable rate CS-ACELP coder produced the same speech quality as G.729, at the average bit rate of 4.4 kb/s.

  • PDF