• Title/Summary/Keyword: Speech Codec

Search Result 128, Processing Time 0.025 seconds

A Study on the Reduction of LSP(Line Spectrum Pair) Transformation Time in Speech Coder for CDMA Digital Cellular System (이동통신용 음성부호화기에서의 LSP 계산시간 감소에 관한 연구)

  • Min, So-Yeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.3
    • /
    • pp.563-568
    • /
    • 2007
  • We propose the computation reduction method of real root method that is used in the EVRC(Enhanced Variable Rate Codec) system. The real root method is that if polynomial equations have the real roots, we are able to find those and transform them into LSP. However, this method takes much time to compute, because the root searching is processed sequentially in frequency region. But, the important characteristic of LSP is that most of coefficients are occurred in specific frequency region. So, to reduce the computation time of real root, we used the met scale that is linear below 1kHz and logarithmic above. In order to compare real root method with proposed method, we measured the following two. First, we compared the position of transformed LSP(Line Spectrum Pairs) parameters in the proposed method with these of real root method. Second, we measured how long computation time is reduced. The experimental result is that the searching time was reduced by about 48% in average without the change of LSP parameters.

  • PDF

A Preprocessing Approach to Improving the Quality of the Music Produced by the EVRC (EVRC 코덱으로 재생하는 음악의 품질을 개선하기 위한 전처리 기법)

  • 남영한;하태균;전윤호;김재수;박섭형
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.5C
    • /
    • pp.476-485
    • /
    • 2003
  • This paper proposers a preprocessing approach to improving the quality of the music produced by the EVRC(enhanced variable rate codec) which is one of the CDMA(Code Division Multiple Access) voice codecs. Since the EVRC is optimized only for speech signals, it can deteriorate the quality of the music passed through it. One of the problems with the EVRC-coded music is time-clipping, which usually occurs when subsequent frames are encoded at Rate l/8. Since the EVRC determines the bit rate for an input frame based on the long-term prediction gain, we increase the long-term prediction gain in order for the most of the frames to be encoded at Rate 1 or Rate 1/2. Experimental results show that the approach works well on music signals and the number of time-clipped frames is considerably reduced.

A Method For Improvement Of Split Vector Quantization Of The ISF Parameters Using Adaptive Extended Codebook (적응적인 확장된 코드북을 이용한 분할 벡터 양자화기 구조의 ISF 양자화기 개선)

  • Lim, Jong-Ha;Jeong, Gyu-Hyeok;Hong, Gi-Bong;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.1
    • /
    • pp.1-8
    • /
    • 2011
  • This paper presents a method for improving the performance of ISF coefficients quantizer through compensating the defect of the split structure vector quantization using the ordering property of ISF coefficients. And design the ISF coefficients quantizer for wideband speech codec using proposed method. The wideband speech codec uses split structure vector quantizer which could not use the correlation between ISF coefficients fully to reduce complexity and the size of codebook. The proposed algorithm uses the ordering property of ISF coefficients to overcome the defect. Using the ordering property, the codebook redundancy could be figured out. The codebook redundancy is replaced by the adaptive-extended codebook to improve the performance of the quantizer through using the ordering property, ISF coefficient prediction and interpolation of existing codebook. As a result, the proposed algorithm shows that the adaptive-extended codebook algorithm could get about 2 bit gains in comparison with the existing split structure ISF quantizer of AMR-WB (G.722.2) in the points of spectral distortion.

Improved ErtPS Scheduling Algorithm for AMR Speech Codec with CNG Mode in IEEE 802.16e Systems (IEEE 802.16e 시스템에서의 CNG 모드 AMR 음성 코덱을 위한 개선된 ErtPS 스케줄링 알고리즘)

  • Woo, Hyun-Je;Kim, Joo-Young;Lee, Mee-Jeong
    • The KIPS Transactions:PartC
    • /
    • v.16C no.5
    • /
    • pp.661-668
    • /
    • 2009
  • The Extended real-time Polling Service (ErtPS) is proposed tosupport QoS of VoIP service with silence suppression which generates variable size data packets in IEEE 802.16e systems. If the silence is suppressed, VoIP should support Comfort Noise Generation (CNG) which generates comfort noise for receiver's auditory sense to notify the status of connection to the user. CNG mode in silent-period generates a data with lower bit rate at long packet transmission intervals in comparison with talk-spurt. Therefore, if the ErtPS, which is designed to support service flows that generate data packets on a periodic basis, is applied to silent-period, resources of the uplink are used inefficiently. In this paper, we proposed the Improved ErtPS algorithm for efficient resource utilization of the silent-period in VoIP traffic supporting CNG. In the proposed algorithm, the base station allocates bandwidth depending on the status of voice at the appropriate interval by havingthe user inform the changes of voice status. The Improved ErtPS utilizes the Cannel Quality Information Channel (CQICH) which is an uplink subchannel for delivering quality information of channel to the base station on a periodic basis in 802.16e systems. We evaluated the performance of proposed algorithm using OPNET simulator. We validated that proposed algorithm improves the bandwidth utilization of the uplink and packet transmission latency

Voice Synthesis Detection Using Language Model-Based Speech Feature Extraction (언어 모델 기반 음성 특징 추출을 활용한 생성 음성 탐지)

  • Seung-min Kim;So-hee Park;Dae-seon Choi
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.3
    • /
    • pp.439-449
    • /
    • 2024
  • Recent rapid advancements in voice generation technology have enabled the natural synthesis of voices using text alone. However, this progress has led to an increase in malicious activities, such as voice phishing (voishing), where generated voices are exploited for criminal purposes. Numerous models have been developed to detect the presence of synthesized voices, typically by extracting features from the voice and using these features to determine the likelihood of voice generation.This paper proposes a new model for extracting voice features to address misuse cases arising from generated voices. It utilizes a deep learning-based audio codec model and the pre-trained natural language processing model BERT to extract novel voice features. To assess the suitability of the proposed voice feature extraction model for voice detection, four generated voice detection models were created using the extracted features, and performance evaluations were conducted. For performance comparison, three voice detection models based on Deepfeature proposed in previous studies were evaluated against other models in terms of accuracy and EER. The model proposed in this paper achieved an accuracy of 88.08%and a low EER of 11.79%, outperforming the existing models. These results confirm that the voice feature extraction method introduced in this paper can be an effective tool for distinguishing between generated and real voices.

Transcoding Algorithm for AMR and EVRC Vocoders Via Direct Parameter Transformation (AMR과 EVRC 음성부호화기를 위한 파라미터 직접 변환 방식의 상호부호화 알고리듬)

  • Lee, Sun-Il;Yu, Chang-Dong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.6
    • /
    • pp.696-708
    • /
    • 2002
  • In this paper, a novel transcoding algorithm for the Adaptive Multi Rate(AMR) and the Enhanced Variable Rate Codec(EVRC) vocoders via direct parameter transformation is proposed. In contrast to the conventional tandem transcoding algorithm, the proposed algorithm converts the parameters of one coder to the other without going through the decoding and encoding processes. The proposed algorithm consists of the parameter decoding, frame classification, mode decision, and transcoders for two frame types. The transcoders convert the parameters such as LSP, frame energy, pitch delay for the adaptive codebook, fixed codebook vector, and codebook gains. Evaluation results show that while exhibiting better computational and delay characteristics, the proposed algorithm produces equivalent speech quality to that produced by the tandem transcoding algorithm.

MPEG Audio New Standard: USAC Technology (MPEG 오디오 최신 표준: USAC 기술)

  • Lee, Tae-Jin;Kang, Kyeong-Ok;Kim, Whan-Woo
    • Journal of Broadcast Engineering
    • /
    • v.16 no.5
    • /
    • pp.693-704
    • /
    • 2011
  • As mobile devices become multi-functional, and converge into a single platform, there is a strong need for a codec that is able to provide consistent quality for speech and music contents. MPEG-D USAC standardization activities started at the 82nd MPEG meeting with a CfP and approved Study on DIS at the 96th MPEG meeting. MPEG-D USAC is converged technology of AMR-WB+ and HE-AAC V2. Specifically, USAC utilizes three core codecs (AAC, ACELP, and TCX) for low frequency regions, SBR for high frequency regions, the MPEG Surround for stereo information, and window transition technology for smoothing transition between various core coder. USAC can provide consistent sound quality for both speech and music contents and can be applied to various applications such as multi-media download to mobile devices, digital radio, mobile TV and audio books.

Same music file recognition method by using similarity measurement among music feature data (음악 특징점간의 유사도 측정을 이용한 동일음원 인식 방법)

  • Sung, Bo-Kyung;Chung, Myoung-Beom;Ko, Il-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.99-106
    • /
    • 2008
  • Recently, digital music retrieval is using in many fields (Web portal. audio service site etc). In existing fields, Meta data of music are used for digital music retrieval. If Meta data are not right or do not exist, it is hard to get high accurate retrieval result. Contents based information retrieval that use music itself are researched for solving upper problem. In this paper, we propose Same music recognition method using similarity measurement. Feature data of digital music are extracted from waveform of music using Simplified MFCC (Mel Frequency Cepstral Coefficient). Similarity between digital music files are measured using DTW (Dynamic time Warping) that are used in Vision and Speech recognition fields. We success all of 500 times experiment in randomly collected 1000 songs from same genre for preying of proposed same music recognition method. 500 digital music were made by mixing different compressing codec and bit-rate from 60 digital audios. We ploved that similarity measurement using DTW can recognize same music.

  • PDF

Real-Time Implementation of the AMR-WB Speech Codec Using TMS320VC5510 DSP (TMS320VC5510을 이용한 AMR-WB 음성부호화기의 실시간 구현)

  • Joh Jae Min;Kim Jun;Kim Jung Min;Bae Keun Sung
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.57-60
    • /
    • 2004
  • 본 연구에서는 ETSI 및 3GPP에 의해 개발된 광대역 음성부호화의 표준안인 AMR-WB 알고리즘을 분석하고, TMS320VC5510 DSK를 이용한 실시간 구현 결과를 제시하였다. AMR-WB 음성부호화기의 실시간 구현을 위해 프로그램 최적화 작업을 수행하였고, 구현된 음성 부호화기의 성능을 평가하기 위해서 프로그램 메모리와 데이터 메모리의 크기, 그리고 한 프레임당 수행 시간을 측정하였다. 구현된 시스템의 프로그램 메모리는 약65.6 kbytes, 데이터 메모리는 약 73.8kbytes 정도의 크기를 나타내었으며, 한 프레임인 20 msec를 처리하는데 소요되는 cycle 수가 평균 1,247,115 정도로 약 6.24 msec 내에 처리 할 수 있음을 보였다. 마지막으로 DSP로 구현한 AMR-WB 음성부호화기의 결과가 PC에서 시뮬레이션 한 결과와 일치함을 검증하였고 실시간으로 동작됨을 확인하였다.

  • PDF

Design of a Variable half rate speech codec (가변율 half rate 음성 부호화기의 설계)

  • 성호상
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.293-296
    • /
    • 1998
  • 본 논문에서는 다양한 멀티미디어 서비스를 위해 가변율 half rate 음성 부호화기를 설계하였다. 유, 무성음과 묵음의 구분을 위해 본 논문에서는 프레임 에너지와 음성 파라메터들을 이용한 효과적인 voicing 결정 알고리즘을 사용하였다. 유성음을 위한 half rate 음성 부호화기는 저속에서 좋은 특성을 보이는 generalized AbS구조를 이용하였다. LPC 계수는 LSP 계수로 변환한 후 predictive 2-stage VQ를 통해서 양자화하며, 여기 신호는 음질저하를 최소화하며 복잡도를 감소시킨 shift 방식의 대수적 고정 코드북 구조를 사용하고, 적응코드북과 여기코드북의 이득은 VQ로 양자화 하였다. 무성음을 위한 부호화기는 대부분이 유성음을 위한 부호화기와 동일하지만, 무성음에서는 피치간 상관도가 매우 낮으므로 피치 보간 방법을 사용하지 않고 개루프로 피치 lag를 찾은 후 전체 프레임에 사용한다. 1 kb/s 부호화기는 묵음 구간과 주변소음 구간에 사용되며 이 구간의 신호를 피치 성분이 미약한 주변소음들로 제한하고 이에 최적인 부음성 부호화기를 설계하였다. 최종적으로 완성된 가변율 half rate 부호화기는 voice activity factor(VAF)가 0.47인 시험음성에서 약 2.6 kb/s의 평균 전송률을 보였다. 주관적 음질 평가의 일환으로 IS-96 표준 코덱인 가변율 8 kb/s QCELP와 A-B preference 시험을 실시하였다. 시험 결과 평균전송률이 약 2배인 가변율 8 kb/s QCELP 보다 우수한 음질 성능을 보였다.

  • PDF