Search | Korea Science

Real-time Implementation or AMR-WB Speech Coder Using TMS320C5509 DSP (TMS320C5509 DSP를 이용한 AMR-WB 음성부호화기의 실시간 구현)

Choi Song-ln;Jee Deock-Gu
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.1
- /
- pp.52-57
- /
- 2005
The adaptive multirate wideband (AMR-WB) speech coder has an extended audio bandwidth from 50 Hz to 7 kBz and operates on nine speech coding bit-rates from 6.6 to 23.85 kbit/s. In this Paper, we present the real-time implementation of AMR-WB speech coder using 16bit fixed-point TMS320C5509 that has dual MAC units. Firstly, We implemented AMR-WB speech coder in C 1anguage level using intrinsics, and then performed optimization in assembly language. The computational complexity of the implemented AMR-WB coder at 23.85 kbit/s is 42.9 Mclocks. And this coder needs the program memory of 15.1 kwords, data ROM of 9.2 kwords and data RAM of 13.9 kwords.
PDF KSCI

Limitations of Spectrogram Analysis for Smartphone Voice Recording File Forgery Detection (스마트폰 음성 녹음 파일 위변조 검출을 위한 스펙트로그램 분석의 한계점)

Sangmin Han;Yeongmin Son;Jae Wan Park
- The Journal of the Convergence on Culture Technology
- /
- v.9 no.2
- /
- pp.545-551
- /
- 2023
As digital information is readily available to everyone today, the adoption of digital evidence is increasing. However, it is virtually impossible to determine the authenticity of forgery in the case of a voice recording file that has gone through a sophisticated editing process along with the spread of various voice file editing tools. This study aims to prove that forgery, which is difficult to distinguish from the original file, is possible by using insertion, deletion, linking, and synthetic editing technologies in voice recording files. This study presents the difficulty of detecting forgery by encoding a forged voice file with the same extension as the original. In addition, it was shown that forgery detection is impossible if additional transition band deletion and secondary encoding are performed only for experiments in which features occurred. Through this, this study is expected to contribute to the establishment of more stringent evidence admissibility criteria for adopting voice recording files as digital evidence.
https://doi.org/10.17703/JCCT.2023.9.2.545 인용 PDF

A Noble Decoding Algorithm Using MLLR Adaptation for Speaker Verification (MLLR 화자적응 기법을 이용한 새로운 화자확인 디코딩 알고리듬)

김강열;김지운;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.2
- /
- pp.190-198
- /
- 2002
In general, we have used the Viterbi algorithm of Speech recognition for decoding. But a decoder in speaker verification has to recognize same word of every speaker differently. In this paper, we propose a noble decoding algorithm that could replace the typical Viterbi algorithm for the speaker verification system. We utilize for the proposed algorithm the speaker adaptation algorithms that transform feature vectors into the region of the client' characteristics in the speech recognition. There are many adaptation algorithms, but we take MLLR (Maximum Likelihood Linear Regression) and MAP (Maximum A-Posterior) adaptation algorithms for proposed algorithm. We could achieve improvement of performance about 30% of EER (Equal Error Rate) using proposed algorithm instead of the typical Viterbi algorithm.
PDF KSCI

On a Pitch Point Detection by Preserving the Phase Component of the Autocorrelation Function (자기상관함수에서 위상 성분의 보존에 의한 피치 시점 검출에 관한 연구)

함명규;최성영;박종철;배명진
- Proceedings of the IEEK Conference
- /
- 2000.09a
- /
- pp.799-802
- /
- 2000
음성신호처리 분야에서 음성신호의 기본 주파수를 정확히 검출 할 수 있다면 음성인식을 할 때 화자에 따른 영향을 줄일 수 있으므로 인식의 정확도를 높일 수 있고, 음성합성을 할 때 자연성과 개성을 쉽게 변경하거나 유지할 수 있다. 또한 분석을 할 때 피치에 동기시켜 분석하면 성문의 영향이 제거된 정확한 성도 파라미터를 얻을 수 있다. 위와 같은 피치검출의 중요성 때문에 피치검출에 대하여 다양한 방법 이 제안되었다〔1〕. 본 논문에서는 음성신호의 분석 시 불안정한 구간에 대해 피치 시점을 검출하는 방법을 연구하였다. 음성신호의 분석에 있어서 기존의 자기상관함수법(Autocorrelation Function)은 주기성을 강조할 수 있다는 장점을 가지고 있다. 그러나 자기상관함수는 위상성분을 보존하지 못한다는 단점을 가지고 있다. 따라서, 자기상관함수를 사용하면서 위상성분을 보존할 수 있는 알고리즘을 제안하고자 한다. 실험결과 피치시점을 수동으로 찾은 경우와 비교하였을 때 약 98% 정도의 정확도를 얻을 수 있었다. 위의 결과와 같이 위상 성분이 보존된 자기상관함수를 사용할 경우 음성합성, 코딩, 인식에서 유용하게 쓰일 수 있다.
PDF

Comparative study of Korean speech recognition based on SpecAugment and Kaldi (SpecAugment와 Kaldi기반 한국어 음성인식 비교 연구)

Lee, Seounghoon;Park, Chanjun;Seo, Jaehyung;Kim, Gyeongmin;Lim, Heuiseok
- Annual Conference on Human and Language Technology
- /
- 2021.10a
- /
- pp.152-157
- /
- 2021
Kaldi는 음성인식 오픈소스 플랫폼이며 많은 기업에서 이를 이용하여 비즈니스 및 연구를 진행하고 있다. 그러나 국문으로 된 Kaldi에 대한 자세한 모듈 설명과 활용법은 아직 미비한 실정이다. 본 논문은 음성인식 오픈소스인 Kaldi에 대한 각 모듈별 자세한 설명과 더불어 데이터 증강 기법인 SpecAugment를 한국어 음성인식 시스템에 적용하여 성능 향상 여부를 검증하였다. 그리고 Kaldi의 음향모델과 언어모델을 변경하면서 어떠한 모듈들로 구성된 한국어 음성인식 모델을 사용하는 것이 가장 결과가 좋은 지를 검증하고 실시간 디코딩에 있어서 실용적인지를 비교하였다.
PDF

Network Coding-based Delay Reduction for Voice Traffic in Large-scale Wireless Sensor Networks (대규모 무선 센서네트워크에서 네트워크 코딩 기반의 음성 트래픽을 위한 딜레이 감소 방안)

Kim, Kyoung-Hwan;Joe, In-Whee
- Proceedings of the KAIS Fall Conference
- /
- 2010.11a
- /
- pp.438-442
- /
- 2010
무선 센서 네트워크 기술이 발전됨에 따라 소규모 무선 센서 네트워크에서 대규모 무선 센서 네트워크로 변하고 있으며, 이로 인하여 대규모 무선 센서 네트워크를 효율적으로 관리하기 위하여 여러 연구가 진행되고 있다. 본 논문에서는 대규모 무선 센서 네트워크를 효율적으로 관리하는 클러스터 기법을 사용한다. 또한 음성 정보를 전송하기 위해 네트워크 코딩 기법을 사용하여 수집된 자료를 목표지점까지 전달하는데 걸리는 딜레이 시간을 줄이는 방법을 제안한다.
PDF

Tree Coding Combined with TDHS for Speech Coding (트리코딩과 시영역 하모닉 스케일링을 결합한 음성 부호화)

이인성;구본응
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.2
- /
- pp.50-55
- /
- 1998
트리코딩과 시영역 하모닉 스케일링을 결합하여 6.4 및 4.8 kbits/s급 음성부호화기 를 제안하였다. 부호화기는 완전 후방 적응적이고 또 하모닉 스케일링 때문에 저지연은 아 니다. 부호화기의 에러 성능을 향상시키기 위하여 트리코더에 새로운 적응 피치 예측기, 적 응 이득 함수, 단구간 적응 예측 알고리듬 등을 제안하였다. 새로운 코드 트리와 적응 이득 함수, 새로운 후방 적응 피치 예측기, 잡음에 강인한 단구간 적응 예측 알고리듬 등을 이상 적인 채널과 잡음의 영향을 받는 채널에 대하여 각각 그 성능을 평가하였다. 두 문장씩 쌍 으로 비교한 청취실험 결과, 6.4kbits/s coder (2-to-1 TDHS/2 bits/sample tree coding)의 음질은 6400samples/s로 표본화된 6-bit logPCM의 음질과 대등하였다.
PDF

Conversational Quality Measurement System for Mobile VoIP Speech Communication (모바일 VoIP 음성통신을 위한 대화음질 측정 시스템)

Cho, Jae-Man;Kim, Hyoung-Gook
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.10 no.4
- /
- pp.71-77
- /
- 2011
In this paper, we propose a conversational quality measurement (CQM) system for providing the objective QoS of high quality mobile VoIP voice telecommunication. For measuring the conversational quality, the VoIP telecommunication system is implemented in two smart phones connected with VoIP. The VoIP telecommunication system consists of echo cancellation, noise reduction, speech encoding/decoding, packet generation with RTP (Real-Time Protocol), jitter buffer control and POS (Play-out Schedule) with LC (loss Concealment). The CQM system is connected to a microphone and a speaker of each smart phone. The voice signal of each speaker is recorded and used to measure CE (Conversational Efficiency), CS (Conversational Symmetry), PESQ (Perceptual Evaluation of Speech Quality) and CE-CS-PESQ correlation. We prove the CQM system by measuring CE, CS and PESQ under various SNR, delay and loss due to IP network environment.
PDF KSCI

Language Specific CTC Projection Layers on Wav2Vec2.0 for Multilingual ASR (다국어 음성인식을 위한 언어별 출력 계층 구조 Wav2Vec2.0)

Lee, Won-Jun;Lee, Geun-Bae
- Annual Conference on Human and Language Technology
- /
- 2021.10a
- /
- pp.414-418
- /
- 2021
다국어 음성인식은 단일언어 음성인식에 비해 높은 난이도를 보인다. 하나의 단일 모델로 다국어 음성인식을 수행하기 위해선 다양한 언어가 공유하는 음성적 특성을 모델이 학습할 수 있도록 하여 음성인식 성능을 향상시킬 수 있다. 본 연구는 딥러닝 음성인식 모델인 Wav2Vec2.0 구조를 변경하여 한국어와 영어 음성을 하나의 모델로 학습하는 방법을 제시한다. CTC(Connectionist Temporal Classification) 손실함수를 이용하는 Wav2Vec2.0 모델의 구조에서 각 언어마다 별도의 CTC 출력 계층을 두고 각 언어별 사전(Lexicon)을 적용하여 음성 입력을 다른 언어로 혼동되는 경우를 원천적으로 방지한다. 제시한 Wav2Vec2.0 구조를 사용하여 한국어와 영어를 잘못 분류하여 음성인식률이 낮아지는 문제를 해결하고 더불어 제시된 한국어 음성 데이터셋(KsponSpeech)에서 한국어와 영어를 동시에 학습한 모델이 한국어만을 이용한 모델보다 향상된 음성 인식률을 보임을 확인하였다. 마지막으로 Prefix 디코딩을 활용하여 언어모델을 이용한 음성인식 성능 개선을 수행하였다.
PDF

Design of EVRC LSP Codebooks with Korean (한국어에 의한 EVRC LSP 코드북 설계)

이진걸
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.2
- /
- pp.167-172
- /
- 2002
The EVRC (Enhanced Variable Rate Codec) is currently in service as a speech cosec in digital cellular systems in North America and Korea. In the EVRC, the LSP (Line Spectral Pairs) related to energy distribution of speech signals in the frequency domain are coded by weighted split vector quantization. Considering that the LSP codebooks might be trained with the language of the develop country of the codebooks or English, it is expected that codebooks trained with Korean provide the performance improvements in the communication in Korean. In this paper, the EVRC LSP codebooks are designed with korean adopting the LBG algorithm based vector quantization, and the performance improvement of the vector quantization and the accompanying speech quality improvement are demonstrated by spectral distortion, SNR and SegSNR measurements, respectively.
PDF KSCI

Search Result 127, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)