Search | Korea Science

Real-time implementation of speaker dependent speech recognition hardware module using the TMS320C32 DSP (TMS320C32 DSP를 이용한 실시간 화자종속 음성인식 하드뒈어 모듈 구현)

Chung, Hoon;Chung, Ik-joo
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.14-22
- /
- 1998
본 연구에서는 Texas instruments 사의 저가형 부동소수점 디지털 신호 처리기인 TMS320C32를 이용하여 실시간 화자종속 음성인식 하드웨어 모듈을 개발하였다. 하드웨어 모듈의구성은 40MHz 의 TMS320C32, 14bit 코덱인 TLC32044, EPROM 과 SRAM 등의 메모리와 호스트 인터페이스를 위한 로직회로로 이루어져 있다. 뿐만 아니라 이 하드웨어 모듈을 PC 상에서 평가해보기 위한 PC 인터페이스용 보드 및 소프트웨어도 개발하였다. 음성인식 알고리즘은 C 및 어셈블리를 이용한 최적화를 통하여 계산속도를 대폭 개선하였다. 현재 인식률은 일반 사무실 환경에서 30단어에 대하여 95% 이상으로 매우 높은 편이며, 특히 배경음악이나 자동차 소음과 같은 잡음환경에서도 잘 동작한다.
PDF

Design of a Variable half rate speech codec (가변율 half rate 음성 부호화기의 설계)

성호상
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.293-296
- /
- 1998
본 논문에서는 다양한 멀티미디어 서비스를 위해 가변율 half rate 음성 부호화기를 설계하였다. 유, 무성음과 묵음의 구분을 위해 본 논문에서는 프레임 에너지와 음성 파라메터들을 이용한 효과적인 voicing 결정 알고리즘을 사용하였다. 유성음을 위한 half rate 음성 부호화기는 저속에서 좋은 특성을 보이는 generalized AbS구조를 이용하였다. LPC 계수는 LSP 계수로 변환한 후 predictive 2-stage VQ를 통해서 양자화하며, 여기 신호는 음질저하를 최소화하며 복잡도를 감소시킨 shift 방식의 대수적 고정 코드북 구조를 사용하고, 적응코드북과 여기코드북의 이득은 VQ로 양자화 하였다. 무성음을 위한 부호화기는 대부분이 유성음을 위한 부호화기와 동일하지만, 무성음에서는 피치간 상관도가 매우 낮으므로 피치 보간 방법을 사용하지 않고 개루프로 피치 lag를 찾은 후 전체 프레임에 사용한다. 1 kb/s 부호화기는 묵음 구간과 주변소음 구간에 사용되며 이 구간의 신호를 피치 성분이 미약한 주변소음들로 제한하고 이에 최적인 부음성 부호화기를 설계하였다. 최종적으로 완성된 가변율 half rate 부호화기는 voice activity factor(VAF)가 0.47인 시험음성에서 약 2.6 kb/s의 평균 전송률을 보였다. 주관적 음질 평가의 일환으로 IS-96 표준 코덱인 가변율 8 kb/s QCELP와 A-B preference 시험을 실시하였다. 시험 결과 평균전송률이 약 2배인 가변율 8 kb/s QCELP 보다 우수한 음질 성능을 보였다.
PDF

Improvement of VAD Performance for the Reduction of the Bit Rate Under the Noise Environment in the G.723.1 (잡음 환경에서의 전송률 감소를 위한 G.723.1 음성활동 검출기 성능 개선에 관한 연구)

김정진;장경아;배명진
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.42-47
- /
- 2001
This paper improves the performance of VAD (Voice Activity Detector) in G.723.1 Annex A 6.3kbps/5.3kbps dual rate speech coder, which is developed for Internet Phone and videoconferencing. The VAD decision is based on a three-level energy threshold. We evaluates for processing time, speech quality, and bit rate. The processing time is reduced due to the accuracy of VAD decision on the silence period. On subjective quality test there is almost no difference compared with the G.723.1. In order to measure the bit rate we count the active speech frame (VAD=1) and we can reduce more bit rate as silence periods are shown.
PDF

Real-time implementation of the G.723.1 coder using TMS320C5409 (TMS320C5409를 이용한 G.723.1음성 코덱의 실시간 구현)

Lee Dong-Won;Son Chang-Yong;Kim Ji-Saeng;Cho Jang-Hyung;Kang Sang-Won
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.23-26
- /
- 2000
본 논문에서는 국제 통신 표준화기구인 ITU-T에서 인터넷폰과 화상회의를 목적으로 채택된 G.723.1 음성 부호화 시스템을 TMS320C5409를 이용하여 전 과정을 어셈블리어로 실시간 구현하였다. 구현된 G.723.1 음성 부호화기는 6.3kbps 전송률일 때 인코더 25.75MIPS이고 디코더 1.99MIPS의 최대 복잡도를 나타내고, 5.3kbps 전송률일 때 인코더 17.69MIPS이고 디코더 1.9Ml PS의 최대 복잡도를 나타낸다. 사용된 메모리는 program ROM llkwords, data ROM(table) 9.45kwords, RAM 2.8kwords 정도이며, 실시간 처리된 출력음성은 C simulation결과와 같은 음질을 보였다. 구현된 G.723.1 음성 부호화기는 ITU-T에서 제공되는17개의 테스트 벡터를 모두bit-exact하게 통과하였다.
PDF

Speech/Mixed Content Signal Classification Based on GMM Using MFCC (MFCC를 이용한 GMM 기반의 음성/혼합 신호 분류)

Kim, Ji-Eun;Lee, In-Sung
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.2
- /
- pp.185-192
- /
- 2013
In this paper, proposed to improve the performance of speech and mixed content signal classification using MFCC based on GMM probability model used for the MPEG USAC(Unified Speech and Audio Coding) standard. For effective pattern recognition, the Gaussian mixture model (GMM) probability model is used. For the optimal GMM parameter extraction, we use the expectation maximization (EM) algorithm. The proposed classification algorithm is divided into two significant parts. The first one extracts the optimal parameters for the GMM. The second distinguishes between speech and mixed content signals using MFCC feature parameters. The performance of the proposed classification algorithm shows better results compared to the conventionally implemented USAC scheme.
https://doi.org/10.5573/ieek.2013.50.2.185 인용 PDF KSCI

Carving deleted voice data in mobile (삭제된 휴대폰 음성 데이터 복원 방법론)

Kim, Sang-Dae;Byun, Keun-Duck;Lee, Sang-Jin
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.22 no.1
- /
- pp.57-65
- /
- 2012
People leave voicemails or record phone conversations in their daily cell phone use. Sometimes important voice data is deleted by the user accidently, or purposely to cover up criminal activity. In these cases, deleted voice data must be able to be recovered for forensics, since the voice data can be used as evidence in a criminal case. Because cell phones store data that is easily fragmented in flash memory, voice data recovery is very difficult. However, if there are identifiable patterns for the deleted voice data, we can recover a significant amount of it by researching images of it. There are several types of voice data, such as QCP, AMR, MP4, etc.. This study researches the data recovery solutions for EVRC codec and AMR codec in QCP file, Qualcumm's voice data format in cell phone.
https://doi.org/10.13089/JKIISC.2012.22.1.57 인용 PDF KSCI HTML

Real-time Implementation of the AMR Speech Coder Using $OakDSPCore^{\circledR}$ ($OakDSPCore^{\circledR}$를 이용한 적응형 다중 비트 (AMR) 음성 부호화기의 실시간 구현)

이남일;손창용;이동원;강상원
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.6
- /
- pp.34-39
- /
- 2001
An adaptive multi-rate (AMR) speech coder was adopted as a standard of W-CDMA by 3GPP and ETSI. The AMR coder is based on the CELP algorithm operating at rates ranging from 12.2 kbps down to 4.75 kbps, and it is a source controlled codec according to the channel error conditions and the traffic loading. In this paper, we implement the DSP S/W of the AMR coder using OakDSPCore. The implementation is based on the CSD17C00A chip developed by C&S Technology, and it is tested using test vectors, for the AMR speech codec, provided by ETSI for the bit exact implementation. The DSP B/W requires 20.6 MIPS for the encoder and 2.7 MIPS for the decoder. Memories required by the Am coder were 21.97 kwords, 6.64 kwords and 15.1 kwords for code, data sections and data ROM, respectively. Also, actual sound input/output test using microphone and speaker demonstrates its proper real-time operation without distortions or delays.
PDF

Modified Generic Mode Coding Scheme for Enhanced Sound Quality of G.718 SWB (G.718 초광대역 코덱의 음질 향상을 위한 개선된 Generic Mode Coding 방법)

Cho, Keun-Seok;Jeong, Sang-Bae
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.119-125
- /
- 2012
This paper describes a new algorithm for encoding spectral shape and envelope in the generic mode of G.718 super-wide band (SWB). In the G.718 SWB coder, generic mode coding and sinusoidal enhancement are used for the quantization of modified discrete cosine transform (MDCT)-based parameters in the high frequency band. In the generic mode, the high frequency band is divided into sub-bands and for every sub-band the most similar match with the selected similarity criteria is searched from the coded and envelope normalized wideband content. In order to improve the quantization scheme in high frequency region of speech/audio signals, the modified generic mode by the improvement of the generic mode in G.718 SWB is proposed. In the proposed generic mode, perceptual vector quantization of spectral envelopes and the resolution increase for spectral copy are used. The performance of the proposed algorithm is evaluated in terms of objective quality. Experimental results show that the proposed algorithm increases the quality of sounds significantly.
https://doi.org/10.13064/KSSS.2012.4.3.119 인용 PDF

Developing of Multipoint Video Chatting System using H.323 (H.323을 이용한 다자간 영상 채팅 시스템 개발)

나형용;김상길;최영식;이상홍;최연성;박한엽;신현숙;김동호;김동욱
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2000.05a
- /
- pp.331-335
- /
- 2000
영상 채팅은 기존의 문자 위주의 채팅에서 벗어나 멀티미디어 시대에 맞출 영상과 음성을 지원할 뿐만 아니라, 파일을 전송할 수도 있는 멀티미디어 기반의 채팅 시스템이다. 특히, 이 멀티미디어 채팅 시스템을 통해 화상통화를 할 수 있고, 또한 국제 전화 및 시외전화를 사용할 수 있다. 본 논문에서는 기존 문자기반 채팅 서비스와 차별화 된 고품질의 멀티미디어 데이팅 시스템을 개발하여, 네트워크를 통해 5인 이상이 참여하는 데이팅에서 참가자의 음성과 문자는 실시간 전송된다 초당 15 프레임 이상 비디오를 동시에 송수신 할 수 있다. 기존의 방법과는 달리 본 시스템은 ITU의 영상회의 표준인 H.323에 근거하여 수정하여 사용하였으며, 실질적으로 가장 중요한 비디오$\boxUl$오디오 코덱은 권고안에 충실히 따랐다.
PDF

Implementation of GSM Full Rate vocoder for the GSM mobile modem chip (GSM방식 단말기용 모뎀칩을 위한 GSM Full Rate 보코더 구현)

Lee Dong-Won
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.9-12
- /
- 2001
본 논문에서는 유럽 통신 표준화기구인 ETSI 의 SMGll에서 채택된 GSM Full Rate(FR) 보코더 알고리wma[1]을 Teak DSP Core를 이용하여 실시간 구현하였다. GSM FR 보코더는 유럽에서 사용하는 통신 시스템인 GSM 의 full-rate Traffic Channel(TCH)의 표준 코덱[2]으로서 GSM HR, GSM EFR GSM AMR과 더불어 모뎀칩 내에 장착되는 필수적인 음성 서비스이다. 구현된 GSM FR는 13.05kbps의 비트율을 가지고 있으며, 인코더와 디코더 기능 외에 voice activity detection(VAD)[3]블록과 DTX[4]블록 등의 부가 기능도 구현되어 있다. 구현에 사용된 Teak[5]는 DSP Group 의 16bit고정 소수점 DSP core로서 최대 140MIPS 의 성능을 낼 수 있고 400bits ALU 와 두개의 MAC 이 장착되어 있어 음성 및 채널 부호화기의 실시간 처리에 최적화 되어있다. 구현된 GSM FR 은 인코더와 디코더 부분이 각각 약 235 MIPS 및 1.19MIPS 의 복잡도를 나타내며, 사용된 메모리는 프로그램 ROM 3.9K words, 데이터 ROM(table) 396 words 및 RAM 932words이다.
PDF

Search Result 119, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)