• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.026 seconds

Performance Improvement of the QCELP using an Efficient LSF Coding (효율적인 LSF 양자화기를 이용한 QCELP 성능개선)

  • Kim, Hae-Jin;Kang, Sang-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1
    • /
    • pp.10-15
    • /
    • 1997
  • In this paper, an efficient LSF quantizer, named improved PSVQ(IPSVQ), is proposed to apply in the 8 kbps QCELP speech coder. By using 27 bits IPSVQ instead of 40 bits DPCM quantizer per frame, we can save 13 bits/frame and allocate those bits to the codebook gain and the pitch gain parameters. Hence we improve the overall performance of the QCELP codec. The enhanced QCELP shows the performance improvement of 0.9 dB SNR and 0.4 dB SEGSNR. Informal listening tests also confirm the improvement in the speech quality.

  • PDF

Study on optimal number of latent source in speech enhancement based Bayesian nonnegative matrix factorization (베이지안 비음수 행렬 인수분해 기반의 음성 강화 기법에서 최적의 latent source 개수에 대한 연구)

  • Lee, Hye In;Seo, Ji Hun;Lee, Young Han;Kim, Je Woo;Lee, Seok Pil
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.418-420
    • /
    • 2015
  • 본 논문은 베이지안 비음수 행렬 인수분해 (Bayesian nonnegative matrix factorization, BNMF) 기반의 음성 강화 기법에서 음성과 잡음 성분의 latent source 수에 따른 강화성능에 대해 서술한다. BNMF 기반의 음성 강화 기법은 입력 신호를 서브 신호들의 합으로 분해한 후, 잡음 성분을 제거하는 방식으로 그 성능이 기존의 NMF 기반의 방법들보다 우수한 것으로 알려져 있다. 그러나 많은 계산량과 latent source 의 수에 따라 성능의 차이가 있다는 단점이 있다. 이러한 단점을 개선하기 위해 본 논문에서는 BNMF 기반의 음성 강화 기법에서 최적의 latent source 개수를 찾기 위한 실험을 진행하였다. 실험은 잡음의 종류, 음성의 종류, 음성과 잡음의 latent source 의 개수, 그리고 SNR 을 바꿔가며 진행하였고, 성능 평가 방법으로 PESQ (perceptual evaluation of speech quality) 를 이용하였다. 실험 결과, 음성의 latent source 개수는 성능에 영향을 주지 않지만, 잡음의 latent source 개수는 많을수록 성능이 좋은 것으로 확인되었다.

  • PDF

Development of Digital Endoscopic Data Management System (디지탈 내시경 데이터 management system의 개발)

  • Song, C.G.;Lee, S.M.;Lee, Y.M.;Kim, W.K.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1996 no.11
    • /
    • pp.304-306
    • /
    • 1996
  • Endoscopy has become a crucial diagnostic and theraputic procedure in clinical areas. Over the past three years, we have developed a computerized system to record and store clinical data pertaining to endoscopic surgery of laparascopic cholesystectomy, peviscopic endometriosis, and surgical arthroscopy. In this study, we are developed computer system, which is composed of frame grabber, sound board, VCR control board, LAN card and EDMS(endoscopic data management software). Also, computer system has controled over peripheral instruments as a color video printer, video cassette recorder, and endoscopic input/output signals(image and doctor's speech). Also, we are developed one body system of camels control unit including an endoscopic miniature camera and light source. Our system offer unsurpassed image quality in terms of resolution and color fidelity. Digital endoscopic data management system is based on open architecture and a set of widely available industry standards, namely: windows 3.1 as a operating system, TCP/IP as a network protocol and a time sequence based database that handles both an image and drctor's speech synchronized with endoscopic image.

  • PDF

Speech enhancement based on reinforcement learning (강화학습 기반의 음성향상기법)

  • Park, Tae-Jun;Chang, Joon-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.335-337
    • /
    • 2018
  • 음성향상기법은 음성에 포함된 잡음이나 잔향을 제거하는 기술로써 마이크로폰으로 입력된 음성신호는 잡음이나 잔향에 의해 왜곡되어지므로 음성인식, 음성통신 등의 음성신호처리 기술의 핵심 기술이다. 이전에는 음성신호와 잡음신호 사이의 통계적 정보를 이용하는 통계모델 기반의 음성향상기법이 주로 사용되었으나 통계 모델 기반의 음성향상기술은 정상 잡음 환경과는 달리 비정상 잡음 환경에서 성능이 크게 저하되는 문제점을 가지고 있었다. 최근 머신러닝 기법인 심화신경망 (DNN, deep neural network)이 도입되어 음성 향상 기법에서 우수한 성능을 내고 있다. 심화신경망을 이용한 음성 향상 기법은 다수의 은닉 층과 은닉 노드들을 통하여 잡음이 존재하는 음성 신호와 잡음이 존재하지 않는 깨끗한 음성 신호 사이의 비선형적인 관계를 잘 모델링하였다. 이러한 심화신경망 기반의 음성향상기법을 향상 시킬 수 있는 방법 중 하나인 강화학습을 적용하여 기존 심화신경망 대비 성능을 향상시켰다. 강화학습이란 대표적으로 구글의 알파고에 적용된 기술로써 특정 state에서 최고의 reward를 받기 위해 어떠한 policy를 통한 action을 취해서 다음 state로 나아갈지를 매우 많은 경우에 대해 학습을 통해 최적의 action을 선택할 수 있도록 학습하는 방법을 말한다. 본 논문에서는 composite measure를 기반으로 reward를 설계하여 기존 PESQ (Perceptual Evaluation of Speech Quality) 기반의 reward를 설계한 기술 대비 음성인식 성능을 높였다.

Experimental Results of SSB Modem in Shallow Sea (천해에서 SSB 모뎀의 실험결과 분석)

  • Ju, Hyng-Jun;Han, Jung-Woo;Kim, Ki-Man
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.6
    • /
    • pp.990-998
    • /
    • 2008
  • In this paper we achieve experimental data evaluation using SSB(Single-side band) modulation in the ocean. Present research in underwater communication is applying digital modulation, OFDM and MIMO system. However, Commercial modems using analog modulation techniques in oceans. So, we achieved experimental for modem appliance development of correct high quality in South Korea sea characteristics. This experimets achievd useing SSB analog modulation in Jin-hae shore of shallow water condition. Used data are tonal and LFM signal for getting underwater channel characterisitcs and female Korean speech for speech communications.

Perioperative Management of the Voice in Thyroid Cancer (갑상선암 수술과 수술 전후 음성관리)

  • Yoon, So Yeon;Hong, Hyun Jun
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.31 no.2
    • /
    • pp.49-55
    • /
    • 2020
  • Evaluating the patient's voice before thyroidectomy is useful for the purpose of identifying patients with vocal cord paralysis without symptoms, identifying other patient's voice abnormalities, and whether it is related to voice disorders that may occur after surgery. Also voice evaluation after thyroid surgery is helpful in diagnosis, treatment, and rehabilitation and follow-up of voice disorders that occur without clear nerve damage after thyroidectomy. And it is helpful for rapid recovery through active early rehabilitation treatment for patients who complain of speech impairment without paralysis. In particular, neck exercise can improve the adhesion of the surgical site and increase the range of motion of the neck as well as improve subjective neck discomfort. In addition, hearing, voice and breathing functions should be improved, and voice hygiene education and counseling should be provided. Vocal cord injection is the first treatment option for unilateral vocal cord palsy. By establishing a protocol for voice disorders before and after thyroid surgery and providing appropriate treatment, the quality of life of patients can be improved.

Current status and evolution of microsurgical tongue reconstructions, part I

  • Choi, Jong-Woo;Alshomer, Feras;Kim, Young-Chul
    • Archives of Craniofacial Surgery
    • /
    • v.23 no.4
    • /
    • pp.139-151
    • /
    • 2022
  • Reconstructive surgery in the management of head and neck cancer has evolved to include structure-specific approaches in which organ-specific treatment algorithms help optimize outcomes. Tongue cancer management and reconstruction are surgical challenges for which well-executed reconstructive plans should be completed promptly to avoid delaying any subsequently planned oncologic treatment. Crucial considerations in tongue cancer resection are the significant functional morbidity associated with surgical defects, particularly in terms of speech and swallowing, and the consequent negative impact on patients' quality of life. With the evolution of microsurgical techniques and the development of the perforator flap concept, flap options can be tailored to the characteristics of various tongue defects. This has allowed the implementation of pliable flaps that can help restore tongue mobility and yield subsequent functional outcomes. Using an evolutional framework, we present this series of reviews related to tongue reconstruction. The first part of the review summarizes flap options and flap-related factors, such as volume and tissue characteristics. Related functional aspects are also presented, including tongue mobility, speech, and swallowing, as well as ways to evaluate and optimize these outcomes.

An efficient transcoding algorithm for AMR and G.723.1 speech coders and performance evaluation (AMR과 G.723.1 음성부호화기를 위한 효율적인 상호부호화 알고리듬 및 성능평가)

  • 최진규;윤성완;강홍구;윤대희
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.4
    • /
    • pp.121-130
    • /
    • 2004
  • In the application requiring the interoperability of different networks such as VoIP and wireless communication system, two speech codecs must work together with the structure of cascaded connection, tandem. Tandem has several problems such as long delay, high complexity and quality degradation due to twice complete encoding/decoding process. Transcoding is one of the best solutions to solve these problems. Transcoding algorithm is varied with the structure of source and target coder. In this paper, transcoding algorithm including the LSP conversion, the pitch estimation and new perceptual weighting filter for reducing complexity and improving qualify is proposed. These algorithms are applied to the pair of AMR md G.723.1. By employing the proposed algorithms in the transcoder, the complexity is reduced by about 20%-58% and quality is improved compared to tandem.

An Experimental Study on Compressive Strength and the Chloride Content of Concrete with Substitution Ratio of Recycled Fine Aggregate and Limestone Power (순환잔골재 및 석회석 미분말 치환율에 따른 콘크리트 강도와 염화물량에 관한 실험적 연구)

  • Lee, Soo-Hyung;Kong, Tae-Woong;Jang, Jae-Hwan;Lee, Han-Baek
    • Proceedings of the Korea Concrete Institute Conference
    • /
    • 2008.11a
    • /
    • pp.597-600
    • /
    • 2008
  • Correspond in chloride content increase by sea sand uses of bad quality using recycled fine aggregate in this research. together, examined basic properties of matter for activation of been using recycled fine aggregate use definitely. Also, super fundamental principles that is shortcoming that blast furnace slag differential speech has prevents problem of decline and change of countenance limestone power differential speech by purpose to contribute in early age strength as Filler role special quality examine. As experiment result, compressive strength at recycled fine aggregate 10%, limestone power 20% metathesis the highest compressive strength value appear, According to recycled fine aggregate metathesis rate increase, the chloride content reduced by 0.127 ㎏/m$^3$s(metathesis rate 0%), 0.119 ㎏/m$^3$s (metathesis rate 10%), 0.112 ㎏/m$^3$s (metathesis rate l20%), 0.097 ㎏/m$^3$s (metathesis rate 30%).

  • PDF

MPEG Audio New Standard: USAC Technology (MPEG 오디오 최신 표준: USAC 기술)

  • Lee, Tae-Jin;Kang, Kyeong-Ok;Kim, Whan-Woo
    • Journal of Broadcast Engineering
    • /
    • v.16 no.5
    • /
    • pp.693-704
    • /
    • 2011
  • As mobile devices become multi-functional, and converge into a single platform, there is a strong need for a codec that is able to provide consistent quality for speech and music contents. MPEG-D USAC standardization activities started at the 82nd MPEG meeting with a CfP and approved Study on DIS at the 96th MPEG meeting. MPEG-D USAC is converged technology of AMR-WB+ and HE-AAC V2. Specifically, USAC utilizes three core codecs (AAC, ACELP, and TCX) for low frequency regions, SBR for high frequency regions, the MPEG Surround for stereo information, and window transition technology for smoothing transition between various core coder. USAC can provide consistent sound quality for both speech and music contents and can be applied to various applications such as multi-media download to mobile devices, digital radio, mobile TV and audio books.