통합 검색 | Korea Science

자동차 잡음 및 오디오 출력신호가 존재하는 자동차 실내 환경에서의 강인한 음성인식 (Robust Speech Recognition in the Car Interior Environment having Car Noise and Audio Output)

박철호;배재철;배건성
- 대한음성학회지:말소리
- /
- 제62호
- /
- pp.85-96
- /
- 2007
In this paper, we carried out recognition experiments for noisy speech having various levels of car noise and output of an audio system using the speech interface. The speech interface consists of three parts: pre-processing, acoustic echo canceller, post-processing. First, a high pass filter is employed as a pre-processing part to remove some engine noises. Then, an echo canceller implemented by using an FIR-type filter with an NLMS adaptive algorithm is used to remove the music or speech coming from the audio system in a car. As a last part, the MMSE-STSA based speech enhancement method is applied to the out of the echo canceller to remove the residual noise further. For recognition experiments, we generated test signals by adding music to the car noisy speech from Aurora 2 database. The HTK-based continuous HMM system is constructed for a recognition system. Experimental results show that the proposed speech interface is very promising for robust speech recognition in a noisy car environment.
PDF

단음절 합성단위음을 사용한 시간영역에서의 한국어 다음절어 규칙합성을 위한 음절간 접속구간에서의 에너지 흐름 제어에 관한 연구 (On the Control of Energy Flow between the Connection Parts of Syllables for the Korean Multi-Syllabic Speech Synthesis in the Time Domain Using Mono-syllables as a Synthesis Unit)

강찬희;김윤석
- 한국통신학회논문지
- /
- 제24권9B호
- /
- pp.1767-1774
- /
- 1999
본 논문은 시간영역 상에서의 단음절 단위합성음을 사용한 다음절어 합성에 관한 연구이다. 특히, 파형 연접시 접속구간에서의 에너지 흐름의 형태를 제어하기 위한 연구이다. 이를 위하여 시간영역 상에서 추출한 운율요소 제어용 매개변수1)를 사용하여 제어하였으며, 음절간 파형 형태의 접속규칙을 도출하여 합성시킴으로써 에너지 흐름의 형태를 시간영역 상에서 제어시킨 결과를 제시하였다. 실험결과, 단음절 단위의 저장된 파형을 연접시킴으로서 발생되는 에너지 흐름의 불연속성을 제거할 수 있었으며, 또한 합성음의 음절 및 자연성이 향상되었다.
PDF

Detection and Synthesis of Transition Parts of The Speech Signal

Kim, Moo-Young
- 한국통신학회논문지
- /
- 제33권3C호
- /
- pp.234-239
- /
- 2008
For the efficient coding and transmission, the speech signal can be classified into three distinctive classes: voiced, unvoiced, and transition classes. At low bit rate coding below 4 kbit/s, conventional sinusoidal transform coders synthesize speech of high quality for the purely voiced and unvoiced classes, whereas not for the transition class. The transition class including plosive sound and abrupt voiced-onset has the lack of periodicity, thus it is often classified and synthesized as the unvoiced class. In this paper, the efficient algorithm for the transition class detection is proposed, which demonstrates superior detection performance not only for clean speech but for noisy speech. For the detected transition frame, phase information is transmitted instead of magnitude information for speech synthesis. From the listening test, it was shown that the proposed algorithm produces better speech quality than the conventional one.
PDF KSCI

밴드 별 잡음 특징을 이용한 골전도 음성신호의 잡음 제거 알고리즘 (Noise Cancellation Algorithm of Bone Conduction Speech Signal using Feature of Noise in Separated Band)

이지나;이기현;나승대;성기웅;조진호;김명남
- 한국멀티미디어학회논문지
- /
- 제19권2호
- /
- pp.128-137
- /
- 2016
In mobile communication, air conduction(AC) speech signal had been commonly used, but it was easily affected by ambient noise environment such as emergency, military action and rescue. To overcome the weakness of the AC speech signal, bone conduction(BC) speech signal have been used. The BC speech signal is transmitted through bone vibration, so it is affected less by the background noise. In this paper, we proposed noise cancellation algorithm of the BC speech signal using noise feature of decomposed bands. The proposed algorithm consist of three steps. First, the BC speech signal is divided into 17 bands using perceptual wavelet packet decomposition. Second, threshold is calculated by noise feature during short time of separated-band and compared to absolute average of the signal frame. Therefore, the speech and noise parts are detected. Last, the detected noise parts are removed and then, noise eliminated bands are re-synthesised. In order to confirm the efficiency of the proposed algorithm, we compared the proposed algorithm with conventional algorithm. And the proposed algorithm has better performance than the conventional algorithm.
https://doi.org/10.9717/kmms.2016.19.2.128 인용 PDF KSCI KPUBS HTML

혼합여기모델을 이용한 대역 확장된 음성신호의 음질 개선 (Quality Improvement of Bandwidth Extended Speech Using Mixed Excitation Model)

최무열;김형순
- 대한음성학회지:말소리
- /
- 제52호
- /
- pp.133-144
- /
- 2004
The quality of narrowband speech can be enhanced by the bandwidth extension technology. This paper proposes a mixed excitation and an energy compensation method based on Gaussian Mixture Model (GMM). First, we employ the mixed excitation model having both periodic and aperiodic characteristics in frequency domain. We use a filter bank to extract the periodicity features from the filtered signals and model them based on GMM to estimate the mixed excitation. Second, we separate the acoustic space into the voiced and unvoiced parts of speech to compensate for the energy difference between narrowband speech and reconstructed highband, or lowband speech, more accurately. Objective and subjective evaluations show that the quality of wideband speech reconstructed by the proposed method is superior to that by the conventional bandwidth extension method.
PDF

TMS320C2000계열 DSP를 이용한 단일칩 음성인식기 구현 (Implementation of a Single-chip Speech Recognizer Using the TMS320C2000 DSPs)

정익주
- 음성과학
- /
- 제14권4호
- /
- pp.157-167
- /
- 2007
In this paper, we implemented a single-chip speech recognizer using the TMS320C2000 DSPs. For this implementation, we had developed very small-sized speaker-dependent recognition engine based on dynamic time warping, which is especially suited for embedded systems where the system resources are severely limited. We carried out some optimizations including speed optimization by programming time-critical functions in assembly language, and code size optimization and effective memory allocation. For the TMS320F2801 DSP which has 12Kbyte SRAM and 32Kbyte flash ROM, the recognizer developed can recognize 10 commands. For the TMS320F2808 DSP which has 36Kbyte SRAM and 128Kbyte flash ROM, it has additional capability of outputting the speech sound corresponding to the recognition result. The speech sounds for response, which are captured when the user trains commands, are encoded using ADPCM and saved on flash ROM. The single-chip recognizer needs few parts except for a DSP itself and an OP amp for amplifying microphone output and anti-aliasing. Therefore, this recognizer may play a similar role to dedicated speech recognition chips.
PDF

Focal Parts of Utterance in Busan Korean

Cho, Yong-Hyung
- 음성과학
- /
- 제9권4호
- /
- pp.149-163
- /
- 2002
Focal parts of utterance can be determined by new/contrastive information, a focus particle, a contrastive topic marker, or a nominative case marker in Busan Korean. Among these factors, new or contrastive information is the most important element in determining the intonational nucleus of an utterance. However, unlike Seoul Korean, when a focus particle, a topic marker, or a case marker contributes to the placement of the most prominent peak of an utterance, the peak is on the noun to which they are attached. Moreover, the case marker-ga shows more prominent pitch on the preceding noun than the noun followed by the topic marker-nun when-ga is used as emphatic or contrastive. This is one of the major problems for Busan Korean users in commanding natural and fluent Seoul Korean intonation even if they use standard written form of Seoul Korean in their speech.
PDF

제스처 및 음성 인식을 이용한 윈도우 시스템 제어에 관한 연구 (Study about Windows System Control Using Gesture and Speech Recognition)

김주홍;진성일이남호이용범
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 1998년도 추계종합학술대회 논문집
- /
- pp.1289-1292
- /
- 1998
HCI(human computer interface) technologies have been often implemented using mouse, keyboard and joystick. Because mouse and keyboard are used only in limited situation, More natural HCI methods such as speech based method and gesture based method recently attract wide attention. In this paper, we present multi-modal input system to control Windows system for practical use of multi-media computer. Our multi-modal input system consists of three parts. First one is virtual-hand mouse part. This part is to replace mouse control with a set of gestures. Second one is Windows control system using speech recognition. Third one is Windows control system using gesture recognition. We introduce neural network and HMM methods to recognize speeches and gestures. The results of three parts interface directly to CPU and through Windows.
PDF

Non-Negative Matrix Factorization을 이용한 음성 스펙트럼의 부분 특징 추출 (Parts-based Feature Extraction of Speech Spectrum Using Non-Negative Matrix Factorization)

박정원;김창근;허강인
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2003년도 신호처리소사이어티 추계학술대회 논문집
- /
- pp.49-52
- /
- 2003
In this paper, we propose new speech feature parameter using NMf(Non-Negative Matrix Factorization). NMF can represent multi-dimensional data based on effective dimensional reduction through matrix factorization under the non-negativity constraint, and reduced data present parts-based features of input data. In this paper, we verify about usefulness of NMF algorithm for speech feature extraction applying feature parameter that is got using NMF in Mel-scaled filter bank output. According to recognition experiment result, we could confirm that proposal feature parameter is superior in recognition performance than MFCC(mel frequency cepstral coefficient) that is used generally.
PDF

Vocabulary Analyzer Based on CEFR-J Wordlist for Self-Reflection (VACSR) Version 2

Yukiko Ohashi;Noriaki Katagiri;Takao Oshikiri
- 아시아태평양코퍼스연구
- /
- 제4권2호
- /
- pp.75-87
- /
- 2023
This paper presents a revised version of the vocabulary analyzer for self-reflection (VACSR), called VACSR v.2.0. The initial version of the VACSR automatically analyzes the occurrences and the level of vocabulary items in the transcribed texts, indicating the frequency, the unused vocabulary items, and those not belonging to either scale. However, it overlooked words with multiple parts of speech due to their identical headword representations. It also needed to provide more explanatory result tables from different corpora. VACSR v.2.0 overcomes the limitations of its predecessor. First, unlike VACSR v.1, VACSR v.2.0 distinguishes words that are different parts of speech by syntactic parsing using Stanza, an open-source Python library. It enables the categorization of the same lexical items with multiple parts of speech. Second, VACSR v.2.0 overcomes the limited clarity of VACSR v.1 by providing precise result output tables. The updated software compares the occurrence of vocabulary items included in classroom corpora for each level of the Common European Framework of Reference-Japan (CEFR-J) wordlist. A pilot study utilizing VACSR v.2.0 showed that, after converting two English classes taught by a preservice English teacher into corpora, the headwords used mostly corresponded to CEFR-J level A1. In practice, VACSR v.2.0 will promote users' reflection on their vocabulary usage and can be applied to teacher training.
https://doi.org/10.22925/apjcr.2023.4.2.75 인용 PDF

검색결과 135건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)