• 제목/요약/키워드: speech technology

검색결과 1,900건 처리시간 0.026초

혼합여기모델을 이용한 대역 확장된 음성신호의 음질 개선 (Quality Improvement of Bandwidth Extended Speech Using Mixed Excitation Model)

  • 최무열;김형순
    • 대한음성학회지:말소리
    • /
    • 제52호
    • /
    • pp.133-144
    • /
    • 2004
  • The quality of narrowband speech can be enhanced by the bandwidth extension technology. This paper proposes a mixed excitation and an energy compensation method based on Gaussian Mixture Model (GMM). First, we employ the mixed excitation model having both periodic and aperiodic characteristics in frequency domain. We use a filter bank to extract the periodicity features from the filtered signals and model them based on GMM to estimate the mixed excitation. Second, we separate the acoustic space into the voiced and unvoiced parts of speech to compensate for the energy difference between narrowband speech and reconstructed highband, or lowband speech, more accurately. Objective and subjective evaluations show that the quality of wideband speech reconstructed by the proposed method is superior to that by the conventional bandwidth extension method.

  • PDF

Machine Learning Techniques for Speech Recognition using the Magnitude

  • Krishnan, C. Gopala;Robinson, Y. Harold;Chilamkurti, Naveen
    • Journal of Multimedia Information System
    • /
    • 제7권1호
    • /
    • pp.33-40
    • /
    • 2020
  • Machine learning consists of supervised and unsupervised learning among which supervised learning is used for the speech recognition objectives. Supervised learning is the Data mining task of inferring a function from labeled training data. Speech recognition is the current trend that has gained focus over the decades. Most automation technologies use speech and speech recognition for various perspectives. This paper demonstrates an overview of major technological standpoint and gratitude of the elementary development of speech recognition and provides impression method has been developed in every stage of speech recognition using supervised learning. The project will use DNN to recognize speeches using magnitudes with large datasets.

분산음성인식 환경에서 서버에서의 스케일러블 고품질 음성복원 (Scalable High-quality Speech Reconstruction in Distributed Speech Recognition Environments)

  • 윤재삼;김홍국;강병옥
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2007년도 하계종합학술대회 논문집
    • /
    • pp.423-424
    • /
    • 2007
  • In this paper, we propose a scalable high-quality speech reconstruction method for distributed speech recognition (DSR). It is difficult to reconstruct speech of high quality with MFCCs at the DSR server. Depending on the bit-rate available by the DSR system, we can send additional information associated with speech coding to the DSR sorrel, where the bit-rate is variable from 4.8 kbit/s to 11.4 kbit/s. The experimental results show that the speech quality reproduced by the proposed method when the bit-rate is 11.4 kbit/s is comparable with that of ITU-T G.729 under both ideal channel and frame error channel conditions while the performance of DSR is maintained to that of wireline speech recognition.

  • PDF

Overlapping of /o/ and /u/ in modern Seoul Korean: focusing on speech rate in read speech

  • Igeta, Takako;Hiroya, Sadao;Arai, Takayuki
    • 말소리와 음성과학
    • /
    • 제9권1호
    • /
    • pp.1-7
    • /
    • 2017
  • Previous studies have reported on the overlapping of $F_1$ and $F_2$ distribution for the vowels /o/ and /u/ produced by young Korean speakers of the Seoul dialect. It has been suggested that the overlapping of /o/ and /u/ occurs due to sound change. However, few studies have examined whether speech rate influences the overlapping of /o/ and /u/. On the other hand, previous studies have reported that the overlapping of /o/ and /u/ in syllable produced by male speakers is smaller than by female speakers. Few reports have investigated on the overlapping of the two vowels in read speech produced by male speakers. In the current study, we examined whether speech rates affect overlapping of /o/ and /u/ in read speech by male and female speakers. Read speech produced by twelve young adult native speakers of Seoul dialect were recorded in three speech rates. For female speakers, discriminant analysis showed that the discriminant rate became lower as the speech rate increases from slow to fast. Thus, this indicates that speech rate is one of the factors affecting the overlapping of /o/ and /u/. For male speakers, on the other hand, the discriminant rate was not correlated with speech rate, but the overlapping was larger than that of female speakers in read speech. Moreover, read speech by male speakers was less clear than by female speakers. This indicates that the overlapping may be related to unclear speech by sociolinguistic reasons for male speakers.

음성 인식 기술 평가 동향 (On the Evaluation of Speech Recognition Systems)

  • 유하진;김동현;육동석
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.201-206
    • /
    • 2005
  • We present a survey on the evaluation methods of speech recognition technology and propose a procedure for evaluating Korean speech recognition systems. Currently there are various kinds of evaluation events conducted by NIST and ELDA every year. In this paper, we introduce these activities, and propose an evaluation procedure for Korean speech recognition systems. In designing the procedure, we consider the characteristics of Korean language, as well as the trends of Korean speech technology industry.

  • PDF

산업용 음성 DB를 위한 XML 기반 메타데이터 (XML Based Meta-data Specification for Industrial Speech Databases)

  • 주영희;홍기형
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.77-91
    • /
    • 2005
  • In this paper, we propose an XML based meta-data specification for industrial speech databases. Building speech databases is very time-consuming and expensive. Recently, by the government supports, huge amount of speech corpus has been collected as speech databases. However, the formats and meta-data for speech databases are different depending on the constructing institutions. In order to advance the reusability and portability of speech databases, a standard representation scheme should be adopted by all speech database construction institutions. ETRI proposed a XML based annotation scheme [51 for speech databases, but the scheme has too simple and flat modeling structure, and may cause duplicated information. In order to overcome such disadvantages in this previous scheme, we first define the speech database more formally and then identify object appearing in speech databases. We then design the data model for speech databases in an object-oriented way. Based on the designed data model, we develop the meta-data specification for industrial speech databases.

  • PDF

Robust Speech Detection Based on Useful Bands for Continuous Digit Speech over Telephone Networks

  • Ji, Mi-Kyongi;Suh, Young-Joo;Kim, Hoi-Rin;Kim, Sang-Hun
    • The Journal of the Acoustical Society of Korea
    • /
    • 제22권3E호
    • /
    • pp.113-123
    • /
    • 2003
  • One of the most important problems in speech recognition is to detect the presence of speech in adverse environments. In other words, the accurate detection of speech boundary is critical to the performance of speech recognition. Furthermore the speech detection problem becomes severer when recognition systems are used over the telephone network, especially wireless network and noisy environment. Therefore this paper describes various speech detection algorithms for continuous digit recognition system used over wire/wireless telephone networks and we propose a algorithm in order to improve the robustness of speech detection using useful band selection under noisy telephone networks. In this paper, we compare some speech detection algorithms with the proposed one, and present experimental results done with various SNRs. The results show that the new algorithm outperforms the other speech detection methods.

An Efficient Model Parameter Compensation Method foe Robust Speech Recognition

  • 정용주
    • 대한음성학회지:말소리
    • /
    • 제45호
    • /
    • pp.107-115
    • /
    • 2003
  • An efficient method that compensates the HMM parameters for the noisy speech recognition is proposed. Instead of assuming some analytical approximations as in the PMC, the proposed method directly re-estimates the HMM parameters by the segmental k-means algorithm. The proposed method has shown improved results compared with the conventional PMC method at reduced computational cost.

  • PDF

대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계 (A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition)

  • 오유리;윤재삼;김홍국
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.103-106
    • /
    • 2005
  • In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

  • PDF

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계 (A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments)

  • 이길호;윤재삼;오유리;김홍국
    • 대한음성학회지:말소리
    • /
    • 제54호
    • /
    • pp.27-43
    • /
    • 2005
  • Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

  • PDF