• Title/Summary/Keyword: Speech rate

Search Result 1,246, Processing Time 0.031 seconds

A Study on the Effective Command Delivery of Commanders Using Speech Recognition Technology (국방 분야에서 전장 소음 환경 하에 음성 인식 기술 연구)

  • Yeong-hoon Kim;Hyun Kwon
    • Convergence Security Journal
    • /
    • v.24 no.2
    • /
    • pp.161-165
    • /
    • 2024
  • Recently, speech recognition models have been advancing, accompanied by the development of various speech processing technologies to obtain high-quality data. In the defense sector, efforts are being made to integrate technologies that effectively remove noise from speech data in noisy battlefield situations and enable efficient speech recognition. This paper proposes a method for effective speech recognition in the midst of diverse noise in a battlefield scenario, allowing commanders to convey orders. The proposed method involves noise removal from noisy speech followed by text conversion using OpenAI's Whisper model. Experimental results show that the proposed method reduces the Character Error Rate (CER) by 6.17% compared to the existing method that does not remove noise. Additionally, potential applications of the proposed method in the defense are discussed.

Noisy Speech Recognition Based on Noise-Adapted HMMs Using Speech Feature Compensation

  • Chung, Yong-Joo
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.2
    • /
    • pp.37-41
    • /
    • 2014
  • The vector Taylor series (VTS) based method usually employs clean speech Hidden Markov Models (HMMs) when compensating speech feature vectors or adapting the parameters of trained HMMs. It is well-known that noisy speech HMMs trained by the Multi-condition TRaining (MTR) and the Multi-Model-based Speech Recognition framework (MMSR) method perform better than the clean speech HMM in noisy speech recognition. In this paper, we propose a method to use the noise-adapted HMMs in the VTS-based speech feature compensation method. We derived a novel mathematical relation between the train and the test noisy speech feature vector in the log-spectrum domain and the VTS is used to estimate the statistics of the test noisy speech. An iterative EM algorithm is used to estimate train noisy speech from the test noisy speech along with noise parameters. The proposed method was applied to the noise-adapted HMMs trained by the MTR and MMSR and could reduce the relative word error rate significantly in the noisy speech recognition experiments on the Aurora 2 database.

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

  • Eng, Goh Kia;Ahmad, Abdul Manan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1291-1295
    • /
    • 2004
  • This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.

  • PDF

A Study on the Optimal Mahalanobis Distance for Speech Recognition

  • Lee, Chang-Young
    • Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.177-186
    • /
    • 2006
  • In an effort to enhance the quality of feature vector classification and thereby reduce the recognition error rate of the speaker-independent speech recognition, we employ the Mahalanobis distance in the calculation of the similarity measure between feature vectors. It is assumed that the metric matrix of the Mahalanobis distance be diagonal for the sake of cost reduction in memory and time of calculation. We propose that the diagonal elements be given in terms of the variations of the feature vector components. Geometrically, this prescription tends to redistribute the set of data in the shape of a hypersphere in the feature vector space. The idea is applied to the speech recognition by hidden Markov model with fuzzy vector quantization. The result shows that the recognition is improved by an appropriate choice of the relevant adjustable parameter. The Viterbi score difference of the two winners in the recognition test shows that the general behavior is in accord with that of the recognition error rate.

  • PDF

Speech Recognition Accuracy Prediction Using Speech Quality Measure (음성 특성 지표를 이용한 음성 인식 성능 예측)

  • Ji, Seung-eun;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.3
    • /
    • pp.471-476
    • /
    • 2016
  • This paper presents our study on speech recognition performance prediction. Our initial study shows that a combination of speech quality measures effectively improves correlation with Word Error Rate (WER) compared to each speech measure alone. In this paper we demonstrate a new combination of various types of speech quality measures shows more significantly improves correlation with WER compared to the speech measure combination of our initial study. In our study, SNR, PESQ, acoustic model score, and MFCC distance are used as the speech quality measures. This paper also presents our speech database verification system for speech recognition employing the speech measures. We develop a WER prediction system using Gaussian mixture model and the speech quality measures as a feature vector. The experimental results show the proposed system is highly effective at predicting WER in a low SNR condition of speech babble and car noise environments.

A Study on the Improvements of the Speech Quality by using Distribution Characteristics of LSP parameters in the EVRC(Enhanced Variable Rate Codec) (LSP 파라미터의 분포특성을 이용한 EVRC의 음질개선에 관한 연구)

  • Min, So-Yeon;Na, Deok-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.12
    • /
    • pp.5843-5848
    • /
    • 2011
  • To improve the efficiency of the channel spectrum and to reduce the power consumption of the system in EVRC, the voice signal is compressed and transmitted only when the user speaks to. In addition to this, voice frames are divided into three rates 1, 1/2 and 1/8 and each frame is handled differently. For example, we assumed that the input is silence region if the 1/8 rate is used. In this paper, the sections are firstly separated into the voiced speech signal region, unvoiced speech signal region, and silence region by using distribution characteristics of LSP parameters. Then the paper suggested to encode 1 rate for the voiced speech signal, 1/2 rate for the unvoiced speech signal region, 1/8 rate for the silence region. In other words, traditional way of transmission is used when sending full rate in the EVRC. However, when sending half rate, the voice is firstly distinguished between voiced and unvoiced. If the voice is distinguished as voiced, voice is converted into full rate before the transmission. If it is distinguished as silence, EVRC's basic rate is applied. In the experimental results with SNR, ASDM, transmission bit rate measurement, we have demonstrated that voice quality was improved by using the proposed algorithm.

A Variable Data Rate Speech Coding Technique Based on the Inflection Point Detection of Speech (음성의 변곡점 추출 및 전송에 기반한 가변 데이터율 음성 부호화 기법)

  • Iem, Byeong-Gwan
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.4
    • /
    • pp.562-565
    • /
    • 2013
  • A new variable rate speech coding technique is proposed. The method is based on the observation that the speech signal approximately looks linear for a very short period of time. The information transmitted is the location and data value of inflection points. If the distance between the inflection points is large, the mid point location and its data value are also delivered. Thus, the encoder transmits both the location and the data value for the inflection samples, but the location only for the non-inflection points. The location information is expressed using one bit for each sample, 0 for non-inflection and 1 for inflection point. At the receiver, using the interpolation, the decoder estimates the untransmitted sample values for non-inflection locations from the received sample values for the inflection samples. With 50 % of computational cost of the existing CVSD delta modulation, the proposed method is expected to achieve the data rate of 36 to 38 kbps and the SNR of 10 to 13 dB.

Real-Time DSP Implementation of Adaptive Multi-Rate with TMS320C542 board (TMS320C542보드를 이용한 Adaptive Multi-Rate 음성부호화기의 실시간 구현)

  • 박세익;전라온;이인성
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.827-830
    • /
    • 2000
  • 3GPP and ETSI adopted AMR(Adaptive Multi-Rate) as a standard for next generation IMT-2000 service. In this paper, we analyzed algorithm about AMR and optimized ANSI C source on the C complier and assembly language of Texas Instrument . The implemented AMR speech codec requires 28.2MIPS of complexity for encoder and 5.5MIPS for decoder. we performed real-time implementation of AMR speech codec using 82% of TMS320C5402 with 40 MIPS specification. We give proof that the output speech of the implemented speech codec on DSP board is identical with result of C source program simulation. Also the reconstructed speech is verified in the real-time environment consisted of microphone and speaker.

  • PDF

Fast ab/adduction Rate of Articulation Valves in Normal Adults (정상 성인의 조음밸브에 대한 내${\cdot}$외전 비율)

  • Park, Hee-Jun;Han, Ji-Yeon
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.149-151
    • /
    • 2007
  • This study was designed to investigate fast ab/adduction rate of articulation valves in normal adults. The measurement of fast ab/aduction rate has traditionally been used for assessment, diagnosis and therapy in patients who suffered from dysarthria, functional articulation disorders or apraxia of speech. Fast ab/adduction rate shows the documented structural and physiological changes in the central nervous system and the peripheral components of oral and speech production mechanism. Fast ab/adduction rates were obtained from 20 normal subjects by producing the repetition of vocal function (/ihi/), tongue function (/t${\wedge}$/), velopharyngeal function (/m/), and labial function (/p${\wedge}$/). The Aerophone II was used for data recording. The results of finding as follows: average fast ab/adduction rates were vocal function(6.21cps), tongue function(7.42cps), velopharyngeal function(5.23cps), labial function (6.93cps). The results of this study are guidelines of normal diadochokinetic rates. In addition, they can indicate the severity of diseases and evaluation of treatment.

  • PDF

The Prosodic Characteristics of Children with Cochlear Implant with Respect to the Articulation Rate, Pause, and Duration (인공와우이식 아동의 운율 특성 - 조음속도와 쉼, 지속시간을 중심으로 -)

  • Oh, Soonyoung;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.117-127
    • /
    • 2012
  • This research reports the prosodic characteristics (including articulation speech rate, pause characteristics, duration) of children with cochlear implants with reference to those of children with normal hearing. Subjects are 8-to 10-year-old children, balancing each number of gender as 24. Dialogue speech data are comprised of four types of sentence patterns. Results show that 1) there's a statistically meaningful difference on articulation speech rate between the two groups. 2) On pauses, they are not observed in exclamatory and declarative sentences in normal children. While imperative sentences show no statistical difference on the number of pauses between the two groups, interrogative sentences do. 3) Declarative, exclamatory, and interrogative sentences reveal statistical difference between the two groups in terms of the sentence's final two-syllable word duration, showing no difference on imperative sentences. 4) When it comes to the RFP (duration ratio of sentence final syllable to penultimate syllable), we no statistically meaningful difference between the two groups in all types of sentences exists. 5) Lastly, RWS (the ratio of sentence final two syllable word duration to that of whole sentence duration) shows statistical difference between two groups in imperative sentences, but not in all the rest types.