• Title/Summary/Keyword: Word error rate

Search Result 125, Processing Time 0.031 seconds

A New Endpoint Detection Method Based on Chaotic System Features for Digital Isolated Word Recognition System

  • Zang, Xian;Chong, Kil-To
    • Proceedings of the IEEK Conference
    • /
    • 2009.05a
    • /
    • pp.37-39
    • /
    • 2009
  • In the research of speech recognition, locating the beginning and end of a speech utterance in a background of noise is of great importance. Since the background noise presenting to record will introduce disturbance while we just want to get the stationary parameters to represent the corresponding speech section, in particular, a major source of error in automatic recognition system of isolated words is the inaccurate detection of beginning and ending boundaries of test and reference templates, thus we must find potent method to remove the unnecessary regions of a speech signal. The conventional methods for speech endpoint detection are based on two simple time-domain measurements - short-time energy, and short-time zero-crossing rate, which couldn't guarantee the precise results if in the low signal-to-noise ratio environments. This paper proposes a novel approach that finds the Lyapunov exponent of time-domain waveform. This proposed method has no use for obtaining the frequency-domain parameters for endpoint detection process, e.g. Mel-Scale Features, which have been introduced in other paper. Comparing with the conventional methods based on short-time energy and short-time zero-crossing rate, the novel approach based on time-domain Lyapunov Exponents(LEs) is low complexity and suitable for Digital Isolated Word Recognition System.

  • PDF

FPGA Design and Sync-Word Detection of CATV Down-Link Stream Transmission System (CATV 하향 스트림 적용 시스템에서 동기 검출 방안 및 FPGA 설계)

  • Jung, Ji-Won
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.4 no.4
    • /
    • pp.286-294
    • /
    • 2011
  • Cable modems typically are implemented by a forward error correction(FEC) scheme. The ITU-T Recommendation J-38 Annex B specifies using 64- and 256- quadrature amplitude modulation (QAM) and extended RS coding scheme. In implementing the cable modem, there are some problems to fabricate and fitting on FPGA chip. First, many clocks are needed in implementing cable modem because of different code rate and different modulation types. To reduce the number of clocks, we use the two memories, which are different clock speed for reading and writing data. Second, this system lost the bit-synchronization and frame-synchronization in decoder, the system recognize that all data is error. This paper solves the problems by using simple 5-stage registers and unique sync-word. Based on solutions for about problems, the cable modem is fabricated on FPGA chip name as Vertex II pro xc2vp30-5 by Xilinx, and we confirmed the effectiveness of the results.

A study on the didactical application of ChatGPT for mathematical word problem solving (수학 문장제 해결과 관련한 ChatGPT의 교수학적 활용 방안 모색)

  • Kang, Yunji
    • Communications of Mathematical Education
    • /
    • v.38 no.1
    • /
    • pp.49-67
    • /
    • 2024
  • Recent interest in the diverse applications of artificial intelligence (AI) language models has highlighted the need to explore didactical uses in mathematics education. AI language models, capable of natural language processing, show promise in solving mathematical word problems. This study tested the capability of ChatGPT, an AI language model, to solve word problems from elementary school textbooks, and analyzed both the solutions and errors made. The results showed that the AI language model achieved an accuracy rate of 81.08%, with errors in problem comprehension, equation formulation, and calculation. Based on this analysis of solution processes and error types, the study suggests implications for the didactical application of AI language models in education.

Unequal Error Control Properties of Convolutional Codes (길쌈부호의 부등오류제어 특성)

  • Lee, Soo-In;Lee, Sang-Gon;Moon, Sang-Jae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.2
    • /
    • pp.1-8
    • /
    • 1990
  • The unequal bit-error control of rate r=b/n binary convolutional code is analyzed. The error protection afforded to each digit of the viterbi decoded b-tuple information word can be different from that afforded to other digit. The property of the unequal protection can be applied for improvement of SNR in transmitting sampled data of DPCM system.

  • PDF

Speech Recognition in Car Noise Environments Using Multiple Models Based on a Hybrid Method of Spectral Subtraction and Residual Noise Masking

  • Song, Myung-Gyu;Jung, Hoi-In;Shim, Kab-Jong;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3E
    • /
    • pp.3-8
    • /
    • 1999
  • In speech recognition for real-world applications, the performance degradation due to the mismatch introduced between training and testing environments should be overcome. In this paper, to reduce this mismatch, we provide a hybrid method of spectral subtraction and residual noise masking. We also employ multiple model approach to obtain improved robustness over various noise environments. In this approach, multiple model sets are made according to several noise masking levels and then a model set appropriate for the estimated noise level is selected automatically in recognition phase. According to speaker independent isolated word recognition experiments in car noise environments, the proposed method using model sets with only two masking levels reduced average word error rate by 60% in comparison with spectral subtraction method.

  • PDF

Spontaneous Speech Language Modeling using N-gram based Similarity (N-gram 기반의 유사도를 이용한 대화체 연속 음성 언어 모델링)

  • Park Young-Hee;Chung Minhwa
    • MALSORI
    • /
    • no.46
    • /
    • pp.117-126
    • /
    • 2003
  • This paper presents our language model adaptation for Korean spontaneous speech recognition. Korean spontaneous speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpus. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf/sup */idf similarity. In addition to relevance weighting, we use disfluencies as Predictor to the neighboring words. The best result reduces 9.7% word error rate relatively and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor also.

  • PDF

Three-Stage Framework for Unsupervised Acoustic Modeling Using Untranscribed Spoken Content

  • Zgank, Andrej
    • ETRI Journal
    • /
    • v.32 no.5
    • /
    • pp.810-818
    • /
    • 2010
  • This paper presents a new framework for integrating untranscribed spoken content into the acoustic training of an automatic speech recognition system. Untranscribed spoken content plays a very important role for under-resourced languages because the production of manually transcribed speech databases still represents a very expensive and time-consuming task. We proposed two new methods as part of the training framework. The first method focuses on combining initial acoustic models using a data-driven metric. The second method proposes an improved acoustic training procedure based on unsupervised transcriptions, in which word endings were modified by broad phonetic classes. The training framework was applied to baseline acoustic models using untranscribed spoken content from parliamentary debates. We include three types of acoustic models in the evaluation: baseline, reference content, and framework content models. The best overall result of 18.02% word error rate was achieved with the third type. This result demonstrates statistically significant improvement over the baseline and reference acoustic models.

N- gram Adaptation Using Information Retrieval and Dynamic Interpolation Coefficient (정보검색 기법과 동적 보간 계수를 이용한 N-gram 언어모델의 적응)

  • Choi Joon Ki;Oh Yung-Hwan
    • MALSORI
    • /
    • no.56
    • /
    • pp.207-223
    • /
    • 2005
  • The goal of language model adaptation is to improve the background language model with a relatively small adaptation corpus. This study presents a language model adaptation technique where additional text data for the adaptation do not exist. We propose the information retrieval (IR) technique with N-gram language modeling to collect the adaptation corpus from baseline text data. We also propose to use a dynamic language model interpolation coefficient to combine the background language model and the adapted language model. The interpolation coefficient is estimated from the word hypotheses obtained by segmenting the input speech data reserved for held-out validation data. This allows the final adapted model to improve the performance of the background model consistently The proposed approach reduces the word error rate by $13.6\%$ relative to baseline 4-gram for two-hour broadcast news speech recognition.

  • PDF

Error Correction and Praat Script Tools for the Buckeye Corpus of Conversational Speech (벅아이 코퍼스 오류 수정과 코퍼스 활용을 위한 프랏 스크립트 툴)

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.29-47
    • /
    • 2012
  • The purpose of this paper is to show how to convert the label files of the Buckeye Corpus of Spontaneous Speech [1] into Praat format and to introduce some of the Praat scripts that will enable linguists to study various aspects of spoken American English present in the corpus. During the conversion process, several types of errors were identified and corrected either manually or automatically by the use of scripts. The Praat script tools that have been developed can help extract from the corpus massive amounts of phonetic measures such as the VOT of plosives, the formants of vowels, word frequency information and speech rates that span several consecutive words. The script tools can extract additional information concerning the phonetic environment of the target words or allophones.

Language Model Adaptation for Conversational Speech Recognition (대화체 연속음성 인식을 위한 언어모델 적응)

  • Park Young-Hee;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.83-86
    • /
    • 2003
  • This paper presents our style-based language model adaptation for Korean conversational speech recognition. Korean conversational speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpora. For style-based language model adaptation, we report two approaches. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf*idf similarity. In addition to relevance weighting, we use disfluencies as predictor to the neighboring words. The best result reduces 6.5% word error rate absolutely and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor.

  • PDF