Search | Korea Science

Speech Activity Detection using Lip Movement Image Signals (입술 움직임 영상 선호를 이용한 음성 구간 검출)

Kim, Eung-Kyeu
- Journal of the Institute of Convergence Signal Processing
- /
- v.11 no.4
- /
- pp.289-297
- /
- 2010
In this paper, A method to prevent the external acoustic noise from being misrecognized as the speech recognition object is presented in the speech activity detection process for the speech recognition. Also this paper confirmed besides the acoustic energy to the lip movement image signals. First of all, the successive images are obtained through the image camera for personal computer and the lip movement whether or not is discriminated. The next, the lip movement image signal data is stored in the shared memory and shares with the speech recognition process. In the mean time, the acoustic energy whether or not by the utterance of a speaker is verified by confirming data stored in the shared memory in the speech activity detection process which is the preprocess phase of the speech recognition. Finally, as a experimental result of linking the speech recognition processor and the image processor, it is confirmed to be normal progression to the output of the speech recognition result if face to the image camera and speak. On the other hand, it is confirmed not to the output the result of the speech recognition if does not face to the image camera and speak. Also, the initial feature values under off-line are replaced by them. Similarly, the initial template image captured while off-line is replaced with a template image captured under on-line, so the discrimination of the lip movement image tracking is raised. An image processing test bed was implemented to confirm the lip movement image tracking process visually and to analyze the related parameters on a real-time basis. As a result of linking the speech and image processing system, the interworking rate shows 99.3% in the various illumination environments.
PDF KSCI

A Study on the Endpoint Detection Algorithm (끝점 검출 알고리즘에 관한 연구)

양진우
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1984.12a
- /
- pp.66-69
- /
- 1984
This paper is a study on the Endpoint Detection for Korean Speech Recognition. In speech signal process, analysis parameter was classification from Zero Crossing Rate(Z.C.R), Log Energy(L.E), Energy in the predictive error(Ep) and fundamental Korean Speech digits, /영/-/구/ are selected as date for the Recognition of Speech. The main goal of this paper is to develop techniques and system for Speech input ot machine. In order to detect the Endpoint, this paper makes choice of Log Energy(L.E) from various parameters analysis, and the Log Energy is very effective parameter in classifying speech and nonspeech segments. The error rate of 1.43% result from the analysis.
PDF

A Study on Phoneme Recognition using Neural Networks and Fuzzy logic (신경망과 퍼지논리를 이용한 음소인식에 관한 연구)

Han, Jung-Hyun;Choi, Doo-Il
- Proceedings of the KIEE Conference
- /
- 1998.07g
- /
- pp.2265-2267
- /
- 1998
This paper deals with study of Fast Speaker Adaptation Type Speech Recognition, and to analyze speech signal efficiently in time domain and time-frequency domain, utilizes SCONN[1] with Speech Signal Process suffices for Fast Speaker Adaptation Type Speech Recognition, and examined Speech Recognition to investigate adaptation of system, which has speech data input after speaker dependent recognition test.
PDF

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

Miran Kim
- Phonetics and Speech Sciences
- /
- v.15 no.2
- /
- pp.13-20
- /
- 2023
This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.
https://doi.org/10.13064/KSSS.2023.15.2.013 인용 PDF

A Study on Design and Implementation of Speech Recognition System Using ART2 Algorithm

Kim, Joeng Hoon;Kim, Dong Han;Jang, Won Il;Lee, Sang Bae
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.4 no.2
- /
- pp.149-154
- /
- 2004
In this research, we selected the speech recognition to implement the electric wheelchair system as a method to control it by only using the speech and used DTW (Dynamic Time Warping), which is speaker-dependent and has a relatively high recognition rate among the speech recognitions. However, it has to have small memory and fast process speed performance under consideration of real-time. Thus, we introduced VQ (Vector Quantization) which is widely used as a compression algorithm of speaker-independent recognition, to secure fast recognition and small memory. However, we found that the recognition rate decreased after using VQ. To improve the recognition rate, we applied ART2 (Adaptive Reason Theory 2) algorithm as a post-process algorithm to obtain about 5% recognition rate improvement. To utilize ART2, we have to apply an error range. In case that the subtraction of the first distance from the second distance for each distance obtained to apply DTW is 20 or more, the error range is applied. Likewise, ART2 was applied and we could obtain fast process and high recognition rate. Moreover, since this system is a moving object, the system should be implemented as an embedded one. Thus, we selected TMS320C32 chip, which can process significantly many calculations relatively fast, to implement the embedded system. Considering that the memory is speech, we used 128kbyte-RAM and 64kbyte ROM to save large amount of data. In case of speech input, we used 16-bit stereo audio codec, securing relatively accurate data through high resolution capacity.
https://doi.org/10.5391/IJFIS.2004.4.2.149 인용 PDF KSCI

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

Chao, Hao;Song, Cheng
- Journal of Information Processing Systems
- /
- v.12 no.3
- /
- pp.410-421
- /
- 2016
In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.
https://doi.org/10.3745/JIPS.03.0052 인용 PDF KSCI

Metrical Foot in Korean Phonology (한국어 음운론의 음보)

Lee Sang-Jik
- MALSORI
- /
- no.25_26
- /
- pp.38-51
- /
- 1993
Korean phonology has not recognised metrical foot as a phonological unit to account for certain phonological processes. This paper, however, suggests that an optional h-deletion process in Korean should require the notion of metrical foot as an independent phonological domain. The previous analyses rely on the notion of speech speed to explain optional h-deletion : i. e. an intervocalic h is deleted in fast speech, but in slow speech it remains. This paper claims that the notion of speech speed should be reinterpreted in terms of metrical foot : i.e. foot-internal t is deleted, but foot-initial h remains. Such analysis provides evidence that metrical foot constitutes a phonological unit in Korean phonology. With the notion of metrical foot, it enables us to achieve more detailed and accurate analysis of the optional h-deletion process in Korean.
PDF

Developing a Korean standard speech DB (II) (한국인 표준 음성 DB 구축(II))

Shin, Jiyoung;Kim, KyungWha
- Phonetics and Speech Sciences
- /
- v.9 no.2
- /
- pp.9-22
- /
- 2017
The purpose of this paper is to report the whole process of developing Korean Standard Speech Database (KSS DB). This project is supported by SPO (Supreme Prosecutors' Office) research grant for three years from 2014 to 2016. KSS DB is designed to provide speech data for acoustic-phonetic and phonological studies and speaker recognition system. For the samples to represent the spoken Korean, sociolinguistic factors, such as region (9 regional dialects), age (5 age groups over 20) and gender (male and female) were considered. The goal of the project is to collect over 3,000 male and female speakers of nine regional dialects and five age groups employing direct and indirect methods. Speech samples of 3,191 speakers (2,829 speakers and 362 speakers using direct and indirect methods, respectively) are collected and databased. KSS DB designs to collect read and spontaneous speech samples from each speaker carrying out 5 speech tasks: three (pseudo-)spontaneous speech tasks (producing prolonged simple vowels, 28 blanked sentences and spontaneous talk) and two read speech tasks (reading 55 phonetically and phonologically rich sentences and reading three short passages). KSS DB includes a 16-bit, 44.1kHz speech waveform file and a orthographic file for each speech task.
https://doi.org/10.13064/KSSS.2017.9.2.009 인용 PDF KSCI

A Study on Realization of Speech Recognition System based on VoiceXML for Railroad Reservation Service (철도예약서비스를 위한 VoiceXML 기반의 음성인식 구현에 관한 연구)

Kim, Beom-Seung;Kim, Soon-Hyob
- Journal of the Korean Society for Railway
- /
- v.14 no.2
- /
- pp.130-136
- /
- 2011
This paper suggests realization method for real-time speech recognition using VoiceXML in telephony environment based on SIP for Railroad Reservation Service. In this method, voice signal incoming through PSTN or Internet is treated as dialog using VoiceXML and the transferred voice signal is processed by Speech Recognition System, and the output is returned to dialog of VoiceXML which is transferred to users. VASR system is constituted of dialog server which processes dialog, APP server for processing voice signal, and Speech Recognition System to process speech recognition. This realizes transfer method to Speech Recognition System in which voice signal is recorded using Record Tag function of VoiceXML to process voice signal in telephony environment and it is played in real time.
https://doi.org/10.7782/JKSR.2011.14.2.130 인용 PDF KSCI

On a Reduction of Computation Time of FFT Cepstrum (FFT 켑스트럼의 처리시간 단축에 관한 연구)

Jo, Wang-Rae;Kim, Jong-Kuk;Bae, Myung-Jin
- Speech Sciences
- /
- v.10 no.2
- /
- pp.57-64
- /
- 2003
The cepstrum coefficients are the most popular feature for speech recognition or speaker recognition. The cepstrum coefficients are also used for speech synthesis and speech coding but has major drawback of long processing time. In this paper, we proposed a new method that can reduce the processing time of FFT cepstrum analysis. We use the normal ordered inputs for FFT function and the bit-reversed inputs for IFFT function. Therefore we can omit the bit-reversing process and reduce the processing time of FFT ceptrum analysis.
PDF

Search Result 526, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)