Search | Korea Science

A Study on the Improvement of DTW with Speech Silence Detection (음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구)

Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
- Speech Sciences
- /
- v.10 no.4
- /
- pp.117-124
- /
- 2003
Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.
PDF

Optical Character Recognition for Hindi Language Using a Neural-network Approach

Yadav, Divakar;Sanchez-Cuadrado, Sonia;Morato, Jorge
- Journal of Information Processing Systems
- /
- v.9 no.1
- /
- pp.117-140
- /
- 2013
Hindi is the most widely spoken language in India, with more than 300 million speakers. As there is no separation between the characters of texts written in Hindi as there is in English, the Optical Character Recognition (OCR) systems developed for the Hindi language carry a very poor recognition rate. In this paper we propose an OCR for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which improves its efficiency. One of the major reasons for the poor recognition rate is error in character segmentation. The presence of touching characters in the scanned documents further complicates the segmentation process, creating a major problem when designing an effective character segmentation technique. Preprocessing, character segmentation, feature extraction, and finally, classification and recognition are the major steps which are followed by a general OCR. The preprocessing tasks considered in the paper are conversion of gray scaled images to binary images, image rectification, and segmentation of the document's textual contents into paragraphs, lines, words, and then at the level of basic symbols. The basic symbols, obtained as the fundamental unit from the segmentation process, are recognized by the neural classifier. In this work, three feature extraction techniques-: histogram of projection based on mean distance, histogram of projection based on pixel value, and vertical zero crossing, have been used to improve the rate of recognition. These feature extraction techniques are powerful enough to extract features of even distorted characters/symbols. For development of the neural classifier, a back-propagation neural network with two hidden layers is used. The classifier is trained and tested for printed Hindi texts. A performance of approximately 90% correct recognition rate is achieved.
https://doi.org/10.3745/JIPS.2013.9.1.117 인용 PDF KSCI

Estimation of Concrete Strength Based on Artificial Intelligence Techniques (인공지능 기법에 의한 콘크리트 강도 추정)

김세동;신동환;이영석;노승용;김성환
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.7
- /
- pp.101-111
- /
- 1999
This paper presents concrete pattern recognition method to identify the strength of concrete by evidence accumulation with multiple parameters based on artificial intelligence techniques. At first, variance(VAR), zero-crossing(ZCR), mean frequency(MEANF), and autoregressive model coefficient(ARC) and linear cepstrum coefficient(LCC) are extracted as feature parameters from ultrasonic signal of concrete. Pattern recognition is carried out through the evidence accumulation procedure using distance measured with reference parameters. A fuzzy mapping function is designed to transform the distances for the application of the evidence accumulation method. Results(92% successful pattern recognition rate) are presented to support the feasibility of the suggested approach for concrete pattern recognition.
PDF

The Comparison of Sensitivity of Numerical Parameters for Quantification of Electromyographic (EMG) Signal (근전도의 정량적 분석시 사용되는 수리적 파라미터의 민감도 비교)

Kim, Jung-Yong;Jung, Myung-Chul
- Journal of Korean Institute of Industrial Engineers
- /
- v.25 no.3
- /
- pp.330-335
- /
- 1999
The goal of the study is to determine the most sensitive parameter to represent the degree of muscle force and fatigue. Various numerical parameters such as the first coefficient of Autoregressive (AR) Model, Root Mean Square (RMS), Zero Crossing Rate (ZCR), Mean Power Frequency (MPF), Median Frequency (MF) were tested in this study. Ten healthy male subjects participated in the experiment. They were asked to extend their trunk by using the right and left erector spinae muscles during a sustained isometric contraction for twenty seconds. The force levels were 15%, 30%, 45%, 60%, and 75% of Maximal Voluntary Contraction (MVC), and the order of trials was randomized. The results showed that RMS was the best parameter to measure the force level of the muscle, and that the first coefficient of AR model was relatively sensitive parameter for the fatigue measurement at less than 60% MVC condition. At the 75% MVC, however, both MPF and the first coefficient of AR Model showed the best performance in quantification of muscle fatigue. Therefore, the sensitivity of measurement can be improved by properly selecting the parameter based upon the level of force during a sustained isometric condition.
PDF

A Study on Isolated Words Speech Recognition in a Running Automobile (주행중인 자동차 환경에서의 고립단어 음성인식 연구)

유봉근
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.381-384
- /
- 1998
본 논문은 주행중인 자동차 환경에서 운전자의 안전성 및 편의성의 동시 확보를 위하여, 보조적인 스위치 조작없이 상시 음성의 입, 출력이 가능하도록 한다. 이때 잡음에 강인한 threshold 값을 구하기 위하여, 일정한 시간마다 기준 에너지와 영교차율(Zero Crossing Rate)을 변경하며, 밴드패스 필터(bandpass filter)를 이용하여 1차, 2차로 나누어 실시간 상태에서 자동으로, 정확하게 끝점검출(End Point Detection)을 처리한다. 기준패턴(reference pattern)은 DMS(Dynamic Multi-Section)을 사용하며, 화자의 변별력을 높이기 위하여 2개의 모델사용을 제안한다. 또한 주행중인 차량의 잡음환경에 강인하기 위하여 일반주행(80km/h 이내), 고속주행(80km/h 이상)등으로 나누며 차량의 가변잡음 크기에 따라 자동으로 선택하도록 한다. 음성의 특징 벡터와 인식 알고리즘은 PLP 13차와 One-Stage Dynamic Programming (OSDP)를 이용한다. 실험결과, 자주 사용되는 차량 편의장치 제어명령 33개에 대하여 중부, 영동 고속도로(시속 80Km/h 이상)에서 화자독립 89.75%, 화자종속 90.08%의 인식율을 구하였으며, 경부 고속도로에서는 화자독립 92.29%, 화자종속 92.42%의 인식율을 구하였다. 그리고 저속 주행중인 자동차 환경(80km/h 이내, 시멘트, 아스팔트 등의 서울시내 및 시외독립)에서는 화자독립 92.89%, 화자종속 94.44% 인식율을 구하였다.
PDF

A Study on the Speech Recognition of Korean Phonemes Using Recurrent Neural Network Models (순환 신경망 모델을 이용한 한국어 음소의 음성인식에 대한 연구)

김기석;황희영
- The Transactions of the Korean Institute of Electrical Engineers
- /
- v.40 no.8
- /
- pp.782-791
- /
- 1991
In the fields of pattern recognition such as speech recognition, several new techniques using Artifical Neural network Models have been proposed and implemented. In particular, the Multilayer Perception Model has been shown to be effective in static speech pattern recognition. But speech has dynamic or temporal characteristics and the most important point in implementing speech recognition systems using Artificial Neural Network Models for continuous speech is the learning of dynamic characteristics and the distributed cues and contextual effects that result from temporal characteristics. But Recurrent Multilayer Perceptron Model is known to be able to learn sequence of pattern. In this paper, the results of applying the Recurrent Model which has possibilities of learning tedmporal characteristics of speech to phoneme recognition is presented. The test data consist of 144 Vowel+ Consonant + Vowel speech chains made up of 4 Korean monothongs and 9 Korean plosive consonants. The input parameters of Artificial Neural Network model used are the FFT coefficients, residual error and zero crossing rates. The Baseline model showed a recognition rate of 91% for volwels and 71% for plosive consonants of one male speaker. We obtained better recognition rates from various other experiments compared to the existing multilayer perceptron model, thus showed the recurrent model to be better suited to speech recognition. And the possibility of using Recurrent Models for speech recognition was experimented by changing the configuration of this baseline model.

A Digital Audio Watermark Using Wavelet Transform and Masking Effect (웨이브릿과 마스킹 효과를 이용한 디지털 오디오 워터마킹)

Hwang, Won-Young;Kang, Hwan-Il;Han, Seung-Soo;Kim, Kab-Il;Kang, Hwan-Soo
- Proceedings of the IEEK Conference
- /
- 2003.11b
- /
- pp.243-246
- /
- 2003
In this paper, we propose a new digital audio watermarking technique with the wavelet transform. The watermark is embedded by eliminating unnecessary information of audio signal based on human auditory system (HAS). This algorithm is an audio watermarking method, which does not require any original audio information in watermark extraction process. In this paper, the masking effect is used for audio watermarking, that is, post-tempera] masking effect. We construct the window with the synchronization signal and we extract the best frame in the window by using the zero-crossing rate (ZCR) and the energy of the audio signal. The watermark may be extracted by using the correlation of the watermark signal and the portion of the frame. Experimental results show good robustness against MPEG1-layer3 compression and other common signal processing manipulations. All the attacks are made after the D/A/D conversion.
PDF

Sound System Analysis for Health Smart Home

CASTELLI Eric;ISTRATE Dan;NGUYEN Cong-Phuong
- Proceedings of the IEEK Conference
- /
- summer
- /
- pp.237-243
- /
- 2004
A multichannel smart sound sensor capable to detect and identify sound events in noisy conditions is presented in this paper. Sound information extraction is a complex task and the main difficulty consists is the extraction of highlevel information from an one-dimensional signal. The input of smart sound sensor is composed of data collected by 5 microphones and its output data is sent through a network. For a real time working purpose, the sound analysis is divided in three steps: sound event detection for each sound channel, fusion between simultaneously events and sound identification. The event detection module find impulsive signals in the noise and extracts them from the signal flow. Our smart sensor must be capable to identify impulsive signals but also speech presence too, in a noisy environment. The classification module is launched in a parallel task on the channel chosen by data fusion process. It looks to identify the event sound between seven predefined sound classes and uses a Gaussian Mixture Model (GMM) method. Mel Frequency Cepstral Coefficients are used in combination with new ones like zero crossing rate, centroid and roll-off point. This smart sound sensor is a part of a medical telemonitoring project with the aim of detecting serious accidents.
PDF

Speaker-Dependent Emotion Recognition For Audio Document Indexing

Hung LE Xuan;QUENOT Georges;CASTELLI Eric
- Proceedings of the IEEK Conference
- /
- summer
- /
- pp.92-96
- /
- 2004
The researches of the emotions are currently great interest in speech processing as well as in human-machine interaction domain. In the recent years, more and more of researches relating to emotion synthesis or emotion recognition are developed for the different purposes. Each approach uses its methods and its various parameters measured on the speech signal. In this paper, we proposed using a short-time parameter: MFCC coefficients (MelFrequency Cepstrum Coefficients) and a simple but efficient classifying method: Vector Quantification (VQ) for speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. The other models: GMM and HMM (Discrete and Continuous Hidden Markov Model) are studied as well in the hope that the usage of continuous distribution and the temporal behaviour of this set of features will improve the quality of emotion recognition. The maximum accuracy recognizing five different emotions exceeds $88\%$ by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluation by listening and judging without returning permission nor comparison between sentences [8]; And this result is positively comparable with the other approaches.
PDF

Speech Enhancement Based on Voice/Unvoice Classification (유성음/무성음 분리를 이용한 잡음처리)

유창동
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.4
- /
- pp.374-379
- /
- 2002
In this paper, a nobel method to reduce noise using voice/unvoice classification is proposed. Voice and unvoice are an important feature of speech and the proposed method processes noisy speech differently for each voice/unvoice part. Speech is classified into voice/unvoice using zero-crossing rate and energy, and a modified speech/noise dominant-decision is proposed based on voice/unvoice classification. The proposed method was tested on conditions of white noise and airplane noise, and on the basis of comparing segmental SNR with the existing method and listening to the enhanced speech, a performance of the proposed method was superior to that of the existing method.
PDF KSCI

Search Result 113, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)