Search | Korea Science

Speaker Adaptation Using Neural Network in Continuous Speech Recognition (연속 음성에서의 신경회로망을 이용한 화자 적응)

김선일
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.1
- /
- pp.11-15
- /
- 2000
Speaker adaptive continuous speech recognition for the RM speech corpus is described in this paper. Learning of hidden markov models for the reference speaker is performed for the training data of RM corpus. For the evaluation, evaluation data of RM corpus are used. Parts of another training data of RM corpus are used for the speaker adaptation. After dynamic time warping of another speaker's data for the reference data is accomplished, error back propagation neural network is used to transform the spectrum between speakers to be recognized and reference speaker. Experimental results to get the best adaptation by tuning the neural network are described. The recognition ratio after adaptation is substantially increased 2.1 times for the word recognition and 4.7 times for the word accuracy for the best.
PDF

Korean continuous digit speech recognition by multilayer perceptron using KL transformation (KL 변환을 이용한 multilayer perceptron에 의한 한국어 연속 숫자음 인식)

박정선;권장우;권정상;이응혁;홍승홍
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.33B no.8
- /
- pp.105-113
- /
- 1996
In this paper, a new korean digita speech recognition technique was proposed using muktolayer perceptron (MLP). In spite of its weakness in dynamic signal recognition, MLP was adapted for this model, cecause korean syllable could give static features. It is so simle in its structure and fast in its computing that MLP was used to the suggested system. MLP's input vectors was transformed using karhunen-loeve transformation (KLT), which compress signal successfully without losin gits separateness, but its physical properties is changed. Because the suggested technique could extract static features while it is not affected from the changes of syllable lengths, it is effectively useful for korean numeric recognition system. Without decreasing classification rates, we can save the time and memory size for computation using KLT. The proposed feature extraction technique extracts same size of features form the tow same parts, front and end of a syllable. This technique makes frames, where features are extracted, using unique size of windows. It could be applied for continuous speech recognition that was not easy for the normal neural network recognition system.
PDF

A Stable Pitch ]Determination via Dyadic Wavelet Transform (DyWT) (Dyadic Wavelet Transform 방식의 Pitch 주기결정)

Kim Namhoon;Yoon Gibum;Ko Hanseok
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.197-200
- /
- 2000
This paper presents a time-based Pitch Determination Algorithm (PDA) for reliable estimation of pitch Period (PP) in speech signal. In proposed method, we use the Dyadic Wavelet Transform (DyWT), which detects the presence of Glottal Closure Instants (GCI) and uses the information to determine the pitch period. And, the proposed method also uses the periodicity property of DyWT to detect unsteady GCI. To evaluate the performance of the proposed methods, that of other PDAs based on DyWT are compared with what this paper proposed. The effectiveness of the proposed method is tested with real speech signals containing a transition between voiced and the unvoiced interval where the energy of voiced signal is unsteady. The result shows that the proposed method provides a good performance in estimating the both the unsteady GCI positions as well as the steady parts.
PDF

Discriminative Feature Vector Selection for Emotion Classification Based on Speech (음성신호기반의 감정분석을 위한 특징벡터 선택)

Choi, Ha-Na;Byun, Sung-Woo;Lee, Seok-Pil
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.64 no.9
- /
- pp.1363-1368
- /
- 2015
Recently, computer form were smaller than before because of computing technique's development and many wearable device are formed. So, computer's cognition of human emotion has importantly considered, thus researches on analyzing the state of emotion are increasing. Human voice includes many information of human emotion. This paper proposes a discriminative feature vector selection for emotion classification based on speech. For this, we extract some feature vectors like Pitch, MFCC, LPC, LPCC from voice signals are divided into four emotion parts on happy, normal, sad, angry and compare a separability of the extracted feature vectors using Bhattacharyya distance. So more effective feature vectors are recommended for emotion classification.
https://doi.org/10.5370/KIEE.2015.64.9.1363 인용 PDF KSCI KPUBS HTML

The Analysis of Intonational Meaning Based on the English Intonational Phonology (영어 억양음운론에 의한 영어 억양 의미 분석)

Kim, Kee-Ho
- Speech Sciences
- /
- v.7 no.3
- /
- pp.109-125
- /
- 2000
The purpose of this paper is to analyse the intonational meaning of various sentences based on the English Intonational Phonology, and to show the superiority of Intonational Phonology in explaining the intonational meanings in comparison with the other existing intonational theories. The American structuralists and British schools which attempt to describe the intonation in terms of 'levels' and 'configurations' respectively, analyze intonational meaning from a holistic perspective in which an utterance cannot be divided into smaller parts. On the other hand, Intonational Phonology considers English intonation as composed of a series of High and Low tones, and as a result, intonational meaning is interpreted compositionally as sets of H and L. In this paper, the phonological relations between intonation and its meaning from the compositions of pitch accents, phrase accents, and boundary tones which consist of an intonational tune are discussed.
PDF

A Learning Method of French Prosodic Rhythm for Korean Speakers using CSL (CSL를 이용한 한국인의 프랑스어 운율학습 방안)

Lee, E.Y.;Lee, M.K.;Lee, J.H.
- Speech Sciences
- /
- v.6
- /
- pp.83-101
- /
- 1999
The aim of this study is to provide a learning method of prosodic rhythm for Taegu North Kyungsang Korean speakers to learn French rhythm more effectively. The rhythmic properties of spoken French and Taegu North Kyungsang Korean dialect are different from each other. Therefore, we try to provide a basic rhythmic model of the two languages by dividing into three parts: syllable, rhythmic unit and accent, and intonation. To do so, we recorded French of Taegu Kyungsang Korean speakers, and then analysed and compared the rhythmic properties of Korean and French by spectrograph. We tried to find rhythmic mistakes in their French pronunciation, and then established a learning model to modify them. After training with the CSL Macro learning model, we observed the output result. However, although learners understand the method we have proposed, an effective method which is possible by repeating practice must be arranged to be actually used in direct verbal communications in a well-developed learning programme. Hence, this study may play an important role at the level of preparation in the setting of an effective rhythmic learning programme.
PDF

A Quantitative Study for the Distribution of Korean Phonemes in the two parts: The Ox and Waiting for Godot (한국어 음소분포에 대한 계량언어학적 연구 - "소"와 "고도를 기다리며"를 중심으로 -)

Bae, Hee-Sook;Koo, Dong-Ook;Yun, Young-Sun;Oh, Yung-Hwan
- Speech Sciences
- /
- v.7 no.4
- /
- pp.27-40
- /
- 2000
The goal of quantitative linguistics is to show the quantitative behavior of linguistic units. There are several studies which examine the frequency of Korean phonemes, which are important in comprehending the internal function of the linguistic units. However, the frequency information, from the pure phonological level without any consideration of rhythmic group, cannot adequately represent linguistic phenomena. Therefore, to provide the effective information, the phonological transcription must be carried out on the level of rhythmic group. In this paper, we made the transcription to analyze Korean phonology. We were not satisfied with merely investigating the frequencies of the phonemes, but also examined whether the distribution of Korean phonemes show the binomial distribution within linguistic constraints.
PDF

Channel Compensation technique using silence cepstral mean subtraction (묵음 구간의 평균 켑스트럼 차감법을 이용한 채널 보상 기법)

Woo, Seung-Ok;Yun, Young-Sun
- Proceedings of the KSPS conference
- /
- 2005.04a
- /
- pp.49-52
- /
- 2005
Cepstral Mean Subtraction (CMS) makes effectively compensation for a channel distortion, but there are some shortcomings such as distortions of feature parameters, waiting for the whole speech sentence. By assuming that the silence parts have the channel characteristics, we consider the channel normalization using subtraction of cepstral means which are only obtained in the silence areas. If the considered techniques are successfully used for the channel compensation, the proposed method can be used for real time processing environments or time important areas. In the experiment result, however, the performance of our method is not good as CMS technique. From the analysis of the results, we found potentiality of the proposed method and will try to find the technique reducing the gap between CMS and ours method.
PDF

Spatial Speaker Localization for a Humanoid Robot Using TDOA-based Feature Matrix (도착시간지연 특성행렬을 이용한 휴머노이드 로봇의 공간 화자 위치측정)

Kim, Jin-Sung;Kim, Ui-Hyun;Kim, Do-Ik;You, Bum-Jae
- The Journal of Korea Robotics Society
- /
- v.3 no.3
- /
- pp.237-244
- /
- 2008
Nowadays, research on human-robot interaction has been getting increasing attention. In the research field of human-robot interaction, speech signal processing in particular is the source of much interest. In this paper, we report a speaker localization system with six microphones for a humanoid robot called MAHRU from KIST and propose a time delay of arrival (TDOA)-based feature matrix with its algorithm based on the minimum sum of absolute errors (MSAE) for sound source localization. The TDOA-based feature matrix is defined as a simple database matrix calculated from pairs of microphones installed on a humanoid robot. The proposed method, using the TDOA-based feature matrix and its algorithm based on MSAE, effortlessly localizes a sound source without any requirement for calculating approximate nonlinear equations. To verify the solid performance of our speaker localization system for a humanoid robot, we present various experimental results for the speech sources at all directions within 5 m distance and the height divided into three parts.
PDF

Japanese Vowel Sound Classification Using Fuzzy Inference System

Phitakwinai, Suwannee;Sawada, Hideyuki;Auephanwiriyakul, Sansanee;Theera-Umpon, Nipon
- Journal of the Korea Convergence Society
- /
- v.5 no.1
- /
- pp.35-41
- /
- 2014
An automatic speech recognition system is one of the popular research problems. There are many research groups working in this field for different language including Japanese. Japanese vowel recognition is one of important parts in the Japanese speech recognition system. The vowel classification system with the Mamdani fuzzy inference system was developed in this research. We tested our system on the blind test data set collected from one male native Japanese speaker and four male non-native Japanese speakers. All subjects in the blind test data set were not the same subjects in the training data set. We found out that the classification rate from the training data set is 95.0 %. In the speaker-independent experiments, the classification rate from the native speaker is around 70.0 %, whereas that from the non-native speakers is around 80.5 %.
https://doi.org/10.15207/JKCS.2014.5.1.035 인용 PDF KSCI

Search Result 135, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)