Search | Korea Science

Noise Robust Speech Recognition Based on Parallel Model Combination Adaptation Using Frequency-Variant (주파수 변이를 이용한 Parallel Model Combination 모델 적응에 기반한 잡음에 강한 음성인식)

Choi, Sook-Nam;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.3
- /
- pp.252-261
- /
- 2013
The common speech recognition system displays higher recognition performance in a quiet environment, while its performance declines sharply in a real environment where there are noises. To implement a speech recognizer that is robust in different speech settings, this study suggests the method of Parallel Model Combination adaptation using frequency-variant based on environment-awareness (FV-PMC), which uses variants in frequency; acquires the environmental data for speech recognition; applies it to upgrading the speech recognition model; and promotes its performance enhancement. This FV-PMC performs the speech recognition with the recognition model which is generated as followings: i) calculating the average frequency variant in advance among the readily-classified noise groups and setting it as a threshold value; ii) recalculating the frequency variant among noise groups when speech with unknown noises are input; iii) regarding the speech higher than the threshold value of the relevant group as the speech including the noise of its group; and iv) using the speech that includes this noise group. When noises were classified with the proposed FV-PMC, the average accuracy of classification was 56%, and the results from the speech recognition experiments showed the average recognition rate of Set A was 79.05%, the rate of Set B 79.43%m, and the rate of Set C 83.37% respectively. The grand mean of recognition rate was 80.62%, which demonstrates 5.69% more improved effects than the recognition rate of 74.93% of the existing Parallel Model Combination with a clear model, meaning that the proposed method is effective.
https://doi.org/10.7776/ASK.2013.32.3.252 인용 PDF KSCI

An Analysis of the English l Sound Produced by Korean Students

Yang, Byung-Gon
- Speech Sciences
- /
- v.15 no.1
- /
- pp.53-62
- /
- 2008
The purpose of this study was to examine the English l sound in an English short story produced by 16 Korean students in order to determine various allophones of the sound using acoustic visual displays and perceptual judgments. The subjects read the story in a quiet office at normal speed. Each word included the lateral sound in onset or coda positions and before a vowel of the following word. Results showed as follows: Firstly, there was a durational difference between the two major groups. Also the majority of the subjects produced the clear l regardless of the contexts. Some students produced the sound as the Korean flap or the English glide [r]. A few missing cases were also seen. The dark l was mostly produced by the subjects of English majors in coda position with a few cases before a vowel in a phrase. Visual displays using the computer analysis were very helpful in distinguishing lateral variants but sometimes perceptual process would be necessary to judge them in fast and weak production of the target word. Further studies would be desirable to test the discrepancies between the acoustical and perceptual decisions.
PDF

Design of the Speech Signal Processores for Cochlear Prosthesis (청각 보철용 음성신호 처리기의 설계)

Park, Sang-Hui;Choi, Doo-Il;Beack, Seung-Wha
- Journal of Biomedical Engineering Research
- /
- v.12 no.4
- /
- pp.285-294
- /
- 1991
Two types of the speech signal processores (SSP) for the cochlear a prosthesis are designed. One is designed using the cochlear model and the other is designed using the information (formant, pitch, intensity) extraction method. For these, some cochlear model and acoustic information extraction method are proposed. The result shows the SSP of the cochlear model type contain more acoustic cues than that of information extraction type. On the other hand, stimulus signal is clear and algorithm is simple in the SSP of the information ex traction type.
PDF

An Electropalatographic Study of English 1, r and the Korean Liquid Sound ㄹ

Ahn, Soo-Woong
- Speech Sciences
- /
- v.8 no.2
- /
- pp.93-106
- /
- 2001
The pronunciation of English l and r was a consistent problem in learning English in Korea as well as Japan. This problem occurs from the fact that in Korea and Japan there is only one liquid sound. Substituting the Korean liquid for English l and r was a common error. The pronunciation of the dark l causes a further problem in pronouncing the English l sound. To see the relationship between the English l, r, and the Korean liquid sound, an electropalatographic (EPG) experiment was done. The findings were (1) there were no tongue contacts either on the alveolar ridge or on the palate during the articulation of the dark l. (2) The Korean liquid sound was different in the tongue contact points either from English l or r. The English clear l consistently touched the alveolar ridge in the forty tokens, but the Korean liquid sound in the intervocalic and word-final position touched mainly the alveopalatal area. The English r touched exclusively the velum area. The Korean intervocalic /l/ was similar to English flap in EPG and spectrographic data. There was evidence that the word-final Korean /l/ is a lateral.
PDF

An acoustic study of fricated vowels in Nuosu Yi: an exploratory study

Perkins, Jeremy;Lee, Seunghun J.;Li, Xiao;Liu, Hongyong
- Phonetics and Speech Sciences
- /
- v.6 no.4
- /
- pp.109-115
- /
- 2014
Fricated nuclei in Nuosu Yi were found to be more correctly described as fricated vowels, rather than syllabic fricatives due to the presence of clear formant structures typical of front vowels. In this exploratory study, two types of fricated nuclei were examined: retroflex "yr" and non-retroflex "y". The retroflex nucleus "yr" had higher F1 and lower F3 than non-retroflex "y", indicating a lower tongue height. On the other hand, F2 was found to correlate not with nucleus retroflexion, but instead with onset consonant retroflexion: F2 was higher following retroflex onsets, in both vowels. This effect was persistent through the entire vowel, suggesting a phonological effect, rather than a coarticulatory one. Interpretation of the F2 results require accompanying articulatory data since the usual coupling of F2 and tongue backness does not always hold for retroflex vowels. Examining the articulation of the fricated nuclei in Nuosu Yi is a direction for future research.
https://doi.org/10.13064/KSSS.2014.6.4.109 인용 PDF KSCI

A survey on the voice related needs of occupational voice users (직업적 음성사용자의 음성관련 요구 조사)

Lee, Eun-Jeong;Kim, Wha-Soo
- Phonetics and Speech Sciences
- /
- v.7 no.2
- /
- pp.39-45
- /
- 2015
This research was conducted to investigate the voice related needs of occupational voice users. The data collected from teachers(379), tele-marketers(156), therapists(50) was classified according to its content, by colaizzi's inductive categorical analysis. The voice related needs are classified into 3 big categories, 1) how to use, 2) how to care, 3) how to be healthy. Again the category 'how to use' my voice was into 6 sub-categories: (1) efficiently, (2) as I desired, (3) without pain(discomfort), (4) expressively, (5) phonation (methods) and (6) clear articulation. The result showed that the needs from 3 groups of occupational voice users reflect their own environment which they have to use their voice as well as the voice characteristics wanted from their specific listeners.
https://doi.org/10.13064/KSSS.2015.7.2.039 인용 PDF KSCI

Design and Implementation of Multimodal Middleware for Mobile Environments (모바일 환경을 위한 멀티모달 미들웨어의 설계 및 구현)

Park, Seong-Soo;Ahn, Se-Yeol;Kim, Won-Woo;Koo, Myoung-Wan;Park, Sung-Chan
- MALSORI
- /
- no.60
- /
- pp.125-144
- /
- 2006
W3C announced a standard software architecture for multimodal context-aware middleware that emphasizes modularity and separates structure, contents, and presentation. We implemented a distributed multimodal interface system followed the W3C architecture, based on SCXML. SCXML uses parallel states to invoke both XHTML and VoiceXML contents as well as to gather composite or sequential multimodal inputs through man-machine interactions. We also hire Delivery Context Interface(DCI) module and an external service bundle enabling middleware to support context-awareness services for real world environments. The provision of personalized user interfaces for mobile devices is expected to be used for different devices with a wide variety of capabilities and interaction modalities. We demonstrated the implemented middleware could maintain multimodal scenarios in a clear, concise and consistent manner by some experiments.
PDF

Driver Verification System Using Biometrical GMM Supervector Kernel (생체기반 GMM Supervector Kernel을 이용한 운전자검증 기술)

Kim, Hyoung-Gook
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.9 no.3
- /
- pp.67-72
- /
- 2010
This paper presents biometrical driver verification system in car experiment through analysis of speech, and face information. We have used Mel-scale Frequency Cesptral Coefficients (MFCCs) for speaker verification using speech information. For face verification, face region is detected by AdaBoost algorithm and dimension-reduced feature vector is extracted by using principal component analysis only from face region. In this paper, we apply the extracted speech- and face feature vectors to an SVM kernel with Gaussian Mixture Models(GMM) supervector. The experimental results of the proposed approach show a clear improvement compared to a simple GMM or SVM approach.
PDF KSCI

F0 Perturbation as a Perceptual Cue to Stop Distinction in Busan and Seoul Dialects of Korean

Kang, Kyoung-Ho
- Phonetics and Speech Sciences
- /
- v.5 no.4
- /
- pp.137-143
- /
- 2013
Recent investigation of acoustic correlates of Korean stop manner contrasts has reported a diachronic transition in Korean stops: young Seoul speakers are relatively more dependent on the F0 characteristics of the stops than on the VOT characteristics in aspirated and lenis stop distinction. This finding has been examined against tonal dialects of Korean and the results suggested that the speakers of tonal dialects are not sharing the transition. These results also suggested that F0 function for segmental stop classification interferes with the function for lexical tone classification in their tonal speech. The current study investigated these findings in terms of perception. Perceptual behavior of Seoul and Busan speakers of Korean was examined in a comparative manner through the measurement of perceptual cue weight of F0 and VOT in particular. The results from regression and correlation analyses revealed that Busan speakers are closer to older Seoul speakers than to younger Seoul speakers in that the cue weight for VOT and F0 were comparable in the aspirated-lenis stop distinction. This result was in contrast to the perceptual behavior of younger Seoul speakers who showed clear dominance of F0 over VOT for the same distinction. These findings provided perceptual evidence of the dual function of F0 for segmental and lexical distinctions in tonal dialects of Korean.
https://doi.org/10.13064/KSSS.2013.5.4.137 인용 PDF

Voice and Image: A Pilot Study (음성과 인상의 관계규명을 위한 실험적 연구)

Moon Seung-Jae
- MALSORI
- /
- no.35_36
- /
- pp.37-48
- /
- 1998
When we hear someone's voice, even without having met the person before, we usually make up a certain mental image of the person. This study aims at investigating the relationship between the voice and the image information carried within the voice. Does the mental picture created by the voice closely reflect the real image and if not, is it related with the real image at all\ulcorner To answer the first question, a perception experiment was carried out. Speech samples reading a short sentence from 8 males and 8 females were recorded and pictures of subjects were also taken. Ajou University students were asked to participate in the experiment to match the voice with the corresponding picture. Participants in the experiment correctly match 1 female voice and 4 male voices with their corresponding pictures. However, it is interesting to note that even in cases of mismatch, the results show that there is a very strong tendency. In other words, even though participants falsely match a certain voice with a certain picture, majority of them chose the same picture for the voice. It is the case for all mismatches. It seems that voice does give the listener a certain impression about physical characteristics even if it might not be always correct. By showing that there is a clear relationship between voice and image, this study provides a starting point for further research on voice characteristics: what characteristics of the voice carry the relevant information\ulcorner This kind of study will contribute toward the understanding of the affective domain of human voice and toward the speech technology.
PDF

Search Result 115, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)