• 제목/요약/키워드: Speech Interference

검색결과 67건 처리시간 0.026초

Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features

  • Park, Taejin;Beack, SeungKwan;Lee, Taejin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제3권5호
    • /
    • pp.259-266
    • /
    • 2014
  • In this paper, we propose a novel technique for noise robust automatic speech recognition (ASR). The development of ASR techniques has made it possible to recognize isolated words with a near perfect word recognition rate. However, in a highly noisy environment, a distinct mismatch between the trained speech and the test data results in a significantly degraded word recognition rate (WRA). Unlike conventional ASR systems employing Mel-frequency cepstral coefficients (MFCCs) and a hidden Markov model (HMM), this study employ histogram of oriented gradient (HOG) features and a Support Vector Machine (SVM) to ASR tasks to overcome this problem. Our proposed ASR system is less vulnerable to external interference noise, and achieves a higher WRA compared to a conventional ASR system equipped with MFCCs and an HMM. The performance of our proposed ASR system was evaluated using a phonetically balanced word (PBW) set mixed with artificially added noise.

의미간섭효과;어휘경쟁가설 대 개념경쟁가설의 비교 (Semantic Interference Effect;Contrasting the Lexical Competition with the Concept Competition Hypothesis)

  • 구민모;남기춘
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.74-77
    • /
    • 2007
  • In order to compare two hypotheses on the origin of semantic interference effect that has been offered in the psycholinguistic literature, we conducted two experiments using the picture-word interference paradigm. When participants named the pictures of the objects simultaneously presented with distractor words, they were required to use either native words (Experiment 1) or loanwords (Experiment 2). The pictures were paired with three kinds of distractor words that were identical, semantically related and neutral to the picture. Two observations were obtained from two experiments. Firstly, the naming times of the pictures were more fast in context of the identical distractors than in context of the neutral ones. Secondly, naming times were more slow in the presence of the semantically related distractors relative to the neutral ones. These findings support the claim that semantic interference is based on a lexical retrieval conflict.

  • PDF

음원신호 추출을 위한 주파수영역 응용모델에 기초한 독립성분분석 (Independent Component Analysis Based on Frequency Domain Approach Model for Speech Source Signal Extraction)

  • 최재승
    • 한국전자통신학회논문지
    • /
    • 제15권5호
    • /
    • pp.807-812
    • /
    • 2020
  • 본 논문은 여러 음원신호가 혼합된 환경에서 목적으로 하는 음원신호만을 분리하기 위하여 마이크로폰을 사용한 블라인드 음원분리 알고리즘을 제안한다. 제안하는 알고리즘은 독립성분분석 방법을 기반으로 한 주파수영역 표현모델이다. 따라서 2 음원에 대한 주파수영역 독립성분분석의 실제 환경에서의 유효성 검증을 목적으로, 음원의 종류를 변경하여 주파수영역 독립성분분석을 실행하여 음원분리를 실시하여 그 향상효과를 검증한다. 파형에 의한 실험결과로부터 원래의 파형과 비교하여 2채널의 음원신호를 깨끗하게 분리할 수 있음을 명확히 하였다. 또한 목표 신호 대 간섭 에너지비율을 사용하여 비교한 실험 결과로부터 본 논문에서 제안한 알고리즘의 음원분리 성능이 기존의 알고리즘에 비하여 성능이 향상되었다는 것을 알 수 있었다.

소음이 외국어 학습에 미치는 영향 (Noise Effects on Foreign Language Learning)

  • 임은수;김현기;김병삼;김종교
    • 음성과학
    • /
    • 제6권
    • /
    • pp.197-217
    • /
    • 1999
  • In a noisy class, the acoustic-phonetic features of the teacher and the perceptual features of learners are changed comparison with a quiet environment. Acoustical analyses were carried out on a set of French monosyllables consisting of 17 consonants and three vowel /a, e, i/, produced by 1 male speaker talking in quiet and in 50, 60 and 70 dB SPL of masking noise on headphone. The results of the acoustic analyses showed consistent differences in energy and formant center frequency amplitude of consonants and vowels, $F_1$ frequency of vowel and duration of voiceless stops suggesting the increase of vocal effort. The perceptual experiments in which 18 undergraduate female students learning French served as the subjects, were conducted in quiet and in 50, 60 dB of masking noise. The identification scores on consonants were higher in Lombard speech than in normal speech, suggesting that the speaker's vocal effort is useful to overcome the masking effect of noise. And, with increased noise level, the perceptual response to the French consonants given had a tendency to be complex and the subjective reaction score on the noise using the vocabulary representative of 'unpleasant' sensation to be higher. And, in the point of view on the L2(second language) acquisition, the influence of L1 (first language) on L2 examined in the perceptual result supports the interference theory.

  • PDF

영어 단어경계에 따른 발화 양상 연구: 한국인 화자와 영어 원어민 화자 비교 분석 (A Study on the Production of the English Word Boundaries: A Comparative Analysis of Korean Speakers and English Speakers)

  • 김지향;김기호
    • 말소리와 음성과학
    • /
    • 제6권1호
    • /
    • pp.47-58
    • /
    • 2014
  • The purpose of this paper is to find out how Korean speakers' speech production in English word boundaries differs from English speakers' and to account for what bring about such differences. Seeing two consecutive words as one single cluster, the English speakers generally pronounce them naturally by linking a word-final consonant of the first word with a word-initial vowel of the second word, while this is not the case with most of the Korean speakers; they read the two consecutive words individually. In consequence, phonological processes such as resyllabification and aspiration can be found in the English speakers' word-boundary production, while glottalization, and unreleased stops are rather common phonological process seen in the Korean speakers' word-boundary production. This may be accounted for by Korean speakers' L1 interference, depending on English proficiency.

한국어 리듬구조에 미치는 L1의 영향: 일본인 학습자를 중심으로 (Native language Interference in producing the Korean rhythmic structure: Focusing on Japanese)

  • 윤영숙
    • 말소리와 음성과학
    • /
    • 제10권4호
    • /
    • pp.45-52
    • /
    • 2018
  • This study investigates the effect of Japanese (L1) on the production of the Korean rhythmic structure. Korean and Japanese have typologically different rhythmic structure as a syllable-timed language and mora-timed language, respectively. This rhythmic difference comes from the different phonological properties of the two languages. Due to this difference, Japanese speakers that are learning Korean may produce a different rhythm than native Korean speakers' rhythm. To investigate the influence of the native language's rhythm on the target language, we conducted an acoustic analysis using acoustic metrics such as %V, VarcoV, and VarcoS. Four Korean native speakers and ten advanced Japanese Korean learners participated in a production test. The analyzed material consisted of six Korean sentences that contained various syllable structures. The results showed that KS and JS's rhythms are different in %V as well as in VarcoV. In the case of VarcoS, significant rhythmic difference was observed in the VC and CVC syllable, in which the coda segment is nasal sound. This study allowed us to observe the influence of L1 on production of L2 rhythm.

유/무성/묵음 정보를 이용한 TTS용 자동음소분할기 성능향상 (Improvement of an Automatic Segmentation for TTS Using Voiced/Unvoiced/Silence Information)

  • 김민제;이정철;김종진
    • 대한음성학회지:말소리
    • /
    • 제58호
    • /
    • pp.67-81
    • /
    • 2006
  • For a large corpus of time-aligned data, HMM based approaches are most widely used for automatic segmentation, providing a consistent and accurate phone labeling scheme. There are two methods for training in HMM. Flat starting method has a property that human interference is minimized but it has low accuracy. Bootstrap method has a high accuracy, but it has a defect that manual segmentation is required In this paper, a new algorithm is proposed to minimize manual work and to improve the performance of automatic segmentation. At first phase, voiced, unvoiced and silence classification is performed for each speech data frame. At second phase, the phoneme sequence is aligned dynamically to the voiced/unvoiced/silence sequence according to the acoustic phonetic rules. Finally, using these segmented speech data as a bootstrap, phoneme model parameters based on HMM are trained. For the performance test, hand labeled ETRI speech DB was used. The experiment results showed that our algorithm achieved 10% improvement of segmentation accuracy within 20 ms tolerable error range. Especially for the unvoiced consonants, it showed 30% improvement.

  • PDF

핸즈프리 전화기를 위한 선형 예측기를 이용한 잔여반향 및 잡음 제거 구조 (A Residual Echo and Noise Reduction Scheme with Linear Prediction for Hands-Free Telephony)

  • 황경록;손경식;김현태
    • 한국음향학회지
    • /
    • 제28권5호
    • /
    • pp.454-460
    • /
    • 2009
  • 본 논문에서는 핸즈프리 전화통신를 위한 선형예측기를 이용한 잔여반향 및 잡음제거구조를 제안하다. 제안하는 구조는 비동시통화구간의 잔여반향신호를 선형예측하여 백색화시킨다. 선형예측에 의해 백색화된 잔여반향신호에는 여전히 음성성분이 남아있다. 제안된 구조는 선형예측오차신호와 선형예측신호의 전력을 이용하여 백색화된 신호를 더욱 더 백색화시킨다. 이러한 백색화 과정을 거치면 동시통화구간에는 근단화자음성과 주변 잡음이 존재하고, 비동시통화구간에는 백색잡음이 존재하게 된다. 근단화자음성과 백색화된 신호를 결합하여 다시 선형예측기에 통과시켜 배경잡음을 추가로 제거한다. 컴퓨터 시뮬레이션을 통해 제안하는 방법이 AIC (acoustic interference cancellation) 측면에서 우수함을 보인다.

음향 실험을 기초로 한 몽골어와 한국어의 단모음 대조분석 (Contrastive Analysis of Mongolian and Korean Monophthongs Based on Acoustic Experiment)

  • 이중진
    • 말소리와 음성과학
    • /
    • 제2권2호
    • /
    • pp.3-16
    • /
    • 2010
  • This study aims at setting the hierarchy of difficulty of the 7 Korean monophthongs for Mongolian learners of Korean according to Prator's theory based on the Contrastive Analysis Hypothesis. In addition to that, it will be shown that the difficulties and errors for Mongolian learners of Korean as a second or foreign language proceed directly from this hierarchy of difficulty. This study began by looking at the speeches of 60 Mongolians for Mongolian monophthongs; data were investigated and analyzed into formant frequencies F1 and F2 of each vowel. Then, the 7 Korean monophthongs were compared with the resultant Mongolian formant values and are assigned to 3 levels, 'same', 'similar' or 'different sound'. The findings in assessing the differences of the 8 nearest equivalents of Korean and Mongolian vowels are as follows: First, Korean /a/ and /$\wedge$/ turned out as a 'same sound' with their counterparts, Mongolian /a/ and /ɔ/. Second, Korean /i/, /e/, /o/, /u/ turned out as a 'similar sound' with each their Mongolian counterparts /i/, /e/, /o/, /u/. Third, Korean /ɨ/ which is nearest to Mongolian /i/ in terms of phonetic features seriously differs from it and is thus assigned to 'different sound'. And lastly, Mongolian /$\mho$/ turned out as a 'different sound' with its nearest counterpart, Korean /u/. Based on these findings the hierarchy of difficulty was constructed. Firstly, 4 Korean monophthongs /a/, /$\wedge$/, /i/, /e/ would be Level 0(Transfer); they would be transferred positively from their Mongolian counterparts when Mongolians learn Korean. Secondly, Korean /o/, /u/ would be Level 5(Split); they would require the Mongolian learner to make a new distinction and cause interference in learning the Korean language because Mongolian /o/, /u/ each have 2 similar counterpart sounds; Korean /o, u/, /u, o/. Thirdly, Korean /ɨ/ which is not in the Mongolian vowel system will be Level 4(Overdifferentiation); the new vowel /ɨ/ which bears little similarity to Mongolian /i/, must be learned entirely anew and will cause much difficulty for Mongolian learners in speaking and writing Korean. And lastly, Mongolian /$\mho$/ will be Level 2(Underdifferentiation); it is absent in the Korean language and doesn‘t cause interference in learning Korean as long as Mongolian learners avoid using it.

  • PDF

CD/CDMA 시스템에서의 제한된 처리 지연 시간을 고려한 단단계 간섭 제거 방식에 대한 성능 분석 (Performance analysis of multistage interference cancellation schemes for a DS/CDMA system subject to delay constraint)

  • 황선한;강충구
    • 한국통신학회논문지
    • /
    • 제22권12호
    • /
    • pp.2653-2663
    • /
    • 1997
  • 직렬 및 병렬 간섭 제거 방식은 기존의 상관 수신기를 기본적인 구성요소로 하는 단순한 구조의 다단 간섭 제거 방식으로서, 다중 접속 환경에서 DS/CDMA 시스템의 성능을 향상시킬 수 있는 방안으로 제시되었다. 본 논문에 서는 해석적 분석을 통해 가우시안 채널에서는 병렬 방식이 직렬 방식에 비해 성능이 항상 우수한 반면, 페이딩 채널에서는 성능 및 복잡도 등을 동시에 고려할 때 직렬 간섭 제거 방식이 보다 유리하다는 결과를 확인한다. 또한, 음성 및 동영상등 실시간 전송을 요구하는 응용에서 허용 가능한 처리 지연 시간을 만족하기 위해서 제거 단계의 수를 제한한 그룹별 직렬 간섭 제거 방식에 대한 성능을 분석한다. 이 결과에 따르면 그룹별 간섭 제거 방식을 통해 제한된 간섭 제거 단계로 인한 성능 열화에 효율적으로 대응할 수 있음을 확인할 수 있고, 특히 각 제거 단계에서 동시에 제거하는 사용자 수는 시스템 내의 전체 사용자 수에 따라 최적의 값이 존재함을 알 수 있다.

  • PDF