• 제목/요약/키워드: Speech pattern

검색결과 412건 처리시간 0.021초

사용자 발화 순차패턴을 이용한 음성인식 후처리 (Post-Processing of Speech Recognition Using User Utterance Sequential Pattern)

  • 송원문;김은주;김명원
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2005년도 한국컴퓨터종합학술대회 논문집 Vol.32 No.1 (B)
    • /
    • pp.709-711
    • /
    • 2005
  • 최근 음성인식 분야에서는 발화된 음성의 단순한 신호 처리위주의 인식 결과로부터 좀 더 신뢰할 수 있는 결과를 얻기 위하여 여러 가지 후처리 기법들이 연구되고 있다. 본 논문에서는 개인 사용자를 위한 음성 명령어 인식 환경에서 사용자의 발화 정보를 후처리에 적용함으로써 사용자 정보를 고려한 음성인식 후처리 기법을 제안한다. 먼저 이전에 사용했던 음성 명령어들로부터 명령어 발화 순차 패턴 규칙을 추출 한 후 사용자가 사전에 발화한 명령어를 바탕으로 구성된 순차 패턴을 비교하여 순차 규칙상 얻어 질 수 있는 단어를 결정한다. 이렇게 얻어진 단어를 고려하여 음성인식기 인식단어 후보들의 확률값을 적절히 보정한 후 최종 인식 단어를 재결정한다. 이러한 과정에서 적절한 보정을 위하여 발화 순차 패턴의 신뢰도와 인식기의 결과단어를 고려한 보정 방법을 제안한다. 실험을 통하여 제안한 후처리를 이용한 음성인식이 HMM을 이용한 기본 음성인식에 비해 오류율을 $15\%$이상 낮추어 인식률에 상당한 기여를 하였음을 확인할 수 있다.

  • PDF

시공간 패턴인식 신경망에 의한 단어 인식에 관한 연구 (A Study on Recognition of Spoken Numbers Using Spatio-Tempora1 Pattern Recognizer)

  • 박경철;김헌기;이종호
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1993년도 하계학술대회 논문집 A
    • /
    • pp.495-497
    • /
    • 1993
  • This paper presents spoken numbers recognition method using a spatio-temporal network This network is efficient in processing the spectrum sequences of speech patterns as spatio-temporal patterns. The number of windows and channels is experimentally determined. The recognition rate has been improved by experiments done on various parameters. The test data is collected form 10 numbers spoken by 2 male and female speakers. A recognition rate of 80% was obtained on a test set of 50 words.

  • PDF

영어 나열형 고립 단어 읽기에서 어말 폐쇄음의 파열 양상 및 그 음성적 상관성 (Aspects of the word-final stop releasing and its phonetic correlates in reading the English isolated words enumerated)

  • 이석재;강수하;박지현;황선민
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.61-68
    • /
    • 2003
  • This experimental research shows that, in reading of the English isolated words that are enumerated, the releasing of the word-final stop is employed for signaling enumeration in company with the well-known intonational pattern for it. Furthermore, this study tries to find the conceivable phonetic correlates of the releasing of the stop in word-final position, focusing on the association of the stop releasing/nonreleasing with i) the POA (Place of Articulation) distinction of the word-final stop, ii) the various qualities of the preceding vowel placed before the final stop, and iii) the voice distinction of the stop in the word-final position.

  • PDF

Recognizing Hand Digit Gestures Using Stochastic Models

  • Sin, Bong-Kee
    • 한국멀티미디어학회논문지
    • /
    • 제11권6호
    • /
    • pp.807-815
    • /
    • 2008
  • A simple efficient method of spotting and recognizing hand gestures in video is presented using a network of hidden Markov models and dynamic programming search algorithm. The description starts from designing a set of isolated trajectory models which are stochastic and robust enough to characterize highly variable patterns like human motion, handwriting, and speech. Those models are interconnected to form a single big network termed a spotting network or a spotter that models a continuous stream of gestures and non-gestures as well. The inference over the model is based on dynamic programming. The proposed model is highly efficient and can readily be extended to a variety of recurrent pattern recognition tasks. The test result without any engineering has shown the potential for practical application. At the end of the paper we add some related experimental result that has been obtained using a different model - dynamic Bayesian network - which is also a type of stochastic model.

  • PDF

핵심어 검출을 위한 단일 끝점 DTW알고리즘 (A Single-End-Point DTW Algorithm for Keyword Spotting)

  • 최용선;오상훈;이수영
    • 대한전자공학회논문지SP
    • /
    • 제41권3호
    • /
    • pp.209-219
    • /
    • 2004
  • 본 논문에서는 핵심어 검출 시스템을 실시간 적용이 가능한 하드웨어로 구현하기 위해 연산량이 적고 구조가 간단한 단일 끝점 DTW 방법을 제안한다. 제안된 알고리즘은 일반적 DTW가 양쪽 끝점을 요구하는데 비하여 단지 한쪽 끝점만 필요하므로 이용하기에 편리하며, 국부 검색의 연속이 전역 경로를 이루게 되므로 매우 적은 연산량을 가진다. 그리고, 제안한 단일 끝점 DTW가 보다 나은 성능을 지니도록 하기 위해 새로운 경사 가중치와 거리 측정법을 가지도록 하였다. 이외에도, 단일 끝점 DTW는 특징벡터 정규화를 적용하여 특징벡터 각각의 차원에서 데이터들이 같은 표준편차를 가지게 하며 모든 프레임이 같은 에너지를 가지도록 정규화 되었다 또한, 주어진 학습 패턴들에 클러스터링을 적용한 후, 각 클러스터 내에서 평균을 계산하여 구한 패턴을 해당 핵심어를 대표하는 여러 개의 기준패턴으로 삼았다. 이러한 기준패턴들과 입력 음성의 특징벡터가 이미 정해진 문턱값 보다 작은 거리 내에 있을 때 핵심어는 검출된다. 제안된 알고리즘을 고립단어 음성인식과 핵심어 검출 실험에 적용하여 다른 방법을 이용한 결과보다 성능이 뛰어남을 확인하였다.

Supralaryngeal Articulatary Characteristics of Coronal Consonants /n, t, $t^h$, $t^*$/ in Korean

  • Son, Min-Jung;Kim, Sa-Hyang;Cho, Tae-Hong
    • 말소리와 음성과학
    • /
    • 제3권4호
    • /
    • pp.33-43
    • /
    • 2011
  • The present study investigates supralaryngeal articulatory characteristics of denti-alveolar (coronal) stops /t, $t^h$, $t^*$/ and /n/ in /aCa/ context in Seoul Korean. An Electromagnetic Articulograph (EMA, Carstens) was used to explore kinematics of the consonants by examining the kinematic data of the tongue tip (the primary articulator for the coronal consonants), along with some additional supplementary position data of the tongue body, the tongue dorsum and the jaw. The results showed that the constriction duration was the most robust articulatory correlates of the three-way stop contrast with a pattern of /t/$t^h$/$t^*$/. The contrast was further reinforced by the tongue body position (higher for /$t^h$, $t^*$/) and the tongue tip opening displacement (less displaced for /$t^h$, $t^*$/). The articulation of /n/ was quite similar to that of the lenis /t/ in terms of the constriction duration, and it was different from the oral stops in that it was produced with larger tongue tip displacement and lower jaw position than the oral stops, indicating its weak articulatory nature. The results are also discussed in comparison with those of bilabial stops with implications that the three-way contrast may be kinematically expressed differently depending on the physiological constraints imposed on the primary articulator (the tongue tip versus the lips). The present study, therefore, provides new articulatory (kinematic) data of denti-alveolar consonants in Korean, and demonstrates that the three-way stops, that have been known to differ primarily in their laryngeal settings, are indeed produced with kinematic distinctions at the supralaryngeal level.

  • PDF

에너지 라벨링 그룹화를 이용한 고속 음성인식시스템 (Fast Speech Recognition System using Classification of Energy Labeling)

  • 한수영;김홍렬;이기희
    • 한국컴퓨터정보학회논문지
    • /
    • 제9권4호
    • /
    • pp.77-83
    • /
    • 2004
  • 본 논문에서는 입력된 음성의 음소단위로 추출된 에너지 파라미터를 이용하여 에너지를 라벨링(energy labeling)하고 라벨링된 값에 따라 입력 음성을 그룹화하였다. 그리고 동적패턴정합 수행 시 입력된 실험음성에서 검출된 에너지의 크기에 따라 선택된 라벨의 그룹 내에서 DTW를 수행시켜 처리시간을 단축시켜 저가형 프로세서에서도 고속으로 동작할 수 있게 하고자 하였다. 본 논문의 음성 라벨링 단계는 음성의 구간 검출 및 에너지 파라미터의 추출 단계에서 정확한 파라미터의 검출을 전제로 하기 때문에 이를 보완하기 위해 피치의 주기에 따른 가변윈도우를 사용하였다. 피치주기를 먼저 구하고 그 주기에 200 프레임에서 300프레임 사이에서 윈도우의 크기를 결정함으로써 윈도우의 영향이 제거된 에너지를 구하는 방법을 제안하였다. 실험결과 제안된 방법이 약 $25\%$ 정도의 연산량을 감소시켰다.

  • PDF

The continuous or categorical effects for HH vs. HL and HH vs. LH in lexical pitch accent contrasts of Korean

  • Kim, Jungsun
    • 말소리와 음성과학
    • /
    • 제6권4호
    • /
    • pp.53-65
    • /
    • 2014
  • The current research examines whether pitch contour shapes in North Kyungsang pitch accent contrasts provide a phonetic dimension for phonological discreteness in a mimicry task. Two pitch accent continua resynthesized were created for HH vs. HL and HH vs. LH. To confirm a phonetic dimension for accounting for pitch accent categories in North Kyungsang Korean, the mimicries of speakers of two dialects (i.e., North Kyungsang & South Cholla) were compared. One of the findings showed that, for North Kyungsang speakers, the range of mean f0 peak times was a phonetic dimension undergoing a continuous shift within a stimulus continuum for both HH vs. HL and HH vs. LH. On the other hand, for South Cholla speakers, there were no apparent shifts around categorical boundaries for either HH vs. HL or HH vs. LH. Regarding individual mimicries on f0 peak timing, there are many variations. For HH vs. LH, three North Kyungsang speakers showed a discrete pattern reflecting a shift in phonological categories, but for HH vs. HL, there was no such distinction showing a categorical shift, though there were statistically significant differences for two speakers. Interestingly, one of the North Kyungsang speakers showed a continuous phonetic dimension for both HH vs. HL and HH vs. LH. Lastly, the f0 valley timing did not exhibit a discrete or gradient phonetic dimension for speakers of either dialect. On the basis of these results, what is interesting is that the tonal target such as high tone in North Kyungsang pitch accent categories within the autosegmental-metrical (AM) theory may be realized within individual cognitive systems for representing the interaction of perception and production.

한국어 파열자음의 특성에 관한 연구 (The Study on the Characteristics of Korean Stop Consonants)

  • 서동일;표화영;강성석;최홍식
    • 대한후두음성언어의학회지
    • /
    • 제8권2호
    • /
    • pp.217-224
    • /
    • 1997
  • The present study was performed to investigate the voice onset time(VOT) of Korean stop consonants as the expanded research of Pyo and Choi(1996) : the intensity, and the air flow rate of Korean stops as the preliminary study f3r the classical singing training. Nine Korean stops(/P, P', $P^{h}$/, /t, t', $t^{h}$/, /k, k', $k^{h}$/) and a vowel /a/ were used as speech materials. CV and VCV syllable patterns were used for VOT measurement, and CV pattern was used for intensity and air flow rate measurement. Five males and five females pronounced the speech tasks with comfortable pitch and intensity : VOT, intensity, and air flow rate were measured. As results, the prevocalic stop consonants showed bilabials, the shortest VOT and velars, the longest one, except the unaspirated stops which showed the shortest was velar /k'/, and the alveolar /t'/ was the longest. Considering the tensity, heavily aspirated stops showed the longest, and the unaspirated, the shortest. Also the intervocalic stops showed similar results with the prevocalic stops, except the slightly aspirated stops which showed alveolar sound was the longest, and the bilabials, which showed the shortest was the slightly aspirated /p/, unlike the prevocalic stops, the unaspirated /p'/ the shortest. All of prevocalic stops showed the highest air flow rate in heavily aspirated stops, the second, thee slightly aspirated ones, and the lowest was the unaspirated stops. And as a whole, bilabials were the highest, and velars, the lowest, except in the heavily aspirated stops, which was the alveolar sound, the lowest. In the dimension of intensity, the unaspirated and bilabials were the highest, and the heavily aspirated and velars were e lowest, except the slightly aspirated stops, which were the bilabials the lowest, and the alveolars the highest.

  • PDF

Some Notes on Articulatory Correlates of Three-way Bilabial Stop Contrast in /Ca/ Context in Korean: An Electromagnetic Articulography (EMA) Study

  • Son, Min-Jung;Cho, Tae-Hong
    • 말소리와 음성과학
    • /
    • 제2권4호
    • /
    • pp.119-127
    • /
    • 2010
  • Recently, we have launched a large-scale articulatory study to investigate how the three-way contrastive stops (i.e., lenis, fortis, and aspirated) in Korean are kinematically expressed (i.e., in terms of articulatory movement characteristics) in various contexts, using a magnetometer (Electromagnetic Articulography). In this paper, we report some preliminary results about how the three-way bilabial series /p,$p^h,p^*$/ produced in /Ca/ context in isolation are kinematically characterized not only during the lip closure but also during the following vocalic articulation. Some important notes could be made from the results. First, the degree of lip constriction (as measured by the lip aperture between the upper and lower lips) was smaller for the lenis /p/ and larger for the fortis/aspirated /$p^*,p^h$/, showing a two-way distinction during the closure. Second, the tongue lowering for the following vowel was more extreme after the lenis /p/ than after the fortis/aspirated /$p^*,p^h$/. Regarding this vocalic articulatory difference in the tongue height, we discussed the possibility that the articulatory tension associated with the fortis/aspirated stops is further reflected in the lingual vocalic movement maintaining the tongue position to a certain level for the following vowel /a/, while the lenis consonant does not impose such articulatory constraints, resulting in more tongue lowering. Finally, the temporal relationship between the release of the stop closure and the lowest tongue position of the following vowel remained constant, suggesting that CV coordination is invariantly maintained across the consonant type. This pattern was interpreted as supporting the view that the consonant and vowel gestures are coordinated in much the same way across languages.

  • PDF