통합 검색 | Korea Science

범용 멀티미디어 프로세서 구조 개발에 관한 연구 (A study on the Development of General-Purpose Multimedia Processor Architecture)

오명훈;박성모
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 1998년도 추계종합학술대회 논문집
- /
- pp.1149-1152
- /
- 1998
멀티미디어 데이터를 아날로그 방식보다는 디지털 방식으로 처리하게 되면 여러 면에서 이득을 볼 수 있다. 멀티미디어 데이터를 디지털 방식으로 처리하는 방법 중 범용프로세서에서 멀티미디어 명령어에 의해 처리하게 되면 flexibility를 증가시키며 효율적으로 프로그램할 수 있다. 본 논문에서는 범용 프로세서 안에서 멀티미디어 데이터를 효율적으로 처리할 수 있는 명령어 집합 구조와 이를 수행할 수 있는 프로세서의 구조를 제안하고 이를 HDL(Hardware Description Language)로 동작레벨에서 기술하고 시뮬레이션 하였다. 제안된 멀티미디어 명령어는 특성에 따라 8개의 그룹에 총 55개의 명령어로 구성되며 64비트 데이터 안에서 각각 8비트의 8바이트, 16비트의 4하프워드, 32비트의 2워드의 부워드(subword) 데이터들을 병렬 처리한다. 모델링된 프로세서는 오픈아키텍쳐(Open Architecture)인 SPARC V.9 의 정수연산장치(Integer Unit)에 기반을 두었으며 하바드 구조를 지닌 5단 파이프라인 RISC 형태이다.
PDF

한국어 음성인식을 위한 음성학 기반의 유사음소단위 집합 설계 (A Phonetics Based Design of PLU Sets for Korean Speech Recognition)

홍혜진;김선희;정민화
- 대한음성학회지:말소리
- /
- 제65호
- /
- pp.105-124
- /
- 2008
This paper presents the effects of different phone-like-unit (PLU) sets in order to propose an optimal PLU set for the performance improvement of Korean automatic speech recognition (ASR) systems. The examination of 9 currently used PLU sets indicates that most of them include a selection of allophones without any sufficient phonetic base. In this paper, a total of 34 PLU sets are designed based on Korean phonetic characteristics arid the effects of each PLU set are evaluated through experiments. The results show that the accuracy rate of each phone is influenced by different phonetic constraint(s) which determine(s) the PLU sets, and that an optimal PLU set can be anticipated through the phonetic analysis of the given speech data.
PDF

문맥종속 반음소단의 모델을 이용한 자동 음소분할 및 레이블링 시스템의 구현 (The Implementation of Automatic Segmentation and Labelling System Using Context-dependent Demi-phone)

김태환
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1998년도 학술발표대회 논문집 제17권 2호
- /
- pp.351.2-356
- /
- 1998
음소 단위로 레이블링된 데이터베이스는 음성연구에 있어 매우 중요하다. 그러나 수작업에 의한 음소분할 및 레이블링 작업은 많은 시간과 노력이 필요하기 때문에 자동 음소분할 및 레이블링 시스템에 대한 많은 연구가 진행되고 있다. 본 논문에서는 monophone과 triphone의 장점을 포함하는 문맥 종속 반음소 단위 모델을 이용한 자동 음소분할 및 레이블링 시스템을 구현하였다. 레이블링 단위로는 68개의 유사음소와 묵음 등 총 69개로 정하였으며, 음소 모델링은 연속 HMM을 사용하였다. 기존의 subword 단위모델과 본 논문에서 제안한 문맥종속 반음소 모델을 이용한 자동 음소분할 및 레이블링 시스템의 성능 비교 음소경계오차가 10ms 이내인 경우 각각 60.17%, 66.32%를 포함하여 6.15%의 향상을 보이고, 40ms 이내인 경우 90.36%, 94.27%를 포함하여 3.92%의 성능향상을 보였다.
PDF

다이폰 기반의 Generic Word Model을 이용한 거절 알고리즘 (A Study on the Rejection Algorithm Using Generic Word Model Based on Diphone Subword Unit)

정익주;정훈
- 음성과학
- /
- 제10권2호
- /
- pp.15-25
- /
- 2003
In this paper, we propose an algorithm on OOV(Out-of-Vocabulary) rejection based on two-stage method. In the first stage, the algorithm rejects OOVs using generic word model, and then in the second stage, for further reduction of false acceptance, it rejects words which have low similarity to the candidate by measuring the distance between HMM models. For the experiment, we choose 20 in-vocabulary words out of PBW445 DB distributed by ETRI. In case that the first stage is processed only, the false acceptance is 3% with 100% correct acceptance, and in case both stages are processed, the false acceptance is reduced to 1% with 100% correct acceptance.
PDF

2$\omega$-FINITE AUTOMATA AND SETS OF OBSTRUCTIONS OF THEIR SLNGUAGES

Duan, Qi;Li, Botang;Djidjeli, K.;Price, W.G.;Twizell, E.H.
- Journal of applied mathematics & informatics
- /
- 제6권3호
- /
- pp.783-798
- /
- 1999
Nondeterministic finite Rabin-Scott's automata without initial and final states (2 $\omega$-FA) are considered. In this paper they are used to define so called sets of obstruction used also in various alge-braic systems and to consider similar problems for the formal languages theory. Thus we define sets of obstructions of languages(or, rather, 2$\omega$-languages) of such automata. We obtain that each 2$\omega$-language defined by 2 $\omega$-FA has the set of obstruction being a regular language. And vice versa for each regular language L(containing no proper subword of its another word) there exists a 2$\omega$ -FA having L as the set of obstructions.

확률 발음사전을 이용한 대어휘 연속음성인식 (Large Vocabulary Continuous Speech Recognition using Stochastic Pronunciatioin Lexicon Modeling)

윤성진
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
- /
- pp.315-319
- /
- 1998
대어휘 연속음성인식을 위한 확률 발음사전 모델에 대해서 제안하였다. 제안된 확률 발음 사전은 연속음성과 같은 자연스런 발성에서 자주 발생되는 단어의 변이를 확률적인 subword-state로 이루어진 HMM으로 모델화 함으로써 단어의 발음 변이를 효과적으로 표현할 수 있으며, 단위 인식 시스템의 성능을 보다 높일 수 있도록 구성되었다. 확률 발음사전의 생성은 음성 자료와 음소 모델을 이용하여 단어 단위의 분할과 학습을 통해서 자동으로 생성되게 됨 음소와 같은 언어학적인 단위뿐만 아니라 PLU 이나 비언어학적인 인식 모델을 이용한 연속음성인식기에도 적용이 가능하다.연속음성인식실험결과 확률 발음사전을 사용함으로써 표준 발음 표기를 사용하는 인식 시스템에 비해 단어 오류율은 39.8%, 문장 오류율은 24.4%의 큰 폭으로 오류율을 감소시킬 수 있었다.
PDF

Modified Phonetic Decision Tree For Continuous Speech Recognition

Kim, Sung-Ill;Kitazoe, Tetsuro;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- 제17권4E호
- /
- pp.11-16
- /
- 1998
For large vocabulary speech recognition using HMMs, context-dependent subword units have been often employed. However, when context-dependent phone models are used, they result in a system which has too may parameters to train. The problem of too many parameters and too little training data is absolutely crucial in the design of a statistical speech recognizer. Furthermore, when building large vocabulary speech recognition systems, unseen triphone problem is unavoidable. In this paper, we propose the modified phonetic decision tree algorithm for the automatic prediction of unseen triphones which has advantages solving these problems through following two experiments in Japanese contexts. The baseline experimental results show that the modified tree based clustering algorithm is effective for clustering and reducing the number of states without any degradation in performance. The task experimental results show that our proposed algorithm also has the advantage of providing a automatic prediction of unseen triphones.
PDF

대역폭 변화에 따른 음성 인식률 비교연구 (A Comparative Study of Recognition Rate According to the Variance of Speech Bandwidth)

손일현;도삼주;구명완
- 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
- /
- 한국정보과학회언어공학연구회 1992년도 제4회 한글 및 한국어정보처리 학술대회
- /
- pp.193-199
- /
- 1992
이 논문에서는 123개 단어의 한국어 음성에 대하여 음성의 대역폭 변화에 따른 인식률을 비교하였다. 인식률 비교실험을 위해 hidden Markov model과 음소와 유사한 131개의 한국어 subword 유니트를 사용한 화자독립 격리단어 인식 시스팀을 사용하였다. 이 실험은 대역폭이 각각 0 - 4.5kHz 및 0.3 - 3.3kHz인 두가지 종류의 음성 데이타베이스를 사용하였다. 훈련과정에서 corrective training의 반복회수를 2로 하고 state transition duration 정보를 사용하였을 때, 0 - 4.5kHz 와 0.3 - 3.3kHz 대역폭에 대해 각각 98.8 % 및 98.2 % 의 최고 인식률을 얻었다. 이로부터 전화대역폭에서도 음성인식률은 크게 저하되지 않음을 알 수 있다.
PDF

HMM-Based Automatic Speech Recognition using EMG Signal

Lee Ki-Seung
- 대한의용생체공학회:의공학회지
- /
- 제27권3호
- /
- pp.101-109
- /
- 2006
It has been known that there is strong relationship between human voices and the movements of the articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The EMG signals were acquired from three articulatory facial muscles. Preliminary, 10 Korean digits were used as recognition variables. The various feature parameters including filter bank outputs, linear predictive coefficients and cepstrum coefficients were evaluated to find the appropriate parameters for EMG-based speech recognition. The sequence of the EMG signals for each word is modelled by a hidden Markov model (HMM) framework. A continuous word recognition approach was investigated in this work. Hence, the model for each word is obtained by concatenating the subword models and the embedded re-estimation techniques were employed in the training stage. The findings indicate that such a system may have a capacity to recognize speech signals with an accuracy of up to 90%, in case when mel-filter bank output was used as the feature parameters for recognition.
https://doi.org/10.9718/JBER.2006.27.3.101 인용 PDF KSCI

상태공유 HMM을 이용한 서브워드 단위 기반 립리딩 (Subword-based Lip Reading Using State-tied HMM)

김진영;신도성
- 음성과학
- /
- 제8권3호
- /
- pp.123-132
- /
- 2001
In recent years research on HCI technology has been very active and speech recognition is being used as its typical method. Its recognition, however, is deteriorated with the increase of surrounding noise. To solve this problem, studies concerning the multimodal HCI are being briskly made. This paper describes automated lipreading for bimodal speech recognition on the basis of image- and speech information. It employs audio-visual DB containing 1,074 words from 70 voice and tri-viseme as a recognition unit, and state tied HMM as a recognition model. Performance of automated recognition of 22 to 1,000 words are evaluated to achieve word recognition of 60.5% in terms of 22word recognizer.
PDF

검색결과 38건 처리시간 0.038초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)