통합 검색 | Korea Science

음성기반 멀티모달 사용자 인터페이스의 사용성 평가 방법론 (Usability Test Guidelines for Speech-Oriented Multimodal User Interface)

홍기형
- 대한음성학회지:말소리
- /
- 제67호
- /
- pp.103-120
- /
- 2008
Basic components for multimodal interface, such as speech recognition, speech synthesis, gesture recognition, and multimodal fusion, have their own technological limitations. For example, the accuracy of speech recognition decreases for large vocabulary and in noisy environments. In spite of those technological limitations, there are lots of applications in which speech-oriented multimodal user interfaces are very helpful to users. However, in order to expand application areas for speech-oriented multimodal interfaces, we have to develop the interfaces focused on usability. In this paper, we introduce usability and user-centered design methodology in general. There has been much work for evaluating spoken dialogue systems. We give a summary for PARADISE (PARAdigm for Dialogue System Evaluation) and PROMISE (PROcedure for Multimodal Interactive System Evaluation) that are the generalized evaluation frameworks for voice and multimodal user interfaces. Then, we present usability components for speech-oriented multimodal user interfaces and usability testing guidelines that can be used in a user-centered multimodal interface design process.
PDF

음소인식 오류에 강인한 N-gram 기반 음성 문서 검색 (N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors)

이수장;박경미;오영환
- 대한음성학회지:말소리
- /
- 제67호
- /
- pp.149-166
- /
- 2008
In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.
PDF

신뢰성 높은 서브밴드 선택을 이용한 잡음에 강인한 화자식별 (Noise Robust Speaker Identification using Reliable Sub-Band Selection in Multi-Band Approach)

김성탁;지미경;김희린
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
- /
- pp.127-130
- /
- 2007
The conventional feature recombination technique is very effective in the band-limited noise condition, but in broad-band noise condition, the conventional feature recombination technique does not produce notable performance improvement compared with the full-band system. To cope with this drawback, we introduce a new technique of sub-band likelihood computation in the feature recombination, and propose a new feature recombination method by using this sub-band likelihood computation. Furthermore, the reliable sub-band selection based on the signal-to-noise ratio is used to improve the performance of this proposed feature recombination. Experimental results shows that the average error reduction rate in various noise condition is more than 27% compared with the conventional full-band speaker identification system.
PDF

아기 울음의 음향학적 특성 (Acoustic Variation in infant crying)

최윤미;김선준;조찬웅;김현기
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
- /
- pp.146-148
- /
- 2007
Studies of cry characteristics in the newborn infant were aimed to determine if cry analysis could be succesful in the early detection of the infant at risk for developmental difficulties. Crying presupposes functioning of the respiratory, laryngeal and supralaryngeal muscles. The nervous system controls the capacity, stability, and co-ordination of the movements in these muscles. Hence, the cry provides information about how the Nervous System is functioning. 3 patients(down syndrome, cornelia de lange syndrome, Patent ductus arteriosus) were assessed through a Computerized Speech Lab (CSL). Tests had been chosen to assess Fundamental frequency(mean, maximum, minimum values), Melody contour, NHR, Energy. We compared the data from patients and healthy volunteer. Variations in cry characteristics were documented in a number of medical abnormalities.
PDF

ETRI 방송뉴스음성인식시스템 소개 (Introduction of ETRI Broadcast News Speech Recognition System)

박준
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2006년도 춘계 학술대회 발표논문집
- /
- pp.89-93
- /
- 2006
This paper presents ETRI broadcast news speech recognition system. There are two major issues on the broadcast news speech recognition: 1) real-time processing and 2) out-of-vocabulary handling. For real-time processing, we devised the dual decoder architecture. The input speech signal is segmented based on the long-pause between utterances, and each decoder processes the speech segment alternatively. One decoder can start to recognize the current speech segment without waiting for the other decoder to recognize the previous speech segment completely. Thus, the processing delay is not accumulated. For out-of-vocabulary handling, we updated both the vocabulary and the language model, based on the recent news articles on the internet. By updating the language model as well as the vocabulary, we can improve the performance up to 17.2% ERR.
PDF

휴대용 단말기에서 음원 위치 추적 기술 비교 연구 (A Comparative Study of Sound Source Localization Algorithms for Portable Devices)

정재연;육동석
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2006년도 춘계 학술대회 발표논문집
- /
- pp.49-52
- /
- 2006
The performance of a sound source localization system degrades severely in reverberant and noisy environments. In addition, restriction on the distance between microphones, which is required by portable devices, also lower the system performance. This paper compares the sound source localization algorithms based on time delay of arrival, which are robust to reverberation and noises considering microphone sensor distance. As well, post filter which outputs maximum count time delay is adopted to increase the accuracy.
PDF

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

Chao, Hao;Song, Cheng
- Journal of Information Processing Systems
- /
- 제12권3호
- /
- pp.410-421
- /
- 2016
In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.
https://doi.org/10.3745/JIPS.03.0052 인용 PDF KSCI

강인한 화자 확인을 위한 히스토그램 개선 기법 (Histogram Enhancement for Robust Speaker Verification)

최재길;권철홍
- 대한음성학회지:말소리
- /
- 제63호
- /
- pp.153-170
- /
- 2007
It is well known that when there is an acoustic mismatch between the speech obtained during training and testing, the accuracy of speaker verification systems drastically deteriorates. This paper presents the use of MFCCs' histogram enhancement technique in order to improve the robustness of a speaker verification system. The technique transforms the features extracted from speech within an utterance such that their statistics conform to reference distributions. The reference distributions proposed in this paper are uniform distribution and beta distribution. The transformation modifies the contrast of MFCCs' histogram so that the performance of a speaker verification system is improved both in the clean training and testing environment and in the clean training and noisy testing environment.
PDF

소음문장 제거를 위한 음소지속시간 사용 (The Usage of Phoneme Duration Information for Rejecting Garbage Sentences)

구명완;김호경;박성준;김재인
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2003년도 5월 학술대회지
- /
- pp.219-222
- /
- 2003
In this paper, we study the usage of phoneme duration information for rejection garbage sentence. First, we build a phoneme duration modeling in a speech recognition system based on dicicion tree state tying, We assume that phone duration has a Gamma distribution. Next, we build a verification module in which word-level confidence measure is used. Finally, we make a comparative study on phoneme duration with speech DB obtained from the live system. This DB consistes of OOT(out-of-task) and ING(in-grammar) utterences. the usage of phone duration information yields that OOT recognition rate is improved by 46％ and that another 8.4％ error rate is reduced when combined with utterence verification module.
PDF

한글 로마자 번자법(飜字法)과 우리말 로마자 표음법(表音法) - 두 가지 서로 다른 표기방식 대비예시(對比例示)를 곁들여 - (Two Ways of the Romanization of Korean - Transliteration of Hanngul and the Transcription of Korean Sounds -)

유만근
- 대한음성학회지:말소리
- /
- 제35_36호
- /
- pp.63-76
- /
- 1998
The writer discusses the necessity of clear distinction between transliteration and transcription. Romanization problems in Korea have been entangled for decades by confusing and mixing those two. For the transliteration of Hanngul a new system with the utmost simplicity and perfect convertibility is suggested here. For the transcription of Korean sounds another system is suggested which can transcribe even the chroneme as well as all the phonemes. So it surpasses the current Hanngul orthography. Korean sentences containing many pairs of homographic heteronyms are romanized in the two ways side by side for the contrasting of the two systems.
PDF

검색결과 313건 처리시간 0.023초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)