Search | Korea Science

Usability Test Guidelines for Speech-Oriented Multimodal User Interface (음성기반 멀티모달 사용자 인터페이스의 사용성 평가 방법론)

Hong, Ki-Hyung
- MALSORI
- /
- no.67
- /
- pp.103-120
- /
- 2008
Basic components for multimodal interface, such as speech recognition, speech synthesis, gesture recognition, and multimodal fusion, have their own technological limitations. For example, the accuracy of speech recognition decreases for large vocabulary and in noisy environments. In spite of those technological limitations, there are lots of applications in which speech-oriented multimodal user interfaces are very helpful to users. However, in order to expand application areas for speech-oriented multimodal interfaces, we have to develop the interfaces focused on usability. In this paper, we introduce usability and user-centered design methodology in general. There has been much work for evaluating spoken dialogue systems. We give a summary for PARADISE (PARAdigm for Dialogue System Evaluation) and PROMISE (PROcedure for Multimodal Interactive System Evaluation) that are the generalized evaluation frameworks for voice and multimodal user interfaces. Then, we present usability components for speech-oriented multimodal user interfaces and usability testing guidelines that can be used in a user-centered multimodal interface design process.
PDF

Emotion Classification of User's Utterance for a Dialogue System (대화 시스템을 위한 사용자 발화 문장의 감정 분류)

Kang, Sang-Woo;Park, Hong-Min;Seo, Jung-Yun
- Korean Journal of Cognitive Science
- /
- v.21 no.4
- /
- pp.459-480
- /
- 2010
A dialogue system includes various morphological analyses for recognizing a user's intention from the user's utterances. However, a user can represent various intentions via emotional states in addition to morphological expressions. Thus, a user's emotion recognition can analyze a user's intention in various manners. This paper presents a new method to automatically recognize a user's emotion for a dialogue system. For general emotions, we define nine categories using a psychological approach. For an optimal feature set, we organize a combination of sentential, a priori, and context features. Then, we employ a support vector machine (SVM) that has been widely used in various learning tasks to automatically classify a user's emotions. The experiment results show that our method has a 62.8% F-measure, 15% higher than the reference system.
PDF

Example-based Dialog System for English Conversation Tutoring (영어 회화 교육을 위한 예제 기반 대화 시스템)

Lee, Sung-Jin;Lee, Cheong-Jae;Lee, Geun-Bae
- Journal of KIISE:Software and Applications
- /
- v.37 no.2
- /
- pp.129-136
- /
- 2010
In this paper, we present an Example-based Dialogue System for English conversation tutoring. It aims to provide intelligent one-to-one English conversation tutoring instead of old fashioned language education with static multimedia materials. This system can understand poor expressions of students and it enables green hands to engage in a dialogue in spite of their poor linguistic ability, which gives students interesting motivation to learn a foreign language. And this system also has educational functionalities to improve the linguistic ability. To achieve these goals, we have developed a statistical natural language understanding module for understanding poor expressions and an example-based dialogue manager with high domain scalability and several effective tutoring methods.
PDF KSCI

Prediction of Domain Action Using a Neural Network (신경망을 이용한 영역 행위 예측)

Lee, Hyun-Jung;Seo, Jung-Yun;Kim, Hark-Soo
- Korean Journal of Cognitive Science
- /
- v.18 no.2
- /
- pp.179-191
- /
- 2007
In a goal-oriented dialogue, spoken' intentions can be represented by domain actions that consist of pairs of a speech art and a concept sequence. The domain action prediction of user's utterance is useful to correct some errors that occur in a speech recognition process, and the domain action prediction of system's utterance is useful to generate flexible responses. In this paper, we propose a model to predict a domain action of the next utterance using a neural network. The proposed model predicts the next domain action by using a dialogue history vector and a current domain action as inputs of the neural network. In the experiment, the proposed model showed the precision of 80.02% in speech act prediction and the precision of 82.09% in concept sequence prediction.
PDF

Virtual Reality based Situation Immersive English Dialogue Learning System (가상현실 기반 상황몰입형 영어 대화 학습 시스템)

Kim, Jin-Won;Park, Seung-Jin;Min, Ga-Young;Lee, Keon-Myung
- Journal of Convergence for Information Technology
- /
- v.7 no.6
- /
- pp.245-251
- /
- 2017
This presents an English conversation training system with which learners train their conversation skills in English, which makes them converse with native speaker characters in a virtual reality environment with voice. The proposed system allows the learners to talk with multiple native speaker characters in varous scenarios in the virtual reality environment. It recongizes voices spoken by the learners and generates voices by a speech synthesis method. The interaction with characters in the virtual reality environment in voice makes the learners immerged in the conversation situations. The scoring system which evaluates the learner's pronunciation provides the positive feedback for the learners to get engaged in the learning context.
https://doi.org/10.22156/CS4SMB.2017.7.6.245 인용 PDF KSCI

Spoken Dialogue Management System based on Word Spotting (단어추출을 기반으로 한 음성 대화처리 시스템)

Song, Chang-Hwan;Yu, Ha-Jin;Oh, Yung-Hwan
- Annual Conference on Human and Language Technology
- /
- 1994.11a
- /
- pp.313-317
- /
- 1994
본 연구에서는 인간과 컴퓨터 사이의 음성을 이용한 대화 시스템을 구현하였다. 특별히 음성을 인식하는데 있어서 단어추출(word apotting) 방법을 사용하는 경우에 알맞은 의미 분석 방법과 도표 형태의 규칙을 기반으로 하여 시스템의 응답을 생성하는 방법에 대하여 연구하였다. 단어추출 방법을 사용하여 음성을 인식하는 경우에는 형태소분석 및 구문분석의 과정을 이용하여 사용자의 발화 의도를 분석하기 어려우므로 새로운 의미분석 방법을 필요로 한다. 본 연구에서는 퍼지 관계를 사용하여 사용자의 발화 의도를 파악하는 새로운 의미분석 방법을 제안하였다. 그리고, 사용자의 발화 의도에 적절한 시스템의 응답을 만들고 응답의 내용을 효율적으로 관리하기 위한 방범으로 현재의 상태와 사용자의 의도에 따른 응답 규칙을 만들었다. 이 규칙은 도표의 형태로 구현되어 규칙의 갱신 및 확장을 편리하게 만들었다. 대화의 영역은 열차 예매에 관련된 예매, 취소, 문의 및 관광지 안내로 제안하였다. 음성의 오인식에 의한 오류에 적절히 대처하기 위해 시스템의 응답은 확인 및 수정 과정을 포함하고 있다. 본 시스템은 문자 입력과 음성 입력으로 각각 실험한 결과, 사용자는 시스템의 도움을 받아 자신이 의도하는 목적을 달성할 수 있었다.
PDF

Generative Interactive Psychotherapy Expert (GIPE) Bot

Ayesheh Ahrari Khalaf;Aisha Hassan Abdalla Hashim;Akeem Olowolayemo;Rashidah Funke Olanrewaju
- International Journal of Computer Science & Network Security
- /
- v.23 no.4
- /
- pp.15-24
- /
- 2023
One of the objectives and aspirations of scientists and engineers ever since the development of computers has been to interact naturally with machines. Hence features of artificial intelligence (AI) like natural language processing and natural language generation were developed. The field of AI that is thought to be expanding the fastest is interactive conversational systems. Numerous businesses have created various Virtual Personal Assistants (VPAs) using these technologies, including Apple's Siri, Amazon's Alexa, and Google Assistant, among others. Even though many chatbots have been introduced through the years to diagnose or treat psychological disorders, we are yet to have a user-friendly chatbot available. A smart generative cognitive behavioral therapy with spoken dialogue systems support was then developed using a model Persona Perception (P2) bot with Generative Pre-trained Transformer-2 (GPT-2). The model was then implemented using modern technologies in VPAs like voice recognition, Natural Language Understanding (NLU), and text-to-speech. This system is a magnificent device to help with voice-based systems because it can have therapeutic discussions with the users utilizing text and vocal interactive user experience.
https://doi.org/10.22937/IJCSNS.2023.23.4.3 인용 PDF

Analysis of Korean Spontaneous Speech Characteristics for Spoken Dialogue Recognition (대화체 연속음성 인식을 위한 한국어 대화음성 특성 분석)

박영희;정민화
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.3
- /
- pp.330-338
- /
- 2002
Spontaneous speech is ungrammatical as well as serious phonological variations, which make recognition extremely difficult, compared with read speech. In this paper, for conversational speech recognition, we analyze the transcriptions of the real conversational speech, and then classify the characteristics of conversational speech in the speech recognition aspect. Reflecting these features, we obtain the baseline system for conversational speech recognition. The classification consists of long duration of silence, disfluencies and phonological variations; each of them is classified with similar features. To deal with these characteristics, first, we update silence model and append a filled pause model, a garbage model; second, we append multiple phonetic transcriptions to lexicon for most frequent phonological variations. In our experiments, our baseline morpheme error rate (WER) is 31.65%; we obtain MER reductions such as 2.08% for silence and garbage model, 0.73% for filled pause model, and 0.73% for phonological variations. Finally, we obtain 27.92% MER for conversational speech recognition, which will be used as a baseline for further study.
PDF KSCI

Search Result 18, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)