• 제목/요약/키워드: Speech Interface

검색결과 251건 처리시간 0.023초

A Phonetic Study of Korean Intervocalic Laryngeal Consonants

  • Oh, Mi-Ra;Johnson, Keith
    • 음성과학
    • /
    • 제1권
    • /
    • pp.83-101
    • /
    • 1997
  • This paper aims at exploring a putative positional neutralization produced at the phonetics/phonology interface. It was designed to determine whether Korean intervocalic laryngeal consonants are phonetically distant from geminates, plain consonants, or laryngeal consonants in consonant clusters. It was found that the contrast between laryngeal singletons and geminates was neutralized intervocalically, and that both of these were patterned with heteroganic consonant sequences rather than with plain singletons.

  • PDF

스마트폰 기반의 실시간 모음 인식 마우스 구현 (Implementation of Real-time Vowel Recognition Mouse based on Smartphone)

  • 장태웅;김현용;김병만;정해
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제21권8호
    • /
    • pp.531-536
    • /
    • 2015
  • 음성인식은 HCI(Human Computer Interface)분야에서 가장 활발히 연구되고 있는 분야로 음성을 이용하여 디지털 디바이스를 제어하는 것을 목적으로 하고 있으며 마우스는 GUI 컴퓨터 환경에서 가장 널리 사용하는 장치로서 높은 보급률을 자랑하는 컴퓨터 주변기기 중의 하나이다. 본 논문은 스마트폰 환경에서 실시간 모음 음성 인식을 이용한 마우스 제어 방법에 관하여 제안한다. 구현 방법은 스마트폰에서 실시간으로 일정크기의 음성 신호를 입력 받아 핵심 음성 신호를 추출하고 MFCC(Mel Frequency Cepstral Coefficient)를 이용하여 특징을 추출하여 학습되어 있는 코드 북을 이용하여 양자화를 진행하고 HMM(Hidden Markov Model)을 이용하여 해당 모음 단어를 인식한다. 그리고 각 모음에 해당하는 마우스 명령어로 변환하여 화면상의 가상의 마우스를 제어한다. 최종적으로, 우리는 구현된 스마트폰의 앱을 가지고 데스크톱 PC의 화면상에서 다양한 마우스의 동작을 보여준다.

Human-Computer Interaction Based Only on Auditory and Visual Information

  • Sha, Hui;Agah, Arvin
    • Transactions on Control, Automation and Systems Engineering
    • /
    • 제2권4호
    • /
    • pp.285-297
    • /
    • 2000
  • One of the research objectives in the area of multimedia human-computer interaction is the application of artificial intelligence and robotics technologies to the development of computer interfaces. This involves utilizing many forms of media, integrating speed input, natural language, graphics, hand pointing gestures, and other methods for interactive dialogues. Although current human-computer communication methods include computer keyboards, mice, and other traditional devices, the two basic ways by which people communicate with each other are voice and gesture. This paper reports on research focusing on the development of an intelligent multimedia interface system modeled based on the manner in which people communicate. This work explores the interaction between humans and computers based only on the processing of speech(Work uttered by the person) and processing of images(hand pointing gestures). The purpose of the interface is to control a pan/tilt camera to point it to a location specified by the user through utterance of words and pointing of the hand, The systems utilizes another stationary camera to capture images of the users hand and a microphone to capture the users words. Upon processing of the images and sounds, the systems responds by pointing the camera. Initially, the interface uses hand pointing to locate the general position which user is referring to and then the interface uses voice command provided by user to fine-the location, and change the zooming of the camera, if requested. The image of the location is captured by the pan/tilt camera and sent to a color TV monitor to be displayed. This type of system has applications in tele-conferencing and other rmote operations, where the system must respond to users command, in a manner similar to how the user would communicate with another person. The advantage of this approach is the elimination of the traditional input devices that the user must utilize in order to control a pan/tillt camera, replacing them with more "natural" means of interaction. A number of experiments were performed to evaluate the interface system with respect to its accuracy, efficiency, reliability, and limitation.

  • PDF

이동형 단말기를 위한 다채널 입력 기반 비정상성 잡음 제거기 (Multi-channel input-based non-stationary noise cenceller for mobile devices)

  • 정상배;이성독
    • 한국지능시스템학회논문지
    • /
    • 제17권7호
    • /
    • pp.945-951
    • /
    • 2007
  • 잡음의 제거는 음성을 인터페이스로 하는 기기들에 필수적이라고 할 수 있다. 실질적으로 통화 품질이나 음성 인식률은 음성 입력부의 주변에서 들어오는 원치 않는 가산성 잡음에 의해서 크게 열화된다. 본 논문에서는 기본적으로 두 개의 마이크로폰을 이용한 잡음제거 방법을 제안한다. 마이크를 여러 개 사용했을 때의 장점은 방향 정보를 이용할 수 있다는 것인데 이는 사람 목소리, 음악 소리 등의 비정상성 잡음을 제거하는 데에 유용하다. 제안된 잡음제거 알고리즘은 위너필터에 기반 한다고 볼 수 있다. 위너필터에 의한 잡음제거를 위해서는 검출하고자 하는 음성과 제거하고자 하는 잡음의 주파수 응답이 동시에 추정 가능해야 한다. 이를 위해서 주파수 영역에서 스펙트럼 분류를 시행하여 위너필터 기반의 잡음제거에 필요한 정보를 얻는다. 제안된 알고리즘을 이용한 성능은 잘 알려진 프로스트 (Frost) 알고리즘 및 적응 모드 컨트롤러를 갖는 generalized sidelobe canceller (GSC)와 비교하였다. 성능의 지표로는 객관적 음질 평가의 방법 중에서 널리 쓰이고 있는 perceptual evaluation of speech quality (PESQ) 및 음성 인식률이 사용되었다.

HUMAN MOTION AND SPEECH ANALYSIS TO CONSTRUCT DECISION MODEL FOR A ROBOT TO END COMMUNICATING WITH A HUMAN

  • Otsuka, Naoki;Murakami, Makoto
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 2009년도 IWAIT
    • /
    • pp.719-722
    • /
    • 2009
  • The purpose of this paper is to develop a robot that moves independently, communicates with a human, and explicitly extracts information from the human mind that is rarely expressed verbally. In a spoken dialog system for information collection, it is desirable to continue communicating with the user as long as possible, but not if the user does not wish to communicate. Therefore, the system should be able to terminate the communication before the user starts to object to using it. In this paper, to enable the construction of a decision model for a system to decide when to stop communicating with a human, we acquired speech and motion data from individuals who were asked many questions by another person. We then analyze their speech and body motion when they do not mind answering the questions, and also when they wish the questioning to cease. From the results, we can identify differences in speech power, length of pauses, speech rate, and body motion.

  • PDF

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

  • Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.22-34
    • /
    • 2024
  • Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

홈네트워크 제어를 위한 대화관리시스템 설계 (Design of Dialogue Management System for Home Network Control)

  • 김현정;은지현;장두성;최준기;구명완
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 추계학술대회 발표논문집
    • /
    • pp.109-112
    • /
    • 2006
  • This paper presents a dialogue interface using the dialogue management system as a method for controlling home appliances in Home Network Services. In order to realize this type of dialogue interface, we first investigated the user requirements for Home Network Services by analyzing the dialogues entered by users. Based on the analysis, we were able to extract 15 user intentions and 22 semantic components. In our study, example dialogues were collected from WOZ (Wizard-of-OZ) environment to implement a reasoning model for generating meaningful responses for example-based dialogue modeling technique. An overview of the Home Network Control System using proposed dialogue interface will be presented. Lastly, we will show that the Dialogue Management System trained with our collected dialogues behaves properly to achieve its task of controlling Home Network appliances by going through the steps of natural language understanding, response reasoning, response generation.

  • PDF

다양한 음성코퍼스의 통합 관리시스템 구축 (Construction of Integration Management System of Various Speech Corpora)

  • 유경택;정창원;김도관;이용주
    • 한국컴퓨터정보학회논문지
    • /
    • 제11권1호
    • /
    • pp.259-271
    • /
    • 2006
  • 본 논문에서는 다양한 음성코퍼스의 통합 관리 시스템을 설계하고 구현하기 위한 여러 고려 사항들을 검토 하고자 한다. 본 논문의 목적은 음성 연구에 필요한 다양한 음성 데이터베이스의 종류 그리고 서로 다른 데이터 형태로 구축된 음성코퍼스를 통합적으로 관리하는데 있다. 또한, 부가적으로 사용자가 요청하는 다양한 조건에 맞는 음성 데이터들을 효과적으로 검색 가능하고 새로 구성된 음성코퍼스를 손쉽게 추가 할 수 있도록 고려하였다. 이를 위해 기존의 구축된 음성코퍼스의 수정 없이 새로운 정보를 통합 관리하기 위한 전역 스키마(global schema)를 설계하고, 이를 기반으로 시 공간의 제약 없이 액세스 할 수 있는 웹 기반의 통합 관리 시스템을 구축하였다. 끝으로 서비스에 포함된 수행 결과인 웹기반 인터페이스를 기술하고, 통합 관리 시스템을 구현하기 위해 인덱스 뷰를 사용한 효과성을 보인다.

  • PDF

VoiceXML 기반 음성인식시스템을 이용한 서비스 개발 (The Interactive Voice Services based on VoiceXML)

  • 김학균;김은향;김재인;구명완
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.113-125
    • /
    • 2002
  • As there are needs to search the Web information via wire or wireless telephones, VoiceXML forum was established to develop and promote the Voice eXtensible Markup Language (VoiceXML). VoiceXML simplifies the creation of personalized interactive voice response services on the Web, and allows voice and phone access to information on Web sites, call center databases. Also, it can utilize the Web-based technologies, such as CGI(Common Gateway Interface) scripts. In this paper, we have developed the voice portal service platform based on VoiceXML called TeleGateway. It enables integration of voice services with data services using the Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) engines. Also, we have showed the various services on voice portal services.

  • PDF

VoiceXML을 이용한 IVR 서버 설계 및 구현 (Design and Implementation of IVR Server Using VoiceXML)

  • 이창호;장원조;강선미
    • 음성과학
    • /
    • 제9권3호
    • /
    • pp.47-59
    • /
    • 2002
  • A new brilliant service using human-voice and DTMF (Dual Tone Multi Frequency) technique is expected nowadays in order to obtain valuable information on the internet more easily. VoiceXML (Voice eXtensible Markup Language) is the right choice that makes the new service possible. In this paper, the design and implementation of IVR (Interactive Voice Response) server using VoiceXML is described, where it connects with internet and IVR server efficiently. IVR server using VoiceXML is composed of two groups: VoiceXML document handling and VoiceXML execution. Scenario part of IVR server corresponds to VoiceXML document, the execution is performed by VoiceXML execution.

  • PDF