• Title/Summary/Keyword: 시각 음성인식

Search Result 129, Processing Time 0.04 seconds

Artificial intelligence wearable platform that supports the life cycle of the visually impaired (시각장애인의 라이프 사이클을 지원하는 인공지능 웨어러블 플랫폼)

  • Park, Siwoong;Kim, Jeung Eun;Kang, Hyun Seo;Park, Hyoung Jun
    • Journal of Platform Technology
    • /
    • v.8 no.4
    • /
    • pp.20-28
    • /
    • 2020
  • In this paper, a voice, object, and optical character recognition platform including voice recognition-based smart wearable devices, smart devices, and web AI servers was proposed as an appropriate technology to help the visually impaired to live independently by learning the life cycle of the visually impaired in advance. The wearable device for the visually impaired was designed and manufactured with a reverse neckband structure to increase the convenience of wearing and the efficiency of object recognition. And the high-sensitivity small microphone and speaker attached to the wearable device was configured to support the voice recognition interface function consisting of the app of the smart device linked to the wearable device. From experimental results, the voice, object, and optical character recognition service used open source and Google APIs in the web AI server, and it was confirmed that the accuracy of voice, object and optical character recognition of the service platform achieved an average of 90% or more.

  • PDF

Trends of Hardware Accelerator for the Embedded Speech Recognition (내장형 음성인식기를 위한 전용 하드웨어가속기 기술개발 동향)

  • Kim, J.Y.;Kim, T.J.;Lee, J.H.;Eum, N.W.
    • Electronics and Telecommunications Trends
    • /
    • v.29 no.4
    • /
    • pp.91-100
    • /
    • 2014
  • 사람의 말소리를 문자로 변환하여 기기의 제어명령으로 활용하는 것이 음성인식 기술이다. 음성인식에 대한 기술개발 요구는 수십 년 전부터 있어 왔고, 꾸준히 제품화되고 있는 분야라 하겠다. 제품으로의 상용화가 가능한 알고리즘 및 데이터 처리체계는 HMM(Hidden Markov Model)이라는 수학적 모델링으로 정형화되어 있으며, 대규모의 반복적 데이터 수집과 정교한 학습 데이터베이스의 구축이 음성인식기술의 핵심요소라는 것이 일반적인 시각이다. 이러한 이유로 인해, 대용량 음성인식 데이터베이스의 수집, 가공 등이 가능한 인프라를 갖춘 기관 및 업체들이 음성인식기술 시장을 점유할 수 있는 것이다. 그러나, 이러한 음성인식의 서비스 제공 체계는 사물인터넷 또는 웨어러블 디바이스 등으로 음성인식 사용자 인터페이스가 확대되고 통신 및 네트워크가 연결이 불가한 경우 그 한계를 보일 수 있다. 본고에서는 이러한 문제를 해결하기 위한 내장형 음성인식기의 하드웨어가속기 기술개발에 대한 내용과 국내외 현황을 살펴보기로 한다.

  • PDF

Phoneme Recognition based on Two-Layered Stereo Vision Neural Network (2층 구조의 입체 시각형 신경망 기반 음소인식)

  • Kim, Sung-Ill;Kim, Nag-Cheol
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.523-529
    • /
    • 2002
  • The present study describes neural networks for stereoscopic vision, which are applied to identifying human speech. In speech recognition based on stereoscopic vision neural networks (SVNN), the similarities are first obtained by comparing input vocal signals with standard models. They are then given to a dynamic process in which both competitive and cooperative processes are conducted among neighboring similarities. Through the dynamic processes, only one winner neuron is finally detected. In a comparative study, the two-layered SVNN was 7.7% higher in recognition accuracies than the hidden Markov model (HMM). From the evaluation results, it was noticed that SVNN outperformed the existing HMM recognizer.

  • PDF

Design of Smart Glasses Platform walking guide for the visually impaired (시각장애인을 위한 보행 안내 스마트 안경 플랫폼 설계)

  • Lee, Jaebeom;Jang, Jongwook;Jang, Sungjin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.320-322
    • /
    • 2021
  • As the world's elderly population increases, the proportion of visually impaired is also increasing, and there are still many restrictions on the use of outside activities, such as safety problems and lack of guidance information. To solve this problem, research on smart devices such as smart glasses with optical character recognition (OCR) function is being actively conducted. In this paper, we propose a system that recognizes obstacles ahead and informs information by voice, and also guides the way to the destination. Using the deep learning object recognition model Yolo, it let them to recognize the risk factors as obstacles such as stairs and Larva cones. and it also deliver the information with a voice. so you can expect that the visually impaired can do a lot of different activity even more now that system takes the visually impaired to the destination by using the directions API, voice recognition, TTS library.

  • PDF

Visual analysis of attention-based end-to-end speech recognition (어텐션 기반 엔드투엔드 음성인식 시각화 분석)

  • Lim, Seongmin;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.11 no.1
    • /
    • pp.41-49
    • /
    • 2019
  • An end-to-end speech recognition model consisting of a single integrated neural network model was recently proposed. The end-to-end model does not need several training steps, and its structure is easy to understand. However, it is difficult to understand how the model recognizes speech internally. In this paper, we visualized and analyzed the attention-based end-to-end model to elucidate its internal mechanisms. We compared the acoustic model of the BLSTM-HMM hybrid model with the encoder of the end-to-end model, and visualized them using t-SNE to examine the difference between neural network layers. As a result, we were able to delineate the difference between the acoustic model and the end-to-end model encoder. Additionally, we analyzed the decoder of the end-to-end model from a language model perspective. Finally, we found that improving end-to-end model decoder is necessary to yield higher performance.

The design of VoiceXML Interpreter based on the Web (웹 기반의 VoiceXML 문서 인터프리터의 설계)

  • 이선남;김경아;이기호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.355-357
    • /
    • 2001
  • VoiceXML은 음성인식 및 음성합성과 음성처리기술을 이용하여, 시각에 의존하는 기존의 웹을 벗어나 음성 및 시각을 모두 활용할 수 있는 새로운 정보 서비스 패러다임으로 제시되어지고 있다. VoiceXML을 이용한 음성정보서비스를 제공할 경우, 마크업 언어형태로 작성된 시나리오를 인터프리터를 통해 서비스하기 때문에 시나리오 변경 요구시 재프로그램해야 하는 기존 음성정보서비스 시스템의 문제점을 쉽게 개선할 뿐만 아니라, 음성정보서비스의 개발자가 음성인식.음성합성과 같은 기술적인 문제와는 독립적으로 시나리오를 작성할 수 있다는 이점이 있다. 본 논문에서는 W3C Voice Browser Working Group에서 제안하는 문법표현.시스템구조.다이얼로그 모델 등을 지원하는 XML 기반 대화형 마크업 언어인 VoiceXML 문서의 인터프리터를 설계하고자 한다.

  • PDF

Wearable system for sound visualization and disaster alarm for the Hearing-Impaired (청각장애인을 위한 사운드-시각화 및 재난 경보 웨어러블 시스템)

  • Lee, Se-Hoon;Kong, Jin-yong;Yeom, Dae-hoon;Kang, Eun-ho;Baek, Yong-Tae
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2017.07a
    • /
    • pp.257-258
    • /
    • 2017
  • 본 논문에서는 청각 장애인들은 시각에 의존하지 않고는 소리를 인지할 수 없다는 문제를 해결하기 위해 사운드를 시각화하는 웨어러블 시스템을 구현하였다. 시스템의 음성 인식 센서가 음성을 인식해 웨어러블 디스플레이에 전송된 메시지를 확인하고, 기상 재난 메시지를 웨어러블에서 실시간으로 확인하여 안전사고를 예방할 수 있게 하여 청각장애인의 어려움을 해결하였다.

  • PDF

Development of Automatic Creating Web-Site Tool for the Blind (시각장애인용 웹사이트 자동생성 툴 개발)

  • Baek, Hyeun-Ki;Ha, Tai-Hyun
    • Journal of Digital Contents Society
    • /
    • v.8 no.4
    • /
    • pp.467-474
    • /
    • 2007
  • This paper documents the design and implementation of an automatic creating web-site tool for the blind to build their own homepage by using both voice recognition and voice mixed technology with equal ease as the non-disabled. The blind can make voice mails, schedules, address lists and bookmarks by making use of the tool. It also facilitates communication between the non-disabled with the help of their information management system. This tool converts basic commands into voice recognition, also making an offer of text-to-speech which supports voice output. In the end, the tool will remove the blind's social isolation, allowing them to enjoy the information age like the non-disabled.

  • PDF

Design and Implementation of Korean Voice Web Browser (한국어 음성 웹브라우저 설계 및 구현)

  • Jang, Young-Gun;Jo, Kyoung-Hwan
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.5
    • /
    • pp.458-466
    • /
    • 2001
  • This paper is addressed to a design and implementation of Korean voice web browser using voice technologies for controling web browser and selecting contents in the web document, and converting them to voice after HTML analysis. Main feature of this web browser is universal design which considers both of normal person and visual disabled, allows multi-modal interface. As voice interface for visual disabled, it supports tree structure which allows to recognize web document structure easily by only voice guidance regardless of frame usage, can handle all elements described as tag in the web document, identify them as predefined different voice property according to element property. This method gets rid of additional guidance voice for element property without audio style sheet or additional programming effort.

  • PDF

A Study on the Implementation of Realtime Phonetic Recognition and LIP-synchronization (실시간 음성인식 및 립싱크 구현에 관한 연구)

  • Lee, H.H.;Choi, D.I.;Cho, W.Y.
    • Proceedings of the KIEE Conference
    • /
    • 2000.11d
    • /
    • pp.812-814
    • /
    • 2000
  • 본 논문에서는 실시간 음성 인식에 의한 립싱크(Lip-synchronization) 애니메이션 제공 방법에 관한 것으로서, 소정의 음성정보를 인식하여 이 음성 정보에 부합되도록 애니메이션의 입모양을 변화시켜 음성정보를 시각적으로 전달하도록 하는 립싱크 방법에 대한 연구이다. 인간의 실제 발음 모습에 보다 유사한 립싱크와 생동감 있는 캐릭터의 얼굴 형태를 실시간으로 표현할 수 있도록 마이크 등의 입력을 받고 신경망을 이용하여 실시간으로 음성을 인식하고 인식된 결과에 따라 2차원 애니메이션을 모핑 하도록 모델을 상고 있다.

  • PDF