• Title/Summary/Keyword: 시각 음성인식

Search Result 129, Processing Time 0.027 seconds

Comparison of Deep Learning Algorithm in Bus Boarding Assistance System for the Visually Impaired using Deep Learning and Traffic Information Open API (딥러닝과 교통정보 Open API를 이용한 시각장애인 버스 탑승 보조 시스템에서 딥러닝 알고리즘 성능 비교)

  • Kim, Tae hong;Yeo, Gil Su;Jeong, Se Jun;Yu, Yun Seop
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.388-390
    • /
    • 2021
  • This paper introduces a system that can help visually impaired people to board a bus using an embedded board with keypad, dot matrix, lidar sensor, NFC reader, a public data portal Open API system, and deep learning algorithm (YOLOv5). The user inputs the desired bus number through the NFC reader and keypad, and then obtains the location and expected arrival time information of the bus through the Open API real-time data through the voice output entered into the system. In addition, by displaying the bus number as the dot matrix, it can help the bus driver to wait for the visually impaired, and at the same time, a deep learning algorithm (YOLOv5) recognizes the bus number that stops in real time and detects the distance to the bus with a distance detection sensor such as lidar sensor.

  • PDF

Audio-Visual Scene Aware Dialogue System Utilizing Action From Vision and Language Features (이미지-텍스트 자질을 이용한 행동 포착 비디오 기반 대화시스템)

  • Jungwoo Lim;Yoonna Jang;Junyoung Son;Seungyoon Lee;Kinam Park;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.253-257
    • /
    • 2023
  • 최근 다양한 대화 시스템이 스마트폰 어시스턴트, 자동 차 내비게이션, 음성 제어 스피커, 인간 중심 로봇 등의 실세계 인간-기계 인터페이스에 적용되고 있다. 하지만 대부분의 대화 시스템은 텍스트 기반으로 작동해 다중 모달리티 입력을 처리할 수 없다. 이 문제를 해결하기 위해서는 비디오와 같은 다중 모달리티 장면 인식을 통합한 대화 시스템이 필요하다. 기존의 비디오 기반 대화 시스템은 주로 시각, 이미지, 오디오 등의 다양한 자질을 합성하거나 사전 학습을 통해 이미지와 텍스트를 잘 정렬하는 데에만 집중하여 중요한 행동 단서와 소리 단서를 놓치고 있다는 한계가 존재한다. 본 논문은 이미지-텍스트 정렬의 사전학습 임베딩과 행동 단서, 소리 단서를 활용해 비디오 기반 대화 시스템을 개선한다. 제안한 모델은 텍스트와 이미지, 그리고 오디오 임베딩을 인코딩하고, 이를 바탕으로 관련 프레임과 행동 단서를 추출하여 발화를 생성하는 과정을 거친다. AVSD 데이터셋에서의 실험 결과, 제안한 모델이 기존의 모델보다 높은 성능을 보였으며, 대표적인 이미지-텍스트 자질들을 비디오 기반 대화시스템에서 비교 분석하였다.

  • PDF

Comparison of Text Beginning Frame Detection Methods in News Video Sequences (뉴스 비디오 시퀀스에서 텍스트 시작 프레임 검출 방법의 비교)

  • Lee, Sanghee;Ahn, Jungil;Jo, Kanghyun
    • Journal of Broadcast Engineering
    • /
    • v.21 no.3
    • /
    • pp.307-318
    • /
    • 2016
  • 비디오 프레임 내의 오버레이 텍스트는 음성과 시각적 내용에 부가적인 정보를 제공한다. 특히, 뉴스 비디오에서 이 텍스트는 비디오 영상 내용을 압축적이고 직접적인 설명을 한다. 그러므로 뉴스 비디오 색인 시스템을 만드는데 있어서 가장 신뢰할 수 있는 실마리이다. 텔레비전 뉴스 프로그램의 색인 시스템을 만들기 위해서는 텍스트를 검출하고 인식하는 것이 중요하다. 이 논문은 뉴스 비디오에서 오버레이 텍스트를 검출하고 인식하는데 도움이 되는 오버레이 텍스트 시작 프레임 식별을 제안한다. 비디오 시퀀스의 모든 프레임이 오버레이 텍스트를 포함하는 것이 아니기 때문에, 모든 프레임에서 오버레이 텍스트의 추출은 불필요하고 시간 낭비다. 그러므로 오버레이 텍스트를 포함하고 있는 프레임에만 초점을 맞춤으로써 오버레이 텍스트 검출의 정확도를 개선할 수 있다. 텍스트 시작 프레임 식별 방법에 대한 비교 실험을 뉴스 비디오에 대해서 실시하고, 적절한 처리 방법을 제안한다.

A general-purpose model capable of image captioning in Korean and Englishand a method to generate text suitable for the purpose (한국어 및 영어 이미지 캡션이 가능한 범용적 모델 및 목적에 맞는 텍스트를 생성해주는 기법)

  • Cho, Su Hyun;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.8
    • /
    • pp.1111-1120
    • /
    • 2022
  • Image Capturing is a matter of viewing images and describing images in language. The problem is an important problem that can be solved by keeping, understanding, and bringing together two areas of image processing and natural language processing. In addition, by automatically recognizing and describing images in text, images can be converted into text and then into speech for visually impaired people to help them understand their surroundings, and important issues such as image search, art therapy, sports commentary, and real-time traffic information commentary. So far, the image captioning research approach focuses solely on recognizing and texturing images. However, various environments in reality must be considered for practical use, as well as being able to provide image descriptions for the intended purpose. In this work, we limit the universally available Korean and English image captioning models and text generation techniques for the purpose of image captioning.

Data Modeling for Cyber Security of IoT in Artificial Intelligence Technology (인공지능기술의 IoT 통합보안관제를 위한 데이터모델링)

  • Oh, Young-Taek;Jo, In-June
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.57-65
    • /
    • 2021
  • A hyper-connected intelligence information society is emerging that creates new value by converging IoT, AI, and Bigdata, which are new technologies of the fourth industrial revolution, in all industrial fields. Everything is connected to the network and data is exploding, and artificial intelligence can learn on its own and even intellectual judgment functions are possible. In particular, the Internet of Things provides a new communication environment that can be connected to anything, anytime, anywhere, enabling super-connections where everything is connected. Artificial intelligence technology is implemented so that computers can execute human perceptions, learning, reasoning, and natural language processing. Artificial intelligence is developing advanced technologies such as machine learning, deep learning, natural language processing, voice recognition, and visual recognition, and includes software, machine learning, and cloud technologies specialized in various applications such as safety, medical, defense, finance, and welfare. Through this, it is utilized in various fields throughout the industry to provide human convenience and new values. However, on the contrary, it is time to respond as intelligent and sophisticated cyber threats are increasing and accompanied by potential adverse functions such as securing the technical safety of new technologies. In this paper, we propose a new data modeling method to enable IoT integrated security control by utilizing artificial intelligence technology as a way to solve these adverse functions.

The Conference Management System Architecture for Ontological Knowledge (지식의 온톨로지화를 위한 관리 시스템 아키텍처)

  • Hong, Hyun-Woo;Koh, Gwang-san;Kim, Chang-Soo;Jeong, Jae-Gil;Jung, Hoe-kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.1115-1118
    • /
    • 2005
  • With the development of the internet technology, The on-line conference system have been producted. Now, the on-line conference system is developing for using pattern recognition system and voice recognition system. Comparing with the off-line conference, the on-line conference is excellent in free from distance limitation. But, the on-line meetings have unavoidable weak points. it is the same as the off-line conference that when the conference goes on, the content orthopedic and the content consistency is weak. So the conference members can not seize the conference flow. Therefore, in this paper, we introduce the ontology concept. Design a new architecture using ontology mining technique for making the conference content and conference knowledge ontological. Then in order to inspection the new architecture, We design and implementation the new conference management system based knowledge.

  • PDF

Analysis of the Experiences and Perceptions of Teachers Participating in the Development of Content-Based Online Science Class Videos, and the Characteristics of the Developed Class Content (콘텐츠 활용형 온라인 과학 수업 동영상 개발에 참여한 교사들의 경험과 인식, 개발된 수업 콘텐츠의 특징 분석)

  • Shin, Jung Yun;Park, Sang Hee
    • Journal of The Korean Association For Science Education
    • /
    • v.40 no.6
    • /
    • pp.595-609
    • /
    • 2020
  • The purpose of this study is to analyze the experiences of teachers who participated in the development of online science class videos in the context of covid-19, their perception of online science class, and the characteristics of the online science class content developed by teachers. A survey and interviews were conducted with ten elementary school teachers who made online science class videos themselves. Also the characteristics of the online science class were investigated by analyzing the online science class video produced by the participants. As a result, participants in the study recognized the lack of production time, difficulty in filming and editing, concerns over misconceptions, the problem of solving copyrights for existing materials, and the burden of external disclosure. Although it was a teacher who had experience producing online science class video contents, no research participants actively answered the merits of online science class. On the other hand, the study participants cited that the shortcomings of online science classes were that students had fewer opportunities for inquiry and lack of communication or interaction. In particular, these shortcomings were thought to have a great influence on the quality of online science classes, especially in making inquiry classes difficult. Some teachers took a negative view that online science classes could not completely replace face-to-face classes. However, if multiple teachers are presented with supplementary teaching activities that complement the content-based online teaching method, the method of combining online science classes and face-to-face classes is not. Through the analysis of the contents of the online science class, the introduction and arrangement steps of the online science class were similar to the process of the face-to-face science class, but the inquiry step and the conceptual explanation step showed a big difference from the face-to-face science class.

Classification standard of Communication Tool (플랫폼 분류 기준 고찰 : 감각의 입·출력)

  • Kim, Hyo-Yeun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.189-190
    • /
    • 2018
  • Digital content requires the concept and structure that give us insights into the languages between computers and humans and how humans experience manifested among the flow of characters, images, and voice. Communicology, $Vil{\acute{e}}m$ Flusser's original study, allows us to reconsider and to reconstruct the boundary of human awareness. This paper intends to begin understanding digital content consisting of numerical codes by reviewing communicology. communicology helps to break up pre-existing categories and thinking about new standards. ith the help of information technology. Planning content can be actualized by classifying and reconstructing content that are input/output of senses. The standard of classification is 'boundary' and 'direction,' communication elements that cannot be broken down any further. There is no need to communicate if there is no boundary. The operation of communication is comprised of 'direction.' Considering humankind as the standard, the boundary that takes in stimulation from outside can be seen as senses. Direction can be expressed as input/output. Output assumes that technical pictures receive information. The coordinates for various pre-existing platforms and content and uncovered platforms can be set with a consistent standard. This allows us to escape from the standard of flat content that was activated by sight and rationality at the ideology of characters, to seek a three-dimensional standard that can be vitalized by various senses and irrationality, and to reconstruct the input/output of senses to show the possibility of planning a new platform.

  • PDF

Attitude Confidence and User Resistance for Purchasing Wearable Devices on Virtual Reality: Based on Virtual Reality Headgears (가상현실 웨어러블 기기의 구매 촉진을 위한 태도 자신감과 사용자 저항 태도: 가상현실 헤드기어를 중심으로)

  • Sohn, Bong-Jin;Park, Da-Sul;Choi, Jaewon
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.165-183
    • /
    • 2016
  • Over the past decade, there has been a rapid diffusion of technological devices and a rising number of various devices, resulting in an escalation of virtual reality technology. Technological market has rapidly been changed from smartphone to wearable devices based on virtual reality. Virtual reality can make users feel real situation through sensing interaction, voice, motion capture and so on. Facebook.com, Google, Samsung, LG, Sony and so on have investigated developing platform of virtual reality. the pricing of virtual reality devices also had decreased into 30% from their launched period. Thus market infrastructure in virtual reality have rapidly been developed to crease marketplace. However, most consumers recognize that virtual reality is not ease to purchase or use. That could not lead consumers to positive attitude for devices and purchase the related devices in the early market. Through previous studies related to virtual reality, there are few studies focusing on why the devices for virtual reality stayed in early stage in adoption & diffusion context in the market. Almost previous studies considered the reasons of hard adoption for innovative products in the viewpoints of Typology of Innovation Resistance, MIR(Management of Innovation Resistant), UTAUT & UTAUT2. However, product-based antecedents also important to increase user intention to purchase and use products in the technological market. In this study, we focus on user acceptance and resistance for increasing purchase and usage promotions of wearable devices related to virtual reality based on headgear products like Galaxy Gear. Especially, we added a variables like attitude confidence as a dimension for user resistance. The research questions of this study are follows. First, how attitude confidence and innovativeness resistance affect user intention to use? Second, What factors related to content and brand contexts can affect user intention to use? This research collected data from the participants who have experiences using virtual rality headgears aged between 20s to 50s located in South Korea. In order to collect data, this study used a pilot test and through making face-to-face interviews on three specialists, face validity and content validity were evaluated for the questionnaire validity. Cleansing the data, we dropped some outliers and data of irrelevant papers. Totally, 156 responses were used for testing the suggested hypotheses. Through collecting data, demographics and the relationships among variables were analyzed through conducting structural equation modeling by PLS. The data showed that the sex of respondents who have experience using social commerce sites (male=86(55.1%), female=70(44.9%). The ages of respondents are mostly from 20s (74.4%) to 30s (16.7%). 126 respondents (80.8%) have used virtual reality devices. The results of our model estimation are as follows. With the exception of Hypothesis 1 and 7, which deals with the two relationships between brand awareness to attitude confidence, and quality of content to perceived enjoyment, all of our hypotheses were supported. In compliance with our hypotheses, perceived ease of use (H2) and use innovativeness (H3) were supported with its positively influence for the attitude confidence. This finding indicates that the more ease of use and innovativeness for devices increased, the more users' attitude confidence increased. Perceived price (H4), enjoyment (H5), Quantity of contents (H6) significantly increase user resistance. However, perceived price positively affect user innovativeness resistance meanwhile perceived enjoyment and quantity of contents negatively affect user innovativeness resistance. In addition, aesthetic exterior (H6) was also positively associated with perceived price (p<0.01). Also projection quality (H8) can increase perceived enjoyment (p<0.05). Finally, attitude confidence (H10) increased user intention to use virtual reality devices. however user resistance (H11) negatively affect user intention to use virtual reality devices. The findings of this study show that attitude confidence and user innovativeness resistance differently influence customer intention for using virtual reality devices. There are two distinct characteristic of attitude confidence: perceived ease of use and user innovativeness. This study identified the antecedents of different roles of perceived price (aesthetic exterior) and perceived enjoyment (quality of contents & projection quality). The findings indicated that brand awareness and quality of contents for virtual reality is not formed within virtual reality market yet. Therefore, firms should developed brand awareness for their product in the virtual market to increase market share.