• Title/Summary/Keyword: Voice and Image Recognition

Search Result 74, Processing Time 0.024 seconds

Multi-view learning review: understanding methods and their application (멀티 뷰 기법 리뷰: 이해와 응용)

  • Bae, Kang Il;Lee, Yung Seop;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.41-68
    • /
    • 2019
  • Multi-view learning considers data from various viewpoints as well as attempts to integrate various information from data. Multi-view learning has been studied recently and has showed superior performance to a model learned from only a single view. With the introduction of deep learning techniques to a multi-view learning approach, it has showed good results in various fields such as image, text, voice, and video. In this study, we introduce how multi-view learning methods solve various problems faced in human behavior recognition, medical areas, information retrieval and facial expression recognition. In addition, we review data integration principles of multi-view learning methods by classifying traditional multi-view learning methods into data integration, classifiers integration, and representation integration. Finally, we examine how CNN, RNN, RBM, Autoencoder, and GAN, which are commonly used among various deep learning methods, are applied to multi-view learning algorithms. We categorize CNN and RNN-based learning methods as supervised learning, and RBM, Autoencoder, and GAN-based learning methods as unsupervised learning.

An Advanced User-friendly Wireless Smart System for Vehicle Safety Monitoring and Accident Prevention (차량 안전 모니터링 및 사고 예방을 위한 친사용자 환경의 첨단 무선 스마트 시스템)

  • Oh, Se-Bin;Chung, Yeon-Ho;Kim, Jong-Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.9
    • /
    • pp.1898-1905
    • /
    • 2012
  • This paper presents an On-board Smart Device (OSD) for moving vehicle, based on a smooth integration of Android-based devices and a Micro-control Unit (MCU). The MCU is used for the acquisition and transmission of various vehicle-borne data. The OSD has threefold functions: Record, Report and Alarm. Based on these RRA functions, the OSD is basically a safety and convenience oriented smart device, where it facilitates alert services such as accident report and rescue as well as alarm for the status of vehicle. In addition, voice activated interface is developed for the convenience of users. Vehicle data can also be uploaded to a remote server for further access and data manipulation. Therefore, unlike conventional blackboxes, the developed OSD lends itself to a user-friendly smart device for vehicle safety: It basically stores monitoring images in driving plus vehicle data collection. Also, it reports on accident and enables subsequent rescue operation. The developed OSD can thus be considered an essential safety smart device equipped with comprehensive wireless data service, image transfer and voice activated interface.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

Gesture Control Gaming for Motoric Post-Stroke Rehabilitation

  • Andi Bese Firdausiah Mansur
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.37-43
    • /
    • 2023
  • The hospital situation, timing, and patient restrictions have become obstacles to an optimum therapy session. The crowdedness of the hospital might lead to a tight schedule and a shorter period of therapy. This condition might strike a post-stroke patient in a dilemma where they need regular treatment to recover their nervous system. In this work, we propose an in-house and uncomplex serious game system that can be used for physical therapy. The Kinect camera is used to capture the depth image stream of a human skeleton. Afterwards, the user might use their hand gesture to control the game. Voice recognition is deployed to ease them with play. Users must complete the given challenge to obtain a more significant outcome from this therapy system. Subjects will use their upper limb and hands to capture the 3D objects with different speeds and positions. The more substantial challenge, speed, and location will be increased and random. Each delegated entity will raise the scores. Afterwards, the scores will be further evaluated to correlate with therapy progress. Users are delighted with the system and eager to use it as their daily exercise. The experimental studies show a comparison between score and difficulty that represent characteristics of user and game. Users tend to quickly adapt to easy and medium levels, while high level requires better focus and proper synchronization between hand and eye to capture the 3D objects. The statistical analysis with a confidence rate(α:0.05) of the usability test shows that the proposed gaming is accessible, even without specialized training. It is not only for therapy but also for fitness because it can be used for body exercise. The result of the experiment is very satisfying. Most users enjoy and familiarize themselves quickly. The evaluation study demonstrates user satisfaction and perception during testing. Future work of the proposed serious game might involve haptic devices to stimulate their physical sensation.

Digital Mirror System with Machine Learning and Microservices (머신 러닝과 Microservice 기반 디지털 미러 시스템)

  • Song, Myeong Ho;Kim, Soo Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.9
    • /
    • pp.267-280
    • /
    • 2020
  • Mirror is a physical reflective surface, typically of glass coated with a metal amalgam, and it is to reflect an image clearly. They are available everywhere anytime and become an essential tool for us to observe our faces and appearances. With the advent of modern software technology, we are motivated to enhance the reflection capability of mirrors with the convenience and intelligence of realtime processing, microservices, and machine learning. In this paper, we present a development of Digital Mirror System that provides the realtime reflection functionality as mirror while providing additional convenience and intelligence including personal information retrieval, public information retrieval, appearance age detection, and emotion detection. Moreover, it provides a multi-model user interface of touch-based, voice-based, and gesture-based. We present our design and discuss how it can be implemented with current technology to deliver the realtime mirror reflection while providing useful information and machine learning intelligence.

A Study on Protection of Iris and fingerprint Data Based on Digital Watermarking in Mid-Frequency Band (중간 주파수 영역에서의 디지털 워터마킹 기법에 의한 홍채 및 지문 데이터 보호 연구)

  • Jeong, Dae-Sik;Park, Kang-Ryoung
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.9
    • /
    • pp.1227-1238
    • /
    • 2005
  • Recently, with the advance of network and internet technologies, it is appeared the Problem that the digital contents such as image, voice and video are illegally pirated and distributed. To protect the copyright of the digital contents, the digital watermarking technology of inserting the provider's information into the contents has been widely used. In this paper, we propose the method of applying the digital watermarking into biometric information such as fingerprint and iris in order to prevent the problem caused by steal and misuse. For that, we propose the method of inserting watermark in frequency domain, compare the recognition performance before and aster watermark inserting. Also, we experiment the robustness of proposed method against blurring attack, which is conventionally taken on biometrics data. Experimental results show that our proposed method can be used for protecting iris and fingerprint data, efficiently.

  • PDF

Design and Implementation of Vehicle Control Network Using WiFi Network System (WiFi 네트워크 시스템을 활용한 차량 관제용 네트워크의 설계 및 구현)

  • Yu, Hwan-Shin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.3
    • /
    • pp.632-637
    • /
    • 2019
  • Recent researches on autonomous driving of vehicles are becoming very active, and it is a trend to assist safe driving and improve driver's convenience. Autonomous vehicles are required to combine artificial intelligence, image recognition capability, and Internet communication between objects. Because mobile telecommunication networks have limitations in their processing, they can be easily implemented and scale using an easily expandable Wi-Fi network. We propose a wireless design method to construct such a vehicle control network. We propose the arrangement of AP and the software configuration method to minimize loss of data transmission / reception of mobile terminal. Through the design of the proposed network system, the communication performance of the moving vehicle can be dramatically increased. We also verify the packet structure of GPS, video, voice, and data communication that can be used for the vehicle through experiments on the movement of various terminal devices. This wireless design technology can be extended to various general purpose wireless networks such as 2.4GHz, 5GHz and 10GHz Wi-Fi. It is also possible to link wireless intelligent road network with autonomous driving.

Augmented Reality Logo System Based on Android platform (안드로이드 기반 로고를 이용한 증강현실 시스템)

  • Jung, Eun-Young;Jeong, Un-Kuk;Lim, Sun-Jin;Moon, Chang-Bae;Kim, Byeong-Man
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.181-192
    • /
    • 2011
  • A mobile phone is becoming no longer a voice communication tool due to smartphones and mobile internet. Also, it now becomes a total entertainment device on which we can play game and get services by variety applications through the Web. As smartphones are getting more popular, their usages are also increased, which makes the interest of advertising industry in mobile advertisement increased but it is bound to be limited by the size of the screen. In this paper, we suggest an augmented reality logo system based on Android platform to maximize the effect of logo advertisement. After developing software and mounting it on a real smartphone, its performances are analyzed in various ways. The results show the possibility of its application to real world but it's not enough to provide real time service because of the low performance of hardware.

The Audience Behavior-based Emotion Prediction Model for Personalized Service (고객 맞춤형 서비스를 위한 관객 행동 기반 감정예측모형)

  • Ryoo, Eun Chung;Ahn, Hyunchul;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.73-85
    • /
    • 2013
  • Nowadays, in today's information society, the importance of the knowledge service using the information to creative value is getting higher day by day. In addition, depending on the development of IT technology, it is ease to collect and use information. Also, many companies actively use customer information to marketing in a variety of industries. Into the 21st century, companies have been actively using the culture arts to manage corporate image and marketing closely linked to their commercial interests. But, it is difficult that companies attract or maintain consumer's interest through their technology. For that reason, it is trend to perform cultural activities for tool of differentiation over many firms. Many firms used the customer's experience to new marketing strategy in order to effectively respond to competitive market. Accordingly, it is emerging rapidly that the necessity of personalized service to provide a new experience for people based on the personal profile information that contains the characteristics of the individual. Like this, personalized service using customer's individual profile information such as language, symbols, behavior, and emotions is very important today. Through this, we will be able to judge interaction between people and content and to maximize customer's experience and satisfaction. There are various relative works provide customer-centered service. Specially, emotion recognition research is emerging recently. Existing researches experienced emotion recognition using mostly bio-signal. Most of researches are voice and face studies that have great emotional changes. However, there are several difficulties to predict people's emotion caused by limitation of equipment and service environments. So, in this paper, we develop emotion prediction model based on vision-based interface to overcome existing limitations. Emotion recognition research based on people's gesture and posture has been processed by several researchers. This paper developed a model that recognizes people's emotional states through body gesture and posture using difference image method. And we found optimization validation model for four kinds of emotions' prediction. A proposed model purposed to automatically determine and predict 4 human emotions (Sadness, Surprise, Joy, and Disgust). To build up the model, event booth was installed in the KOCCA's lobby and we provided some proper stimulative movie to collect their body gesture and posture as the change of emotions. And then, we extracted body movements using difference image method. And we revised people data to build proposed model through neural network. The proposed model for emotion prediction used 3 type time-frame sets (20 frames, 30 frames, and 40 frames). And then, we adopted the model which has best performance compared with other models.' Before build three kinds of models, the entire 97 data set were divided into three data sets of learning, test, and validation set. The proposed model for emotion prediction was constructed using artificial neural network. In this paper, we used the back-propagation algorithm as a learning method, and set learning rate to 10%, momentum rate to 10%. The sigmoid function was used as the transform function. And we designed a three-layer perceptron neural network with one hidden layer and four output nodes. Based on the test data set, the learning for this research model was stopped when it reaches 50000 after reaching the minimum error in order to explore the point of learning. We finally processed each model's accuracy and found best model to predict each emotions. The result showed prediction accuracy 100% from sadness, and 96% from joy prediction in 20 frames set model. And 88% from surprise, and 98% from disgust in 30 frames set model. The findings of our research are expected to be useful to provide effective algorithm for personalized service in various industries such as advertisement, exhibition, performance, etc.

Speech Recognition Using Linear Discriminant Analysis and Common Vector Extraction (선형 판별분석과 공통벡터 추출방법을 이용한 음성인식)

  • 남명우;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.35-41
    • /
    • 2001
  • This paper describes Linear Discriminant Analysis and common vector extraction for speech recognition. Voice signal contains psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word spelled out by different speakers can be very different heard. This property of speech signal make it very difficult to extract common properties in the same speech class (word or phoneme). Linear algebra method like BT (Karhunen-Loeve Transformation) is generally used for common properties extraction In the speech signals, but common vector extraction which is suggested by M. Bilginer et at. is used in this paper. The method of M. Bilginer et al. extracts the optimized common vector from the speech signals used for training. And it has 100% recognition accuracy in the trained data which is used for common vector extraction. In spite of these characteristics, the method has some drawback-we cannot use numbers of speech signal for training and the discriminant information among common vectors is not defined. This paper suggests advanced method which can reduce error rate by maximizing the discriminant information among common vectors. And novel method to normalize the size of common vector also added. The result shows improved performance of algorithm and better recognition accuracy of 2% than conventional method.

  • PDF