• Title/Summary/Keyword: Voice and Image Recognition

Search Result 74, Processing Time 0.03 seconds

The Relationship Between Voice and the Image Triggered by the Voice: American Speakers and American Listeners (목소리를 듣고 감지하는 인상에 대한 연구: 미국인화자와 미국인청자)

  • Moon, Seung-Jae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.111-118
    • /
    • 2009
  • The present study aims at investigating the relationship between voices and the physical images triggered by the voices. It is the final part of a four-part series and the results reported in the present study are limited to those of American speakers and American listeners. Combined with the results from previous studies (Moon, 2000; Moon, 2002; Tak, 2005), the results suggest that (1) there is a very strong, much higher than chance-level relationship between voices and the pictures chosen for the voices by the perception experiment subjects; (2) the more physical characteristics that are given, the better the chance for correctly matching voices with pictures; and (3) culture (in the present, language environment) seems to play a role in conjuring up the mental images from voices.

  • PDF

Subword-based Lip Reading Using State-tied HMM (상태공유 HMM을 이용한 서브워드 단위 기반 립리딩)

  • Kim, Jin-Young;Shin, Do-Sung
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.123-132
    • /
    • 2001
  • In recent years research on HCI technology has been very active and speech recognition is being used as its typical method. Its recognition, however, is deteriorated with the increase of surrounding noise. To solve this problem, studies concerning the multimodal HCI are being briskly made. This paper describes automated lipreading for bimodal speech recognition on the basis of image- and speech information. It employs audio-visual DB containing 1,074 words from 70 voice and tri-viseme as a recognition unit, and state tied HMM as a recognition model. Performance of automated recognition of 22 to 1,000 words are evaluated to achieve word recognition of 60.5% in terms of 22word recognizer.

  • PDF

Multi-resolution DenseNet based acoustic models for reverberant speech recognition (잔향 환경 음성인식을 위한 다중 해상도 DenseNet 기반 음향 모델)

  • Park, Sunchan;Jeong, Yongwon;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.33-38
    • /
    • 2018
  • Although deep neural network-based acoustic models have greatly improved the performance of automatic speech recognition (ASR), reverberation still degrades the performance of distant speech recognition in indoor environments. In this paper, we adopt the DenseNet, which has shown great performance results in image classification tasks, to improve the performance of reverberant speech recognition. The DenseNet enables the deep convolutional neural network (CNN) to be effectively trained by concatenating feature maps in each convolutional layer. In addition, we extend the concept of multi-resolution CNN to multi-resolution DenseNet for robust speech recognition in reverberant environments. We evaluate the performance of reverberant speech recognition on the single-channel ASR task in reverberant voice enhancement and recognition benchmark (REVERB) challenge 2014. According to the experimental results, the DenseNet-based acoustic models show better performance than do the conventional CNN-based ones, and the multi-resolution DenseNet provides additional performance improvement.

Recognition of the Korean alphabet Using Neural Oscillator Phase model Synchronization

  • Kwon, Yong-Bum;Lee, Jun-Tak
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.315-317
    • /
    • 2003
  • Neural oscillator is applied in oscillatory systems (Analysis of image information, Voice recognition. Etc...). If we apply established EBPA(Error back Propagation Algorithm) to oscillatory system, we are difficult to presume complicated input's patterns. Therefore, it requires more data at training, and approximation of convergent speed is difficult. In this paper, I studied the neural oscillator as synchronized states with appropriate phase relation between neurons and recognized the Korean alphabet using Neural Oscillator Phase model Synchronization.

  • PDF

Sign Language Image Recognition System Using Artificial Neural Network

  • Kim, Hyung-Hoon;Cho, Jeong-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.2
    • /
    • pp.193-200
    • /
    • 2019
  • Hearing impaired people are living in a voice culture area, but due to the difficulty of communicating with normal people using sign language, many people experience discomfort in daily life and social life and various disadvantages unlike their desires. Therefore, in this paper, we study a sign language translation system for communication between a normal person and a hearing impaired person using sign language and implement a prototype system for this. Previous studies on sign language translation systems for communication between normal people and hearing impaired people using sign language are classified into two types using video image system and shape input device. However, existing sign language translation systems have some problems that they do not recognize various sign language expressions of sign language users and require special devices. In this paper, we use machine learning method of artificial neural network to recognize various sign language expressions of sign language users. By using generalized smart phone and various video equipment for sign language image recognition, we intend to improve the usability of sign language translation system.

Traffic Signal Recognition System Based on Color and Time for Visually Impaired

  • P. Kamakshi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.48-54
    • /
    • 2023
  • Nowadays, a blind man finds it very difficult to cross the roads. They should be very vigilant with every step they take. To resolve this problem, Convolutional Neural Networks(CNN) is a best method to analyse the data and automate the model without intervention of human being. In this work, a traffic signal recognition system is designed using CNN for the visually impaired. To provide a safe walking environment, a voice message is given according to light state and timer state at that instance. The developed model consists of two phases, in the first phase the CNN model is trained to classify different images captured from traffic signals. Common Objects in Context (COCO) labelled dataset is used, which includes images of different classes like traffic lights, bicycles, cars etc. The traffic light object will be detected using this labelled dataset with help of object detection model. The CNN model detects the color of the traffic light and timer displayed on the traffic image. In the second phase, from the detected color of the light and timer value a text message is generated and sent to the text-to-speech conversion model to make voice guidance for the blind person. The developed traffic light recognition model recognizes traffic light color and countdown timer displayed on the signal for safe signal crossing. The countdown timer displayed on the signal was not considered in existing models which is very useful. The proposed model has given accurate results in different scenarios when compared to other models.

Hand Biometric Information Recognition System of Mobile Phone Image for Mobile Security (모바일 보안을 위한 모바일 폰 영상의 손 생체 정보 인식 시스템)

  • Hong, Kyungho;Jung, Eunhwa
    • Journal of Digital Convergence
    • /
    • v.12 no.4
    • /
    • pp.319-326
    • /
    • 2014
  • According to the increasing mobile security users who have experienced authentication failure by forgetting passwords, user names, or a response to a knowledge-based question have preference for biological information such as hand geometry, fingerprints, voice in personal identification and authentication. Therefore biometric verification of personal identification and authentication for mobile security provides assurance to both the customer and the seller in the internet. Our study focuses on human hand biometric information recognition system for personal identification and personal Authentication, including its shape, palm features and the lengths and widths of the fingers taken from mobile phone photographs such as iPhone4 and galaxy s2. Our hand biometric information recognition system consists of six steps processing: image acquisition, preprocessing, removing noises, extracting standard hand feature extraction, individual feature pattern extraction, hand biometric information recognition for personal identification and authentication from input images. The validity of the proposed system from mobile phone image is demonstrated through 93.5% of the sucessful recognition rate for 250 experimental data of hand shape images and palm information images from 50 subjects.

An Intelligent Emotion Recognition Model Using Facial and Bodily Expressions

  • Jae Kyeong Kim;Won Kuk Park;Il Young Choi
    • Asia pacific journal of information systems
    • /
    • v.27 no.1
    • /
    • pp.38-53
    • /
    • 2017
  • As sensor technologies and image processing technologies make collecting information on users' behavior easy, many researchers have examined automatic emotion recognition based on facial expressions, body expressions, and tone of voice, among others. Specifically, many studies have used normal cameras in the multimodal case using facial and body expressions. Thus, previous studies used a limited number of information because normal cameras generally produce only two-dimensional images. In the present research, we propose an artificial neural network-based model using a high-definition webcam and Kinect to recognize users' emotions from facial and bodily expressions when watching a movie trailer. We validate the proposed model in a naturally occurring field environment rather than in an artificially controlled laboratory environment. The result of this research will be helpful in the wide use of emotion recognition models in advertisements, exhibitions, and interactive shows.

A study on the implementation of user identification system using bioinfomatics (생물학적 특징을 이용한 사용자 인증시스템 구현)

  • 문용선;정택준
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.2
    • /
    • pp.346-355
    • /
    • 2002
  • This study will offer multimodal recognition instead of an existing monomodal bioinfomatics by using face, lips, to improve the accuracy of recognition. Each bioinfomatics vector can be found by the following ways. For a face, the feature is calculated by principal component analysis with wavelet multiresolution. For a lip, a filter is used to find out an equation to calculate the edges of the lips first. Then by using a thinning image and least square method, an equation factor can be drawn. A voice recognition is found with MFCC by using mel frequency. We've sorted backpropagation neural network and experimented with the inputs used above. Based on the experimental results we discuss the advantage and efficiency.

Smart Drone Police System: Development of Autonomous Patrol and Real-time Activation System Based on Big Data and AI

  • Heo Jun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.4
    • /
    • pp.168-173
    • /
    • 2024
  • This paper proposes a solution for innovating crime prevention and real-time response through the development of the Smart Drone Police System. The system integrates big data, artificial intelligence (AI), the Internet of Things (IoT), and autonomous drone driving technologies [2][5]. It stores and analyzes crime statistics from the Statistics Office and the Public Prosecutor's Office, as well as real-time data collected by drones, including location, video, and audio, in a cloud-based database [6][7]. By predicting high-risk areas and peak times for crimes, drones autonomously patrol these identified zones using a self-driving algorithm [5][8]. Equipped with video and voice recognition technologies, the drones detect dangerous situations in real-time and recognize threats using deep learning-based analysis, sending immediate alerts to the police control center [3][9]. When necessary, drones form an ad-hoc network to coordinate efforts in tracking suspects and blocking escape routes, providing crucial support for police dispatch and arrest operations [2][11]. To ensure sustained operation, solar and wireless charging technologies were introduced, enabling prolonged patrols that reduce operational costs while maintaining continuous surveillance and crime prevention [8][10]. Research confirms that the Smart Drone Police System is significantly more cost-effective than CCTV or patrol car-based systems, showing a 40% improvement in real-time response speed and a 25% increase in crime prevention effectiveness over traditional CCTV setups [1][2][14]. This system addresses police staffing shortages and contributes to building safer urban environments by enhancing response times and crime prevention capabilities [4].