• Title/Summary/Keyword: Visual Recognition

Search Result 814, Processing Time 0.026 seconds

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

  • Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.22-34
    • /
    • 2024
  • Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

Artificial Vision System using Human Visual Information Processing (시각정보처리과정을 이용한 인공시각시스템)

  • Seo, Chang-Jin
    • Journal of Digital Convergence
    • /
    • v.12 no.11
    • /
    • pp.349-355
    • /
    • 2014
  • In this paper, we propose the artificial vision system using human visual information processing and wavelet. Artificial vision system may be used for the visually impaired person and the machine recognition system. In this paper, we have constructed the information compression process to ganglion cells from the human retina. And we have reconstructed the primary visual information using recovery process to primary visual cortex from ganglion. Primary visual information is constructed by wavelet transformation using a high frequency and low frequency response. In the experiment, we used the faces database of AT&T. And the proposed method was able to improve the accuracy of face recognition considerably. And it was verified through experiments.

A Study on Image Recognition based on the Characteristics of Retinal Cells (망막 세포 특성에 의한 영상인식에 관한 연구)

  • Cho, Jae-Hyun;Kim, Do-Hyeon;Kim, Kwang-Baek
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.11
    • /
    • pp.2143-2149
    • /
    • 2007
  • Visual Cortex Stimulator is among artificial retina prosthesis for blind man, is the method that stimulate the brain cell directly without processing the information from retina to visual cortex. In this paper, we propose image construction and recognition model that is similar to human visual processing by recognizing the feature data with orientation information, that is, the characteristics of visual cortex. Back propagation algorithm based on Delta-bar delta is used to recognize after extracting image feature by Kirsh edge detector. Various numerical patterns are used to analyze the performance of proposed method. In experiment, the proposed recognition model to extract image characteristics with the orientation of information from retinal cells to visual cortex makes a little difference in a recognition rate but shows that it is not sensitive in a variety of learning rates similar to human vision system.

Effects of Wearing between Respirators and Glasses Simultaneously on Physical and Visual Discomforts and Quantitative Fit Factors (안면부 여과식 방진마스크와 안경 동시 착용 시 불편감과 밀착계수 비교)

  • Eoh, Won Souk;Choi, Youngbo;Shin, Chang Sub
    • Journal of the Korean Society of Safety
    • /
    • v.33 no.2
    • /
    • pp.52-60
    • /
    • 2018
  • This study compares the differences of the fit factor by the order of wearing preference between Particulate filtering facepiece respirators(PFFR) and glasses when participants wore simultaneously and a survey of physical and visual complaint. Recognition level about fit of respirators was investigated and the educational (before- and after-) effect of the fit factor. When participants wore PFFR and glasses, physical complaints were nose pressure, slipping, nose and ear pressure, ear pressure and rim loosen, the most highly physical complaints were nose pressure. Visual complaints were demister, blurry vision, dizziness, visual field, and lens dirty, the most highly visual complaints were demister. But, there was significant difference in physical complaint such as nose pressure(10.3%), slipping (23.0%), nose and ear pressure(14.3%), and rim loosen(16.2%), visual complaint such as visual field(13.8%) and lens dirty(32.4%). For the recognition of fit of respirators, respirators fitness, leak site, an initial point and an object, faulty factor, recognition level was higher. Fit factor was increased after education of proper wearing of respirator. Change of the fit factor was smaller compared to the normal breathing and after 6 actions in case of after education. Questionnaire consisted of general characteristics and physical/visual complaint, recognition of fit. Complaints were measured after the QNFT with multiple choices. Quantitative fit factor was measured by device and compared the result of (before- and after-) educational effect. Also, we selected to 6 actions (Normal breathing, Deep breathing, Bending over, Turning head side to side, Moving head up and down, Normal breathing) among 8 actions OSHA QNFT (Quantitative Fit testing) protocol to measure the fit factors. The fit factor was higher after the training (p=0.000). Descriptive statistics, paired t-test, and Wilcoxon analysis were performed to describe the result of questionnaire and fit test. (P=0.05) Therefore, it is necessary to investigate the quantitative research such as training program and glasses fitting factor about the wearing of PFFR and glasses simultaneously.

A Study on Image Recognition by Orientation Information (방향 정보 처리에 의한 영상 인식에 관한 연구)

  • Cho, Jae-hyun;Kim, Jin-hwan;Lee, Jong-hee
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.308-309
    • /
    • 2009
  • Human vision information processing has many characteristics when image information is transmitted from retina to visual cortex. Among them, we analyze the sensibility of the orientation on an image and compare the recognition rates by the response_weight of the vertical, horizontal and diagonal orientation. In statistics analysis, we show that a particular simple cell responds best to a bar with a vertical orientation. After then, we will apply the characteristics to Human visual system.

  • PDF

Robust Video-Based Barcode Recognition via Online Sequential Filtering

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.1
    • /
    • pp.8-16
    • /
    • 2014
  • We consider the visual barcode recognition problem in a noisy video data setup. Unlike most existing single-frame recognizers that require considerable user effort to acquire clean, motionless and blur-free barcode signals, we eliminate such extra human efforts by proposing a robust video-based barcode recognition algorithm. We deal with a sequence of noisy blurred barcode image frames by posing it as an online filtering problem. In the proposed dynamic recognition model, at each frame we infer the blur level of the frame as well as the digit class label. In contrast to a frame-by-frame based approach with heuristic majority voting scheme, the class labels and frame-wise noise levels are propagated along the frame sequences in our model, and hence we exploit all cues from noisy frames that are potentially useful for predicting the barcode label in a probabilistically reasonable sense. We also suggest a visual barcode tracking approach that efficiently localizes barcode areas in video frames. The effectiveness of the proposed approaches is demonstrated empirically on both synthetic and real data setup.

Sensor Fusion System for Improving the Recognition Performance of 3D Object (3차원 물체의 인식 성능 향상을 위한 감각 융합 시스템)

  • Kim, Ji-Kyoung;Oh, Yeong-Jae;Chong, Kab-Sung;Wee, Jae-Woo;Lee, Chong-Ho
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.107-109
    • /
    • 2004
  • In this paper, authors propose the sensor fusion system that can recognize multiple 3D objects from 2D projection images and tactile information. The proposed system focuses on improving recognition performance of 3D object. Unlike the conventional object recognition system that uses image sensor alone, the proposed method uses tactual sensors in addition to visual sensor. Neural network is used to fuse these informations. Tactual signals are obtained from the reaction force by the pressure sensors at the fingertips when unknown objects are grasped by four-fingered robot hand. The experiment evaluates the recognition rate and the number of teaming iterations of various objects. The merits of the proposed systems are not only the high performance of the learning ability but also the reliability of the system with tactual information for recognizing various objects even though visual information has a defect. The experimental results show that the proposed system can improve recognition rate and reduce learning time. These results verify the effectiveness of the proposed sensor fusion system as recognition scheme of 3D object.

  • PDF

Neural Network Approach to Sensor Fusion System for Improving the Recognition Performance of 3D Objects (3차원 물체의 인식 성능 향상을 위한 감각 융합 신경망 시스템)

  • Dong Sung Soo;Lee Chong Ho;Kim Ji Kyoung
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.3
    • /
    • pp.156-165
    • /
    • 2005
  • Human being recognizes the physical world by integrating a great variety of sensory inputs, the information acquired by their own action, and their knowledge of the world using hierarchically parallel-distributed mechanism. In this paper, authors propose the sensor fusion system that can recognize multiple 3D objects from 2D projection images and tactile informations. The proposed system focuses on improving recognition performance of 3D objects. Unlike the conventional object recognition system that uses image sensor alone, the proposed method uses tactual sensors in addition to visual sensor. Neural network is used to fuse the two sensory signals. Tactual signals are obtained from the reaction force of the pressure sensors at the fingertips when unknown objects are grasped by four-fingered robot hand. The experiment evaluates the recognition rate and the number of learning iterations of various objects. The merits of the proposed systems are not only the high performance of the learning ability but also the reliability of the system with tactual information for recognizing various objects even though the visual sensory signals get defects. The experimental results show that the proposed system can improve recognition rate and reduce teeming time. These results verify the effectiveness of the proposed sensor fusion system as recognition scheme for 3D objects.

A Salient Based Bag of Visual Word Model (SBBoVW): Improvements toward Difficult Object Recognition and Object Location in Image Retrieval

  • Mansourian, Leila;Abdullah, Muhamad Taufik;Abdullah, Lilli Nurliyana;Azman, Azreen;Mustaffa, Mas Rina
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.2
    • /
    • pp.769-786
    • /
    • 2016
  • Object recognition and object location have always drawn much interest. Also, recently various computational models have been designed. One of the big issues in this domain is the lack of an appropriate model for extracting important part of the picture and estimating the object place in the same environments that caused low accuracy. To solve this problem, a new Salient Based Bag of Visual Word (SBBoVW) model for object recognition and object location estimation is presented. Contributions lied in the present study are two-fold. One is to introduce a new approach, which is a Salient Based Bag of Visual Word model (SBBoVW) to recognize difficult objects that have had low accuracy in previous methods. This method integrates SIFT features of the original and salient parts of pictures and fuses them together to generate better codebooks using bag of visual word method. The second contribution is to introduce a new algorithm for finding object place based on the salient map automatically. The performance evaluation on several data sets proves that the new approach outperforms other state-of-the-arts.

A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment (모바일 환경에서의 시각 음성인식을 위한 눈 정위 기반 입술 탐지에 대한 연구)

  • Gyu, Song-Min;Pham, Thanh Trung;Kim, Jin-Young;Taek, Hwang-Sung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.478-484
    • /
    • 2009
  • Automatic speech recognition(ASR) is attractive technique in trend these day that seek convenient life. Although many approaches have been proposed for ASR but the performance is still not good in noisy environment. Now-a-days in the state of art in speech recognition, ASR uses not only the audio information but also the visual information. In this paper, We present a novel lip detection method for visual speech recognition in mobile environment. In order to apply visual information to speech recognition, we need to extract exact lip regions. Because eye-detection is more easy than lip-detection, we firstly detect positions of left and right eyes, then locate lip region roughly. After that we apply K-means clustering technique to devide that region into groups, than two lip corners and lip center are detected by choosing biggest one among clustered groups. Finally, we have shown the effectiveness of the proposed method through the experiments based on samsung AVSR database.