• Title/Summary/Keyword: Multi-Modal Recognition

Search Result 68, Processing Time 0.025 seconds

Implementation of Moving Object Recognition based on Deep Learning (딥러닝을 통한 움직이는 객체 검출 알고리즘 구현)

  • Lee, YuKyong;Lee, Yong-Hwan
    • Journal of the Semiconductor & Display Technology
    • /
    • v.17 no.2
    • /
    • pp.67-70
    • /
    • 2018
  • Object detection and tracking is an exciting and interesting research area in the field of computer vision, and its technologies have been widely used in various application systems such as surveillance, military, and augmented reality. This paper proposes and implements a novel and more robust object recognition and tracking system to localize and track multiple objects from input images, which estimates target state using the likelihoods obtained from multiple CNNs. As the experimental result, the proposed algorithm is effective to handle multi-modal target appearances and other exceptions.

Large-scale Language-image Model-based Bag-of-Objects Extraction for Visual Place Recognition (영상 기반 위치 인식을 위한 대규모 언어-이미지 모델 기반의 Bag-of-Objects 표현)

  • Seung Won Jung;Byungjae Park
    • Journal of Sensor Science and Technology
    • /
    • v.33 no.2
    • /
    • pp.78-85
    • /
    • 2024
  • We proposed a method for visual place recognition that represents images using objects as visual words. Visual words represent the various objects present in urban environments. To detect various objects within the images, we implemented and used a zero-shot detector based on a large-scale image language model. This zero-shot detector enables the detection of various objects in urban environments without additional training. In the process of creating histograms using the proposed method, frequency-based weighting was applied to consider the importance of each object. Through experiments with open datasets, the potential of the proposed method was demonstrated by comparing it with another method, even in situations involving environmental or viewpoint changes.

A study on the implementation of identification system using facial multi-modal (얼굴의 다중특징을 이용한 인증 시스템 구현)

  • 정택준;문용선
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.5
    • /
    • pp.777-782
    • /
    • 2002
  • This study will offer multimodal recognition instead of an existing monomodal bioinfomatics by using facial multi-feature to improve the accuracy of recognition and to consider the convenience of user . Each bioinfomatics vector can be found by the following ways. For a face, the feature is calculated by principal component analysis with wavelet multiresolution. For a lip, a filter is used to find out an equation to calculate the edges of the lips first. Then by using a thinning image and least square method, an equation factor can be drawn. A feature found out the facial parameter distance ratio. We've sorted backpropagation neural network and experimented with the inputs used above. Based on the experimental results we discuss the advantage and efficiency.

Improvement of Reliability based Information Integration in Audio-visual Person Identification (시청각 화자식별에서 신뢰성 기반 정보 통합 방법의 성능 향상)

  • Tariquzzaman, Md.;Kim, Jin-Young;Hong, Joon-Hee
    • MALSORI
    • /
    • no.62
    • /
    • pp.149-161
    • /
    • 2007
  • In this paper we proposed a modified reliability function for improving bimodal speaker identification(BSI) performance. The convectional reliability function, used by N. Fox[1], is extended by introducing an optimization factor. We evaluated the proposed method in BSI domain. A BSI system was implemented based on GMM and it was tested using VidTIMIT database. Through speaker identification experiments we verified the usefulness of our proposed method. The experiments showed the improved performance, i.e., the reduction of error rate by 39%.

  • PDF

Implementation of Embedded System for Multi-modal Biometric Recognition using KSOM (KSOM을 이용한 다중생체 인식시스템에 관한 연구)

  • Kim, Jae-Wan;Lee, Sang-Bae
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.91-94
    • /
    • 2006
  • 본 논문은 생체인식시스템에서 단일시스템의 각각의 특징을 바탕으로 신뢰성을 증가시키는 것에 있다. 간단하면서 높은 인식률을 가지는 지문과 개개인의 음성을 다중생체인식에 활용하여 다중생체인식 시스템을 구현 하였다. 화자인식부에서는 DSP를 이용하여 화자인식을 수행하고, 이후 지문인식부에서 지문 특징점을 추출하여 KSOM신경망 알고리즘을 이용하여 인식을 수행하였다. 그리고 각 인식부의 전체적인 제어는 ATmega16L을 사용하였다. 또한 인증결과를 PC에 MFC로 디스플레이 한다.

  • PDF

Multi-view learning review: understanding methods and their application (멀티 뷰 기법 리뷰: 이해와 응용)

  • Bae, Kang Il;Lee, Yung Seop;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.41-68
    • /
    • 2019
  • Multi-view learning considers data from various viewpoints as well as attempts to integrate various information from data. Multi-view learning has been studied recently and has showed superior performance to a model learned from only a single view. With the introduction of deep learning techniques to a multi-view learning approach, it has showed good results in various fields such as image, text, voice, and video. In this study, we introduce how multi-view learning methods solve various problems faced in human behavior recognition, medical areas, information retrieval and facial expression recognition. In addition, we review data integration principles of multi-view learning methods by classifying traditional multi-view learning methods into data integration, classifiers integration, and representation integration. Finally, we examine how CNN, RNN, RBM, Autoencoder, and GAN, which are commonly used among various deep learning methods, are applied to multi-view learning algorithms. We categorize CNN and RNN-based learning methods as supervised learning, and RBM, Autoencoder, and GAN-based learning methods as unsupervised learning.

ICLAL: In-Context Learning-Based Audio-Language Multi-Modal Deep Learning Models (ICLAL: 인 컨텍스트 러닝 기반 오디오-언어 멀티 모달 딥러닝 모델)

  • Jun Yeong Park;Jinyoung Yeo;Go-Eun Lee;Chang Hwan Choi;Sang-Il Choi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.514-517
    • /
    • 2023
  • 본 연구는 인 컨택스트 러닝 (In-Context Learning)을 오디오-언어 작업에 적용하기 위한 멀티모달 (Multi-Modal) 딥러닝 모델을 다룬다. 해당 모델을 통해 학습 단계에서 오디오와 텍스트의 소통 가능한 형태의 표현 (Representation)을 학습하고 여러가지 오디오-텍스트 작업을 수행할 수 있는 멀티모달 딥러닝 모델을 개발하는 것이 본 연구의 목적이다. 모델은 오디오 인코더와 언어 인코더가 연결된 구조를 가지고 있으며, 언어 모델은 6.7B, 30B 의 파라미터 수를 가진 자동회귀 (Autoregressive) 대형 언어 모델 (Large Language Model)을 사용한다 오디오 인코더는 자기지도학습 (Self-Supervised Learning)을 기반으로 사전학습 된 오디오 특징 추출 모델이다. 언어모델이 상대적으로 대용량이기 언어모델의 파라미터를 고정하고 오디오 인코더의 파라미터만 업데이트하는 프로즌 (Frozen) 방법으로 학습한다. 학습을 위한 과제는 음성인식 (Automatic Speech Recognition)과 요약 (Abstractive Summarization) 이다. 학습을 마친 후 질의응답 (Question Answering) 작업으로 테스트를 진행했다. 그 결과, 정답 문장을 생성하기 위해서는 추가적인 학습이 필요한 것으로 보였으나, 음성인식으로 사전학습 한 모델의 경우 정답과 유사한 키워드를 사용하는 문법적으로 올바른 문장을 생성함을 확인했다.

A Study on Multi-modal Near-IR Face and Iris Recognition on Mobile Phones (휴대폰 환경에서의 근적외선 얼굴 및 홍채 다중 인식 연구)

  • Park, Kang-Ryoung;Han, Song-Yi;Kang, Byung-Jun;Park, So-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.2
    • /
    • pp.1-9
    • /
    • 2008
  • As the security requirements of mobile phones have been increasing, there have been extensive researches using one biometric feature (e.g., an iris, a fingerprint, or a face image) for authentication. Due to the limitation of uni-modal biometrics, we propose a method that combines face and iris images in order to improve accuracy in mobile environments. This paper presents four advantages and contributions over previous research. First, in order to capture both face and iris image at fast speed and simultaneously, we use a built-in conventional mega pixel camera in mobile phone, which is revised to capture the NIR (Near-InfraRed) face and iris image. Second, in order to increase the authentication accuracy of face and iris, we propose a score level fusion method based on SVM (Support Vector Machine). Third, to reduce the classification complexities of SVM and intra-variation of face and iris data, we normalize the input face and iris data, respectively. For face, a NIR illuminator and NIR passing filter on camera are used to reduce the illumination variance caused by environmental visible lighting and the consequent saturated region in face by the NIR illuminator is normalized by low processing logarithmic algorithm considering mobile phone. For iris, image transform into polar coordinate and iris code shifting are used for obtaining robust identification accuracy irrespective of image capturing condition. Fourth, to increase the processing speed on mobile phone, we use integer based face and iris authentication algorithms. Experimental results were tested with face and iris images by mega-pixel camera of mobile phone. It showed that the authentication accuracy using SVM was better than those of uni-modal (face or iris), SUM, MAX, NIN and weighted SUM rules.

Development of a Electronic Commerce System of Multi-Modal Information (다중모달을 이용한 전자상거래시스템 개발)

  • 장찬용;류갑상
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.729-732
    • /
    • 2001
  • Individual authentication system that take advantage of multimodal information is very efficient method that can take advantage of method of speech recognition, face recognition, electron signature etc. and protect important information from much dangers that exits on communication network whole as skill that construct security system. This paper deal product connected with hardware from internet space based on public key sign and electron signature description embodied system. Maintenance of public security is explaining that commercial transaction system implementation that is considered is possible as applying individual authentication.

  • PDF

Damage detection for a beam under transient excitation via three different algorithms

  • Zhao, Ying;Noori, Mohammad;Altabey, Wael A.
    • Structural Engineering and Mechanics
    • /
    • v.64 no.6
    • /
    • pp.803-817
    • /
    • 2017
  • Structural health monitoring has increasingly been a focus within the civil engineering research community over the last few decades. With increasing application of sensor networks in large structures and infrastructure systems, effective use and development of robust algorithms to analyze large volumes of data and to extract the desired features has become a challenging problem. In this paper, we grasp some precautions and key points of the signal processing approach, wavelet, establish a relative reliable framework, and analyze three problems that require attention when applying wavelet based damage detection approach. The cases studies how to use optimal scales for extracting mode shapes and modal curvatures in a reinforced concrete beam and how to effectively identify damages using maximum curves of wavelet coefficient differences. Moreover, how to make a recognition based on the wavelet multi-resolution analysis, wavelet packet energy, and fuzzy sets is a meaningful topic that has been addressed in this work. The relative systematic work that compasses algorithms, structures and evaluation paves a way to a framework regarding effective structural health monitoring, orientation, decision and action.