• Title/Summary/Keyword: multi-modal fusion

Search Result 32, Processing Time 0.023 seconds

Facial Features and Motion Recovery using multi-modal information and Paraperspective Camera Model (다양한 형식의 얼굴정보와 준원근 카메라 모델해석을 이용한 얼굴 특징점 및 움직임 복원)

  • Kim, Sang-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.563-570
    • /
    • 2002
  • Robust extraction of 3D facial features and global motion information from 2D image sequence for the MPEG-4 SNHC face model encoding is described. The facial regions are detected from image sequence using multi-modal fusion technique that combines range, color and motion information. 23 facial features among the MPEG-4 FDP (Face Definition Parameters) are extracted automatically inside the facial region using color transform (GSCD, BWCD) and morphological processing. The extracted facial features are used to recover the 3D shape and global motion of the object using paraperspective camera model and SVD (Singular Value Decomposition) factorization method. A 3D synthetic object is designed and tested to show the performance of proposed algorithm. The recovered 3D motion information is transformed into global motion parameters of FAP (Face Animation Parameters) of the MPEG-4 to synchronize a generic face model with a real face.

Multi-modal Authentication Using Score Fusion of ECG and Fingerprints

  • Kwon, Young-Bin;Kim, Jason
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.2
    • /
    • pp.132-146
    • /
    • 2020
  • Biometric technologies have become widely available in many different fields. However, biometric technologies using existing physical features such as fingerprints, facial features, irises, and veins must consider forgery and alterations targeting them through fraudulent physical characteristics such as fake fingerprints. Thus, a trend toward next-generation biometric technologies using behavioral biometrics of a living person, such as bio-signals and walking characteristics, has emerged. Accordingly, in this study, we developed a bio-signal authentication algorithm using electrocardiogram (ECG) signals, which are the most uniquely identifiable form of bio-signal available. When using ECG signals with our system, the personal identification and authentication accuracy are approximately 90% during a state of rest. When using fingerprints alone, the equal error rate (EER) is 0.243%; however, when fusing the scores of both the ECG signal and fingerprints, the EER decreases to 0.113% on average. In addition, as a function of detecting a presentation attack on a mobile phone, a method for rejecting a transaction when a fake fingerprint is applied was successfully implemented.

Multi-Modal Recognition System Using the Fuzzy Fusion (퍼지 융합을 이용한 다중생체인식 시스템 구현)

  • Yang, Dong-Hwa;Kim, Hyung-Min;Go, Hyoun-Joo;Chun, Myung-Geun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.355-358
    • /
    • 2004
  • 본 논문에서는 사람의 얼굴과 지문을 이용하여 실시간 다중 생체인식 시스템 구현을 제안하였다. 얼굴인식에서는 이미지의 크기를 축소하기 위해 Wavelet Transform을 이용하였으며, 특징 값을 찾아내기 위한 방법으로는 얼굴인식에서 많이 사용되는 LDA(Linear Discriminant Analysis)를 이용하였다. 또한, 지문인식에서는 지문의 중심점을 찾아 가버 변환을 하고, 이로부터 섹터별 변량을 특징 값으로 사용하였으며, 인식 성능을 향상시킬 수 있는 상관도가 높은 지문 3개를 기준 데이터로 등록하였다. 마지막 단계로 두 가지의 생체정보를 모두 사용할 수 있도록 퍼지를 이용하여 얼굴인식의 결과와 지문인식의 결과를 융합하였으며, 단일 생체정보를 이용했을 때의 단점을 다중 생체인식 시스템을 구현함으로서 우수한 성능을 보이는 시스템을 구현하였다.

  • PDF

Improved Semantic Segmentation in Multi-modal Network Using Encoder-Decoder Feature Fusion (인코더-디코더 사이의 특징 융합을 통한 멀티 모달 네트워크의 의미론적 분할 성능 향상)

  • Sohn, Chan-Young;Ho, Yo-Sung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.11a
    • /
    • pp.81-83
    • /
    • 2018
  • Fully Convolutional Network(FCN)은 기존의 방법보다 뛰어난 성능을 보였지만, FCN은 RGB 정보만을 사용하기 때문에 세밀한 예측이 필요한 장면에서는 다소 부족한 성능을 보였다. 이를 해결하기 위해 인코더-디코더 구조를 이용하여 RGB와 깊이의 멀티 모달을 활용하기 위한 FuseNet이 제안되었다. 하지만, FuseNet에서는 RGB와 깊이 브랜치 사이의 융합은 있지만, 인코더와 디코더 사이의 특징 지도를 융합하지 않는다. 본 논문에서는 FCN의 디코더 부분의 업샘플링 과정에서 이전 계층의 결과와 2배 업샘플링한 결과를 융합하는 스킵 레이어를 적용하여 FuseNet의 모달리티를 잘 활용하여 성능을 개선했다. 본 실험에서는 NYUDv2와 SUNRGBD 데이터 셋을 사용했으며, 전체 정확도는 각각 77%, 65%이고, 평균 IoU는 47.4%, 26.9%, 평균 정확도는 67.7%, 41%의 성능을 보였다.

  • PDF

Dual Foot-PDR System Considering Lateral Position Error Characteristics

  • Lee, Jae Hong;Cho, Seong Yun;Park, Chan Gook
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.11 no.1
    • /
    • pp.35-44
    • /
    • 2022
  • In this paper, a dual foot (DF)-PDR system is proposed for the fusion of integration (IA)-based PDR systems independently applied on both shoes. The horizontal positions of the two shoes estimated from each PDR system are fused based on a particle filter. The proposed method bounds the position error even if the walking time increases without an additional sensor. The distribution of particles is a non-Gaussian distribution to express the lateral error due to systematic drift. Assuming that the shoe position is the pedestrian position, the multi-modal position distribution can be fused into one using the Gaussian sum. The fused pedestrian position is used as a measurement of each particle filter so that the position error is corrected. As a result, experimental results show that position of pedestrians can be effectively estimated by using only the inertial sensors attached to both shoes.

A Study on Multi-modal Near-IR Face and Iris Recognition on Mobile Phones (휴대폰 환경에서의 근적외선 얼굴 및 홍채 다중 인식 연구)

  • Park, Kang-Ryoung;Han, Song-Yi;Kang, Byung-Jun;Park, So-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.2
    • /
    • pp.1-9
    • /
    • 2008
  • As the security requirements of mobile phones have been increasing, there have been extensive researches using one biometric feature (e.g., an iris, a fingerprint, or a face image) for authentication. Due to the limitation of uni-modal biometrics, we propose a method that combines face and iris images in order to improve accuracy in mobile environments. This paper presents four advantages and contributions over previous research. First, in order to capture both face and iris image at fast speed and simultaneously, we use a built-in conventional mega pixel camera in mobile phone, which is revised to capture the NIR (Near-InfraRed) face and iris image. Second, in order to increase the authentication accuracy of face and iris, we propose a score level fusion method based on SVM (Support Vector Machine). Third, to reduce the classification complexities of SVM and intra-variation of face and iris data, we normalize the input face and iris data, respectively. For face, a NIR illuminator and NIR passing filter on camera are used to reduce the illumination variance caused by environmental visible lighting and the consequent saturated region in face by the NIR illuminator is normalized by low processing logarithmic algorithm considering mobile phone. For iris, image transform into polar coordinate and iris code shifting are used for obtaining robust identification accuracy irrespective of image capturing condition. Fourth, to increase the processing speed on mobile phone, we use integer based face and iris authentication algorithms. Experimental results were tested with face and iris images by mega-pixel camera of mobile phone. It showed that the authentication accuracy using SVM was better than those of uni-modal (face or iris), SUM, MAX, NIN and weighted SUM rules.

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks (신경망 기반 음성, 영상 및 문맥 통합 음성인식)

  • 김명원;한문성;이순신;류정우
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.3
    • /
    • pp.67-77
    • /
    • 2004
  • The recent research has been focused on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network(BMNN) is a multi-layer perception of 4 layers, each of which performs a certain level of abstraction of input features. In BMNN the third layer combines audio md visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition. Our research demonstrates that diverse sources of information need to be integrated to improve the accuracy of speech recognition particularly in noisy environments.

A Survey of Objective Measurement of Fatigue Caused by Visual Stimuli (시각자극에 의한 피로도의 객관적 측정을 위한 연구 조사)

  • Kim, Young-Joo;Lee, Eui-Chul;Whang, Min-Cheol;Park, Kang-Ryoung
    • Journal of the Ergonomics Society of Korea
    • /
    • v.30 no.1
    • /
    • pp.195-202
    • /
    • 2011
  • Objective: The aim of this study is to investigate and review the previous researches about objective measuring fatigue caused by visual stimuli. Also, we analyze possibility of alternative visual fatigue measurement methods using facial expression recognition and gesture recognition. Background: In most previous researches, visual fatigue is commonly measured by survey or interview based subjective method. However, the subjective evaluation methods can be affected by individual feeling's variation or other kinds of stimuli. To solve these problems, signal and image processing based visual fatigue measurement methods have been widely researched. Method: To analyze the signal and image processing based methods, we categorized previous works into three groups such as bio-signal, brainwave, and eye image based methods. Also, the possibility of adopting facial expression or gesture recognition to measure visual fatigue is analyzed. Results: Bio-signal and brainwave based methods have problems because they can be degraded by not only visual stimuli but also the other kinds of external stimuli caused by other sense organs. In eye image based methods, using only single feature such as blink frequency or pupil size also has problem because the single feature can be easily degraded by other kinds of emotions. Conclusion: Multi-modal measurement method is required by fusing several features which are extracted from the bio-signal and image. Also, alternative method using facial expression or gesture recognition can be considered. Application: The objective visual fatigue measurement method can be applied into the fields of quantitative and comparative measurement of visual fatigue of next generation display devices in terms of human factor.

Multi-modal Meteorological Data Fusion based on Self-supervised Learning for Graph (Self-supervised Graph Learning을 통한 멀티모달 기상관측 융합)

  • Hyeon-Ju Jeon;Jeon-Ho Kang;In-Hyuk Kwon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.589-591
    • /
    • 2023
  • 현재 수치예보 시스템은 항공기, 위성 등 다양한 센서에서 얻은 다종 관측 데이터를 동화하여 대기 상태를 추정하고 있지만, 관측변수 또는 물리량이 서로 다른 관측들을 처리하기 위한 계산 복잡도가 매우 높다. 본 연구에서 기존 시스템의 계산 효율성을 개선하여 관측을 평가하거나 전처리하는 데에 효율적으로 활용하기 위해, 각 관측의 특성을 고려한 자기 지도학습 방법을 통해 멀티모달 기상관측으로부터 실제 대기 상태를 추정하는 방법론을 제안하고자 한다. 비균질적으로 수집되는 멀티모달 기상관측 데이터를 융합하기 위해, (i) 기상관측의 heterogeneous network를 구축하여 개별 관측의 위상정보를 표현하고, (ii) pretext task 기반의 self-supervised learning을 바탕으로 개별 관측의 특성을 표현한다. (iii) Graph neural network 기반의 예측 모델을 통해 실제에 가까운 대기 상태를 추정한다. 제안하는 모델은 대규모 수치 시뮬레이션 시스템으로 수행되는 기존 기술의 한계점을 개선함으로써, 이상 관측 탐지, 관측의 편차 보정, 관측영향 평가 등 관측 전처리 기술로 활용할 수 있다.

Ontology-Based Dynamic Context Management and Spatio-Temporal Reasoning for Intelligent Service Robots (지능형 서비스 로봇을 위한 온톨로지 기반의 동적 상황 관리 및 시-공간 추론)

  • Kim, Jonghoon;Lee, Seokjun;Kim, Dongha;Kim, Incheol
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1365-1375
    • /
    • 2016
  • One of the most important capabilities for autonomous service robots working in living environments is to recognize and understand the correct context in dynamically changing environment. To generate high-level context knowledge for decision-making from multiple sensory data streams, many technical problems such as multi-modal sensory data fusion, uncertainty handling, symbolic knowledge grounding, time dependency, dynamics, and time-constrained spatio-temporal reasoning should be solved. Considering these problems, this paper proposes an effective dynamic context management and spatio-temporal reasoning method for intelligent service robots. In order to guarantee efficient context management and reasoning, our algorithm was designed to generate low-level context knowledge reactively for every input sensory or perception data, while postponing high-level context knowledge generation until it was demanded by the decision-making module. When high-level context knowledge is demanded, it is derived through backward spatio-temporal reasoning. In experiments with Turtlebot using Kinect visual sensor, the dynamic context management and spatio-temporal reasoning system based on the proposed method showed high performance.