• Title/Summary/Keyword: Multi-Modal Recognition

Search Result 68, Processing Time 0.026 seconds

An Emotion Recognition and Expression Method using Facial Image and Speech Signal (음성 신호와 얼굴 표정을 이용한 감정인식 몇 표현 기법)

  • Ju, Jong-Tae;Mun, Byeong-Hyeon;Seo, Sang-Uk;Jang, In-Hun;Sim, Gwi-Bo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.04a
    • /
    • pp.333-336
    • /
    • 2007
  • 본 논문에서는 감정인식 분야에서 가장 많이 사용되어지는 음성신호와 얼굴영상을 가지고 4개의(기쁨, 슬픔, 화남, 놀람) 감정으로 인식하고 각각 얻어진 감정인식 결과를 Multi modal 기법을 이용해서 이들의 감정을 융합한다. 이를 위해 얼굴영상을 이용한 감정인식에서는 주성분 분석(Principal Component Analysis)법을 이용해 특징벡터를 추출하고, 음성신호는 언어적 특성을 배재한 acoustic feature를 사용하였으며 이와 같이 추출된 특징들을 각각 신경망에 적용시켜 감정별로 패턴을 분류하였고, 인식된 결과는 감정표현 시스템에 작용하여 감정을 표현하였다.

  • PDF

Audio-Visual Integration based Multi-modal Speech Recognition System (오디오-비디오 정보 융합을 통한 멀티 모달 음성 인식 시스템)

  • Lee, Sahng-Woon;Lee, Yeon-Chul;Hong, Hun-Sop;Yun, Bo-Hyun;Han, Mun-Sung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.707-710
    • /
    • 2002
  • 본 논문은 오디오와 비디오 정보의 융합을 통한 멀티 모달 음성 인식 시스템을 제안한다. 음성 특징 정보와 영상 정보 특징의 융합을 통하여 잡음이 많은 환경에서 효율적으로 사람의 음성을 인식하는 시스템을 제안한다. 음성 특징 정보는 멜 필터 캡스트럼 계수(Mel Frequency Cepstrum Coefficients: MFCC)를 사용하며, 영상 특징 정보는 주성분 분석을 통해 얻어진 특징 벡터를 사용한다. 또한, 영상 정보 자체의 인식률 향상을 위해 피부 색깔 모델과 얼굴의 형태 정보를 이용하여 얼굴 영역을 찾은 후 강력한 입술 영역 추출 방법을 통해 입술 영역을 검출한다. 음성-영상 융합은 변형된 시간 지연 신경 회로망을 사용하여 초기 융합을 통해 이루어진다. 실험을 통해 음성과 영상의 정보 융합이 음성 정보만을 사용한 것 보다 대략 5%-20%의 성능 향상을 보여주고 있다.

  • PDF

Multi-Modal Recognition System Using the Fuzzy Fusion (퍼지 융합을 이용한 다중생체인식 시스템 구현)

  • Yang, Dong-Hwa;Kim, Hyung-Min;Go, Hyoun-Joo;Chun, Myung-Geun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.355-358
    • /
    • 2004
  • 본 논문에서는 사람의 얼굴과 지문을 이용하여 실시간 다중 생체인식 시스템 구현을 제안하였다. 얼굴인식에서는 이미지의 크기를 축소하기 위해 Wavelet Transform을 이용하였으며, 특징 값을 찾아내기 위한 방법으로는 얼굴인식에서 많이 사용되는 LDA(Linear Discriminant Analysis)를 이용하였다. 또한, 지문인식에서는 지문의 중심점을 찾아 가버 변환을 하고, 이로부터 섹터별 변량을 특징 값으로 사용하였으며, 인식 성능을 향상시킬 수 있는 상관도가 높은 지문 3개를 기준 데이터로 등록하였다. 마지막 단계로 두 가지의 생체정보를 모두 사용할 수 있도록 퍼지를 이용하여 얼굴인식의 결과와 지문인식의 결과를 융합하였으며, 단일 생체정보를 이용했을 때의 단점을 다중 생체인식 시스템을 구현함으로서 우수한 성능을 보이는 시스템을 구현하였다.

  • PDF

LH-FAS v2: Head Pose Estimation-Based Lightweight Face Anti-Spoofing (LH-FAS v2: 머리 자세 추정 기반 경량 얼굴 위조 방지 기술)

  • Hyeon-Beom Heo;Hye-Ri Yang;Sung-Uk Jung;Kyung-Jae Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.309-316
    • /
    • 2024
  • Facial recognition technology is widely used in various fields but faces challenges due to its vulnerability to fraudulent activities such as photo spoofing. Extensive research has been conducted to overcome this challenge. Most of them, however, require the use of specialized equipment like multi-modal cameras or operation in high-performance environments. In this paper, we introduce LH-FAS v2 (: Lightweight Head-pose-based Face Anti-Spoofing v2), a system designed to operate on a commercial webcam without any specialized equipment, to address the issue of facial recognition spoofing. LH-FAS v2 utilizes FSA-Net for head pose estimation and ArcFace for facial recognition, effectively assessing changes in head pose and verifying facial identity. We developed the VD4PS dataset, incorporating photo spoofing scenarios to evaluate the model's performance. The experimental results show the model's balanced accuracy and speed, indicating that head pose estimation-based facial anti-spoofing technology can be effectively used to counteract photo spoofing.

The Impact of the Science Writing Heuristic Approach on Students' Use of Multiple Representations in Science Writing and Students' Recognition about Multiple Representations (탐구적 과학 글쓰기 활동이 학생들의 글쓰기에서 나타난 다중 표상에 미치는 영향 및 다중 표상에 대한 학생들의 인식)

  • Nam, Jeonghee;Park, Jiyeon;Lee, Dongwon
    • Journal of the Korean Chemical Society
    • /
    • v.56 no.6
    • /
    • pp.759-767
    • /
    • 2012
  • The purpose of this study was to examine the impact of Science Writing Heuristic (SWH) on multiple representations in students' writing and to survey experimental group students' recognition about the use of multiple representations. For this study, Participants of this study were 158 students in 7th grade. 94 students were assigned to the experimental group and 64 students were assigned to the comparative group. The experimental group showed significantly higher mean score than comparative group at utilizing multiple representation in summary writing. Interview analysis indicated that all students who participated in interviews, regardless of solid multi-modal competency, recognized that use of multiple representations with appropriate explanations enable to communicate science information persuasively.

Training Performance Analysis of Semantic Segmentation Deep Learning Model by Progressive Combining Multi-modal Spatial Information Datasets (다중 공간정보 데이터의 점진적 조합에 의한 의미적 분류 딥러닝 모델 학습 성능 분석)

  • Lee, Dae-Geon;Shin, Young-Ha;Lee, Dong-Cheon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.40 no.2
    • /
    • pp.91-108
    • /
    • 2022
  • In most cases, optical images have been used as training data of DL (Deep Learning) models for object detection, recognition, identification, classification, semantic segmentation, and instance segmentation. However, properties of 3D objects in the real-world could not be fully explored with 2D images. One of the major sources of the 3D geospatial information is DSM (Digital Surface Model). In this matter, characteristic information derived from DSM would be effective to analyze 3D terrain features. Especially, man-made objects such as buildings having geometrically unique shape could be described by geometric elements that are obtained from 3D geospatial data. The background and motivation of this paper were drawn from concept of the intrinsic image that is involved in high-level visual information processing. This paper aims to extract buildings after classifying terrain features by training DL model with DSM-derived information including slope, aspect, and SRI (Shaded Relief Image). The experiments were carried out using DSM and label dataset provided by ISPRS (International Society for Photogrammetry and Remote Sensing) for CNN-based SegNet model. In particular, experiments focus on combining multi-source information to improve training performance and synergistic effect of the DL model. The results demonstrate that buildings were effectively classified and extracted by the proposed approach.

Multi-modal Image Processing for Improving Recognition Accuracy of Text Data in Images (이미지 내의 텍스트 데이터 인식 정확도 향상을 위한 멀티 모달 이미지 처리 프로세스)

  • Park, Jungeun;Joo, Gyeongdon;Kim, Chulyun
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.148-158
    • /
    • 2018
  • The optical character recognition (OCR) is a technique to extract and recognize texts from images. It is an important preprocessing step in data analysis since most actual text information is embedded in images. Many OCR engines have high recognition accuracy for images where texts are clearly separable from background, such as white background and black lettering. However, they have low recognition accuracy for images where texts are not easily separable from complex background. To improve this low accuracy problem with complex images, it is necessary to transform the input image to make texts more noticeable. In this paper, we propose a method to segment an input image into text lines to enable OCR engines to recognize each line more efficiently, and to determine the final output by comparing the recognition rates of CLAHE module and Two-step module which distinguish texts from background regions based on image processing techniques. Through thorough experiments comparing with well-known OCR engines, Tesseract and Abbyy, we show that our proposed method have the best recognition accuracy with complex background images.

A study on the Pattern Recognition of the EMG signals using Neural Network and Probabilistic modal for the two dimensional Motions described by External Coordinate (신경회로망과 확률모델을 이용한 2차원운동의 외부좌표에 대한 EMG신호의 패턴인식에 관한 연구)

  • Jang, Young-Gun;Kwon, Jang-Woo;Hong, Seung-Hong
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1991 no.05
    • /
    • pp.65-70
    • /
    • 1991
  • A hybrid model which uses a probabilistic model and a MLP(multi layer perceptron) model for pattern recognition of EMG(electromyogram) signals is proposed in this paper. MLP model has problems which do not guarantee global minima of error due to learning method and have different approximation grade to bayesian probabilities due to different amounts and quality of training data, the number of hidden layers and hidden nodes, etc. Especially in the case of new test data which exclude design samples, the latter problem produces quite different results. The error probability of probabilistic model is closely related to the estimation error of the parameters used in the model and fidelity of assumtion. Generally, it is impossible to introduce the bayesian classifier to the probabilistic model of EMG signals because of unknown priori probabilities and is estimated by MLE(maximum likelihood estimate). In this paper we propose the method which get the MAP(maximum a posteriori probability) in the probabilistic model by estimating the priori probability distribution which minimize the error probability using the MLP. This method minimize the error probability of the probabilistic model as long as the realization of the MLP is optimal and approximate the minimum of error probability of each class of both models selectively. Alocating the reference coordinate of EMG signal to the outside of the body make it easy to suit to the applications which it is difficult to define and seperate using internal body coordinate. Simulation results show the benefit of the proposed model compared to use the MLP and the probabilistic model seperately.

  • PDF

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System (멀티모달 패션 추천 대화 시스템을 위한 개선된 트랜스포머 모델)

  • Park, Yeong Joon;Jo, Byeong Cheol;Lee, Kyoung Uk;Kim, Kyung Sun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.138-147
    • /
    • 2022
  • Recently, chatbots have been applied in various fields and have shown good results, and many attempts to use chatbots in shopping mall product recommendation services are being conducted on e-commerce platforms. In this paper, for a conversation system that recommends a fashion that a user wants based on conversation between the user and the system and fashion image information, a transformer model that is currently performing well in various AI fields such as natural language processing, voice recognition, and image recognition. We propose a multimodal-based improved transformer model that is improved to increase the accuracy of recommendation by using dialogue (text) and fashion (image) information together for data preprocessing and data representation. We also propose a method to improve accuracy through data improvement by analyzing the data. The proposed system has a recommendation accuracy score of 0.6563 WKT (Weighted Kendall's tau), which significantly improved the existing system's 0.3372 WKT by 0.3191 WKT or more.

Digital Mirror System with Machine Learning and Microservices (머신 러닝과 Microservice 기반 디지털 미러 시스템)

  • Song, Myeong Ho;Kim, Soo Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.9
    • /
    • pp.267-280
    • /
    • 2020
  • Mirror is a physical reflective surface, typically of glass coated with a metal amalgam, and it is to reflect an image clearly. They are available everywhere anytime and become an essential tool for us to observe our faces and appearances. With the advent of modern software technology, we are motivated to enhance the reflection capability of mirrors with the convenience and intelligence of realtime processing, microservices, and machine learning. In this paper, we present a development of Digital Mirror System that provides the realtime reflection functionality as mirror while providing additional convenience and intelligence including personal information retrieval, public information retrieval, appearance age detection, and emotion detection. Moreover, it provides a multi-model user interface of touch-based, voice-based, and gesture-based. We present our design and discuss how it can be implemented with current technology to deliver the realtime mirror reflection while providing useful information and machine learning intelligence.