• Title/Summary/Keyword: 모달실험

Search Result 163, Processing Time 0.03 seconds

Scene Graph Generation with Graph Neural Network and Multimodal Context (그래프 신경망과 멀티 모달 맥락 정보를 이용한 장면 그래프 생성)

  • Jung, Ga-Young;Kim, In-cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.555-558
    • /
    • 2020
  • 본 논문에서는 입력 영상에 담긴 다양한 물체들과 그들 간의 관계를 효과적으로 탐지하여, 하나의 장면 그래프로 표현해내는 새로운 심층 신경망 모델을 제안한다. 제안 모델에서는 물체와 관계의 효과적인 탐지를 위해, 합성 곱 신경망 기반의 시각 맥락 특징들뿐만 아니라 언어 맥락 특징들을 포함하는 다양한 멀티 모달 맥락 정보들을 활용한다. 또한, 제안 모델에서는 관계를 맺는 두 물체 간의 상호 의존성이 그래프 노드 특징값들에 충분히 반영되도록, 그래프 신경망을 이용해 맥락 정보를 임베딩한다. 본 논문에서는 Visual Genome 벤치마크 데이터 집합을 이용한 비교 실험들을 통해, 제안 모델의 효과와 성능을 입증한다.

Multi-modal approach for FASCODE-EVAL (FASCODE-EVAL을 위한 복합모달 접근방법)

  • Chung, Euisok;Kim, Hyun Woo;Park, Minho;Song, Hwa Jeon
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.514-517
    • /
    • 2021
  • FASCODE-EVAL1은 고객과 시스템간의 의상 추천 대화 문맥과 해당 문맥의 요구사항을 고려한 의상셋 추천 목록으로 구성된다. 의상셋 추천 목록은 3개의 의상셋 후보로 구성되고, 문맥과 관련성이 높은 순서로 정렬된다. 해당 정렬을 찾는 방식으로 의상 추천 시스템 평가를 진행한다. 대화 문맥는 텍스트로 되어 있고, 의상 아이템은 텍스트로 구성된 자질 정보와 의상 이미지 정보로 구성된다. 본 논문은 FASCODE-EVAL 문제를 해결하기 위하여 트랜스포머 기반의 사전학습 언어모델을 이용하고, 텍스트 정보와 이미지 정보를 해당 언어모델에 통합하는 방법을 보여준다. FASCODE-EVAL 실험결과는 기존 공개된 결과들보다 우수한 성능을 보여준다.

  • PDF

Multi-Modal Cross Attention for 3D Point Cloud Semantic Segmentation (3차원 포인트 클라우드의 의미적 분할을 위한 멀티-모달 교차 주의집중)

  • HyeLim Bae;Incheol Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.660-662
    • /
    • 2023
  • 3차원 포인트 클라우드의 의미적 분할은 환경을 구성하는 물체 단위로 포인트 클라우드를 분할하는 작업으로서, 환경의 3차원적 구성을 이해하고 환경과 상호작용에 필수적인 시각 지능을 요구한다. 본 논문에서는 포인트 클라우드에서 추출하는 3차원 기하학적 특징과 함께 멀티-뷰 영상에서 추출하는 2차원 시각적 특징들도 활용하는 새로운 3차원 포인트 클라우드 의미적 분할 모델 MFNet을 제안한다. 제안 모델은 서로 이질적인 2차원 시각적 특징과 3차원 기하학적 특징의 효과적인 융합을 위해, 새로운 중기 융합 전략과 멀티-모달 교차 주의집중을 이용한다. 본 논문에서는 ScanNetV2 벤치마크 데이터 집합을 이용한 다양한 실험들을 통해, 제안 모델 MFNet의 우수성을 입증한다.

Verification of Damage Detection Using In-Service Time Domain Response (사용중 시간영역응답을 이용한 손상탐지이론의 검증)

  • Choi, Sang-Hyun;Kim, Dae-Hyork;Park, Nam-Hoi
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.9 no.5
    • /
    • pp.9-13
    • /
    • 2009
  • Modal parameters including resonant frequencies and mode shapes are heavily utililized in most damage identification throries for structural health monitoring. However, extracting modal parameters from dynamic responses needs postprocessing which inevitably involves errors in curve-fitting resonants as well as transforming the domain of responses. In this paper, the applicability of a damage identification method based on free vibration responses to the in-sevice responses is experimentally verified. The experiment is performed via applying periodic and nonperiodic moving loads to a simply supported beam and displacement responses are measured. The moving load is simulated using steel balls and a downhill device. The damage identification results show that the in-service response may be applicable to identifying damage in the beam.

A Model to Automatically Generate Non-verbal Expression Information for Korean Utterance Sentence (한국어 발화 문장에 대한 비언어 표현 정보를 자동으로 생성하는 모델)

  • Jaeyoon Kim;Jinyea Jang;San Kim;Minyoung Jung;Hyunwook Kang;Saim Shin
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.91-94
    • /
    • 2023
  • 자연스러운 상호작용이 가능한 인공지능 에이전트를 개발하기 위해서는 언어적 표현뿐 아니라, 비언어적 표현 또한 고려되어야 한다. 본 논문에서는 한국어 발화문으로부터 비언어적 표현인 모션을 생성하는 연구를 소개한다. 유튜브 영상으로부터 데이터셋을 구축하고, Text to Motion의 기존 모델인 T2M-GPT와 이종 모달리티 데이터를 연계 학습한 VL-KE-T5의 언어 인코더를 활용하여 구현한 모델로 실험을 진행하였다. 실험 결과, 한국어 발화 텍스트에 대해 생성된 모션 표현은 FID 스코어 0.11의 성능으로 나타났으며, 한국어 발화 정보 기반 비언어 표현 정보 생성의 가능성을 보여주었다.

  • PDF

Improvement of Face Verification Performance Using Multiple Instances and Matching Algorithms (다중획득 및 매칭을 통한 얼굴 검증 성능 향상)

  • 김도형;윤호섭;이재연
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2003.05b
    • /
    • pp.450-453
    • /
    • 2003
  • 본 논문에서는 멀티모달 생체인식 시나리오 중에서, 단일 생체 특징에 적용되는 다중 획득 및 매칭이 시스템 성능에 기여하는 효과에 대하여 논의한다. 얼굴이라는 단일 생체 검중 시스템에 본 논문에서 제안한 간단한 다중 획득 및 매칭 결합 방법론들을 적용하였고, 실제적인 평가모델과 데이터베이스를 구축하여 이를 실험하고 결과를 분석하였다 실험결과, 단일 획득 및 매칭 시스템보다 25% 가량 향상된 우수한 성능을 나타냈으며, 이는 얼굴 검증 시스템 구축에 있어 반드시 고려되어야 할 사항 중에 하나임을 보여준다.

  • PDF

Dynamic Analysis of Design Data for Structural Lap Joint (LAP 구조물 결합부의 설계치 확보를 위한 동역학적 해석)

  • 윤성호
    • Journal of KSNVE
    • /
    • v.8 no.1
    • /
    • pp.57-74
    • /
    • 1998
  • This paper is concerned with a combination of experimental and analytical investigation aimed at identifying modeling errors, accounted for the lack of correlation between experimental measurements and analytical predictions of the modal parameters for lap joint panels. A nonlinearity vibration test methodology, initiated from the theoretical analysis, is suggested for measurements of dynamic stiffnesses in a lap joint using the rivet fastener. Based on the experimental evidence on discrepancies between measured and predicted frequencies, improved finite element models of the joint are developed using PATRAN and ABAQUS, in which the beam element size is evaluated from the joint stiffnesses readily determined in the test. The beam element diameter as a principal design parameter is tuned to match experimental results within the evaluated bound value. Frequencies predicted by the proposed numerical model are compared with frequencies measured by the test. Improved predictions based on this new model are observed when compared with those based on conventional modeling practices.

  • PDF

Multimodal Emotional State Estimation Model for Implementation of Intelligent Exhibition Services (지능형 전시 서비스 구현을 위한 멀티모달 감정 상태 추정 모형)

  • Lee, Kichun;Choi, So Yun;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.1-14
    • /
    • 2014
  • Both researchers and practitioners are showing an increased interested in interactive exhibition services. Interactive exhibition services are designed to directly respond to visitor responses in real time, so as to fully engage visitors' interest and enhance their satisfaction. In order to install an effective interactive exhibition service, it is essential to adopt intelligent technologies that enable accurate estimation of a visitor's emotional state from responses to exhibited stimulus. Studies undertaken so far have attempted to estimate the human emotional state, most of them doing so by gauging either facial expressions or audio responses. However, the most recent research suggests that, a multimodal approach that uses people's multiple responses simultaneously may lead to better estimation. Given this context, we propose a new multimodal emotional state estimation model that uses various responses including facial expressions, gestures, and movements measured by the Microsoft Kinect Sensor. In order to effectively handle a large amount of sensory data, we propose to use stratified sampling-based MRA (multiple regression analysis) as our estimation method. To validate the usefulness of the proposed model, we collected 602,599 responses and emotional state data with 274 variables from 15 people. When we applied our model to the data set, we found that our model estimated the levels of valence and arousal in the 10~15% error range. Since our proposed model is simple and stable, we expect that it will be applied not only in intelligent exhibition services, but also in other areas such as e-learning and personalized advertising.

Dynamic Performance Estimation of the Incrementally PSC Girder Railway Bridge by Modal Tests and Moving Load Analysis (다단계 긴장 PSC 거더 철도교량의 동특성 실험 및 주행열차하중 해석에 의한 동적성능 평가)

  • Kim, Sung Il;Kim, Nam Sik;Lee, Hee Up
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.4A
    • /
    • pp.707-717
    • /
    • 2006
  • As an alternative to conventional prestressed concrete (PSC) girders, various types of PSC girders are either under development or have already been applied in bridge structures. Incrementally prestressed concrete girder is one of these newly developed girders. According to the design concept, these new types of PSC girders have the advantages of requiring less self-weight while having the capability of longer spans. However, the dynamic interaction between bridge superstructures and passing trains is one of the critical issues concerning these railway bridges designed with more flexibility. Therefore, it is very important to evaluate modal parameters of newly designed bridges before doing dynamic analyses. In the present paper, a 25 meters long full scale PSC girder was fabricated as a test specimen and modal testing was carried out to evaluate modal parameters including natural frequencies and modal damping ratios at every prestressing stage. During the modal testing, a digitally controlled vibration exciter as well as an impact hammer is applied, in order to obtain precise frequency response functions and the modal parameters are evaluated varying with construction stages. Prestressed force effects on changes of modal parameters are analyzed at every incremental prestressing stage. With the application of reliable properties from modal experiments, estimation of dynamic performances of PSC girder railway bridges can be obtained from various parametric studies on dynamic behavior under the passage of moving train. Dynamic displacements, impact factor, acceleration of the slab, end rotation of the girder, and other important dynamic performance parameters are checked with various speeds of the train.

Design of a Deep Neural Network Model for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델의 설계)

  • Kim, Dongha;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • In this paper, we propose an effective neural network model for image caption generation and model transfer. This model is a kind of multi-modal recurrent neural network models. It consists of five distinct layers: a convolution neural network layer for extracting visual information from images, an embedding layer for converting each word into a low dimensional feature, a recurrent neural network layer for learning caption sentence structure, and a multi-modal layer for combining visual and language information. In this model, the recurrent neural network layer is constructed by LSTM units, which are well known to be effective for learning and transferring sequence patterns. Moreover, this model has a unique structure in which the output of the convolution neural network layer is linked not only to the input of the initial state of the recurrent neural network layer but also to the input of the multimodal layer, in order to make use of visual information extracted from the image at each recurrent step for generating the corresponding textual caption. Through various comparative experiments using open data sets such as Flickr8k, Flickr30k, and MSCOCO, we demonstrated the proposed multimodal recurrent neural network model has high performance in terms of caption accuracy and model transfer effect.