• Title/Summary/Keyword: Multi-modal Features

Search Result 37, Processing Time 0.021 seconds

Multi-modal Detection of Anchor Shot in News Video (다중모드 특징을 사용한 뉴스 동영상의 앵커 장면 검출 기법)

  • Yoo, Sung-Yul;Kang, Dong-Wook;Kim, Ki-Doo;Jung, Kyeong-Hoon
    • Journal of Broadcast Engineering
    • /
    • v.12 no.4
    • /
    • pp.311-320
    • /
    • 2007
  • In this paper, an efficient detection algorithm of an anchor shot in news video is presented. We observed the audio visual characteristics of news video and proposed several low level features which are appropriate for detecting an anchor shot in news video. The overall structure of the proposed algorithm is composed of 3 stages: the pause detection, the audio cluster classification, and the matching with motion activity stage. We used the audio features as well as the motion feature in order to improve the indexing accuracy and the simulation results show that the performance of the proposed algorithm is quite satisfactory.

Facial Features and Motion Recovery using multi-modal information and Paraperspective Camera Model (다양한 형식의 얼굴정보와 준원근 카메라 모델해석을 이용한 얼굴 특징점 및 움직임 복원)

  • Kim, Sang-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.563-570
    • /
    • 2002
  • Robust extraction of 3D facial features and global motion information from 2D image sequence for the MPEG-4 SNHC face model encoding is described. The facial regions are detected from image sequence using multi-modal fusion technique that combines range, color and motion information. 23 facial features among the MPEG-4 FDP (Face Definition Parameters) are extracted automatically inside the facial region using color transform (GSCD, BWCD) and morphological processing. The extracted facial features are used to recover the 3D shape and global motion of the object using paraperspective camera model and SVD (Singular Value Decomposition) factorization method. A 3D synthetic object is designed and tested to show the performance of proposed algorithm. The recovered 3D motion information is transformed into global motion parameters of FAP (Face Animation Parameters) of the MPEG-4 to synchronize a generic face model with a real face.

Ubiquitous Context-aware Modeling and Multi-Modal Interaction Design Framework (유비쿼터스 환경의 상황인지 모델과 이를 활용한 멀티모달 인터랙션 디자인 프레임웍 개발에 관한 연구)

  • Kim, Hyun-Jeong;Lee, Hyun-Jin
    • Archives of design research
    • /
    • v.18 no.2 s.60
    • /
    • pp.273-282
    • /
    • 2005
  • In this study, we proposed Context Cube as a conceptual model of user context, and a Multi-modal Interaction Design framework to develop ubiquitous service through understanding user context and analyzing correlation between context awareness and multi-modality, which are to help infer the meaning of context and offer services to meet user needs. And we developed a case study to verify Context Cube's validity and proposed interaction design framework to derive personalized ubiquitous service. We could understand context awareness as information properties which consists of basic activity, location of a user and devices(environment), time, and daily schedule of a user. And it enables us to construct three-dimensional conceptual model, Context Cube. Also, we developed ubiquitous interaction design process which encloses multi-modal interaction design by studying the features of user interaction presented on Context Cube.

  • PDF

Gait Type Classification Using Multi-modal Ensemble Deep Learning Network

  • Park, Hee-Chan;Choi, Young-Chan;Choi, Sang-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.29-38
    • /
    • 2022
  • This paper proposes a system for classifying gait types using an ensemble deep learning network for gait data measured by a smart insole equipped with multi-sensors. The gait type classification system consists of a part for normalizing the data measured by the insole, a part for extracting gait features using a deep learning network, and a part for classifying the gait type by inputting the extracted features. Two kinds of gait feature maps were extracted by independently learning networks based on CNNs and LSTMs with different characteristics. The final ensemble network classification results were obtained by combining the classification results. For the seven types of gait for adults in their 20s and 30s: walking, running, fast walking, going up and down stairs, and going up and down hills, multi-sensor data was classified into a proposed ensemble network. As a result, it was confirmed that the classification rate was higher than 90%.

Multi-Modal Controller Usability for Smart TV Control

  • Yu, Jeongil;Kim, Seongmin;Choe, Jaeho;Jung, Eui S.
    • Journal of the Ergonomics Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.517-528
    • /
    • 2013
  • Objective: The objective of this study was to suggest a multi-modal controller type for Smart TV Control. Background: Recently, many issues regarding the Smart TV are arising due to the rising complexity of features in a Smart TV. One of the specific issues involves what type of controller must be utilized in order to perform regulated tasks. This study examines the ongoing trend of the controller. Method: The selected participants had experiences with the Smart TV and were 20 to 30 years of age. A pre-survey determined the first independent variable of five tasks(Live TV, Record, Share, Web, App Store). The second independent variable was the type of controllers(Conventional, Mouse, Voice-Based Remote Controllers). The dependent variables were preference, task completion time, and error rate. The experiment consist a series of three experiments. The first experiment utilized a uni-modal Controller for tasks; the second experiment utilized a dual-modal Controller, while the third experiment utilized a triple-modal Controller. Results: The first experiment revealed that the uni-modal Controller (Conventional, Voice Controller) showed the best results for the Live TV task. The second experiment revealed that the dual-modal Controller(Conventional-Voice, Conventional-Mouse combinations) showed the best results for the Share, Web, App Store tasks. The third experiment revealed that the triple-modal Controller among all the level had not effective compared with dual-modal Controller. Conclusion: In order to control simple tasks in a smart TV, our results showed that a uni-modal Controller was more effective than a dual-modal controller. However, the control of complex tasks was better suited to the dual-modal Controller. User preference for a controller differs according the Smart TV functions. For instance, there was a high user preference for the uni-Controller for simple functions while high user preference appeared for Dual-Controllers when the task was complex. Additionally, in accordance with task characteristics, there was a high user preference for the Voice Controller for channel and volume adjustment. Furthermore, there was a high user preference for the Conventional Controller for menu selection. In situations where the user had to input text, the Voice Controller had the highest preference among users while the Mouse Type, Voice Controller had the highest user preference for performing a search or selecting items on the menu. Application: The results of this study may be utilized in the design of a controller which can effectively carry out the various tasks of the Smart TV.

A multi-resolution analysis based finite element model updating method for damage identification

  • Zhang, Xin;Gao, Danying;Liu, Yang;Du, Xiuli
    • Smart Structures and Systems
    • /
    • v.16 no.1
    • /
    • pp.47-65
    • /
    • 2015
  • A novel finite element (FE) model updating method based on multi-resolution analysis (MRA) is proposed. The true stiffness of the FE model is considered as the superposition of two pieces of stiffness information of different resolutions: the pre-defined stiffness information and updating stiffness information. While the resolution of former is solely decided by the meshing density of the FE model, the resolution of latter is decided by the limited information obtained from the experiment. The latter resolution is considerably lower than the former. Second generation wavelet is adopted to describe the updating stiffness information in the framework of MRA. This updating stiffness in MRA is realized at low level of resolution, therefore, needs less number of updating parameters. The efficiency of the optimization process is thus enhanced. The proposed method is suitable for the identification of multiple irregular cracks and performs well in capturing the global features of the structural damage. After the global features are identified, a refinement process proposed in the paper can be carried out to improve the performance of the MRA of the updating information. The effectiveness of the method is verified by numerical simulations of a box girder and the experiment of a three-span continues pre-stressed concrete bridge. It is shown that the proposed method corresponds well to the global features of the structural damage and is stable against the perturbation of modal parameters and small variations of the damage.

Multi-modal Emotion Recognition using Semi-supervised Learning and Multiple Neural Networks in the Wild (준 지도학습과 여러 개의 딥 뉴럴 네트워크를 사용한 멀티 모달 기반 감정 인식 알고리즘)

  • Kim, Dae Ha;Song, Byung Cheol
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.351-360
    • /
    • 2018
  • Human emotion recognition is a research topic that is receiving continuous attention in computer vision and artificial intelligence domains. This paper proposes a method for classifying human emotions through multiple neural networks based on multi-modal signals which consist of image, landmark, and audio in a wild environment. The proposed method has the following features. First, the learning performance of the image-based network is greatly improved by employing both multi-task learning and semi-supervised learning using the spatio-temporal characteristic of videos. Second, a model for converting 1-dimensional (1D) landmark information of face into two-dimensional (2D) images, is newly proposed, and a CNN-LSTM network based on the model is proposed for better emotion recognition. Third, based on an observation that audio signals are often very effective for specific emotions, we propose an audio deep learning mechanism robust to the specific emotions. Finally, so-called emotion adaptive fusion is applied to enable synergy of multiple networks. The proposed network improves emotion classification performance by appropriately integrating existing supervised learning and semi-supervised learning networks. In the fifth attempt on the given test set in the EmotiW2017 challenge, the proposed method achieved a classification accuracy of 57.12%.

Incorporating BERT-based NLP and Transformer for An Ensemble Model and its Application to Personal Credit Prediction

  • Sophot Ky;Ju-Hong Lee;Kwangtek Na
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.9-15
    • /
    • 2024
  • Tree-based algorithms have been the dominant methods used build a prediction model for tabular data. This also includes personal credit data. However, they are limited to compatibility with categorical and numerical data only, and also do not capture information of the relationship between other features. In this work, we proposed an ensemble model using the Transformer architecture that includes text features and harness the self-attention mechanism to tackle the feature relationships limitation. We describe a text formatter module, that converts the original tabular data into sentence data that is fed into FinBERT along with other text features. Furthermore, we employed FT-Transformer that train with the original tabular data. We evaluate this multi-modal approach with two popular tree-based algorithms known as, Random Forest and Extreme Gradient Boosting, XGBoost and TabTransformer. Our proposed method shows superior Default Recall, F1 score and AUC results across two public data sets. Our results are significant for financial institutions to reduce the risk of financial loss regarding defaulters.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

Development of Driver's Emotion and Attention Recognition System using Multi-modal Sensor Fusion Algorithm (다중 센서 융합 알고리즘을 이용한 운전자의 감정 및 주의력 인식 기술 개발)

  • Han, Cheol-Hun;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.754-761
    • /
    • 2008
  • As the automobile industry and technologies are developed, driver's tend to more concern about service matters than mechanical matters. For this reason, interests about recognition of human knowledge and emotion to make safe and convenient driving environment for driver are increasing more and more. recognition of human knowledge and emotion are emotion engineering technology which has been studied since the late 1980s to provide people with human-friendly services. Emotion engineering technology analyzes people's emotion through their faces, voices and gestures, so if we use this technology for automobile, we can supply drivels with various kinds of service for each driver's situation and help them drive safely. Furthermore, we can prevent accidents which are caused by careless driving or dozing off while driving by recognizing driver's gestures. the purpose of this paper is to develop a system which can recognize states of driver's emotion and attention for safe driving. First of all, we detect a signals of driver's emotion by using bio-motion signals, sleepiness and attention, and then we build several types of databases. by analyzing this databases, we find some special features about drivers' emotion, sleepiness and attention, and fuse the results through Multi-Modal method so that it is possible to develop the system.