• Title/Summary/Keyword: 멀티 모달

Search Result 274, Processing Time 0.032 seconds

A Study on the Design of Digital Twin System and Required Function for Underground Lifelines (지하공동구 디지털 트윈 체계 및 요구기능 설계에 관한 연구)

  • Jeong, Min-Woo;Lee, Hee-Seok;Shin, Dong-Bin
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.7
    • /
    • pp.248-258
    • /
    • 2021
  • 24-hour monitoring is required to maintain the city's lifeline function in the underground facility for public utilities. And it is necessary to develop technology to exchange the shortage of human resources. It is difficult to reflect the specificity of underground space management in general management methods. This study proposes underground facility for public utilities digital twin system requirements. The concept of space is divided into physical space and virtual space, and the physical space constitutes the type and layout of the sensor that is the basis for the construction of the multimodal image sensor system, and the virtual space constitutes the system architecture. It also suggested system functions according to the task. It will be effective in preventing disasters and maintaining the lifeline function of the city through the digital twins.

A Design of AI Cloud Platform for Safety Management on High-risk Environment (고위험 현장의 안전관리를 위한 AI 클라우드 플랫폼 설계)

  • Ki-Bong, Kim
    • Journal of Advanced Technology Convergence
    • /
    • v.1 no.2
    • /
    • pp.01-09
    • /
    • 2022
  • Recently, safety issues in companies and public institutions are no longer a task that can be postponed, and when a major safety accident occurs, not only direct financial loss, but also indirect loss of social trust in the company and public institution is greatly increased. In particular, in the case of a fatal accident, the damage is even more serious. Accordingly, as companies and public institutions expand their investments in industrial safety education and prevention, open AI learning model creation technology that enables safety management services without being affected by user behavior in industrial sites where high-risk situations exist, edge terminals System development using inter-AI collaboration technology, cloud-edge terminal linkage technology, multi-modal risk situation determination technology, and AI model learning support technology is underway. In particular, with the development and spread of artificial intelligence technology, research to apply the technology to safety issues is becoming active. Therefore, in this paper, an open cloud platform design method that can support AI model learning for high-risk site safety management is presented.

Korean Emotional Speech and Facial Expression Database for Emotional Audio-Visual Speech Generation (대화 영상 생성을 위한 한국어 감정음성 및 얼굴 표정 데이터베이스)

  • Baek, Ji-Young;Kim, Sera;Lee, Seok-Pil
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.71-77
    • /
    • 2022
  • In this paper, a database is collected for extending the speech synthesis model to a model that synthesizes speech according to emotions and generating facial expressions. The database is divided into male and female data, and consists of emotional speech and facial expressions. Two professional actors of different genders speak sentences in Korean. Sentences are divided into four emotions: happiness, sadness, anger, and neutrality. Each actor plays about 3300 sentences per emotion. A total of 26468 sentences collected by filming this are not overlap and contain expression similar to the corresponding emotion. Since building a high-quality database is important for the performance of future research, the database is assessed on emotional category, intensity, and genuineness. In order to find out the accuracy according to the modality of data, the database is divided into audio-video data, audio data, and video data.

Deep Learning-Based Companion Animal Abnormal Behavior Detection Service Using Image and Sensor Data

  • Lee, JI-Hoon;Shin, Min-Chan;Park, Jun-Hee;Moon, Nam-Mee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.1-9
    • /
    • 2022
  • In this paper, we propose the Deep Learning-Based Companion Animal Abnormal Behavior Detection Service, which using video and sensor data. Due to the recent increase in households with companion animals, the pet tech industry with artificial intelligence is growing in the existing food and medical-oriented companion animal market. In this study, companion animal behavior was classified and abnormal behavior was detected based on a deep learning model using various data for health management of companion animals through artificial intelligence. Video data and sensor data of companion animals are collected using CCTV and the manufactured pet wearable device, and used as input data for the model. Image data was processed by combining the YOLO(You Only Look Once) model and DeepLabCut for extracting joint coordinates to detect companion animal objects for behavior classification. Also, in order to process sensor data, GAT(Graph Attention Network), which can identify the correlation and characteristics of each sensor, was used.

Real-time Background Music System for Immersive Dialogue in Metaverse based on Dialogue Emotion (메타버스 대화의 몰입감 증진을 위한 대화 감정 기반 실시간 배경음악 시스템 구현)

  • Kirak Kim;Sangah Lee;Nahyeon Kim;Moonryul Jung
    • Journal of the Korea Computer Graphics Society
    • /
    • v.29 no.4
    • /
    • pp.1-6
    • /
    • 2023
  • To enhance immersive experiences for metaverse environements, background music is often used. However, the background music is mostly pre-matched and repeated which might occur a distractive experience to users as it does not align well with rapidly changing user-interactive contents. Thus, we implemented a system to provide a more immersive metaverse conversation experience by 1) developing a regression neural network that extracts emotions from an utterance using KEMDy20, the Korean multimodal emotion dataset 2) selecting music corresponding to the extracted emotions from an utterance by the DEAM dataset where music is tagged with arousal-valence levels 3) combining it with a virtual space where users can have a real-time conversation with avatars.

Text Mining Analysis of Customer Reviews on Public Service Robots: With a focus on the Guide Robot Cases (텍스트 마이닝을 활용한 공공기관 서비스 로봇에 대한 사용자 리뷰 분석 : 안내로봇 사례를 중심으로)

  • Hyorim Shin;Junho Choi;Changhoon Oh
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.787-797
    • /
    • 2023
  • The use of service robots, particularly guide robots, is becoming increasingly prevalent in public institutions. However, there has been limited research into the interactions between users and guide robots. To explore the customer experience with the guidance robot, we selected 'QI', which has been meeting customers for the longest time, and collected all reviews since the service was launched in public institutions. By using text mining techniques, we identified the main keywords and user experience factors and examined factors that hinder user experience. As a result, the guide robot's functionality, appearance, interaction methods, and role as a cultural commentator and helper were key factors that influenced the user experience. After identifying hindrance factors, we suggested solutions such as improved interaction design, multimodal interface service design, and content development. This study contributes to the understanding of user experience with guide robots and provides practical suggestions for improvement.

Design of Big Semantic System for Factory Energy Management in IoE environments (IoE 환경에서 공장에너지 관리를 위한 빅시맨틱 시스템 설계)

  • Kwon, Soon-Hyun;Lee, Joa-Hyoung;Kim, Seon-Hyeog;Lee, Sang-Keum;Shin, Young-Mee;Doh, Yoon-Mee;Heo, Tae-Wook
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.37-39
    • /
    • 2022
  • 기존 IoE 환경에서 수집데이터는 특정 서비스를 위한 도메인 지식과 연계되어 서비스를 제공한다. 하지만 수집되는 데이터의 유형이 다양하고, 정적인 지식베이스가 상황에 따라 동적으로 변화하는 IoE 환경에서는 기존의 지식베이스 시스템을 통하여 원활한 서비스를 제공할 수 없었다. 따라서, 본 논문에서는 IoE 환경에서 발생하는 대용량/실시간성 데이터를 시맨틱으로 처리하여 공통 도메인 지식베이스와 연계하고 기존의 지식베이스 추론 방법과 기계학습 기반 지식 임베딩 기법을 통하여 지식 증강을 유기적으로 진행하는 빅시맨틱 시스템을 제시한다. 제시한 시스템은 IoE 환경의 멀티모달(정형, 비정형) 데이터를 수집하고 반자동적으로 시맨틱 변환을 수행하여 도메인 지식베이스에 저장하고, 시맨틱 추론을 통해 지식베이스를 증강 시키며 증강된 지식베이스를 포함한 전체 지식베이스를 정형 및 반정형 사용자 쿼리를 통해 지식정보를 사용자에게 제공한다. 또한, 기계학습 기반 지식 임베딩 기법을 통해 학습·예측을 함으로써, 기존의 지식베이스를 증강하는 기능을 수행한다. 본 논문에서 제시한 시스템은 공장내의 에너지 정보를 수집하여 공정 및 설비 상태 및 운영정보를 바탕으로 실시간 제어를 통한 에너지 절감 시스템인 공장 에너지 관리 시스템의 기반 기술로 구현될 예정이다.

Deep Multimodal MRI Fusion Model for Brain Tumor Grading (뇌 종양 등급 분류를 위한 심층 멀티모달 MRI 통합 모델)

  • Na, In-ye;Park, Hyunjin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.416-418
    • /
    • 2022
  • Glioma is a type of brain tumor that occurs in glial cells and is classified into two types: high hrade hlioma with a poor prognosis and low grade glioma. Magnetic resonance imaging (MRI) as a non-invasive method is widely used in glioma diagnosis research. Studies to obtain complementary information by combining multiple modalities to overcome the incomplete information limitation of single modality are being conducted. In this study, we developed a 3D CNN-based model that applied input-level fusion to MRI of four modalities (T1, T1Gd, T2, T2-FLAIR). The trained model showed classification performance of 0.8926 accuracy, 0.9688 sensitivity, 0.6400 specificity, and 0.9467 AUC on the validation data. Through this, it was confirmed that the grade of glioma was effectively classified by learning the internal relationship between various modalities.

  • PDF

Game Platform and System that Synchronize Actual Humanoid Robot with Virtual 3D Character Robot (가상의 3D와 실제 로봇이 동기화하는 시스템 및 플랫폼)

  • Park, Chang-Hyun;Lee, Chang-Jo
    • Journal of Korea Entertainment Industry Association
    • /
    • v.8 no.2
    • /
    • pp.283-297
    • /
    • 2014
  • The future of human life is expected to be innovative by increasing social, economic, political and personal, including all areas of life across the multi-disciplinary skills. Particularly, in the field of robotics and next-generation games with robots, by multidisciplinary contributions and interaction, convergence between technology is expected to accelerate more and more. The purpose of this study is that by new interface model beyond the technical limitations of the "human-robot interface technology," until now and time and spatial constraints and through fusion of various modalities which existing human-robot interface technologies can't have, the research of more reliable and easy free "human-robot interface technology". This is the research of robot game system which develop and utilizing real time synchronization engine linking between biped humanoid robot and the behavior of the position value of mobile device screen's 3D content (contents), robot (virtual robots), the wireless protocol for sending and receiving (Protocol) mutual information and development of a teaching program of "Direct Teaching & Play" by the study for effective teaching.

Multicontents Integrated Image Animation within Synthesis for Hiqh Quality Multimodal Video (고화질 멀티 모달 영상 합성을 통한 다중 콘텐츠 통합 애니메이션 방법)

  • Jae Seung Roh;Jinbeom Kang
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.257-269
    • /
    • 2023
  • There is currently a burgeoning demand for image synthesis from photos and videos using deep learning models. Existing video synthesis models solely extract motion information from the provided video to generate animation effects on photos. However, these synthesis models encounter challenges in achieving accurate lip synchronization with the audio and maintaining the image quality of the synthesized output. To tackle these issues, this paper introduces a novel framework based on an image animation approach. Within this framework, upon receiving a photo, a video, and audio input, it produces an output that not only retains the unique characteristics of the individuals in the photo but also synchronizes their movements with the provided video, achieving lip synchronization with the audio. Furthermore, a super-resolution model is employed to enhance the quality and resolution of the synthesized output.