• Title/Summary/Keyword: multimodal artificial intelligence

Search Result 28, Processing Time 0.024 seconds

A Study on User Experience Factors of Display-Type Artificial Intelligence Speakers through Semantic Network Analysis : Focusing on Online Review Analysis of the Amazon Echo (의미연결망 분석을 통한 디스플레이형 인공지능 스피커의 사용자 경험 요인 연구 : 아마존 에코의 온라인 리뷰 분석을 중심으로)

  • Lee, Jeongmyeong;Kim, Hyesun;Choi, Junho
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.3
    • /
    • pp.9-23
    • /
    • 2019
  • The artificial intelligence speaker market is in a new age of mounting displays. This study aimed to analyze the difference of experience using artificial intelligent speakers in terms of usage context, according to the presence or absence of displays. This was achieved by using semantic network analysis to determine how the online review texts of Amazon Echo Show and Echo Plus consisted of different UX issues with structural differences. Based on the physical context and the social context of the user experience, the ego network was constructed to draw out major issues. Results of the analysis show that users' expectation gap is generated according to the display presence, which can lead to negative experiences. Also, it was confirmed that the Multimodal interface is more utilized in the kitchen than in the bedroom, and can contribute to the activation of communication among family members. Based on these findings, we propose a user experience strategy to be considered in display type speakers to be launched in Korea in the future.

Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arXiv (챗GPT 등장 이후 인공지능 환각 연구의 문헌 검토: 아카이브(arXiv)의 논문을 중심으로)

  • Park, Dae-Min;Lee, Han-Jong
    • Informatization Policy
    • /
    • v.31 no.2
    • /
    • pp.3-38
    • /
    • 2024
  • Hallucination is a significant barrier to the utilization of large-scale language models or multimodal models. In this study, we collected 654 computer science papers with "hallucination" in the abstract from arXiv from December 2022 to January 2024 following the advent of Chat GPT and conducted frequency analysis, knowledge network analysis, and literature review to explore the latest trends in hallucination research. The results showed that research in the fields of "Computation and Language," "Artificial Intelligence," "Computer Vision and Pattern Recognition," and "Machine Learning" were active. We then analyzed the research trends in the four major fields by focusing on the main authors and dividing them into data, hallucination detection, and hallucination mitigation. The main research trends included hallucination mitigation through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), inference enhancement via "chain of thought" (CoT), and growing interest in hallucination mitigation within the domain of multimodal AI. This study provides insights into the latest developments in hallucination research through a technology-oriented literature review. This study is expected to help subsequent research in both engineering and humanities and social sciences fields by understanding the latest trends in hallucination research.

Trends in Disaster Environment Multimodal Sensing Platforms (재난환경 멀티모달 센싱 플랫폼 기술 동향)

  • S.M. Park;P.J. Park;K.H. Park;B.T. Koo
    • Electronics and Telecommunications Trends
    • /
    • v.39 no.5
    • /
    • pp.31-39
    • /
    • 2024
  • For a quick and accurate response at a disaster site, technological solutions are essential to overcome limited visual information, secure environmental information, and identify victim locations. Research on artificial-intelligence-based semiconductors is being actively conducted to address existing challenges. In fact, new technologies combining various sensor signals are required to provide accurate and timely information at disaster sites. We examine existing disaster environment multimodal sensing technologies and discuss the status of disaster risk detection and monitoring technologies. Additionally, we present current problems and future directions of development.

Efficient Emotion Classification Method Based on Multimodal Approach Using Limited Speech and Text Data (적은 양의 음성 및 텍스트 데이터를 활용한 멀티 모달 기반의 효율적인 감정 분류 기법)

  • Mirr Shin;Youhyun Shin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.174-180
    • /
    • 2024
  • In this paper, we explore an emotion classification method through multimodal learning utilizing wav2vec 2.0 and KcELECTRA models. It is known that multimodal learning, which leverages both speech and text data, can significantly enhance emotion classification performance compared to methods that solely rely on speech data. Our study conducts a comparative analysis of BERT and its derivative models, known for their superior performance in the field of natural language processing, to select the optimal model for effective feature extraction from text data for use as the text processing model. The results confirm that the KcELECTRA model exhibits outstanding performance in emotion classification tasks. Furthermore, experiments using datasets made available by AI-Hub demonstrate that the inclusion of text data enables achieving superior performance with less data than when using speech data alone. The experiments show that the use of the KcELECTRA model achieved the highest accuracy of 96.57%. This indicates that multimodal learning can offer meaningful performance improvements in complex natural language processing tasks such as emotion classification.

Review of Lung Cancer Survival Analysis with Multimodal Data (다중 모드 데이터를 사용한 폐암 생존분석 검토)

  • Choi, Chul-woong;Kim, Hyeon-Ji;Shim, Eun-Seok;Im, A-yeon;Lee, Yun-Jun;Jeong, Seon-Ju;Kim, Kyung-baek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.784-787
    • /
    • 2020
  • 폐암 환자의 생존율을 예측할 때 미국암연합회(AJCC)의 TNM병기 분류체계에 의해 진단되는 최종병기를 많이 사용한다. 최종병기는 폐암환자의 임상데이터 중 하나로 종양의 위치, 크기, 전이정도를 고려하여 환자의 폐암 상태를 판별하는 정보이다. 최종병기는 개략적인 환자의 상황을 설명하는 데 효과적이지만, 보다 구체적인 생존분석을 위해서는 임상데이터 뿐만 아니라 PET/CT와 같은 영상 데이터를 함께 분석해야 한다. 이 논문에서는 데이터 과학적 접근을 통해 폐암환자의 임상데이터, CT영상과 PET영상 등 다양한 종류의 데이터를 함께 활용하는 생존분석기법을 검토한다. 실험을 통해 다중 모드 데이터를 활용하는 생존분석을 위해 비선형모델 개발과 Feature임베딩 기법 고도화가 필요함을 확인하였다.

Artificial Intelligence Plant Doctor: Plant Disease Diagnosis Using GPT4-vision

  • Yoeguang Hue;Jea Hyeoung Kim;Gang Lee;Byungheon Choi;Hyun Sim;Jongbum Jeon;Mun-Il Ahn;Yong Kyu Han;Ki-Tae Kim
    • Research in Plant Disease
    • /
    • v.30 no.1
    • /
    • pp.99-102
    • /
    • 2024
  • Integrated pest management is essential for controlling plant diseases that reduce crop yields. Rapid diagnosis is crucial for effective management in the event of an outbreak to identify the cause and minimize damage. Diagnosis methods range from indirect visual observation, which can be subjective and inaccurate, to machine learning and deep learning predictions that may suffer from biased data. Direct molecular-based methods, while accurate, are complex and time-consuming. However, the development of large multimodal models, like GPT-4, combines image recognition with natural language processing for more accurate diagnostic information. This study introduces GPT-4-based system for diagnosing plant diseases utilizing a detailed knowledge base with 1,420 host plants, 2,462 pathogens, and 37,467 pesticide instances from the official plant disease and pesticide registries of Korea. The AI plant doctor offers interactive advice on diagnosis, control methods, and pesticide use for diseases in Korea and is accessible at https://pdoc.scnu.ac.kr/.

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

  • Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.22-34
    • /
    • 2024
  • Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

Analysis of AI Model Hub

  • Yo-Seob Lee
    • International Journal of Advanced Culture Technology
    • /
    • v.11 no.4
    • /
    • pp.442-448
    • /
    • 2023
  • Artificial Intelligence (AI) technology has recently grown explosively and is being used in a variety of application fields. Accordingly, the number of AI models is rapidly increasing. AI models are adapted and developed to fit a variety of data types, tasks, and environments, and the variety and volume of models continues to grow. The need to share models and collaborate within the AI community is becoming increasingly important. Collaboration is essential for AI models to be shared and improved publicly and used in a variety of applications. Therefore, with the advancement of AI, the introduction of Model Hub has become more important, improving the sharing, reuse, and collaboration of AI models and increasing the utilization of AI technology. In this paper, we collect data on the model hub and analyze the characteristics of the model hub and the AI models provided. The results of this research can be of great help in developing various multimodal AI models in the future, utilizing AI models in various fields, and building services by fusing various AI models.

Deep Learning-Based Companion Animal Abnormal Behavior Detection Service Using Image and Sensor Data

  • Lee, JI-Hoon;Shin, Min-Chan;Park, Jun-Hee;Moon, Nam-Mee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.1-9
    • /
    • 2022
  • In this paper, we propose the Deep Learning-Based Companion Animal Abnormal Behavior Detection Service, which using video and sensor data. Due to the recent increase in households with companion animals, the pet tech industry with artificial intelligence is growing in the existing food and medical-oriented companion animal market. In this study, companion animal behavior was classified and abnormal behavior was detected based on a deep learning model using various data for health management of companion animals through artificial intelligence. Video data and sensor data of companion animals are collected using CCTV and the manufactured pet wearable device, and used as input data for the model. Image data was processed by combining the YOLO(You Only Look Once) model and DeepLabCut for extracting joint coordinates to detect companion animal objects for behavior classification. Also, in order to process sensor data, GAT(Graph Attention Network), which can identify the correlation and characteristics of each sensor, was used.