• Title/Summary/Keyword: AI 영상인식

Search Result 100, Processing Time 0.023 seconds

Large-scale Language-image Model-based Bag-of-Objects Extraction for Visual Place Recognition (영상 기반 위치 인식을 위한 대규모 언어-이미지 모델 기반의 Bag-of-Objects 표현)

  • Seung Won Jung;Byungjae Park
    • Journal of Sensor Science and Technology
    • /
    • v.33 no.2
    • /
    • pp.78-85
    • /
    • 2024
  • We proposed a method for visual place recognition that represents images using objects as visual words. Visual words represent the various objects present in urban environments. To detect various objects within the images, we implemented and used a zero-shot detector based on a large-scale image language model. This zero-shot detector enables the detection of various objects in urban environments without additional training. In the process of creating histograms using the proposed method, frequency-based weighting was applied to consider the importance of each object. Through experiments with open datasets, the potential of the proposed method was demonstrated by comparing it with another method, even in situations involving environmental or viewpoint changes.

Ship Detection Using Background Estimation of Video and AIS Informations (영상의 배경추정기법과 AIS정보를 이용한 선박검출)

  • Kim, Hyun-Tae;Park, Jang-Sik;Yu, Yun-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.12
    • /
    • pp.2636-2641
    • /
    • 2010
  • To support anti-collision between ship to ship and sea-search and sea-rescue work, ship automatic identification system(AIS) that can both send and receive messages between ship and VTS Traffic control have been adopted. And port control system can control traffic vessel service which is co-operated with AIS. For more efficient traffic vessel service, ship recognition and display system is required to cooperated with AIS. In this paper, we propose ship detection system which is co-operated with AIS by using background estimation based on image processing for on the sea or harbor image extracted from camera. We experiment with on the sea or harbor image extracted from real-time input image from camera. By computer simulation and real world test, the proposed system show more effective to ship monitoring.

Analysis Method of influence of input for Image recognition result of machine learning (기계습의 영상인식결과에 대한 입력영상의 영향도 분석 기법)

  • Kim, Do-Wan;Kim, Woo-seong;Lee, Eun-hun;Kim, Hyeoncheol
    • Proceedings of The KACE
    • /
    • 2017.08a
    • /
    • pp.209-211
    • /
    • 2017
  • 기계학습은 인공지능(AI, Artificial Intelligence)의 일종으로 다른 인공지능 알고리즘이 정해진 규칙을 기반으로 주어진 임무(Task)를 해결하는 것과는 달리, 기계학습은 수집된 Data를 기반으로 최적의 솔루션을 학습한 후 미래의 값들을 예측하거나 해석하는 방법을 사용하고 있다. 더욱이 인터넷을 통한 연결성의 확대와 컴퓨터의 연산능력 발전으로 가능하게 된 Big-Data를 기반으로 하고 있어 이전의 인공지능 알고리즘에 비해 월등한 성능을 보여주고 있다. 그러나 기계학습 알고리즘이 Data를 학습할 때 학습 결과를 사람이 해석하기에 너무 복잡하여 사람이 그 내부 구조를 이해하는 것은 사실상 불가능하고, 이에 따라 학습된 기계학습 모델의 단점 또는 한계 등을 알지 못하는 문제가 있다. 본 연구에서는 이러한 블랙박스화된 기계학습 알고리즘의 특성을 이해하기 위해, 기계학습 알고리즘이 특정 입력에 대한 결과를 예측할 때 어떤 입력들로 부터 영향을 많이 받는지 그리고 어떤 입력으로부터 영향을 적게 받는지를 알아보는 방법을 소개하고 기존 연구의 단점을 개선하기 위한 방법을 제시한다.

  • PDF

Implementation of Facility Movement Recognition Accuracy Analysis and Utilization Service using Drone Image (드론 영상 활용 시설물 이동 인식 정확도 분석 및 활용 서비스 구현)

  • Kim, Gwang-Seok;Oh, Ah-Ra;Choi, Yun-Soo
    • Journal of the Korean Institute of Gas
    • /
    • v.25 no.5
    • /
    • pp.88-96
    • /
    • 2021
  • Advanced Internet of Things (IoT) technology is being used in various ways for the safety of the energy industry. At the center of safety measures, drones play various roles on behalf of humans. Drones are playing a role in reaching places that are difficult to reach due to large-scale facilities and space restrictions that are difficult for humans to inspect. In this study, the accuracy and completeness of movement of dangerous facilities were tested using drone images, and it was confirmed that the movement recognition accuracy was 100%, the average data analysis accuracy was 95.8699%, and the average completeness was 100%. Based on the experimental results, a future-oriented facility risk analysis system combined with ICT technology was implemented and presented. Additional experiments with diversified conditions are required in the future, and ICT convergence analysis system implementation is required.

Video Compression Standard Prediction using Attention-based Bidirectional LSTM (어텐션 알고리듬 기반 양방향성 LSTM을 이용한 동영상의 압축 표준 예측)

  • Kim, Sangmin;Park, Bumjun;Jeong, Jechang
    • Journal of Broadcast Engineering
    • /
    • v.24 no.5
    • /
    • pp.870-878
    • /
    • 2019
  • In this paper, we propose an Attention-based BLSTM for predicting the video compression standard of a video. Recently, in NLP, many researches have been studied to predict the next word of sentences, classify and translate sentences by their semantics using the structure of RNN, and they were commercialized as chatbots, AI speakers and translator applications, etc. LSTM is designed to solve the gradient vanishing problem in RNN, and is used in NLP. The proposed algorithm makes video compression standard prediction possible by applying BLSTM and Attention algorithm which focuses on the most important word in a sentence to a bitstream of a video, not an sentence of a natural language.

Generating Extreme Close-up Shot Dataset Based On ROI Detection For Classifying Shots Using Artificial Neural Network (인공신경망을 이용한 샷 사이즈 분류를 위한 ROI 탐지 기반의 익스트림 클로즈업 샷 데이터 셋 생성)

  • Kang, Dongwann;Lim, Yang-mi
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.983-991
    • /
    • 2019
  • This study aims to analyze movies which contain various stories according to the size of their shots. To achieve this, it is needed to classify dataset according to the shot size, such as extreme close-up shots, close-up shots, medium shots, full shots, and long shots. However, a typical video storytelling is mainly composed of close-up shots, medium shots, full shots, and long shots, it is not an easy task to construct an appropriate dataset for extreme close-up shots. To solve this, we propose an image cropping method based on the region of interest (ROI) detection. In this paper, we use the face detection and saliency detection to estimate the ROI. By cropping the ROI of close-up images, we generate extreme close-up images. The dataset which is enriched by proposed method is utilized to construct a model for classifying shots based on its size. The study can help to analyze the emotional changes of characters in video stories and to predict how the composition of the story changes over time. If AI is used more actively in the future in entertainment fields, it is expected to affect the automatic adjustment and creation of characters, dialogue, and image editing.

Digital Library Interface Research Based on EEG, Eye-Tracking, and Artificial Intelligence Technologies: Focusing on the Utilization of Implicit Relevance Feedback (뇌파, 시선추적 및 인공지능 기술에 기반한 디지털 도서관 인터페이스 연구: 암묵적 적합성 피드백 활용을 중심으로)

  • Hyun-Hee Kim;Yong-Ho Kim
    • Journal of the Korean Society for information Management
    • /
    • v.41 no.1
    • /
    • pp.261-282
    • /
    • 2024
  • This study proposed and evaluated electroencephalography (EEG)-based and eye-tracking-based methods to determine relevance by utilizing users' implicit relevance feedback while navigating content in a digital library. For this, EEG/eye-tracking experiments were conducted on 32 participants using video, image, and text data. To assess the usefulness of the proposed methods, deep learning-based artificial intelligence (AI) techniques were used as a competitive benchmark. The evaluation results showed that EEG component-based methods (av_P600 and f_P3b components) demonstrated high classification accuracy in selecting relevant videos and images (faces/emotions). In contrast, AI-based methods, specifically object recognition and natural language processing, showed high classification accuracy for selecting images (objects) and texts (newspaper articles). Finally, guidelines for implementing a digital library interface based on EEG, eye-tracking, and artificial intelligence technologies have been proposed. Specifically, a system model based on implicit relevance feedback has been presented. Moreover, to enhance classification accuracy, methods suitable for each media type have been suggested, including EEG-based, eye-tracking-based, and AI-based approaches.

An Effectiveness Verification for Evaluating the Amount of WTCI Tongue Coating Using Deep Learning (딥러닝을 이용한 WTCI 설태량 평가를 위한 유효성 검증)

  • Lee, Woo-Beom
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.20 no.4
    • /
    • pp.226-231
    • /
    • 2019
  • A WTCI is an important criteria for evaluating an mount of patient's tongue coating in tongue diagnosis. However, Previous WTCI tongue coating evaluation methods is a most of quantitatively measuring ration of the extracted tongue coating region and tongue body region, which has a non-objective measurement problem occurring by exposure conditions of tongue image or the recognition performance of tongue coating. Therefore, a WTCI based on deep learning is proposed for classifying an amount of tonger coating in this paper. This is applying the AI deep learning method using big data. to WTCI for evaluating an amount of tonger coating. In order to verify the effectiveness performance of the deep learning in tongue coating evaluating method, we classify the 3 types class(no coating, some coating, intense coating) of an amount of tongue coating by using CNN model. As a results by testing a building the tongue coating sample images for learning and verification of CNN model, proposed method is showed 96.7% with respect to the accuracy of classifying an amount of tongue coating.

Implementation of Hair Style Recommendation System Based on Big data and Deepfakes (빅데이터와 딥페이크 기반의 헤어스타일 추천 시스템 구현)

  • Tae-Kook Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.3
    • /
    • pp.13-19
    • /
    • 2023
  • In this paper, we investigated the implementation of a hairstyle recommendation system based on big data and deepfake technology. The proposed hairstyle recommendation system recognizes the facial shapes based on the user's photo (image). Facial shapes are classified into oval, round, and square shapes, and hairstyles that suit each facial shape are synthesized using deepfake technology and provided as videos. Hairstyles are recommended based on big data by applying the latest trends and styles that suit the facial shape. With the image segmentation map and the Motion Supervised Co-Part Segmentation algorithm, it is possible to synthesize elements between images belonging to the same category (such as hair, face, etc.). Next, the synthesized image with the hairstyle and a pre-defined video are applied to the Motion Representations for Articulated Animation algorithm to generate a video animation. The proposed system is expected to be used in various aspects of the beauty industry, including virtual fitting and other related areas. In future research, we plan to study the development of a smart mirror that recommends hairstyles and incorporates features such as Internet of Things (IoT) functionality.

Method of Automatically Generating Metadata through Audio Analysis of Video Content (영상 콘텐츠의 오디오 분석을 통한 메타데이터 자동 생성 방법)

  • Sung-Jung Young;Hyo-Gyeong Park;Yeon-Hwi You;Il-Young Moon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.557-561
    • /
    • 2021
  • A meatadata has become an essential element in order to recommend video content to users. However, it is passively generated by video content providers. In the paper, a method for automatically generating metadata was studied in the existing manual metadata input method. In addition to the method of extracting emotion tags in the previous study, a study was conducted on a method for automatically generating metadata for genre and country of production through movie audio. The genre was extracted from the audio spectrogram using the ResNet34 artificial neural network model, a transfer learning model, and the language of the speaker in the movie was detected through speech recognition. Through this, it was possible to confirm the possibility of automatically generating metadata through artificial intelligence.