• 제목/요약/키워드: Deep Visual Features

검색결과 64건 처리시간 0.024초

Multimodal Context Embedding for Scene Graph Generation

  • Jung, Gayoung;Kim, Incheol
    • Journal of Information Processing Systems
    • /
    • 제16권6호
    • /
    • pp.1250-1260
    • /
    • 2020
  • This study proposes a novel deep neural network model that can accurately detect objects and their relationships in an image and represent them as a scene graph. The proposed model utilizes several multimodal features, including linguistic features and visual context features, to accurately detect objects and relationships. In addition, in the proposed model, context features are embedded using graph neural networks to depict the dependencies between two related objects in the context feature vector. This study demonstrates the effectiveness of the proposed model through comparative experiments using the Visual Genome benchmark dataset.

Video Captioning with Visual and Semantic Features

  • Lee, Sujin;Kim, Incheol
    • Journal of Information Processing Systems
    • /
    • 제14권6호
    • /
    • pp.1318-1330
    • /
    • 2018
  • Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).

Similar Image Retrieval Technique based on Semantics through Automatic Labeling Extraction of Personalized Images

  • Jung-Hee, Seo
    • Journal of information and communication convergence engineering
    • /
    • 제22권1호
    • /
    • pp.56-63
    • /
    • 2024
  • Despite the rapid strides in content-based image retrieval, a notable disparity persists between the visual features of images and the semantic features discerned by humans. Hence, image retrieval based on the association of semantic similarities recognized by humans with visual similarities is a difficult task for most image-retrieval systems. Our study endeavors to bridge this gap by refining image semantics, aligning them more closely with human perception. Deep learning techniques are used to semantically classify images and retrieve those that are semantically similar to personalized images. Moreover, we introduce a keyword-based image retrieval, enabling automatic labeling of images in mobile environments. The proposed approach can improve the performance of a mobile device with limited resources and bandwidth by performing retrieval based on the visual features and keywords of the image on the mobile device.

부뇌량팽대 동정맥 기형의 수술에서 시야의 보존 - 증례보고 - (Surgery of Parasplenial Arteriovenous Malformation with Preservation of Vision - A Case Report -)

  • 주진양;안정용
    • Journal of Korean Neurosurgical Society
    • /
    • 제29권6호
    • /
    • pp.815-821
    • /
    • 2000
  • Parasplenial arteriovenous malformations(AVMs) are rare vascular malformations which have distinct clinical and anatomical features. They are situated at the confluence of the hippocampus, isthmus of the cingulate gyrus and the gyrus occipitotemporalis medialis. These lesions are anterior to the calcarine sulcus and their apex extends towards the medial surface of the trigonum. Posterolaterally, these lesions are in close proximity to the visual cortex and optic radiation. The objectives in the surgery of parasplenial AVMs are complete resection of the lesions and preservation of vision. These objectives must be achieved with comprehensive understanding of the following anatomical features :1) the deep central location of the lesions within eloquent brain tissue ; 2) the lack of cortical representation of the AVMs that requires retraction of visual cortex ; 3) deep arterial supply ; 4) deep venous drainage ; 5) juxtaposition to the choroid plexus with which arterial supply and venous drainage are shared. A 16-year-old female student presented with intraventricular hemorrhage from a right parasplenial-subtrigonal AVM. The lesion, fed by posterior cerebral artery and drained into the vein of Galen, was successfully treated by the inter-hemispheric parietooccipital approach. To avoid visual field defect a small incision was made on precuneus anterior to the calcarine sulcus. In this report, the authors describe a surgical approach with special consideration on preservation of visual field.

  • PDF

A Deep Learning-Based Rate Control for HEVC Intra Coding

  • Marzuki, Ismail;Sim, Donggyu
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송∙미디어공학회 2019년도 추계학술대회
    • /
    • pp.180-181
    • /
    • 2019
  • This paper proposes a rate control algorithm for intra coding frame in HEVC encoder using a deep learning approach. The proposed algorithm is designed for CTU level bit allocation in intra frame by considering visual features spatially and temporally. Our features are generated using visual geometry group (VGG-16) with deep convolutional layers, then it is used for bit allocation per each CTU within an intra frame. According to our experiments, the proposed algorithm can achieve -2.04% Luma component BD-rate gain with minimal bit accuracy loss against the HM-16.20 rate control model.

  • PDF

비주얼 서보잉을 위한 딥러닝 기반 물체 인식 및 자세 추정 (Object Recognition and Pose Estimation Based on Deep Learning for Visual Servoing)

  • 조재민;강상승;김계경
    • 로봇학회논문지
    • /
    • 제14권1호
    • /
    • pp.1-7
    • /
    • 2019
  • Recently, smart factories have attracted much attention as a result of the 4th Industrial Revolution. Existing factory automation technologies are generally designed for simple repetition without using vision sensors. Even small object assemblies are still dependent on manual work. To satisfy the needs for replacing the existing system with new technology such as bin picking and visual servoing, precision and real-time application should be core. Therefore in our work we focused on the core elements by using deep learning algorithm to detect and classify the target object for real-time and analyzing the object features. We chose YOLO CNN which is capable of real-time working and combining the two tasks as mentioned above though there are lots of good deep learning algorithms such as Mask R-CNN and Fast R-CNN. Then through the line and inside features extracted from target object, we can obtain final outline and estimate object posture.

딥러닝 기반 객체 인식 기술 동향 (Trends on Object Detection Techniques Based on Deep Learning)

  • 이진수;이상광;김대욱;홍승진;양성일
    • 전자통신동향분석
    • /
    • 제33권4호
    • /
    • pp.23-32
    • /
    • 2018
  • Object detection is a challenging field in the visual understanding research area, detecting objects in visual scenes, and the location of such objects. It has recently been applied in various fields such as autonomous driving, image surveillance, and face recognition. In traditional methods of object detection, handcrafted features have been designed for overcoming various visual environments; however, they have a trade-off issue between accuracy and computational efficiency. Deep learning is a revolutionary paradigm in the machine-learning field. In addition, because deep-learning-based methods, particularly convolutional neural networks (CNNs), have outperformed conventional methods in terms of object detection, they have been studied in recent years. In this article, we provide a brief descriptive summary of several recent deep-learning methods for object detection and deep learning architectures. We also compare the performance of these methods and present a research guide of the object detection field.

로봇시스템에서 작은 마커 인식을 하기 위한 사물 감지 어텐션 모델 (Small Marker Detection with Attention Model in Robotic Applications)

  • 김민재;문형필
    • 로봇학회논문지
    • /
    • 제17권4호
    • /
    • pp.425-430
    • /
    • 2022
  • As robots are considered one of the mainstream digital transformations, robots with machine vision becomes a main area of study providing the ability to check what robots watch and make decisions based on it. However, it is difficult to find a small object in the image mainly due to the flaw of the most of visual recognition networks. Because visual recognition networks are mostly convolution neural network which usually consider local features. So, we make a model considering not only local feature, but also global feature. In this paper, we propose a detection method of a small marker on the object using deep learning and an algorithm that considers global features by combining Transformer's self-attention technique with a convolutional neural network. We suggest a self-attention model with new definition of Query, Key and Value for model to learn global feature and simplified equation by getting rid of position vector and classification token which cause the model to be heavy and slow. Finally, we show that our model achieves higher mAP than state of the art model YOLOr.

Video Saliency Detection Using Bi-directional LSTM

  • Chi, Yang;Li, Jinjiang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권6호
    • /
    • pp.2444-2463
    • /
    • 2020
  • Significant detection of video can more rationally allocate computing resources and reduce the amount of computation to improve accuracy. Deep learning can extract the edge features of the image, providing technical support for video saliency. This paper proposes a new detection method. We combine the Convolutional Neural Network (CNN) and the Deep Bidirectional LSTM Network (DB-LSTM) to learn the spatio-temporal features by exploring the object motion information and object motion information to generate video. A continuous frame of significant images. We also analyzed the sample database and found that human attention and significant conversion are time-dependent, so we also considered the significance detection of video cross-frame. Finally, experiments show that our method is superior to other advanced methods.

시각 정보를 활용한 딥러닝 기반 추천 시스템 (A Deep Learning Based Recommender System Using Visual Information)

  • 문현실;임진혁;김도연;조윤호
    • 지식경영연구
    • /
    • 제21권3호
    • /
    • pp.27-44
    • /
    • 2020
  • 사용자의 정보 과부하 문제의 해결을 목표로 하는 추천 시스템은 개인의 선호를 추론하여 이에 부합하는 아이템을 필터링하여 제공한다. 추천 시스템 관련 기법 중 가장 성공적으로 알려져 있는 협업 필터링은 최근까지 다양한 성능 개선 시도가 이루어지고 있으며 여러 분야에 적용되고 있다. 본 연구에서는 이와 같은 협업 필터링의 성공에 기반하여 소비자의 구매 의사결정에 영향을 미칠 수 있는 시각 정보를 추천 시스템에 반영할 수 있는 VizNCS를 제안한다. 이를 위하여 먼저, 비정형 데이터인 시각 정보에서 특징을 추출하기 위해 합성곱 신경망을 사용하였다. 다음으로, 합성곱 신경망으로 부터 도출된 이미지 특성 정보를 추천 시스템에 반영하기 위하여 기존의 딥러닝 기반의 추천 시스템 중 다른 정보로 확장이 용이한 NCF 기법을 응용하였다. 본 연구에서 제안한 VizNCS의 성능 비교 실험 결과 기본 NCF보다 더 높은 성능을 보였으며 카테고리별 성능 비교 실험을 통해 시각 정보에 영향을 받는 카테고리와 그렇지 않은 카테고리를 발견하였다. 결론적으로 본 연구에서 제안한 VizNCS는 시각정보를 개인화된 추천에 직접 활용함에 따라 시각 정보에 영향을 받는 소비자들의 구매의사결정 행태를 반영할 수 있어 추천 시스템 성능 향상에 기여하였다. 또한, 지금까지 활용이 미미했던 이미지 데이터로 추천 시스템의 원천 데이터 영역을 확장함에 따라 다양한 원천 데이터의 활용 방안을 제시하였다.