• Title/Summary/Keyword: visual/object search

Search Result 42, Processing Time 0.025 seconds

Structurally Enhanced Correlation Tracking

  • Parate, Mayur Rajaram;Bhurchandi, Kishor M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.4929-4947
    • /
    • 2017
  • In visual object tracking, Correlation Filter-based Tracking (CFT) systems have arouse recently to be the most accurate and efficient methods. The CFT's circularly shifts the larger search window to find most likely position of the target. The need of larger search window to cover both background and object make an algorithm sensitive to the background and the target occlusions. Further, the use of fixed-sized windows for training makes them incapable to handle scale variations during tracking. To address these problems, we propose two layer target representation in which both global and local appearances of the target is considered. Multiple local patches in the local layer provide robustness to the background changes and the target occlusion. The target representation is enhanced by employing additional reversed RGB channels to prevent the loss of black objects in background during tracking. The final target position is obtained by the adaptive weighted average of confidence maps from global and local layers. Furthermore, the target scale variation in tracking is handled by the statistical model, which is governed by adaptive constraints to ensure reliability and accuracy in scale estimation. The proposed structural enhancement is tested on VTBv1.0 benchmark for its accuracy and robustness.

Moving Objects Modeling for Supporting Content and Similarity Searches (내용 및 유사도 검색을 위한 움직임 객체 모델링)

  • 복경수;김미희;신재룡;유재수;조기형
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.5
    • /
    • pp.617-632
    • /
    • 2004
  • Video Data includes moving objects which change spatial positions as time goes by. In this paper, we propose a new modeling method for a moving object contained in the video data. In order to effectively retrieve moving objects, the proposed modeling method represents the spatial position and the size of a moving object. It also represents the visual features and the trajectory by considering direction, distance and speed or moving objects as time goes by. Therefore, It allows various types of retrieval such as visual feature based similarity retrieval, distance based similarity retrieval and trajectory based similarity retrieval and their mixed type of weighted retrieval.

  • PDF

Multi-level Cross-attention Siamese Network For Visual Object Tracking

  • Zhang, Jianwei;Wang, Jingchao;Zhang, Huanlong;Miao, Mengen;Cai, Zengyu;Chen, Fuguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3976-3990
    • /
    • 2022
  • Currently, cross-attention is widely used in Siamese trackers to replace traditional correlation operations for feature fusion between template and search region. The former can establish a similar relationship between the target and the search region better than the latter for robust visual object tracking. But existing trackers using cross-attention only focus on rich semantic information of high-level features, while ignoring the appearance information contained in low-level features, which makes trackers vulnerable to interference from similar objects. In this paper, we propose a Multi-level Cross-attention Siamese network(MCSiam) to aggregate the semantic information and appearance information at the same time. Specifically, a multi-level cross-attention module is designed to fuse the multi-layer features extracted from the backbone, which integrate different levels of the template and search region features, so that the rich appearance information and semantic information can be used to carry out the tracking task simultaneously. In addition, before cross-attention, a target-aware module is introduced to enhance the target feature and alleviate interference, which makes the multi-level cross-attention module more efficient to fuse the information of the target and the search region. We test the MCSiam on four tracking benchmarks and the result show that the proposed tracker achieves comparable performance to the state-of-the-art trackers.

Visual Tracking Algorithm Using the Active Bar Models (능동 보모델을 이용한 영상추적 알고리즘)

  • 이진우;이재웅;박광일
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.19 no.5
    • /
    • pp.1220-1228
    • /
    • 1995
  • In this paper, we consider the problems of tracking an object in a real image. In evaluating these problems, we explore a new technique based on an active contour model commonly called a snake model, and propose the active bar models to represent target. Using this model, we simplified the target welection problems, reduced the search space of energy surface, and obtained the better performances than those of snake model. This approach improves the numerical stability and the tendency for points to bunch up and speed up the computational efficiency. Representing the object by active bar, we can easily obtain the zeroth, the first, and the second moment and it facilitates the target tracking. Finally, we present the good result for the visual tracking problem.

Bottleneck-based Siam-CNN Algorithm for Object Tracking (객체 추적을 위한 보틀넥 기반 Siam-CNN 알고리즘)

  • Lim, Su-Chang;Kim, Jong-Chan
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.1
    • /
    • pp.72-81
    • /
    • 2022
  • Visual Object Tracking is known as the most fundamental problem in the field of computer vision. Object tracking localize the region of target object with bounding box in the video. In this paper, a custom CNN is created to extract object feature that has strong and various information. This network was constructed as a Siamese network for use as a feature extractor. The input images are passed convolution block composed of a bottleneck layers, and features are emphasized. The feature map of the target object and the search area, extracted from the Siamese network, was input as a local proposal network. Estimate the object area using the feature map. The performance of the tracking algorithm was evaluated using the OTB2013 dataset. Success Plot and Precision Plot were used as evaluation matrix. As a result of the experiment, 0.611 in Success Plot and 0.831 in Precision Plot were achieved.

Task Planning Algorithm with Graph-based State Representation (그래프 기반 상태 표현을 활용한 작업 계획 알고리즘 개발)

  • Seongwan Byeon;Yoonseon Oh
    • The Journal of Korea Robotics Society
    • /
    • v.19 no.2
    • /
    • pp.196-202
    • /
    • 2024
  • The ability to understand given environments and plan a sequence of actions leading to goal state is crucial for personal service robots. With recent advancements in deep learning, numerous studies have proposed methods for state representation in planning. However, previous works lack explicit information about relationships between objects when the state observation is converted to a single visual embedding containing all state information. In this paper, we introduce graph-based state representation that incorporates both object and relationship features. To leverage these advantages in addressing the task planning problem, we propose a Graph Neural Network (GNN)-based subgoal prediction model. This model can extract rich information about object and their interconnected relationships from given state graph. Moreover, a search-based algorithm is integrated with pre-trained subgoal prediction model and state transition module to explore diverse states and find proper sequence of subgoals. The proposed method is trained with synthetic task dataset collected in simulation environment, demonstrating a higher success rate with fewer additional searches compared to baseline methods.

Novel Intent based Dimension Reduction and Visual Features Semi-Supervised Learning for Automatic Visual Media Retrieval

  • kunisetti, Subramanyam;Ravichandran, Suban
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.230-240
    • /
    • 2022
  • Sharing of online videos via internet is an emerging and important concept in different types of applications like surveillance and video mobile search in different web related applications. So there is need to manage personalized web video retrieval system necessary to explore relevant videos and it helps to peoples who are searching for efficient video relates to specific big data content. To evaluate this process, attributes/features with reduction of dimensionality are computed from videos to explore discriminative aspects of scene in video based on shape, histogram, and texture, annotation of object, co-ordination, color and contour data. Dimensionality reduction is mainly depends on extraction of feature and selection of feature in multi labeled data retrieval from multimedia related data. Many of the researchers are implemented different techniques/approaches to reduce dimensionality based on visual features of video data. But all the techniques have disadvantages and advantages in reduction of dimensionality with advanced features in video retrieval. In this research, we present a Novel Intent based Dimension Reduction Semi-Supervised Learning Approach (NIDRSLA) that examine the reduction of dimensionality with explore exact and fast video retrieval based on different visual features. For dimensionality reduction, NIDRSLA learns the matrix of projection by increasing the dependence between enlarged data and projected space features. Proposed approach also addressed the aforementioned issue (i.e. Segmentation of video with frame selection using low level features and high level features) with efficient object annotation for video representation. Experiments performed on synthetic data set, it demonstrate the efficiency of proposed approach with traditional state-of-the-art video retrieval methodologies.

Real-Time Comprehensive Assistance for Visually Impaired Navigation

  • Amal Al-Shahrani;Amjad Alghamdi;Areej Alqurashi;Raghad Alzahrani;Nuha imam
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.5
    • /
    • pp.1-10
    • /
    • 2024
  • Individuals with visual impairments face numerous challenges in their daily lives, with navigating streets and public spaces being particularly daunting. The inability to identify safe crossing locations and assess the feasibility of crossing significantly restricts their mobility and independence. Globally, an estimated 285 million people suffer from visual impairment, with 39 million categorized as blind and 246 million as visually impaired, according to the World Health Organization. In Saudi Arabia alone, there are approximately 159 thousand blind individuals, as per unofficial statistics. The profound impact of visual impairments on daily activities underscores the urgent need for solutions to improve mobility and enhance safety. This study aims to address this pressing issue by leveraging computer vision and deep learning techniques to enhance object detection capabilities. Two models were trained to detect objects: one focused on street crossing obstacles, and the other aimed to search for objects. The first model was trained on a dataset comprising 5283 images of road obstacles and traffic signals, annotated to create a labeled dataset. Subsequently, it was trained using the YOLOv8 and YOLOv5 models, with YOLOv5 achieving a satisfactory accuracy of 84%. The second model was trained on the COCO dataset using YOLOv5, yielding an impressive accuracy of 94%. By improving object detection capabilities through advanced technology, this research seeks to empower individuals with visual impairments, enhancing their mobility, independence, and overall quality of life.

Implementation of Object Feature Extraction within Image for Object Tracking (객체 추적을 위한 영상 내의 객체 특징점 추출 알고리즘 구현)

  • Lee, Yong-Hwan;Kim, Youngseop
    • Journal of the Semiconductor & Display Technology
    • /
    • v.17 no.3
    • /
    • pp.113-116
    • /
    • 2018
  • This paper proposes a mobile image search system which uses a sensor information of smart phone, and enables running in a variety of environments, which is implemented on Android platform. The implemented system deals with a new image descriptor using combination of the visual feature (CEDD) with EXIF attributes in the target of JPEG image, and image matching scheme, which is optimized to the mobile platform. Experimental result shows that the proposed method exhibited a significant improved searching results of around 80% in precision in the large image database. Considering the performance such as processing time and precision, we think that the proposed method can be used in other application field.

Scalable Re-detection for Correlation Filter in Visual Tracking

  • Park, Kayoung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.57-64
    • /
    • 2020
  • In this paper, we propose an scalable re-detection for correlation filter in visual tracking. In real world, there are lots of target disappearances and reappearances during tracking, thus failure detection and re-detection methods are needed. One of the important point for re-detection is that a searching area must be large enough to find the missing target. For robust visual tracking, we adopt kernelized correlation filter as a baseline. Correlation filters have been extensively studied for visual object tracking in recent years. However conventional correlation filters detect the target in the same size area with the trained filter which is only 2 to 3 times larger than the target. When the target is disappeared for a long time, we need to search a wide area to re-detect the target. Proposed algorithm can search the target in a scalable area, hence the searching area is expanded by 2% in every frame from the target loss. Four datasets are used for experiments and both qualitative and quantitative results are shown in this paper. Our algorithm succeed the target re-detection in challenging datasets while conventional correlation filter fails.