• Title/Summary/Keyword: Monocular depth estimation

Search Result 22, Processing Time 0.024 seconds

A Region Depth Estimation Algorithm using Motion Vector from Monocular Video Sequence (단안영상에서 움직임 벡터를 이용한 영역의 깊이추정)

  • 손정만;박영민;윤영우
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.5 no.2
    • /
    • pp.96-105
    • /
    • 2004
  • The recovering 3D image from 2D requires the depth information for each picture element. The manual creation of those 3D models is time consuming and expensive. The goal in this paper is to estimate the relative depth information of every region from single view image with camera translation. The paper is based on the fact that the motion of every point within image which taken from camera translation depends on the depth. Motion vector using full-search motion estimation is compensated for camera rotation and zooming. We have developed a framework that estimates the average frame depth by analyzing motion vector and then calculates relative depth of region to average frame depth. Simulation results show that the depth of region belongs to a near or far object is consistent accord with relative depth that man recognizes.

  • PDF

Non-Homogeneous Haze Synthesis for Hazy Image Depth Estimation Using Deep Learning (불균일 안개 영상 합성을 이용한 딥러닝 기반 안개 영상 깊이 추정)

  • Choi, Yeongcheol;Paik, Jeehyun;Ju, Gwangjin;Lee, Donggun;Hwang, Gyeongha;Lee, Seungyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.28 no.3
    • /
    • pp.45-54
    • /
    • 2022
  • Image depth estimation is a technology that is the basis of various image analysis. As analysis methods using deep learning models emerge, studies using deep learning in image depth estimation are being actively conducted. Currently, most deep learning-based depth estimation models are being trained with clean and ideal images. However, due to the lack of data on adverse conditions such as haze or fog, the depth estimation may not work well in such an environment. It is hard to sufficiently secure an image in these environments, and in particular, obtaining non-homogeneous haze data is a very difficult problem. In order to solve this problem, in this study, we propose a method of synthesizing non-homogeneous haze images and a learning method for a monocular depth estimation deep learning model using this method. Considering that haze mainly occurs outdoors, datasets mainly containing outdoor images are constructed. Experiment results show that the model with the proposed method is good at estimating depth in both synthesized and real haze data.

Deep Learning-based Depth Map Estimation: A Review

  • Abdullah, Jan;Safran, Khan;Suyoung, Seo
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.1
    • /
    • pp.1-21
    • /
    • 2023
  • In this technically advanced era, we are surrounded by smartphones, computers, and cameras, which help us to store visual information in 2D image planes. However, such images lack 3D spatial information about the scene, which is very useful for scientists, surveyors, engineers, and even robots. To tackle such problems, depth maps are generated for respective image planes. Depth maps or depth images are single image metric which carries the information in three-dimensional axes, i.e., xyz coordinates, where z is the object's distance from camera axes. For many applications, including augmented reality, object tracking, segmentation, scene reconstruction, distance measurement, autonomous navigation, and autonomous driving, depth estimation is a fundamental task. Much of the work has been done to calculate depth maps. We reviewed the status of depth map estimation using different techniques from several papers, study areas, and models applied over the last 20 years. We surveyed different depth-mapping techniques based on traditional ways and newly developed deep-learning methods. The primary purpose of this study is to present a detailed review of the state-of-the-art traditional depth mapping techniques and recent deep learning methodologies. This study encompasses the critical points of each method from different perspectives, like datasets, procedures performed, types of algorithms, loss functions, and well-known evaluation metrics. Similarly, this paper also discusses the subdomains in each method, like supervised, unsupervised, and semi-supervised methods. We also elaborate on the challenges of different methods. At the conclusion of this study, we discussed new ideas for future research and studies in depth map research.

Visual Object Tracking Fusing CNN and Color Histogram based Tracker and Depth Estimation for Automatic Immersive Audio Mixing

  • Park, Sung-Jun;Islam, Md. Mahbubul;Baek, Joong-Hwan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.3
    • /
    • pp.1121-1141
    • /
    • 2020
  • We propose a robust visual object tracking algorithm fusing a convolutional neural network tracker trained offline from a large number of video repositories and a color histogram based tracker to track objects for mixing immersive audio. Our algorithm addresses the problem of occlusion and large movements of the CNN based GOTURN generic object tracker. The key idea is the offline training of a binary classifier with the color histogram similarity values estimated via both trackers used in this method to opt appropriate tracker for target tracking and update both trackers with the predicted bounding box position of the target to continue tracking. Furthermore, a histogram similarity constraint is applied before updating the trackers to maximize the tracking accuracy. Finally, we compute the depth(z) of the target object by one of the prominent unsupervised monocular depth estimation algorithms to ensure the necessary 3D position of the tracked object to mix the immersive audio into that object. Our proposed algorithm demonstrates about 2% improved accuracy over the outperforming GOTURN algorithm in the existing VOT2014 tracking benchmark. Additionally, our tracker also works well to track multiple objects utilizing the concept of single object tracker but no demonstrations on any MOT benchmark.

Depth estimation and View Synthesis using Haze Information (실안개를 이용한 단일 영상으로부터의 깊이정보 획득 및 뷰 생성 알고리듬)

  • Soh, Yong-Seok;Hyun, Dae-Young;Lee, Sang-Uk
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.07a
    • /
    • pp.241-243
    • /
    • 2010
  • Previous approaches to the 2D to 3D conversion problem require heavy computation or considerable amount of user input. In this paper, we propose a rather simple method in estimating the depth map from a single image using a monocular depth cue: haze. Using the haze imaging model, we obtain the distance information and estimate a reliable depth map from a single scenery image. Using the depth map, we also suggest an algorithm that converts the single image to 3D stereoscopic images. We determine a disparity value for each pixel from the original 'left' image and generate a corresponding 'right' image. Results show that the algorithm gives well refined depth maps despite the simplicity of the approach.

  • PDF

Vision-based Target Tracking for UAV and Relative Depth Estimation using Optical Flow (무인 항공기의 영상기반 목표물 추적과 광류를 이용한 상대깊이 추정)

  • Jo, Seon-Yeong;Kim, Jong-Hun;Kim, Jung-Ho;Lee, Dae-Woo;Cho, Kyeum-Rae
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.37 no.3
    • /
    • pp.267-274
    • /
    • 2009
  • Recently, UAVs (Unmanned Aerial Vehicles) are expected much as the Unmanned Systems for various missions. These missions are often based on the Vision System. Especially, missions such as surveillance and pursuit have a process which is carried on through the transmitted vision data from the UAV. In case of small UAVs, monocular vision is often used to consider weights and expenses. Research of missions performance using the monocular vision is continued but, actually, ground and target model have difference in distance from the UAV. So, 3D distance measurement is still incorrect. In this study, Mean-Shift Algorithm, Optical Flow and Subspace Method are posed to estimate the relative depth. Mean-Shift Algorithm is used for target tracking and determining Region of Interest (ROI). Optical Flow includes image motion information using pixel intensity. After that, Subspace Method computes the translation and rotation of image and estimates the relative depth. Finally, we present the results of this study using images obtained from the UAV experiments.

Deep Learning Based On-Device Augmented Reality System using Multiple Images (다중영상을 이용한 딥러닝 기반 온디바이스 증강현실 시스템)

  • Jeong, Taehyeon;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.341-350
    • /
    • 2022
  • In this paper, we propose a deep learning based on-device augmented reality (AR) system in which multiple input images are used to implement the correct occlusion in a real environment. The proposed system is composed of three technical steps; camera pose estimation, depth estimation, and object augmentation. Each step employs various mobile frameworks to optimize the processing on the on-device environment. Firstly, in the camera pose estimation stage, the massive computation involved in feature extraction is parallelized using OpenCL which is the GPU parallelization framework. Next, in depth estimation, monocular and multiple image-based depth image inference is accelerated using the mobile deep learning framework, i.e. TensorFlow Lite. Finally, object augmentation and occlusion handling are performed on the OpenGL ES mobile graphics framework. The proposed augmented reality system is implemented as an application in the Android environment. We evaluate the performance of the proposed system in terms of augmentation accuracy and the processing time in the mobile as well as PC environments.

Moving Object Extraction and Relative Depth Estimation of Backgrould regions in Video Sequences (동영상에서 물체의 추출과 배경영역의 상대적인 깊이 추정)

  • Park Young-Min;Chang Chu-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.3 s.99
    • /
    • pp.247-256
    • /
    • 2005
  • One of the classic research problems in computer vision is that of stereo, i.e., the reconstruction of three dimensional shape from two or more images. This paper deals with the problem of extracting depth information of non-rigid dynamic 3D scenes from general 2D video sequences taken by monocular camera, such as movies, documentaries, and dramas. Depth of the blocks are extracted from the resultant block motions throughout following two steps: (i) calculation of global parameters concerned with camera translations and focal length using the locations of blocks and their motions, (ii) calculation of each block depth relative to average image depth using the global parameters and the location of the block and its motion, Both singular and non-singular cases are experimented with various video sequences. The resultant relative depths and ego-motion object shapes are virtually identical to human vision.

Stereo Vision-based Visual Odometry Using Robust Visual Feature in Dynamic Environment (동적 환경에서 강인한 영상특징을 이용한 스테레오 비전 기반의 비주얼 오도메트리)

  • Jung, Sang-Jun;Song, Jae-Bok;Kang, Sin-Cheon
    • The Journal of Korea Robotics Society
    • /
    • v.3 no.4
    • /
    • pp.263-269
    • /
    • 2008
  • Visual odometry is a popular approach to estimating robot motion using a monocular or stereo camera. This paper proposes a novel visual odometry scheme using a stereo camera for robust estimation of a 6 DOF motion in the dynamic environment. The false results of feature matching and the uncertainty of depth information provided by the camera can generate the outliers which deteriorate the estimation. The outliers are removed by analyzing the magnitude histogram of the motion vector of the corresponding features and the RANSAC algorithm. The features extracted from a dynamic object such as a human also makes the motion estimation inaccurate. To eliminate the effect of a dynamic object, several candidates of dynamic objects are generated by clustering the 3D position of features and each candidate is checked based on the standard deviation of features on whether it is a real dynamic object or not. The accuracy and practicality of the proposed scheme are verified by several experiments and comparisons with both IMU and wheel-based odometry. It is shown that the proposed scheme works well when wheel slip occurs or dynamic objects exist.

  • PDF

3D Environment Perception using Stereo Infrared Light Sources and a Camera (스테레오 적외선 조명 및 단일카메라를 이용한 3차원 환경인지)

  • Lee, Soo-Yong;Song, Jae-Bok
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.5
    • /
    • pp.519-524
    • /
    • 2009
  • This paper describes a new sensor system for 3D environment perception using stereo structured infrared light sources and a camera. Environment and obstacle sensing is the key issue for mobile robot localization and navigation. Laser scanners and infrared scanners cover $180^{\circ}$ and are accurate but too expensive. Those sensors use rotating light beams so that the range measurements are constrained on a plane. 3D measurements are much more useful in many ways for obstacle detection, map building and localization. Stereo vision is very common way of getting the depth information of 3D environment. However, it requires that the correspondence should be clearly identified and it also heavily depends on the light condition of the environment. Instead of using stereo camera, monocular camera and two projected infrared light sources are used in order to reduce the effects of the ambient light while getting 3D depth map. Modeling of the projected light pattern enabled precise estimation of the range. Two successive captures of the image with left and right infrared light projection provide several benefits, which include wider area of depth measurement, higher spatial resolution and the visibility perception.