• Title/Summary/Keyword: 3D video

Search Result 1,154, Processing Time 0.021 seconds

A Cross-Layer Unequal Error Protection Scheme for Prioritized H.264 Video using RCPC Codes and Hierarchical QAM

  • Chung, Wei-Ho;Kumar, Sunil;Paluri, Seethal;Nagaraj, Santosh;Annamalai, Annamalai Jr.;Matyjas, John D.
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.53-68
    • /
    • 2013
  • We investigate the rate-compatible punctured convolutional (RCPC) codes concatenated with hierarchical QAM for designing a cross-layer unequal error protection scheme for H.264 coded sequences. We first divide the H.264 encoded video slices into three priority classes based on their relative importance. We investigate the system constraints and propose an optimization formulation to compute the optimal parameters of the proposed system for the given source significance information. An upper bound to the significance-weighted bit error rate in the proposed system is derived as a function of system parameters, including the code rate and geometry of the constellation. An example is given with design rules for H.264 video communications and 3.5-4 dB PSNR improvement over existing RCPC based techniques for AWGN wireless channels is shown through simulations.

3D Augmented Reality Streaming System Based on a Lamina Display

  • Baek, Hogil;Park, Jinwoo;Kim, Youngrok;Park, Sungwoong;Choi, Hee-Jin;Min, Sung-Wook
    • Current Optics and Photonics
    • /
    • v.5 no.1
    • /
    • pp.32-39
    • /
    • 2021
  • We propose a three-dimensional (3D) streaming system based on a lamina display that can convey field information in real-time by creating floating 3D images that can satisfy the accommodation cue. The proposed system is mainly composed of three parts, namely: a 3D vision camera unit to obtain and provide RGB and depth data in real-time, a 3D image engine unit to realize the 3D volume with a fast response time by using the RGB and depth data, and an optical floating unit to bring the implemented 3D image out of the system and consequently increase the sense of presence. Furthermore, we devise the streaming method required for implementing augmented reality (AR) images by using a multilayered image, and the proposed method for implementing AR 3D video in real-time non-face-to-face communication has been experimentally verified.

Impact of playout buffer dynamics on the QoE of wireless adaptive HTTP progressive video

  • Xie, Guannan;Chen, Huifang;Yu, Fange;Xie, Lei
    • ETRI Journal
    • /
    • v.43 no.3
    • /
    • pp.447-458
    • /
    • 2021
  • The quality of experience (QoE) of video streaming is degraded by playback interruptions, which can be mitigated by the playout buffers of end users. To analyze the impact of playout buffer dynamics on the QoE of wireless adaptive hypertext transfer protocol (HTTP) progressive video, we model the playout buffer as a G/D/1 queue with an arbitrary packet arrival rate and deterministic service time. Because all video packets within a block must be available in the playout buffer before that block is decoded, playback interruption can occur even when the playout buffer is non-empty. We analyze the queue length evolution of the playout buffer using diffusion approximation. Closed-form expressions for user-perceived video quality are derived in terms of the buffering delay, playback duration, and interruption probability for an infinite buffer size, the packet loss probability and re-buffering probability for a finite buffer size. Simulation results verify our theoretical analysis and reveal that the impact of playout buffer dynamics on QoE is content dependent, which can contribute to the design of QoE-driven wireless adaptive HTTP progressive video management.

Pattern Similarity Retrieval of Data Sequences for Video Retrieval System (비디오 검색 시스템을 위한 데이터 시퀀스 패턴 유사성 검색)

  • Lee Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.347-356
    • /
    • 2006
  • A video stream can be represented by a sequence of data points in a multidimensional space. In this paper, we introduce a trend vector that approximates values of data points in a sequence and represents the moving trend of points in the sequence, and present a pattern similarity matching method for data sequences using the trend vector. A sequence is partitioned into multiple segments, each of which is represented by a trend vector. The query processing is based on the comparison of these vectors instead of scanning data elements of entire sequences. Using the trend vector, our method is designed to filter out irrelevant sequences from a database and to find similar sequences with respect to a query. We have performed an extensive experiment on synthetic sequences as well as video streams. Experimental results show that the precision of our method is up to 2.1 times higher and the processing time is up to 45% reduced, compared with an existing method.

Improvement of point cloud data using 2D super resolution network (2D super resolution network를 이용한 Point Cloud 데이터 개선)

  • Park, Seong-Hwan;Kim, Kyu-Heon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2021.06a
    • /
    • pp.16-18
    • /
    • 2021
  • 미디어 기술은 사용자가 더욱 몰입감을 느낄 수 있는 방향으로 개발되어 왔다. 이러한 흐름에 따라 기존의 2D 이미지에 비해 깊이감을 느낄 수 있는 증강 현실, 가상 현실 등 3D 공간 데이터를 활용하는 미디어가 주목을 받고 있다. 포인트 클라우드는 수많은 3차원 좌표를 가진 여러 개의 점들로 구성된 데이터 형식이므로 각각의 점들에 대한 좌표 및 색상 정보를 사용하여 3D 미디어를 표현한다. 고정된 크기의 해상도를 갖는 2D 이미지와 다르게 포인트 클라우드는 포인트의 개수에 따라 용량이 유동적이며, 이를 기존의 비디오 코덱을 사용하여 압축하기 위해 국제 표준기구인 MPEG(Moving Picture Experts Group)에서는 Video-based Point Cloud Compression (V-PCC)을 제정하였다. V-PCC는 3D 포인트 클라우드 데이터를 직교 평면 벡터를 이용하여 2D 패치로 분해하고 이러한 패치를 2D 이미지에 배치한 다음 기존의 2D 비디오 코덱을 사용하여 압축한다. 본 논문에서는 앞서 설명한 2D 패치 이미지에 super resolution network를 적용함으로써 3D 포인트 클라우드의 성능 향상하는 방안을 제안한다.

  • PDF

Spatial-temporal Ensemble Method for Action Recognition (행동 인식을 위한 시공간 앙상블 기법)

  • Seo, Minseok;Lee, Sangwoo;Choi, Dong-Geol
    • The Journal of Korea Robotics Society
    • /
    • v.15 no.4
    • /
    • pp.385-391
    • /
    • 2020
  • As deep learning technology has been developed and applied to various fields, it is gradually changing from an existing single image based application to a video based application having a time base in order to recognize human behavior. However, unlike 2D CNN in a single image, 3D CNN in a video has a very high amount of computation and parameter increase due to the addition of a time axis, so improving accuracy in action recognition technology is more difficult than in a single image. To solve this problem, we investigate and analyze various techniques to improve performance in 3D CNN-based image recognition without additional training time and parameter increase. We propose a time base ensemble using the time axis that exists only in the videos and an ensemble in the input frame. We have achieved an accuracy improvement of up to 7.1% compared to the existing performance with a combination of techniques. It also revealed the trade-off relationship between computational and accuracy.

Geocoding of the Free Stereo Mosaic Image Generated from Video Sequences (비디오 프레임 영상으로부터 제작된 자유 입체 모자이크 영상의 실좌표 등록)

  • Noh, Myoung-Jong;Cho, Woo-Sug;Park, Jun-Ku;Kim, Jung-Sub;Koh, Jin-Woo
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.29 no.3
    • /
    • pp.249-255
    • /
    • 2011
  • The free-stereo mosaics image without GPS/INS and ground control data can be generated by using relative orientation parameters on the 3D model coordinate system. Its origin is located in one reference frame image. A 3D coordinate calculated by conjugate points on the free-stereo mosaic images is represented on the 3D model coordinate system. For determining 3D coordinate on the 3D absolute coordinate system utilizing conjugate points on the free-stereo mosaic images, transformation methodology is required for transforming 3D model coordinate into 3D absolute coordinate. Generally, the 3D similarity transformation is used for transforming each other 3D coordinates. Error of 3D model coordinates used in the free-stereo mosaic images is non-linearly increased according to distance from 3D model coordinate and origin point. For this reason, 3D model coordinates used in the free-stereo mosaic images are difficult to transform into 3D absolute coordinates by using linear transformation. Therefore, methodology for transforming nonlinear 3D model coordinate into 3D absolute coordinate is needed. Also methodology for resampling the free-stereo mosaic image to the geo-stereo mosaic image is needed for overlapping digital map on absolute coordinate and stereo mosaic images. In this paper, we propose a 3D non-linear transformation for converting 3D model coordinate in the free-stereo mosaic image to 3D absolute coordinate, and a 2D non-linear transformation based on 3D non-linear transformation converting the free-stereo mosaic image to the geo-stereo mosaic image.

An Atlas Generation Method with Tiny Blocks Removal for Efficient 3DoF+ Video Coding (효율적인 3DoF+ 비디오 부호화를 위한 작은 블록 제거를 통한 아틀라스 생성 기법)

  • Lim, Sung-Gyun;Kim, Hyun-Ho;Kim, Jae-Gon
    • Journal of Broadcast Engineering
    • /
    • v.25 no.5
    • /
    • pp.665-671
    • /
    • 2020
  • MPEG-I is actively working on standardization on the coding of immersive video which provides up to 6 degree of freedom (6DoF) in terms of viewpoint. 3DoF+ video, which provides motion parallax to omnidirectional view of 360 video, renders a view at any desired viewpoint using multiple view videos acquisitioned in a limited 3D space covered with upper body motion at a fixed position. The MPEG-I visual group is developing a test model called TMIV (Test Model for Immersive Video) in the process of development of the standard for 3DoF+ video coding. In the TMIV, the redundancy between a set of input view videos is removed, and several atlases are generated by packing patches including the remaining texture and depth regions into frames as compact as possible, and coded. This paper presents an atlas generation method that removes small-sized blocks in the atlas for more efficient 3DoF+ video coding. The proposed method shows a performance improvement of BD-rate bit savings of 0.7% and 1.4%, respectively, in natural and graphic sequences compared to TMIV.

Effect of Input Data Video Interval and Input Data Image Similarity on Learning Accuracy in 3D-CNN

  • Kim, Heeil;Chung, Yeongjee
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.2
    • /
    • pp.208-217
    • /
    • 2021
  • 3D-CNN is one of the deep learning techniques for learning time series data. However, these three-dimensional learning can generate many parameters, requiring high performance or having a significant impact on learning speed. We will use these 3D-CNNs to learn hand gesture and find the parameters that showed the highest accuracy, and then analyze how the accuracy of 3D-CNN varies through input data changes without any structural changes in 3D-CNN. First, choose the interval of the input data. This adjusts the ratio of the stop interval to the gesture interval. Secondly, the corresponding interframe mean value is obtained by measuring and normalizing the similarity of images through interclass 2D cross correlation analysis. This experiment demonstrates that changes in input data affect learning accuracy without structural changes in 3D-CNN. In this paper, we proposed two methods for changing input data. Experimental results show that input data can affect the accuracy of the model.

Multi-view Rate Control based on HEVC for 3D Video Services

  • Lim, Woong;Lee, Sooyoun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.8
    • /
    • pp.245-249
    • /
    • 2013
  • In this paper, we propose two rate control algorithms for multi-view extension of HEVC with two rate control algorithms adopted in HEVC and analyze the multi-view rate control performance. The proposed multi-view rate controls are designed on HEVC-based multi-view video coding (MV-HEVC) platform with consideration of high-level syntax, inter-view prediction, etc. not only for the base view but also for the extended views using the rate control algorithms based on URQ (Unified Rate-Quantization) and R-lambda model adopted in HEVC. The proposed multi-view rate controls also contain view-wise target bit allocation for providing the compatibility to the base view. By allocating the target bitrates for each view, the proposed multi-view rate control based on URQ model achieved about 1.83% of average bitrate accuracy and 1.73dB of average PSNR degradation. In addition, about 2.97% of average bitrate accuracy and 0.31dB of average PSNR degradation are achieved with the proposed multi-view rate control based on R-lambda model.