• Title/Summary/Keyword: Object-based encoding

Search Result 42, Processing Time 0.1 seconds

An MPEG-4 Compliant Interactive Multimedia Streaming Platform Using Overlay Networks

  • Kim, Hyun-Cheol;Patrikakis, Charalampos Z.;Minogiannis, Nikos;Karamolegkos, Pantelis N.;Lambiris, Alex;Kim, Kyu-Heon
    • ETRI Journal
    • /
    • v.28 no.4
    • /
    • pp.411-424
    • /
    • 2006
  • This paper presents a multimedia streaming platform for efficiently transmitting MPEG-4 content over IP networks. The platform includes an MPEG-4 compliant streaming server and client, supporting object-based representation of multimedia scenes, interactivity, and advanced encoding profiles defined by the ISO standard. For scalability purposes, we employ an application-layer multicast scheme for media transmission using overlay networks. The overlay network, governed by the central entity of the network distribution manager, is dynamically deployed according to a set of pre-defined criteria. The overlay network supports both broadcast delivery and video-on-demand content. The multimedia streaming platform is standards-compliant and utilizes widespread multimedia protocols such as MPEG-4, real-time transport protocol, real-time transport control protocol, and real-time streaming protocol. The design of the overlay network was architected with the goal of transparency to both the streaming server and the client. As a result, many commercial implementations that use industry-standard protocols can be plugged into the architecture relatively painlessly and can enjoy the benefits of the platform.

  • PDF

A Study about Learning Graph Representation on Farmhouse Apple Quality Images with Graph Transformer (그래프 트랜스포머 기반 농가 사과 품질 이미지의 그래프 표현 학습 연구)

  • Ji Hun Bae;Ju Hwan Lee;Gwang Hyun Yu;Gyeong Ju Kwon;Jin Young Kim
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2023
  • Recently, a convolutional neural network (CNN) based system is being developed to overcome the limitations of human resources in the apple quality classification of farmhouse. However, since convolutional neural networks receive only images of the same size, preprocessing such as sampling may be required, and in the case of oversampling, information loss of the original image such as image quality degradation and blurring occurs. In this paper, in order to minimize the above problem, to generate a image patch based graph of an original image and propose a random walk-based positional encoding method to apply the graph transformer model. The above method continuously learns the position embedding information of patches which don't have a positional information based on the random walk algorithm, and finds the optimal graph structure by aggregating useful node information through the self-attention technique of graph transformer model. Therefore, it is robust and shows good performance even in a new graph structure of random node order and an arbitrary graph structure according to the location of an object in an image. As a result, when experimented with 5 apple quality datasets, the learning accuracy was higher than other GNN models by a minimum of 1.3% to a maximum of 4.7%, and the number of parameters was 3.59M, which was about 15% less than the 23.52M of the ResNet18 model. Therefore, it shows fast reasoning speed according to the reduction of the amount of computation and proves the effect.

MPEG-H 3D Audio Decoder Structure and Complexity Analysis (MPEG-H 3D 오디오 표준 복호화기 구조 및 연산량 분석)

  • Moon, Hyeongi;Park, Young-cheol;Lee, Yong Ju;Whang, Young-soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.432-443
    • /
    • 2017
  • The primary goal of the MPEG-H 3D Audio standard is to provide immersive audio environments for high-resolution broadcasting services such as UHDTV. This standard incorporates a wide range of technologies such as encoding/decoding technology for multi-channel/object/scene-based signal, rendering technology for providing 3D audio in various playback environments, and post-processing technology. The reference software decoder of this standard is a structure combining several modules and can operate in various modes. Each module is composed of independent executable files and executed sequentially, real time decoding is impossible. In this paper, we make DLL library of the core decoder, format converter, object renderer, and binaural renderer of the standard and integrate them to enable frame-based decoding. In addition, by measuring the computation complexity of each mode of the MPEG-H 3D-Audio decoder, this paper also provides a reference for selecting the appropriate decoding mode for various hardware platforms. As a result of the computational complexity measurement, the low complexity profiles included in Korean broadcasting standard has a computation complexity of 2.8 times to 12.4 times that of the QMF synthesis operation in case of rendering as a channel signals, and it has a computation complexity of 4.1 times to 15.3 times of the QMF synthesis operation in case of rendering as a binaural signals.

Exploring the contextual factors of episodic memory: dissociating distinct social, behavioral, and intentional episodic encoding from spatio-temporal contexts based on medial temporal lobe-cortical networks (일화기억을 구성하는 맥락 요소에 대한 탐구: 시공간적 맥락과 구분되는 사회적, 행동적, 의도적 맥락의 내측두엽-대뇌피질 네트워크 특징을 중심으로)

  • Park, Jonghyun;Nah, Yoonjin;Yu, Sumin;Lee, Seung-Koo;Han, Sanghoon
    • Korean Journal of Cognitive Science
    • /
    • v.33 no.2
    • /
    • pp.109-133
    • /
    • 2022
  • Episodic memory consists of a core event and the associated contexts. Although the role of the hippocampus and its neighboring regions in contextual representations during encoding has become increasingly evident, it remains unclear how these regions handle various context-specific information other than spatio-temporal contexts. Using high-resolution functional MRI, we explored the patterns of the medial temporal lobe (MTL) and cortical regions' involvement during the encoding of various types of contextual information (i.e., journalism principle 5W1H): "Who did it?," "Why did it happen?," "What happened?," "When did it happen?," "Where did it happen?," and "How did it happen?" Participants answered six different contextual questions while looking at simple experimental events consisting of two faces with one object on the screen. The MTL was divided to sub-regions by hierarchical clustering from resting-state data. General linear model analyses revealed a stronger activation of MTL sub-regions, the prefrontal lobe (PFC), and the inferior parietal lobule (IPL) during social (Who), behavioral (How), and intentional (Why) contextual processing when compared with spatio-temporal (Where/When) contextual processing. To further investigate the functional networks involved in contextual encoding dissociation, a multivariate pattern analysis was conducted with features selected as the task-based connectivity links between the hippocampal subfields and PFC/IPL. Each social, behavioral, and intentional contextual processing was individually and successfully classified from spatio-temporal contextual processing, respectively. Thus, specific contexts in episodic memory, namely social, behavior, and intention, involve distinct functional connectivity patterns that are distinct from those for spatio-temporal contextual memory.

Cancellation of MRI Artifact due to Rotational Motion (회전운동에 기인한 MRI 아티팩트의 제거)

  • 김응규
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.411-419
    • /
    • 2004
  • When the imaging object rotates in image plane during MRI scan, its rotation causes phase error and non-uniform sampling to MRI signal. The model of the problem including phase error non-uniform sampling of MRI signal showed that the MRI signals corrupted by rotations about an arbitrary center and the origin in image plane are different in their phases. Therefore the following methods are presented to improve the quality of the MR image which includes the artifact. The first, assuming that the angle of 2-D rotational motion is already known and the position of 2-D rotational center is unknown, an algorithm to correct the artifact which is based on the phase correction is presented. The second, in case of 2-D rotational motion with unknown rotational center and unknown rotational angle, an algorithm is presented to correct the MRI artifact. At this case, the energy of an ideal MR image is minimum outside the boundary of the imaging object to estimate unknown motion parameters and the measured energy increases when the imaging object has an rotation. By using this property, an evaluation function is defined to estimate unknown values of rotational angle at each phase encoding step. Finally, the effectiveness of this presented techniques is shown by using a phantom image with simulated motion and a real image with 2-D translational shift and rotation.

An Algorithm for the Multi-view Image Improvement with the Resteicted Number of Images in Texture Extraction (텍스쳐 추출시 제한된 수의 참여 영상을 이용한 Multi-view 영상 개선 알고리듬)

  • 김도현;양영일
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.1
    • /
    • pp.34-40
    • /
    • 2000
  • '[n this paper, we propose an efficient multi-view image coding algorithm which finds the optimal texture from a restricted number of multi-view image. The X-Y plane of the normalized object space is divided into the triangular patches. The depth of each node is determined by appling a block based disparity compensation method. Thereafter the texture of each patch is extracted by appling an affine transformation based disparity compensation method to the multi-view images. We reduced the number of images needed to determine the texture compared to traditional methods which use all the multi-view image in the texture extraction. The experimental results show that the SNR of images encoded by the proposed algorithm is better than that of images encoded by the traditional method by the approximately 0.2dB for the test sets of multi -view image called dragon, santa, city and kid. Image data recovered after encoding by the proposed method show a better visual results than after using traditional method.

  • PDF

Semantic Segmentation of Drone Images Based on Combined Segmentation Network Using Multiple Open Datasets (개방형 다중 데이터셋을 활용한 Combined Segmentation Network 기반 드론 영상의 의미론적 분할)

  • Ahram Song
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.967-978
    • /
    • 2023
  • This study proposed and validated a combined segmentation network (CSN) designed to effectively train on multiple drone image datasets and enhance the accuracy of semantic segmentation. CSN shares the entire encoding domain to accommodate the diversity of three drone datasets, while the decoding domains are trained independently. During training, the segmentation accuracy of CSN was lower compared to U-Net and the pyramid scene parsing network (PSPNet) on single datasets because it considers loss values for all dataset simultaneously. However, when applied to domestic autonomous drone images, CSN demonstrated the ability to classify pixels into appropriate classes without requiring additional training, outperforming PSPNet. This research suggests that CSN can serve as a valuable tool for effectively training on diverse drone image datasets and improving object recognition accuracy in new regions.

Fast Game Encoder Based on Scene Descriptor for Gaming-on-Demand Service (주문형 게임 서비스를 위한 장면 기술자 기반 고속 게임 부호화기)

  • Jeon, Chan-Woong;Jo, Hyun-Ho;Sim, Dong-Gyu
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.7
    • /
    • pp.849-857
    • /
    • 2011
  • Gaming on demand(GOD) makes people enjoy games by encoding and transmitting game screen at a server side, and decoding the video at a client side. In this paper, we propose a fast game video encoder for multiple users over network with low-powered devices. In the proposed system, the computational complexity of game encoders is reduced by using scene descriptors, which consists of an object motion vector, global motion, and scene change. With additional information from game engines, the proposed encoder does not need to perform various complexity processes such as motion estimation and ratedistortion optimization. The motion estimation and rate-distortion optimization skipped by scene descriptors. We found that the proposed method improved 192 % in terms of FPS, compared with x264 software. With partial assembly code, we also improved coding speed by 86 % in terms of FPS. We found that the proposed fast encoder could encode over 60 FPS for real-time GOD applications.

A FRINGE CHARACTER ANALYSIS OF FRINGE IMAGE (Fringe 영상의 주파수 특성 분석)

  • Seo Young-Ho;Choi Hyun-Jun;Kim Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.11C
    • /
    • pp.1053-1059
    • /
    • 2005
  • The computer generated hologram (CGH) designs and produces digital information for generating 3-D (3-Dimension) image using computer and software instead of optically-sensed hologram of light interference, and it can synthesis a virtual object which is physically not in existence. Since digital hologram includes an amount of data as can be seen at the process of digitization, it is necessary that the data representing digital hologram is reduced for storing, transmission, and processing. As the efforts that are to handle hologram with a type of digital information have been increased, various methods to compress digital hologram called by fringe pattern are groped. Suitable proposal is encoding of hologram. In this paper, we analyzed the properties of CGH using tools of frequency transform, assuming that a generated CGH is a 2D image by introducing DWT that is known as the better tool than DCT for frequency transform. The compression and reconstruction result which was extracted from the wavelet-based codecs illustrates that it has better properties for reconstruction at the maximum 2 times higher compression rate than the Previous researches of Yoshikawa[2] and Thomas[3].

A Study on the Deep Neural Network based Recognition Model for Space Debris Vision Tracking System (심층신경망 기반 우주파편 영상 추적시스템 인식모델에 대한 연구)

  • Lim, Seongmin;Kim, Jin-Hyung;Choi, Won-Sub;Kim, Hae-Dong
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.45 no.9
    • /
    • pp.794-806
    • /
    • 2017
  • It is essential to protect the national space assets and space environment safely as a space development country from the continuously increasing space debris. And Active Debris Removal(ADR) is the most active way to solve this problem. In this paper, we studied the Artificial Neural Network(ANN) for a stable recognition model of vision-based space debris tracking system. We obtained the simulated image of the space environment by the KARICAT which is the ground-based space debris clearing satellite testbed developed by the Korea Aerospace Research Institute, and created the vector which encodes structure and color-based features of each object after image segmentation by depth discontinuity. The Feature Vector consists of 3D surface area, principle vector of point cloud, 2D shape and color information. We designed artificial neural network model based on the separated Feature Vector. In order to improve the performance of the artificial neural network, the model is divided according to the categories of the input feature vectors, and the ensemble technique is applied to each model. As a result, we confirmed the performance improvement of recognition model by ensemble technique.