• Title/Summary/Keyword: 데이터 레벨 병렬성

Search Result 13, Processing Time 0.023 seconds

Data Level Parallelism for H.264/AVC Decoder on a Multi-Core Processor and Performance Analysis (멀티코어 프로세서에서의 H.264/AVC 디코더를 위한 데이터 레벨 병렬화 성능 예측 및 분석)

  • Cho, Han-Wook;Jo, Song-Hyun;Song, Yong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.102-116
    • /
    • 2009
  • There have been lots of researches for H.264/AVC performance enhancement on a multi-core processor. The enhancement has been performed through parallelization methods. Parallelization methods can be classified into a task-level parallelization method and a data level parallelization method. A task-level parallelization method for H.264/AVC decoder is implemented by dividing H.264/AVC decoder algorithms into pipeline stages. However, it is not suitable for complex and large bitstreams due to poor load-balancing. Considering load-balancing and performance scalability, we propose a horizontal data level parallelization method for H.264/AVC decoder in such a way that threads are assigned to macroblock lines. We develop a mathematical performance expectation model for the proposed parallelization methods. For evaluation of the mathematical performance expectation, we measured the performance with JM 13.2 reference software on ARM11 MPCore Evaluation Board. The cycle-accurate measurement with SoCDesigner Co-verification Environment showed that expected performance and performance scalability of the proposed parallelization method was accurate in relatively high level

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing (효율적인 영상데이터 처리를 위한 SIMD기반 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2011
  • Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.

Parallel Method for HEVC Deblocking Filter based on Coding Unit Depth Information (코딩 유닛 깊이 정보를 이용한 HEVC 디블록킹 필터의 병렬화 기법)

  • Jo, Hyun-Ho;Ryu, Eun-Kyung;Nam, Jung-Hak;Sim, Dong-Gyu;Kim, Doo-Hyun;Song, Joon-Ho
    • Journal of Broadcast Engineering
    • /
    • v.17 no.5
    • /
    • pp.742-755
    • /
    • 2012
  • In this paper, we propose a parallel deblocking algorithm to resolve workload imbalance when the deblocking filter of high efficiency video coding (HEVC) decoder is parallelized. In HEVC, the deblocking filter which is one of the in-loop filters conducts two-step filtering on vertical edges first and horizontal edges later. The deblocking filtering can be conducted with high-speed through data-level parallelism because there is no dependency between adjacent edges for deblocking filtering processes. However, workloads would be imbalanced among regions even though the same amount of data for each region is allocated, which causes performance loss of decoder parallelization. In this paper, we solve the problem for workload imbalance by predicting the complexity of deblocking filtering with coding unit (CU) depth information at a coding tree block (CTB) and by allocating the same amount of workload to each core. Experimental results show that the proposed method achieves average time saving (ATS) by 64.3%, compared to single core-based deblocking filtering and also achieves ATS by 6.7% on average and 13.5% on maximum, compared to the conventional uniform data-level parallelism.

Performance Analysis of HEVC Decoder Parallelization based on Slice and Tile for Ultra-High Definition Video (초고해상도 비디오를 위한 분할 영상 기반 HEVC 복호화기 병렬화)

  • Son, SoHee;Baek, A-Ram;Choi, Haechul
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2016.06a
    • /
    • pp.359-360
    • /
    • 2016
  • 본 논문에서는 초고화질의 비디오 실시간 복호화를 위해 HEVC(High Efficiency Video Coding)에서 지원하는 병렬화 기술인 Slice와 Tile 기술을 이용하여 초고해상도 영상에 대한 복호화기 병렬화 성능을 비교한다. Slice와 Tile은 분할 데이터간 의존성이 존재하지 않으므로 분할된 데이터를 다중 스레드에 할당하여 데이터-레벨 병렬화를 수행하였다. 실험 결과에서는 병렬화된 복호화기 성능이 기존 순차 복호화기에 비해 최대 2.08배 고속화 되었고, 분할 데이터 수가 증가하여도 화질 손실이 거의 없는 결과를 보인다.

  • PDF

Design and Implementation of an Information Analyzer for Object-Oriented Program (객체지향 프로그램 정보 분석기 설계 및 구현)

  • 김운용;최영근
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.490-492
    • /
    • 1999
  • 본 논문에서는 객체지향 프로그램에 대한 프로그램 정보를 분석하여 이들간의 관계를 표현할 수 있는 방법을 제시한다. 현재까지 프로그램을 분석하고 표현하기 위한 그래프 표현으로 호출 그래프, 제어흐름 그래프 및 종속 그래프 등이 있으며 이를 이용하여 테스팅, 슬라이싱, 디버깅, 프로그램 이해, 병렬처리, 역공학과 같은 다양한 분야에 적용되고 있다. 본 논문에서는 객체지향 언어의 프로그램의 시각적 이해를 돕고, 분석에 필요한 정보를 표현하는 그래프들간의 관계성을 고려한 효율적인 분석기를 표현한다. 이를 위해 클래스, 상속관계, 호출관계, 제어흐름 및 데이터 종속관계를 고려하여 객체 지향언어 분석에 필요한 그래프 요소를 멤버함수레벨, 클래스 레벨, 모듈 클래스 레벨 단위로 추출하고 이들간의 정보를 저장소로 통합 구성한다. 이를 통해 기존의 특정목적을 위해 표현하는 그래프 표현 방식은 그래프간의 관계성과 분석정보의 독립성 그리고 재사용성의 특징을 가지는 통합 분석기로 구성될 수 있다. 이러한 특징은 프로그램의 이해와 정보의 관리효과를 증가시킬 수 있으며, 많은 소프트웨어 엔지니어링 도구와 기술들에 필요한 통합화된 정보를 제공하고 이용될 수 있을 것이다.

  • PDF

Parallel Accessed Mirroring based on Stripping (스트라이핑 기반의 병렬 접근 미러링 기법)

  • Kang, Dong-Jae;Kim, Chang-Soo;Shin, Bum-Joo;Kim, Hag-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.539-542
    • /
    • 2002
  • 멀티미디어와 인터넷의 대중화가 야기한 급격한 데이터의 증가는 테라(Tera)바이트 이상의 대용량 저장공간과 대용량 정보의 효율적인 공유를 지원하는 스토리지 시스템을 요구하고 있으며 이를 위하여 SAN 기반의 스토리지 클러스터링 시스템들이 많이 사용되고 있다. 이러한 환경에서 하드웨어 또는 소프트웨어 RAID(Redundant Array of Independent Disks)는 대용량 정보의 고성능의 입출력과 신뢰성을 위해서 필수적이 되었다. 범용적인 RAID로는 RAE-0, RAID-1, RAID-5가 주로 사용되고 있으며 각각의 레벨은 장단점을 갖는다. 본 논문에서는 RAID-0와 RAID-1이 갖는 문제점들의 보완을 위하여 변형된 RAID 레벨인 RAID-SM을 제안한다. RAID-SM은 기존의 RAID-1이 가지는 데이터의 가용성을 유지하면서 추가적인 비용 없이 RAID-0의 우수한 입출력 성능을 얻기 위한 RAID-1의 변형된 방식이다. RAID-SM의 구현을 위하여 디스크상의 데이터의 배치 및 데이터 맵핑 탕식을 정의하고 RAID-SM에서의 I/O방법을 기술한다. 제안하는 RAID-SM은 멀티미디어나 GIS 데이터와 같은 읽기 연산 집약적인 시스템을 대상으로 하는 안정적인 레이드 방식이며 RAID-SM의 장점 및 성능은 본 논문에서의 실험을 통한 결과로서 제시한다.

  • PDF

Implementation of Multi-Core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 알고리즘을 위한 멀티코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.2
    • /
    • pp.45-52
    • /
    • 2011
  • In the past, a patient went to the room where an ultrasound image diagnosis device was set, and then he or she was examined by a doctor. However, currently a doctor can go and examine the patient with a handheld ultrasound device who stays in a room. However, it was implemented with only fundamental functions, and can not meet the high performance required by the focusing algorithm of ultrasound beam which determines the quality of ultrasound image. In addition, low energy consumption was satisfied for the mobile ultrasound device. To satisfy these requirements, this paper proposes a high-performance and low-power single instruction, multiple data (SIMD) based multi-core processor that supports a representative beamforming algorithm out of several focusing methods of mobile ultrasound image signals. The proposed SIMD multi-core processor, which consists of 16 processing elements (PEs), satisfies the high-performance required by the beamforming algorithm by exploiting considerable data-level parallelism inherent in the echo image data of ultrasound. Experimental results showed that the proposed multi-core processor outperforms a commercial high-performance processor, TI DSP C6416, in terms of execution time (15.8 times better), energy efficiency (6.9 times better), and area efficiency (10 times better).

A Distributed High Dimensional Indexing Structure for Content-based Retrieval of Large Scale Data (대용량 데이터의 내용 기반 검색을 위한 분산 고차원 색인 구조)

  • Cho, Hyun-Hwa;Lee, Mi-Young;Kim, Young-Chang;Chang, Jae-Woo;Lee, Kyu-Chul
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.228-237
    • /
    • 2010
  • Although conventional index structures provide various nearest-neighbor search algorithms for high-dimensional data, there are additional requirements to increase search performances as well as to support index scalability for large scale data. To support these requirements, we propose a distributed high-dimensional indexing structure based on cluster systems, called a Distributed Vector Approximation-tree (DVA-tree), which is a two-level structure consisting of a hybrid spill-tree and VA-files. We also describe the algorithms used for constructing the DVA-tree over multiple machines and performing distributed k-nearest neighbors (NN) searches. To evaluate the performance of the DVA-tree, we conduct an experimental study using both real and synthetic datasets. The results show that our proposed method contributes to significant performance advantages over existing index structures on difference kinds of datasets.

Implementation of an Optimal Many-core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 기법을 위한 최적의 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.119-128
    • /
    • 2011
  • This paper introduces design space exploration of many-core processors that meet high performance and low power required by the beamforming algorithm of image signals of mobile ultrasound. For the design space exploration of the many-core processor, we mapped different number of ultrasound image data to each processing element of many-core, and then determined an optimal many-core processor architecture in terms of execution time, energy efficiency and area efficiency. Experimental results indicate that PE=4096 and 1024 provide the highest energy efficiency and area efficiency, respectively. In addition, PE=4096 achieves 46x and 10x better than TI DSP C6416, which is widely used for ultrasound image devices, in terms of energy efficiency and area efficiency, respectively.

Analysis on the GPU Performance according to Hierarchical Memory Organization (계층적 메모리 구성에 따른 GPU 성능 분석)

  • Choi, Hongjun;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.3
    • /
    • pp.22-32
    • /
    • 2014
  • Recently, GPGPU has been widely used for general-purpose processing as well as graphics processing by providing optimized hardware for parallel processing. Memory system has big effects on the performance of parallel processing units such as GPU. In the GPU, hierarchical memory architecture is implemented for high memory bandwidth. Moreover, both memory address coalescing and memory request merging techniques are widely used. This paper analyzes the GPU performance according to various memory organizations. According to our simulation results, GPU performance improves by 15.5%, 21.5%, 25.5%, 30.9% as adding 8KB L1, 16KB L1, 32KB L1, 64KB L1 cache, respectively, compared to case without L1 cache. However, experimental results show that some benchmarks decrease performance since memory transaction increases due to data dependency. Moreover, average memory access latency is increased as the depth of hierarchical cache level increases when cache miss occurs significantly.