• Title/Summary/Keyword: single-instruction multiple-data

Search Result 77, Processing Time 0.032 seconds

A Low Power Design of H.264 Codec Based on Hardware and Software Co-design

  • Park, Seong-Mo;Lee, Suk-Ho;Shin, Kyoung-Seon;Lee, Jae-Jin;Chung, Moo-Kyoung;Lee, Jun-Young;Eum, Nak-Woong
    • Information and Communications Magazine
    • /
    • v.25 no.12
    • /
    • pp.10-18
    • /
    • 2008
  • In this paper, we present a low-power design of H.264 codec based on dedicated hardware and software solution on EMP(ETRI Multi-core platform). The dedicated hardware scheme has reducing computation using motion estimation skip and reducing memory access for motion estimation. The design reduces data transfer load to 66% compared to conventional method. The gate count of H.264 encoder and the performance is about 455k and 43Mhz@30fps with D1(720x480) for H.264 encoder. The software solution is with ASIP(Application Specific Instruction Processor) that it is SIMD(Single Instruction Multiple Data), Dual Issue VLIW(Very Long Instruction Word) core, specified register file for SIMD, internal memory and data memory access for memory controller, 6 step pipeline, and 32 bits bus width. Performance and gate count is 400MHz@30fps with CIF(Common Intermediated format) and about 100k per core for H.264 decoder.

Rapid Data Allocation Technique for Multiple Memory Bank Architectures (다중 메모리 뱅크 구조를 위한 고속의 자료 할당 기법)

  • 조정훈;백윤홍;최준식
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.196-198
    • /
    • 2003
  • Virtually every digital signal processors(DSPs) support on-chip multi- memory banks that allow the processor to access multiple words of data from memory in a single instruction cycle. Also, all existing fixed-point DSPs have irregular architecture of heterogeneous register which contains multiple register files that are distributed and dedicated to different sets of instructions. Although there have been several studies conducted to efficiently assign data to multi-memory banks, most of them assumed processors with relatively simple, homogeneous general-purpose resisters. Therefore, several vendor-provided compilers fer DSPs were unable to efficiently assign data to multiple data memory banks. thereby often failing to generate highly optimized code fer their machines. This paper presents an algorithm that helps the compiler to efficiently assign data to multi- memory banks. Our algorithm differs from previous work in that it assigns variables to memory banks in separate, decoupled code generation phases, instead of a single, tightly-coupled phase. The experimental results have revealed that our decoupled algorithm greatly simplifies our code generation process; thus our compiler runs extremely fast, yet generates target code that is comparable In quality to the code generated by a coupled approach

  • PDF

A Study on Prevention of Collision and Data Loss of the RFID System Using a Full-Length Instruction Code Method (무선인식 시스템의 완전 명령 코드 기법을 이용한 데이터 충돌 및 손실 방지에 관한 연구)

  • 강민수;신석균;이재호;박면규;이기서
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.7A
    • /
    • pp.756-765
    • /
    • 2004
  • Using single carrier frequency RFID system in one-to-multiple wireless communications, might be generated data loss because of data collisions. Conventional Anti-collision method prevent data loss from data collisions which are binary tree method and ALOHA. However, those two preventive measures also have week points which are strongly dependent on the time and space when passing through the recognition area. This paper suggests the full-length instruction code method which fits in to half-duplex method, prevents data collision effectively by calculating the non-transmitting time of multiple tags considering approaching time to the recognition area. After full-length instruction code method test using 13.56MHz bandwidth RFID system shows that full-length instruction code method could make better result than any other methods. Moreover, the record shows O(n) result after analyzing O-notation of conventional time-domain procedure.

SIMD Instruction-based Fast HEVC RExt Decoder (SIMD 명령어 기반 HEVC RExt 복호화기 고속화)

  • Mok, Jung-Soo;Ahn, Yong-Jo;Ryu, Hochan;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.20 no.2
    • /
    • pp.224-237
    • /
    • 2015
  • In this paper, we introduce the fast decoding method with the SIMD (Single Instruction Multiple Data) instructions for HEVC RExt (High Efficiency Video Coding Range Extensions). Several tools of HEVC RExt such as intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules can be classified as the proper modules for applying the SIMD instructions. In consideration of bit-depth increasement of RExt, intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules are accelerated by SSE (Streaming SIMD Extension) instructions. In addition, we propose effective implementations for interpolation filter, inverse-quantization, and clipping modules by utilizing a set of AVX2 (Advanced Vector eXtension 2) instructions that can use 256 bits register. The evaluation of the proposed methods were performed on the private HEVC RExt decoder developed based on HM 16.0. The experimental results show that the developed RExt decoder reduces 12% average decoding time, compared with the conventional sequential method.

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing (효율적인 영상데이터 처리를 위한 SIMD기반 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2011
  • Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

PC-Based Realtime Implementation of H.263 CODEC Using SIMD Method (SIMD기법에 의한 H.263 코덱의 PC기반 실시간 구현)

  • 하교동;남수영;김남철
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.947-950
    • /
    • 2001
  • This paper implements H.263 codec using SIMD(single instruction multiple data) method in real time based on PC. This system uses INS algorithm previously proposed by the authors as motion estimation module. SIMD method is used in DCT, IDCT, quantization, motion estimation, and display module. The developed algorithms are implemented using TMN5. Using the above algorithm, H.263 Codec can communicate more than 15 frames/sec in CIF resolution on a Pentium-IV 1.7GHz computer.

  • PDF

Parallel Speedup of NTGST on SIMD type Multiprocessor (SIMD 구조의 다중 프로세서를 이용한 NTGST의 병렬고속화)

  • 김복만;서경석;김종화;최흥문
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.127-130
    • /
    • 2001
  • 본 논문에서는 SIMD (Single Instruction stream and Multiple Data stream)형 병렬 구조의 다중 프로세서를 이용하여 NTGST (noise-tolerant generalized symmetry transform)를 병렬 고속화하였다. 먼저 NTGST의 화소 및 영상 영역간의 계산 독립성을 이용하여 영상을 분할하여 P개의 프로세서에 할당하고, 이들 각각을 N개의 데이터를 한번에 처리하는 SIMD 구조로 병렬화하여 NP에 비례하는 속도 향상을 얻었다. 실험에서 MMX 기술의 펜티엄 Ⅲ 프로세서를 2개 사용하여 제안한 알고리즘이 기존의 NTGST 보다 8배 가까이 고속으로 처리됨을 확인하였다.

  • PDF

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.

A Fast SAD Algorithm for Area-based Stereo Matching Methods (영역기반 스테레오 영상 정합을 위한 고속 SAD 알고리즘)

  • Lee, Woo-Young;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.7 no.2
    • /
    • pp.8-12
    • /
    • 2012
  • Area-based stereo matchng algorithms are widely used for image analysis for stereo vision. SAD (Sum of Absolute Difference) algorithm is one of well known area-based stereo matchng algorithms with the characteristics of data intensive computing application. Therefore, it requires very high computation capabilities and its processing speed becomes very slow with software realization. This paper proposes a fast SAD algorithm utilizing SSE (Streaming SIMD Extensions) instructions based on SIMD (Single Instruction Multiple Data) parallism. CPU supporing SSE instructions has 16 XMM registers with 128 bits. For the performance evaluation of the proposed scheme, we compare the processing speed between SAD with/without SSE instructions. The proposed scheme achieves four times performance improvement over the general SAD, which shows the possibility of the software realization of real time SAD algorithm.