• Title/Summary/Keyword: Embedded CPU

Search Result 222, Processing Time 0.027 seconds

Hardware Design for JBIG2 Encoder on Embedded System (임베디드용 JBIG2 부호화기의 하드웨어 설계)

  • Seo, Seok-Yong;Ko, Hyung-Hwa
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.2C
    • /
    • pp.182-192
    • /
    • 2010
  • This paper proposes the hardware IP design of JBIG2 encoder. In order to facilitate the next generation FAX after the standardization of JBIG2, major modules of JBIG2 encoder are designed and implemented, such as symbol extraction module, Huffman coder, MMR coder, and MQ coder. ImpulseC Codeveloper and Xilinx ISE/EDK program are used for the synthesis of VHDL code. To minimize the memory usage, 128 lines of input image are processed succesively instead of total image. The synthesized IPs are downloaded to Virtex-4 FX60 FPGA on ML410 development board. The four synthesized IPs utilize 36.7% of total slice of FPGA. Using Active-HDL tool, the generated IPs were verified showing normal operation. Compared with the software operation using microblaze cpu on ML410 board, the synthesized IPs are better in operation time. The improvement ratio of operation time between the synthesized IP and software is 17 times in case of symbol extraction IP, and 10 times in Huffman coder IP. MMR coder IP shows 6 times faster and MQ coder IP shows 2.2 times faster than software only operation. The synthesized H/W IP and S/W module cooperated to succeed in compressing the CCITT standard document.

A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations (유한요소 비압축성 유동장 해석을 위한 이중공액구배법의 GPU 기반 연산에 대한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Jung, Hye Dong;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.40 no.9
    • /
    • pp.597-604
    • /
    • 2016
  • A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.

A Study of Purity-based Page Allocation Scheme for Flash Memory File Systems (플래시 메모리 파일 시스템을 위한 순수도 기반 페이지 할당 기법에 대한 연구)

  • Baek, Seung-Jae;Choi, Jong-Moo
    • The KIPS Transactions:PartA
    • /
    • v.13A no.5 s.102
    • /
    • pp.387-398
    • /
    • 2006
  • In this paper, we propose a new page allocation scheme for flash memory file system. The proposed scheme allocates pages by exploiting the concept of Purity, which is defined as the fraction of blocks where valid Pages and invalid Pages are coexisted. The Pity determines the cost of block cleaning, that is, the portion of pages to be copied and blocks to be erased for block cleaning. To enhance the purity, the scheme classifies hot-modified data and cold-modified data and allocates them into different blocks. The hot/cold classification is based on both static properties such as attribute of data and dynamic properties such as the frequency of modifications. We have implemented the proposed scheme in YAFFS and evaluated its performance on the embedded board equipped with 400MHz XScale CPU, 64MB SDRAM, and 64MB NAND flash memory. Performance measurements have shown that the proposed scheme can reduce block cleaning time by up to 15.4 seconds with an average of 7.8 seconds compared to the typical YAFFS. Also, the enhancement becomes bigger as the utilization of flash memory increases.

Design and Implementation of Hand-Held Inspection Device for High Performance Mobile TFT LCD/OLED Module (고성능 모바일 TFT LCD/OLED 모듈을 위한 헨드헬드 검사장비 설계 및 구현)

  • Moon, Seung-Jin;Kim, Hong-Kyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.6B
    • /
    • pp.630-640
    • /
    • 2009
  • The thesis suggests hand-held equipment to overhaul for mobile TFT LCD/OLED module of high-performance. The established module equipment to overhaul could distinguish outputting video data to module for distinguishing flicker, but it is impossible with low system. In this thesis, supporting system could check the various supplement functions from bringing equipment to overhaul without changing design of FPGA or H/W the module various size for equipment to overhaul for module of high-performance coincidently. The system includes hand-held equipment to overhaul, test software embedded and software a base personal computer and have designed to output, save, and certify all contents of module test of hand-held equipment to overhaul to interface universal serial bus. Setting up 9 items that represent for efficient verification of the proposed system have been possible confirmation with TFT LCD/OLED module of high-performance, establishment scan time, creation gamma, changing register, supporting interface, and multi inch modules.

Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources (한정된 자원을 갖는 FPGA에서의 이진가중치 신경망 가속처리 구조 설계 및 구현)

  • Kim, Jong-Hyun;Yun, SangKyun
    • Journal of IKEEE
    • /
    • v.24 no.1
    • /
    • pp.225-231
    • /
    • 2020
  • In this paper, we propose a method to accelerate BWN based on FPGA with limited resources for embedded system. Because of the limited number of logic elements available, a single computing unit capable of handling Conv-layer, FC-layer of various sizes must be designed and reused. Also, if the input feature map can not be parallel processed at one time, the output must be calculated by reading the inputs several times. Since the number of available BRAM modules is limited, the number of data bits in the BWN accelerator must be minimized. The image classification processing time of the BWN accelerator is superior when compared with a embedded CPU and is faster than a desktop PC and 50% slower than a GPU system. Since the BWN accelerator uses a slow clock of 50MHz, it can be seen that the BWN accelerator is advantageous in performance versus power.

Synthesizing multi-loop control systems with period adjustment and Kernel compilation (주기 조정과 커널 자동 생성을 통한 다중 루프 시스템의 구현)

  • Hong, Seong-Soo;Choi, Chong-Ho;Park, Hong-Seong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.3 no.2
    • /
    • pp.187-196
    • /
    • 1997
  • This paper presents a semi-automatic methodology to synthesize executable digital controller saftware in a multi-loop control system. A digital controller is described by a task graph and end-to-end timing requirements. A task graph denotes the software structure of the controller, and the end-to-end requirements establish timing relationships between external inputs and outputs. Our approach translates the end-to-end requirements into a set of task attributes such as task periods and deadlines using nonlinear optimization techniques. Such attributes are essential for control engineers to implement control programs and schedule them in a control system with limited resources. In current engineering practice, human programmers manually derive those attributes in an ad hoc manner: they often resort to radical over-sampling to safely guarantee the given timing requirements, and thus render the resultant system poorly utilized. After task-specific attributes are derived, the tasks are scheduled on a single CPU and the compiled kernel is synthesized. We illustrate this process with a non-trivial servo motor control system.

  • PDF

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.

Implementation of Embedded Digital Set-top box/PVR (내장형 디지털 방송 수신기 및 PVR 개발)

  • Song, Jae-Jong;Lee, Seok-Pil;Jang, Se-Jin;Park, Seong-Ju
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.284-287
    • /
    • 2004
  • 본 논문의 목표는 일체형 디지털 TV에서 디지털 방송 수신과 방송 컨텐츠를 녹화, 저장, 재생이 가능할 뿐만 아니라 조만간 시작될 데이터 방송을 수신할 수 있는 내장형 디지털 방송수신 및 개인 비디오 저장 시스템 Platform 개발이다. 디지털 방송과 데이터방송 수신이 가능한 Set-Top Box 기능, 수신된 방송의 저장 및 재생이 가능한 PVR 기능을 지원할 수 있는 시스템의 구조를 설계하였다. 고품질 디지털 방송 서비스가 본격적으로 시작됨에 따라 디지털 방송 수신기와 PVR 기능이 복합된 제품의 수요가 증가할 것으로 예상되며 이러한 고성능 복합시스템은 필수적일 것이다. 이러한 기능을 수행하기 위하여 시스템 제어를 위한 CPU로는 PMC-Sierra 사의 MIPS Architecture에 기반을 둔 RM5231을 채택하고, Teralogic 사의 TL811 System Controller을 채택하여 시스템을 이루고 있는 각종 디바이스를 구성하고, MPEG-2 Demux/Decoding을 위해 Teralogic TL851 Graphics & Display Processor을 채택하였다. 개발된 시스템을 테스트하기 위하여 현재 각 방송사들의 시험 방송을 수신하고 PVR 기능을 테스트하였다.

  • PDF

A Digital Convergence Platform Implemented with Embedded System Technologies (임베디드 기술에 기반한 디지털 컨버젼스 플랫폼 구현에 관한 연구)

  • On, Hwa-Yong;Kin, Dong-Hwan;Lee, Eun-Seo;Chang, Tae-Gyu
    • Proceedings of the KIEE Conference
    • /
    • 2005.07d
    • /
    • pp.2912-2914
    • /
    • 2005
  • 본 논문에서는 방송, 멀티미디어, 통신 및 가전제어가 가능하고 디지털 아이템의 양방향 거래가 가능한 디지털 컨버젼스 플랫폼을 임베디드 기술을 이용하여 구현하였다. 단말인 SDMP(Software Defined Media Platform)는 디지털 방송과 고화질의 비디오와 오디오를 처리하기 위한 하드웨어 구조를 갖고, 이를 바탕으로 다양한 형태의 코덱 지원 및 여러 가지 서비스를 지원하기 위한 유연 플랫폼 소프트웨어 기술을 이용하여 구현하였다. 디지털 컨버젼스 하드웨어는 RISC CPU와 DSP를 이용하여 임베디드 OS상에서 방송, 통신 및 멀티미디어 서비스를 지원한다. 또한 디지털 컨버젼스 하드웨어는 고성능 DSP를 MPU로 하여 다채널의 입출력 가능한 오디오 코더 및 HD 급 화질 비디오 입출력 처리가 가능하도록 설계한다. 디지털 컨버젼스 소프트웨어는 MEFG-21에 기반한 유연 플랫폼 소프트웨어를 임베디드 Linux 상에서 다양한 서비스 모델을 수용할 수 있는 상위 어플리케이션으로 구현하였다. 이는 네트워크를 통한 멀티미디어를 스트리밍 할 수 있으며, 단순한 재생기의 기능을 넘어서 컨텐츠 제작이 가능하고 능동적인 개념의 다기능 소프트웨어로 컨텐츠를 서로 다른 네트워크 간 혹은 단말 간의 통신이 가능하게 하였다.

  • PDF

DESIGN AND IMPLEMENTATION OF 3D TERRAIN RENDERING SYSTEM ON MOBILE ENVIRONMENT USING HIGH RESOLUTION SATELLITE IMAGERY

  • Kim, Seung-Yub;Lee, Ki-Won
    • Proceedings of the KSRS Conference
    • /
    • v.1
    • /
    • pp.417-420
    • /
    • 2006
  • In these days, mobile application dealing with information contents on mobile or handheld devices such as mobile communicator, PDA or WAP device face the most important industrial needs. The motivation of this study is the design and implementation of mobile application using high resolution satellite imagery, large-sized image data set. Although major advantages of mobile devices are portability and mobility to users, limited system resources such as small-sized memory, slow CPU, low power and small screen size are the main obstacles to developers who should handle a large volume of geo-based 3D model. Related to this, the previous works have been concentrated on GIS-based location awareness services on mobile; however, the mobile 3D terrain model, which aims at this study, with the source data of DEM (Digital Elevation Model) and high resolution satellite imagery is not considered yet, in the other mobile systems. The main functions of 3D graphic processing or pixel pipeline in this prototype are implemented with OpenGL|ES (Embedded System) standard API (Application Programming Interface) released by Khronos group. In the developing stage, experiments to investigate optimal operation environment and good performance are carried out: TIN-based vertex generation with regular elevation data, image tiling, and image-vertex texturing, text processing of Unicode type and ASCII type.

  • PDF