• Title/Summary/Keyword: Hardware Resources

Search Result 442, Processing Time 0.026 seconds

Optimization of H.264 Encoder based on Hardware Implementation in Embedded System (임베디드시스템 환경에서 하드웨어 기반 H.264 Encoder 최적화)

  • Cho, Jung-Hyun;Lee, Myung-Soo;Jeong, Han-Soo;Kim, Chang-Suk;Cho, Dae-Jea
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.8
    • /
    • pp.3076-3082
    • /
    • 2010
  • The techniques and the products which use various video compression codec are come out from army or civil field. In existing high-end PC environment, process of the video compression codec does not become a problem, but in embedded system environments which limited system resources, because the system load due to the high-resolution images compressed by high-density, issues of performance and utilization are highlighted. This paper proposes the DirectShow Filter interfaces which are a hardware method in order to solve the problem existing software algorithms for image compression performance and peripheral interfaces.

Improving Hardware Resource Utilization for Software Load Balancer using Multiprocess in Virtual Machine (멀티 프로세스를 사용한 가상 머신에서의 소프트웨어 로드밸런서의 효율적인 물리 자원 활용 연구)

  • Kim, Minsu;Kim, Seung Hun;Lee, Sang-Min;Ro, Won Woo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.9
    • /
    • pp.103-108
    • /
    • 2014
  • In the virtualized server systems, a scheduler in a hypervisor is responsible to assign physical resources for virtual machines. However, the traditional scheduler is hard to provide optimized resource allocation considering the amount of I/O requests. Especially, the drawback hinders performance of software load balancer which runs on virtual machines to distribute I/O requests from the clients. In this paper, we propose a new architecture to improve the performance of software load balancer using multiprocess. Our architecture aims to improve hardware resource utilization and overall performance of the server systems which utilize virtualization technology. Experimental results show the effectiveness of the proposed architecture for the various cases.

An implementation of block cipher algorithm HIGHT for mobile applications (모바일용 블록암호 알고리듬 HIGHT의 하드웨어 구현)

  • Park, Hae-Won;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.125-128
    • /
    • 2011
  • This paper describes an efficient hardware implementation of HIGHT block cipher algorithm, which was approved as standard of cryptographic algorithm by KATS(Korean Agency for Technology and Standards) and ISO/IEC. The HIGHT algorithm, which is suitable for ubiquitous computing devices such as a sensor in USN or a RFID tag, encrypts a 64-bit data block with a 128-bit cipher key to make a 64-bit cipher text, and vice versa. For area-efficient and low-power implementation, we optimize round transform block and key scheduler to share hardware resources for encryption and decryption. The HIGHT64 core synthesized using a $0.35-{\mu}m$ CMOS cell library consists of 3,226 gates, and the estimated throughput is 150-Mbps with 80-MHz@2.5-V clock.

  • PDF

Single-Ended High-Efficiency Step-up Converter Using the Isolated Switched-Capacitor Cell

  • Kim, Do-Hyun;Jang, Jong-Ho;Park, Joung-Hu;Kim, Jung-Won
    • Journal of Power Electronics
    • /
    • v.13 no.5
    • /
    • pp.766-778
    • /
    • 2013
  • The depletion of natural resources and renewable energy sources, such as photovoltaic (PV) energy, has been highlighted for global energy solution. The PV power control unit in the PV power-generation technology requires a high step-up DC-DC converter. The conventional step-up DC-DC converter has low efficiency and limited step-up ratio. To overcome these problems, a novel high step-up DC-DC converter using an isolated switched capacitor cell is proposed. The step-up converter uses the proposed transformer and employs the switched-capacitor cell to enable integration with the boost inductor. The output of the boost converter and isolated switched-capacitor cell are connected in series to obtain high step-up with low turn-on ratio. A hardware prototype with 30 V to 40 V input voltage and 340 V output voltage is implemented to verify the performance of the proposed converter. As an extended version, another novel high step-up isolated switched-capacitor single-ended DC-DC converter integrated with a tapped-inductor (TI) boost converter is proposed. The TI boost converter and isolated-switched-capacitor outputs are connected in series to achieve high step-up. All magnetic components are integrated in a single magnetic core to lower costs. A prototype hardware with 20 V to 40 V input voltage, 340 V output voltage, and 100 W output power is implemented to verify the performance of the proposed converter.

Design of a Low Power Turbo Decoder by Reducing Decoding Iterations (반복 복호수 감소에 의한 저전력 터보 복호기의 설계)

  • Back, Seo-Young;Kim, Sik;Back, Seo-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.1-8
    • /
    • 2004
  • This paper proposes a novel algorithm for a low power turbo decoder based on reduction of number of decoding iterations, targeting power-critical mobile communication devices. Previous researches that attempt to reduce number of decoding iterations, such as CRC-aided and LLR methods, either show degraded BER performance in return for reduced complexity or require additional hardware resources for controlling the number of iterations to meet BER performance, respectively. The proposed algorithm can reduce power consumption without degrading the BER performance, and it is achieved with minimal hardware overhead. The proposed algorithm achieves this by comparing consecutive hard decision results using a simple buffer and counter. Simulation results show that the number of decoding iterations can be reduced to about 60% without degrading the BER performance in the proposed decoder, and power consumption can be saved in proportion to the number of decoding iterations.

A SPECK Crypto-Core Supporting Eight Block/Key Sizes (8가지 블록/키 크기를 지원하는 SPECK 암호 코어)

  • Yang, Hyeon-Jun;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.24 no.2
    • /
    • pp.468-474
    • /
    • 2020
  • This paper describes the hardware implementation of SPECK, a lightweight block cipher algorithm developed for the security of applications with limited resources such as IoT and wireless sensor networks. The block cipher SPECK crypto-core supports 8 block/key sizes, and the internal data-path was designed with 16-bit for small gate counts. The final round key to be used for decryption is pre-generated through the key initialization process and stored with the initial key, enabling the encryption/decryption for consecutive blocks. It was also designed to process round operations and key scheduling independently to increase throughput. The hardware operation of the SPECK crypto-core was validated through FPGA verification, and it was implemented with 1,503 slices on the Virtex-5 FPGA device, and the maximum operating frequency was estimated to be 98 MHz. When it was synthesized with a 180 nm process, the maximum operating frequency was estimated to be 163 MHz, and the estimated throughput was in the range of 154 ~ 238 Mbps depending on the block/key sizes.

A Scenario based Framework for System Setup and Scheduling in Reconfigurable Manufacturing Systems (재구성형 유연가공라인을 위한 시나리오 기반 시스템 셋업 및 스케줄링 체계)

  • Lee, Dong-Ho;Kim, Ji-Su;Kim, Hyung-Won;Doh, Hyoung-Ho;Yu, Jae-Min;Nam, Sung-Ho
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.28 no.3
    • /
    • pp.339-348
    • /
    • 2011
  • Reconfigurable manufacturing system (RMS), alternatively called changeable manufacturing, is a new manufacturing paradigm designed for rapid change in hardware and software components in order to quickly adjust production capacity and functionality in response to sudden changes in market or in regulatory requirements. Although there has been much progress in hardware components during the last decade, not much work has been done on operational issues of RMS. As one of starting studies on the operational issues, we suggest a framework for the system setup and scheduling problems to cope with the reconfigurability of RMS. System setup, which includes batching, part grouping, and loading, are concerned with the pre-arrangement of parts and tools before the system begins to process, and scheduling is the problem of allocating manufacturing resources over time to perform the operations specified by system setup. The framework consists of 8 scenarios classified by three major factors: order arrival process, part selection process, and tool magazine capacity. Each of the scenarios is explained with its subproblems and their interrelationships.

FPGA Implementation of SURF-based Feature extraction and Descriptor generation (SURF 기반 특징점 추출 및 서술자 생성의 FPGA 구현)

  • Na, Eun-Soo;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.4
    • /
    • pp.483-492
    • /
    • 2013
  • SURF is an algorithm which extracts feature points and generates their descriptors from input images, and it is being used for many applications such as object recognition, tracking, and constructing panorama pictures. Although SURF is known to be robust to changes of scale, rotation, and view points, it is hard to implement it in real time due to its complex and repetitive computations. Using 3.3 GHz Pentium, in our experiment, it takes 240ms to extract feature points and create descriptors in a VGA image containing about 1,000 feature points, which means that software implementation cannot meet the real time requirement, especially in embedded systems. In this paper, we present a hardware architecture that can compute the SURF algorithm very fast while consuming minimum hardware resources. Two key concepts of our architecture are parallelism (for repetitive computations) and efficient line memory usage (obtained by analyzing memory access patterns). As a result of FPGA synthesis using Xilinx Virtex5LX330, it occupies 101,348 LUTs and 1,367 KB on-chip memory, giving performance of 30 frames per second at 100 MHz clock.

High Performance Elliptic Curve Cryptographic Processor for $GF(2^m)$ ($GF(2^m)$의 고속 타원곡선 암호 프로세서)

  • Kim, Chang-Hoon;Kim, Tae-Ho;Hong, Chun-Pyo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.3
    • /
    • pp.113-123
    • /
    • 2007
  • This paper presents a high-performance elliptic curve cryptographic processor over $GF(2^m)$. The proposed design adopts Lopez-Dahab Montgomery algorithm for elliptic curve point multiplication and uses Gaussian normal basis for $GF(2^m)$ field arithmetic operations. We select m=163 which is the smallest value among five recommended $GF(2^m)$ field sizes by NIST and it is Gaussian normal basis of type 4. The proposed elliptic curve cryptographic processor consists of host interface, data memory, instruction memory, and control. We implement the proposed design using Xilinx XCV2000E FPGA device. Based on the FPGA implementation results, we can see that our design is 2.6 times faster and requires significantly less hardware resources compared with the previously proposed best hardware implementation.

Efficient Architecture of an n-bit Radix-4 Modular Multiplier in Systolic Array Structure (시스톨릭 어레이 구조를 갖는 효율적인 n-비트 Radix-4 모듈러 곱셈기 구조)

  • Park, Tae-geun;Cho, Kwang-won
    • The KIPS Transactions:PartA
    • /
    • v.10A no.4
    • /
    • pp.279-284
    • /
    • 2003
  • In this paper, we propose an efficient architecture for radix-4 modular multiplication in systolic array structure based on the Montgomery's algorithm. We propose a radix-4 modular multiplication algorithm to reduce the number of iterations, so that it takes (3/2)n+2 clock cycles to complete an n-bit modular multiplication. Since we can interleave two consecutive modular multiplications for 100% hardware utilization and can start the next multiplication at the earliest possible moment, it takes about only n/2 clock cycles to complete one modular multiplication in the average. The proposed architecture is quite regular and scalable due to the systolic array structure so that it fits in a VLSI implementation. Compared to conventional approaches, the proposed architecture shows shorter period to complete a modular multiplication while requiring relatively less hardware resources.