• Title/Summary/Keyword: 레지스터

Search Result 506, Processing Time 0.025 seconds

A Design of Programmable Fragment Shader with Reduction of Memory Transfer Time (메모리 전송 효율을 개선한 programmable Fragment 쉐이더 설계)

  • Park, Tae-Ryoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.12
    • /
    • pp.2675-2680
    • /
    • 2010
  • Computation steps for 3D graphic processing consist of two stages - fixed operation stage and programming required stage. Using this characteristic of 3D pipeline, a hybrid structure between graphics hardware designed by fixed structure and programmable hardware based on instructions, can handle graphic processing more efficiently. In this paper, fragment Shader is designed under this hybrid structure. It also supports OpenGL ES 2.0. Interior interface is optimized to reduce the delay of entire pipeline, which may be occurred by data I/O between the fixed hardware and the Shader. Interior register group of the Shader is designed by an interleaved structure to improve the register space and processing speed.

Design Technique of Register-based Asynchronous FIFO (레지스터 기반 비동기 FIFO 구조 설계 기법)

  • Lee, Yong-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.1038-1041
    • /
    • 2005
  • In today's SoC design, most of IPs which use the different clock frequency from that of the bus require asynchronous FIFOs. However, in many cases, asynchronous FIFO is designed improperly and the cost of the wrong design is high. In this paper, a register-based asynchronous FIFO is designed to transfer data in asynchronous clock domains by using a valid bits scheme that eliminates the problem of the metastability and synchronization altogether. This FIFO architecture is described in HDL and synthesized to the gate level to compare with other FIFO scheme.

  • PDF

Design of a Low Power MictoController Core for Intellectual Property applications (IP활용에 적합한 저전력 MCU CORE 설계)

  • Lee, Kwang-Youb;Lee, Dong-Yup
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.2
    • /
    • pp.470-476
    • /
    • 2000
  • This paper describes an IP design of a low-power microcontroller using an architecture level design methodology instead of a transistor level. To reduce switching capacitance, the register-toregister data transfer is adopted to frequently used register transfer micro-operations. Also, distributed buffers are proposed to reduce a input data rising edge time. To reduce power consumption without any loss of performance, pipeline processing should be used. In this paper, a 4-stage pipelined datapath being able to process CISC instructions is designed. Designed microcontroller lessens power consumption by 20%. To measure a power consumption, the SYNOPSYS EPIC powermill is used.

  • PDF

Memory-Based Prefilter Architecture for a CDMA Receiver of Satellite-DMB (위성 DMB의 CDMA 수신기를 위한 메모리 기반 Prefilter 구조)

  • Kang, Hyeong-Ju
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.425-427
    • /
    • 2009
  • CDMA has been used widely in communication standards like IS-95, WCDMA, and Korea-Japan Satellite-DMB. Since CDMA has a multiple access interference (MAI) problem, a CDMA receiver requires an interference cancellation scheme like prefilter, a kind of adaptive filter. This paper proposed a memory-based prefilter architecture to reduce the area of a prefilter. An adaptive filter is usually implemented with registers for area reduction, but memory-based architecture leads to a less area for a prefilter due to its functional characteristics. Experimental results show that memory-based architecture reduces the area by around 10% in common prefilters.

  • PDF

Low Complexity Architecture for Fast-Serial Multiplier in $GF(2^m)$ ($GF(2^m)$ 상의 저복잡도 고속-직렬 곱셈기 구조)

  • Cho, Yong-Suk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.4
    • /
    • pp.97-102
    • /
    • 2007
  • In this paper, a new architecture for fast-serial $GF(2^m)$ multiplier with low hardware complexity is proposed. The fast-serial multiplier operates standard basis of $GF(2^m)$ and is faster than bit serial ones but with lower area complexity than bit parallel ones. The most significant feature of the fast-serial architecture is that a trade-off between hardware complexity and delay time can be achieved. But The traditional fast-serial architecture needs extra (t-1)m registers for achieving the t times speed. In this paper a new fast-serial multiplier without increasing the number of registers is presented.

Design of an Operator Architecture for Finite Fields in Constrained Environments (제약적인 환경에 적합한 유한체 연산기 구조 설계)

  • Jung, Seok-Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.3
    • /
    • pp.45-50
    • /
    • 2008
  • The choice of an irreducible polynomial and the representation of elements have influence on the efficiency of operators for finite fields. This paper suggests two serial multiplier for the extention field GF$(p^n)$ where p is odd prime. A serial multiplier using an irreducible binomial consists of (2n+5) resisters, 2 MUXs, 2 multipliers of GF(p), and 1 adder of GF(p). It obtains the mulitplication result after $n^2+n$ clock cycles. A serial multiplier using an AOP consists of (2n+5) resisters, 1 MUX, 1 multiplier of CF(p), and 1 adder of GF(p). It obtains the mulitplication result after $n^2$+3n+2 clock cycles.

A firmware base address search technique based on MIPS architecture using $gp register address value and page granularity

  • Seok-Joo, Mun;Young-Ho, Sohn
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.2
    • /
    • pp.1-7
    • /
    • 2023
  • In this paper, we propose a base address candidate selection method using the $gp register and page granularity as a way to build a static analysis environment for firmware based on MIPS architecture. As a way to shorten the base address search time, which is a disadvantage of the base address candidate selection method through inductive reasoning in existing studies, this study proposes a method to perform page-level search based on the $gp register in the existing base address candidate selection method as a reference point for search. Then, based on the proposed method, a base address search tool is implemented and a static analysis environment is constructed to prove the validity of the target tool. The results show that the proposed method is faster than the existing candidate selection method through inductive reasoning.

Optimized parallel implementation of Lightweight blockcipher PIPO on 32-bit RISC-V (32-bit RISC-V상에서의 경량 블록암호 PIPO 최적 병렬 구현)

  • Eum, Si-Woo;Jang, Kyung-Bae;Song, Gyeong-Ju;Lee, Min-Woo;Seo, Hwa-Jeong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.201-204
    • /
    • 2021
  • PIPO 경량 블록암호는 ICISC'20에서 발표된 암호이다. 본 논문에서는 PIPO의 단일 평문 최적화 구현과 4평문 병렬 구현을 제안한다. 단일 평문 최적화 구현은 Rlayer의 최적화와 키스케쥴을 포함하지 않은 구현을 진행하였다. 결과적으로 키스케쥴을 포함하는 기존 연구 대비 70%의 성능 향상을 확인하였다. 4평문의 경우 32-bit 레지스터를 최대한 활용하여, 레지스터 내부 정렬과 Rlayer의 최적화 구현을 진행하였다. 또한 Addroundkey 구현에서 메모리 최적화 구현과 속도 최적화 구현을 나누어 구현하였다. 메모리 사용을 줄인 메모리 최적화 구현은 단일 평문 구현 대비 80%의 성능 향상을 확인하였고, 암호화 속도를 빠르게 구현한 속도 최적화 구현은 단일 평문 구현 대비 157%의 성능 향상을 확인하였다.

2-D DCT/IDCT Processor Design Reducing Adders in DA Architecture (DA구조 이용 가산기 수를 감소한 2-D DCT/IDCT 프로세서 설계)

  • Jeong Dong-Yun;Seo Hae-Jun;Bae Hyeon-Deok;Cho Tae-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.3 s.345
    • /
    • pp.48-58
    • /
    • 2006
  • This paper presents 8x8 two dimensional DCT/IDCT processor of adder-based distributed arithmetic architecture without applying ROM units in conventional memories. To reduce hardware cost in the coefficient matrix of DCT and IDCT, an odd part of the coefficient matrix was shared. The proposed architecture uses only 29 adders to compute coefficient operation in the 2-D DCT/IDCT processor, while 1-D DCT processor consists of 18 adders to compute coefficient operation. This architecture reduced 48.6% more than the number of adders in 8x8 1-D DCT NEDA architecture. Also, this paper proposed a form of new transpose network which is different from the conventional transpose memory block. The proposed transpose network block uses 64 registers with reduction of 18% more than the number of transistors in conventional memory architecture. Also, to improve throughput, eight input data receive eight pixels in every clock cycle and accordingly eight pixels are produced at the outputs.

A SoC Design Synthesis System for High Performance Vehicles (고성능 차량용 SoC 설계 합성 시스템)

  • Chang, Jeong-Uk;Lin, Chi-Ho
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.3
    • /
    • pp.181-187
    • /
    • 2020
  • In this paper, we proposed a register allocation algorithm and resource allocation algorithm in the high level synthesis process for the SoC design synthesis system of high performance vehicles We have analyzed to the operator characteristics and structure of datapath in the most important high-level synthesis. We also introduced the concept of virtual operator for the scheduling of multi-cycle operations. Thus, we demonstrated the complexity to implement a multi-cycle operation of the operator, regardless of the type of operation that can be applied for commonly use in the resources allocation algorithm. The algorithm assigns the functional operators so that the number of connecting signal lines which are repeatedly used between the operators would be minimum. This algorithm provides regional graphs with priority depending on connected structure when the registers are allocated. The registers with connecting structure are allocated to the maximum cluster which is generated by the minimum cluster partition algorithm. Also, it minimize the connecting structure by removing the duplicate inputs for the multiplexor in connecting structure and arranging the inputs for the multiplexor which is connected to the operators. In order to evaluate the scheduling performance of the described algorithm, we demonstrate the utility of the proposed algorithm by executing scheduling on the fifth digital wave filter, a standard bench mark model.