• Title/Summary/Keyword: 레지스터

Search Result 505, Processing Time 0.026 seconds

An Efficient Array Algorithm for VLSI Implementation of Vector-radix 2-D Fast Discrete Cosine Transform (Vector-radix 2차원 고속 DCT의 VLSI 구현을 위한 효율적인 어레이 알고리듬)

  • 신경욱;전흥우;강용섬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1970-1982
    • /
    • 1993
  • This paper describes an efficient array algorithm for parallel computation of vector-radix two-dimensional (2-D) fast discrete cosine transform (VR-FCT), and its VLSI implementation. By mapping the 2-D VR-FCT onto a 2-D array of processing elements (PEs), the butterfly structure of the VR-FCT can be efficiently importanted with high concurrency and local communication geometry. The proposed array algorithm features architectural modularity, regularity and locality, so that it is very suitable for VLSI realization. Also, no transposition memory is required, which is invitable in the conventional row-column decomposition approach. It has the time complexity of O(N+Nnzp-log2N) for (N*N) 2-D DCT, where Nnzd is the number of non-zero digits in canonic-signed digit(CSD) code, By adopting the CSD arithmetic in circuit desine, the number of addition is reduced by about 30%, as compared to the 2`s complement arithmetic. The computational accuracy analysis for finite wordlength processing is presented. From simulation result, it is estimated that (8*8) 2-D DCT (with Nnzp=4) can be computed in about 0.88 sec at 50 MHz clock frequency, resulting in the throughput rate of about 72 Mega pixels per second.

  • PDF

A Design of PRESENT Crypto-Processor Supporting ECB/CBC/OFB/CTR Modes of Operation and Key Lengths of 80/128-bit (ECB/CBC/OFB/CTR 운영모드와 80/128-비트 키 길이를 지원하는 PRESENT 암호 프로세서 설계)

  • Kim, Ki-Bbeum;Cho, Wook-Lae;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.6
    • /
    • pp.1163-1170
    • /
    • 2016
  • A hardware implementation of ultra-lightweight block cipher algorithm PRESENT which was specified as a standard for lightweight cryptography ISO/IEC 29192-2 is described. The PRESENT crypto-processor supports two key lengths of 80 and 128 bits, as well as four modes of operation including ECB, CBC, OFB, and CTR. The PRESENT crypto-processor has on-the-fly key scheduler with master key register, and it can process consecutive blocks of plaintext/ciphertext without reloading master key. In order to achieve a lightweight implementation, the key scheduler was optimized to share circuits for key lengths of 80 bits and 128 bits. The round block was designed with a data-path of 64 bits, so that one round transformation for encryption/decryption is processed in a clock cycle. The PRESENT crypto-processor was verified using Virtex5 FPGA device. The crypto-processor that was synthesized using a $0.18{\mu}m$ CMOS cell library has 8,100 gate equivalents(GE), and the estimated throughput is about 908 Mbps with a maximum operating clock frequency of 454 MHz.

Fast-Transient Digital LDO Regulator With Binary-Weighted Current Control (이진 가중치 전류 제어 기법을 이용한 고속 응답 디지털 LDO 레귤레이터)

  • Woo, Ki-Chan;Sim, Jae-Hyeon;Kim, Tae-Woo;Hwang, Seon-Kwang;Yang, Byung-Do
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.6
    • /
    • pp.1154-1162
    • /
    • 2016
  • This paper proposes a fast-transient digital LDO(Low dropout) regulator with binary-weighted current control technique. Conventional digital LDO takes a long time to stabilize the output voltage, because it controls the amount of current step by step, thus ringing problem is generated. Binary-weighted current control technique rapidly stabilizes output voltage by removing the ringing problem. When output voltage reliably reaches the target voltage, It added the FRZ mode(Freeze) to stop the operation of digital LDO. The proposed fast response digital LDO is used with a slow response DC-DC converter in the system which rapidly changes output voltage. The proposed digital controller circuit area was reduced by 56% compared to conventional bidirectional shift register, and the ripple voltage was reduced by 87%. A chip was implemented with a $0.18{\mu}F$ CMOS process. The settling time is $3.1{\mu}F$ and the voltage ripple is 6.2mV when $1{\mu}F$ output capacitor is used.

The Realization of RFID Tag Data Communication System Using CC1020 (CC1020을 이용한 RFID Tag 데이터 통신 시스템 구현)

  • Jo, Heung-Kuk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.4
    • /
    • pp.833-838
    • /
    • 2011
  • RFID system in manufacturing industry is used to collect, categorize, and process the data of products. To install RFID system for a large factory, a large amount of wired data communication network is necessary for RS232 communication. If the installed location of RFID system in the factory is changed or extended, a reinstallment is required for the already installed wired data network. A large amount of time/financial reinvestment is necessary for such reinstallation. By using wireless data communication network, however, the initial installation and reinstallation are very simple. In this paper, we implemented a wireless communication system and RFID system. We used the CC1020 chip for wireless communication system and EM4095 chip for RFID system. CC1020 chip enables highly-reliable data communication, and by setting a simple status register, it can switch between transmitting/receiving status and it can choose the desired frequency of either 400 MHz or 900 MHz. Also, Communication range is 50 m, if external antenna is used. EM4095 is a chip for RFID reader system with the carrier frequency of 125 KHz. This chip can implement the reader system by connecting a small number of components. And EM4100 was used for RFID system. EM4100 is read-only type. Atmega128 is used to control a wireless communication system and RFID system. We confirm that the system can communicate without error up to 50 m from sender. In the paper, the circuit diagram and operation program for CC1020 and RFID system are presented. The system used in the experiment is shown in pictures, and the data movement pattern of CC1020 is shown in the diagram, and the performance of each transmission method is presented.

Simulation of YUV-Aware Instructions for High-Performance, Low-Power Embedded Video Processors (고성능, 저전력 임베디드 비디오 프로세서를 위한 YUV 인식 명령어의 시뮬레이션)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.252-259
    • /
    • 2007
  • With the rapid development of multimedia applications and wireless communication networks, consumer demand for video-over-wireless capability on mobile computing systems is growing rapidly. In this regard, this paper introduces YUV-aware instructions that enhance the performance and efficiency in the processing of color image and video. Traditional multimedia extensions (e.g., MMX, SSE, VIS, and AltiVec) depend solely on generic subword parallelism whereas the proposed YUV-aware instructions support parallel operations on two-packed 16-bit YUV (6-bit Y, 5-bits U, V) values in a 32-bit datapath architecture, providing greater concurrency and efficiency for color image and video processing. Moreover, the ability to reduce data format size reduces system cost. Experiment results on a representative dynamically scheduled embedded superscalar processor show that YUV-aware instructions achieve an average speedup of 3.9x over the baseline superscalar performance. This is in contrast to MMX (a representative Intel#s multimedia extension), which achieves a speedup of only 2.1x over the same baseline superscalar processor. In addition, YUV-aware instructions outperform MMX instructions in energy reduction (75.8% reduction with YUV-aware instructions, but only 54.8% reduction with MMX instructions over the baseline).

Development of Software Correlator for KJJVC (한일공동VLBI상관기를 위한 소프트웨어 상관기의 개발)

  • Yeom, J.H.;Oh, S.J.;Roh, D.G.;Kang, Y.W.;Park, S.Y.;Lee, C.H.;Chung, H.S.
    • Journal of Astronomy and Space Sciences
    • /
    • v.26 no.4
    • /
    • pp.567-588
    • /
    • 2009
  • Korea-Japan Joint VLBI Correlator (KJJVC) is being developed by collaborating KASI (Korea Astronomy and Space Science Institute), Korea, and NAOJ(National Observatory of Japan), Japan. In early 2010, KJJVC will work in normal operation. In this study, we developed the software correlator which is based on VCS (VLBI Correlation Subsystem) hardware specification as the core component of KJJVC. The main specification of software correlator is 8 Gbps, 8192 output channels, and 262,144-points FFT (Fast Fourier Transform) function same as VCS. And the functional algorithm which is same as specification of VCS and arithmetic register are adopted in this software correlator. To verify the performance of developed software correlator, the correlation experiments were carried out using the spectral line and continuum sources which were observed by VERA (VLBI Exploration of Radio Astrometry), NAOJ. And the experimental results were compared to the output of Mitaka FX correlator by referring spectrum shape, phase rate, and fringe detection and so on. Through the experimental results, we confirmed that the correlation results of software correlator are the same as Mitaka FX correlator and verified the effectiveness of it. In future, we expect that the developed software correlator will be the possible software correlator of KVN (Korean VLBI Network) with KJJVC by introducing the correlation post-processing and modifying the user interface as like GUI (Graphic User Interface).

A 16 bit FPGA Microprocessor for Embedded Applications (실장제어 16 비트 FPGA 마이크로프로세서)

  • 차영호;조경연;최혁환
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.7
    • /
    • pp.1332-1339
    • /
    • 2001
  • SoC(System on Chip) technology is widely used in the field of embedded systems by providing high flexibility for a specific application domain. An important aspect of development any new embedded system is verification which usually requires lengthy software and hardware co-design. To reduce development cost of design effort, the instruction set of microprocessor must be suitable for a high level language compiler. And FPGA prototype system could be derived and tested for design verification. In this paper, we propose a 16 bit FPGA microprocessor, which is tentatively-named EISC16, based on an EISC(Extendable Instruction Set Computer) architecture for embedded applications. The proposed EISC16 has a 16 bit fixed length instruction set which has the short length offset and small immediate operand. A 16 bit offset and immediate operand could be extended using by an extension register and an extension flag. We developed a cross C/C++ compiler and development software of the EISC16 by porting GNU on an IBM-PC and SUN workstation and compared the object code size created after compiling a C/C. standard library, concluding that EISC16 exhibits a higher code density than existing 16 microprocessors. The proposed EISC16 requires approximately 6,000 gates when designed and synthesized with RTL level VHDL at Xilinix's Virtex XCV300 FPGA. And we design a test board which consists of EISC16 ROM, RAM, LED/LCD panel, periodic timer, input key pad and RS-232C controller. 11 works normally at 7MHz Clock.

  • PDF

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 픽셀 서브워드 병렬처리 명령어 구현)

  • Jung, Yong-Bum;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.99-108
    • /
    • 2011
  • Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.

A DLL-Based Multi-Clock Generator Having Fast-Relocking and Duty-Cycle Correction Scheme for Low Power and High Speed VLSIs (저전력 고속 VLSI를 위한 Fast-Relocking과 Duty-Cycle Correction 구조를 가지는 DLL 기반의 다중 클락 발생기)

  • Hwang Tae-Jin;Yeon Gyu-Sung;Jun Chi-Hoon;Wee Jae-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.2 s.332
    • /
    • pp.23-30
    • /
    • 2005
  • This paper describes a DLL(delay locked loop)-based multi-clock generator having the lower active stand-by power as well as a fast relocking after re-activating the DLL. for low power and high speed VLSI chip. It enables a frequency multiplication using frequency multiplier scheme and produces output clocks with 50:50 duty-ratio regardless of the duty-ratio of system clock. Also, digital control scheme using DAC enables a fast relocking operation after exiting a standby-mode of the clock system which was obtained by storing analog locking information as digital codes in a register block. Also, for a clock multiplication, it has a feed-forward duty correction scheme using multiphase and phase mixing corrects a duty-error of system clock without requiring additional time. In this paper, the proposed DLL-based multi-clock generator can provides a synchronous clock to an external clock for I/O data communications and multiple clocks of slow and high speed operations for various IPs. The proposed DLL-based multi-clock generator was designed by the area of $1796{\mu}m\times654{\mu}m$ using $0.35-{\mu}m$ CMOS process and has $75MHz\~550MHz$ lock-range and maximum multiplication frequency of 800 MHz below 20psec static skew at 2.3v supply voltage.

Efficient Indirect Branch Predictor Based on Data Dependence (효율적인 데이터 종속 기반의 간접 분기 예측기)

  • Paik Kyoung-Ho;Kim Eun-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.1-14
    • /
    • 2006
  • The indirect branch instruction is a most substantial obstacle in utilizing ILP of modem high performance processors. The target address of an indirect branch has the polymorphic characteristic varied dynamically, so it is very difficult to predict the accurate target address. Therefore the performance of a processor with speculative methodology is reduced significantly due to the many execution cycle delays in occurring the misprediction. We proposed the very accurate and novel indirect branch prediction scheme so called data-dependence based prediction. The predictor results in the prediction accuracy of 98.92% using 1K entries, and. 99.95% using 8K But, all of the proposed indirect predictor including our predictor has a large hardware overhead for restoring expected target addresses as well as tags for alleviating an aliasing. Hence, we propose the scheme minimizing the hardware overhead without sacrificing the prediction accuracy. Our experiment results show that the hardware is reduced about 60% without the performance loss, and about 80% sacrificing only the performance loss of 0.1% in aspect of the tag overhead. Also, in aspect of the overhead of storing target addresses, it can save the hardware about 35% without the performance loss, and about 45% sacrificing only the performance loss of 1.11%.