Search | Korea Science

Design Space Exploration for NoC-Style Bus Networks

Kim, Jin-Sung;Lee, Jaesung
- ETRI Journal
- /
- v.38 no.6
- /
- pp.1240-1249
- /
- 2016
With the number of IP cores in a multicore system-on-chip increasing to up to tens or hundreds, the role of on-chip interconnection networks is vital. We propose a networks-on-chip-style bus network as a compromise and redefine the exploration problem to find the best IP tiling patterns and communication path combinations. Before solving the problem, we estimate the time complexity and validate the infeasibility of the solution. To reduce the time complexity, we propose two fast exploration algorithms and develop a program to implement these algorithms. The program is executed for several experiments, and the exploration time is reduced to approximately 1/22 and 7/1,200 at the first and second steps of the exploration process, respectively. However, as a trade-off for the time saving, the time cost (TC) of the searched architecture is increased to up to 4.7% and 11.2%, respectively, at each step compared with that of the architecture obtained through full-case exploration. The reduction ratio can be decreased to 1/4,000 by simultaneously applying both the algorithms even though the resulting TC is increased to up to 13.1% when compared with that obtained through full-case exploration.
https://doi.org/10.4218/etrij.16.0116.0037 인용 PDF KSCI KPUBS

A Decoder Design for High-Speed RS code (RS 코드를 이용한 복호기 설계)

박화세;김은원
- Journal of the Korean Institute of Telematics and Electronics T
- /
- v.35T no.1
- /
- pp.59-66
- /
- 1998
In this paper, the high-speed decoder for RS(Reed-Solomon) code, one of the most popular error correcting code, is implemented using VHDL. This RS decoder is designed in transform domain instead of most time domain. Because of the simplicity in structure, transform decoder can be easily realized VLSI chip. Additionally the pipeline architecture, which is similar to a systolic array is applied for all design. Therefore, This transform RS decoder is suitable for high-rate data transfer. After synthesis with FPGA technology, the decoding rate is more 43 Mbytes/s and the area is 1853 LCs(Logic Cells). To compare with other product with pipeline architecture, this result is admirable. Error correcting ability and pipeline performance is certified by computer simulation.
PDF

A High Speed Motion Estimation Architedture for Three Step Search Algorithm (삼단검색 알고리즘을 위한 움직임 추정기 구조)

Kim, Sang-Jung;Kim, Yong-Gil;Im, Gang-Bin;Kim, Yong-Deuk;Jeong, Gi-Hyeon
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.2
- /
- pp.616-627
- /
- 1997
We porpose a new VLSI architecture for the three step search algorithm for the mition estimation of moving images.In the proposed architecture the regular data input is possible and the data are passed through all computational processes ,minimiuzing the input bandwidth.The performandce is analyzed in detail,and compared with other architectures.The performance is approaching to the ideal computation speed,with less hardware than for the existing architectures.
PDF

A Built-In Self-Test Architecture using Self-Scan Chains (자체 스캔 체인을 이용한 Built-In Self-Test 구조에 관한 연구)

Han, Jin-Uk;Min, Hyeong-Bok
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.39 no.3
- /
- pp.85-97
- /
- 2002
STUMPS has been widely used for built-in self-test of scan design with multiple scan chains. In the STUMPS architecture, there is very high correlation between the bit sequences in the adjacent scan chains. This correlation causes circuits lower the fault coverage. In order to solve this problem, an extra combinational circuit block(phase shifter) is placed between the LFSR and the inputs of STUMPS architecture despite the hardware overhead increase. This paper introduces an efficient test pattern generation technique and built-in self-test architecture for sequential circuits with multiple scan chains. The proposed test pattern generator is not used the input of LFSR and phase shifter, hence hardware overhead can be reduced and sufficiently high fault coverage is obtained. Only several XOR gates in each scan chain are required to modify the circuit for the scan BIST, so that the design is very simple.
PDF KSCI

A New Arithmetic Unit Over GF(2$^{m}$ ) for Low-Area Elliptic Curve Cryptographic Processor (저 면적 타원곡선 암호프로세서를 위한 GF(2$^{m}$ )상의 새로운 산술 연산기)

김창훈;권순학;홍춘표
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.28 no.7A
- /
- pp.547-556
- /
- 2003
This paper proposes a novel arithmetic unit over GF(2$^{m}$ ) for low-area elliptic curve cryptographic processor. The proposed arithmetic unit, which is linear feed back shift register (LFSR) architecture, is designed by using hardware sharing between the binary GCD algorithm and the most significant bit (MSB)-first multiplication scheme, and it can perform both division and multiplication in GF(2$^{m}$ ). In other word, the proposed architecture produce division results at a rate of one per 2m-1 clock cycles in division mode and multiplication results at a rate of one per m clock cycles in multiplication mode. Analysis shows that the computational delay time of the proposed architecture, for division, is less than previously proposed dividers with reduced transistor counts. In addition, since the proposed arithmetic unit does not restrict the choice of irreducible polynomials and has regularity and modularity, it provides a high flexibility and scalability with respect to the field size m. Therefore, the proposed novel architecture can be used for both division and multiplication circuit of elliptic curve cryptographic processor. Specially, it is well suited to low-area applications such as smart cards and hand held devices.
PDF KSCI

High-Speed Reed-Solomon Decoder Using New Degree Computationless Modified Euclid´s Algorithm (새로운 DCME 알고리즘을 사용한 고속 Reed-Solomon 복호기)

백재현;선우명훈
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.40 no.6
- /
- pp.459-468
- /
- 2003
This paper proposes a novel low-cost and high-speed Reed-Solomon (RS) decoder based on a new degree computationless modified Euclid´s (DCME) algorithm. This architecture has quite low hardware complexity compared with conventional modified Euclid´s (ME) architectures, since it can remove completely the degree computation and comparison circuits. The architecture employing a systolic away requires only the latency of 2t clock cycles to solve the key equation without initial latency. In addition, the DCME architecture using 3t＋2 basic cells has regularity and scalability since it uses only one processing element. The RS decoder has been synthesized using the 0.25${\mu}{\textrm}{m}$. Faraday CMOS standard cell library and operates at 200MHz and its data rate suppots up to 1.6Gbps. For tile (255, 239, 8) RS code, the gate counts of the DCME architecture and the whole RS decoder excluding FIFO memory are only 21,760 and 42,213, respectively. The proposed RS decoder can reduce the total fate count at least 23% and the total latency at least 10% compared with conventional ME architectures.
PDF KSCI

Design of Architecture of Programmable Stack-based Video Processor with VHDL (VHDL을 이용한 프로그램 가능한 스택 기반 영상 프로세서 구조 설계)

박주현;김영민
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.36C no.4
- /
- pp.31-43
- /
- 1999
The main goal of this paper is to design a high performance SVP(Stack based Video Processor) for network applications. The SVP is a comprehensive scheme; 'better' in the sense that it is an optimal selection of previously proposed enhancements of a stack machine and a video processor. This can process effectively object-based video data using a S-RISC(Stack-based Reduced Instruction Set Computer) with a semi -general-purpose architecture having a stack buffer for OOP(Object-Oriented Programming) with many small procedures at running programs. And it includes a vector processor that can improve the MPEG coding speed. The vector processor in the SVP can execute advanced mode motion compensation, motion prediction by half pixel and SA-DCT(Shape Adaptive-Discrete Cosine Transform) of MPEG-4. Absolutors and halfers in the vector processor make this architecture extensive to a encoder. We also designed a VLSI stack-oriented video processor using the proposed architecture of stack-oriented video decoding. It was designed with O.5$\mu\textrm{m}$ 3LM standard-cell technology, and has 110K logic gates and 12 Kbits SRAM internal buffer. The operating frequency is 50MHz. This executes algorithms of video decoding for QCIF 15fps(frame per second), maximum rate of VLBV(Very Low Bitrate Video) in MPEG-4.
PDF

A New Architecture of High-Performance Digital Hologram Generator based on Independent Calculation of a Holographic Pixel (독립적 홀로그램 화소 연산 방식의 고성능 디지털 홀로그램 생성기의 하드웨어 구조)

Lee, Yoon-Huyk;Seo, Young-Ho;Choi, Hyun-Jun;Kim, Dong-Wook
- Journal of Broadcast Engineering
- /
- v.16 no.3
- /
- pp.403-415
- /
- 2011
In this paper, we proposed a hardware architecture to generate digital holograms at high speed. It used the modified computer-generated hologram (CGH) algorithm and adapted the pipeline-based hardware to be able to remove memory bottleneck problem. It uses not the method which generates a hologram by accumulating intermittent holograms but the one which independently generates a pixel of a final hologram and uses the appropriate CGH algorithm for the selected method. Based on the CGH algorithm we proposed the architecture of the digital hologram generator which consists of input interface part, calculating part, and normalizing part. The hardware can decrease memory usage because it repeatedly use object light sources which is stored in the internal buffer. It is also operationally parallelized by vertically adding unit cells. It can generate 86 frames of HD digital hologram per 1 second for 1K light sources.
https://doi.org/10.5909/JEB.2011.16.3.403 인용 PDF KSCI

A Study on the Design of Switch for High Speed Internet Communication Network (고속 인터넷 통신망을 위한 스위치 설계에 관한 연구)

조삼호
- Journal of Internet Computing and Services
- /
- v.3 no.3
- /
- pp.87-93
- /
- 2002
A complex network and a parallel computer are made up of interconnected switching units. The role of a switching unit is to set up a connection between an input port and an output port, according to the routing information. We proposed our switching network with a remodeled architecture is a newly modified Banyan network with eight input and output ports. We have analysed the maximum throughput of the revised switch. Our analyses have shown that under the uniform random traffic load, the FIFO discipline is limited to 70%, The switching system consists of an input control unit, a switch unit and an output control unit. Therefore the result of the analyses shows that the results of the networking simulation with the new switch are feasible and if we adopt the new architecture of the revised model of the Banyan switch, the hardware complexity can be reduced. The FIFO discipline has increased by about 11% when we compare the switching system with the input buffer system. We have designed and verified the switching system in VHDL using Max＋plusII. We also designed our test environment including micro computers, the base station, and the proposed architecture. We proposed a new architecture of the Banyan switch for BISDN networks and parallel computers.
PDF

A Novel Arithmetic Unit Over GF(2$^{m}$) for Reconfigurable Hardware Implementation of the Elliptic Curve Cryptographic Processor (타원곡선 암호프로세서의 재구성형 하드웨어 구현을 위한 GF(2$^{m}$)상의 새로운 연산기)

김창훈;권순학;홍춘표;유기영
- Journal of KIISE:Computer Systems and Theory
- /
- v.31 no.8
- /
- pp.453-464
- /
- 2004
In order to solve the well-known drawback of reduced flexibility that is associate with ASIC implementations, this paper proposes a novel arithmetic unit over GF(2$^{m}$ ) for field programmable gate arrays (FPGAs) implementations of elliptic curve cryptographic processor. The proposed arithmetic unit is based on the binary extended GCD algorithm and the MSB-first multiplication scheme, and designed as systolic architecture to remove global signals broadcasting. The proposed architecture can perform both division and multiplication in GF(2$^{m}$ ). In other word, when input data come in continuously, it produces division results at a rate of one per m clock cycles after an initial delay of 5m-2 in division mode and multiplication results at a rate of one per m clock cycles after an initial delay of 3m in multiplication mode respectively. Analysis shows that while previously proposed dividers have area complexity of Ο(m$^2$) or Ο(mㆍ(log$_2$$^{m}$ )), the Proposed architecture has area complexity of Ο(m), In addition, the proposed architecture has significantly less computational delay time compared with the divider which has area complexity of Ο(mㆍ(log$_2$$^{m}$ )). FPGA implementation results of the proposed arithmetic unit, in which Altera's EP2A70F1508C-7 was used as the target device, show that it ran at maximum 121MHz and utilized 52% of the chip area in GF(2$^{571}$ ). Therefore, when elliptic curve cryptographic processor is implemented on FPGAs, the proposed arithmetic unit is well suited for both division and multiplication circuit.
PDF KSCI

Search Result 277, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)