Search | Korea Science

SCATOMi : Scheduling Driven Circuit Partitioning Algorithm for Multiple FPGAs using Time-multiplexed, Off-chip, Multicasting Interconnection Architecture

Young-Su kwon;Kyung, Chong-Min
- Proceedings of the IEEK Conference
- /
- 2003.07b
- /
- pp.823-826
- /
- 2003
FPGA-based logic emulator with lane gate capacity generally comprises a large number of FPGAs connected in mesh or crossbar topology. However, gate utilization of FPGAs and speed of emulation are limited by the number of signal pins among FPGAs and the interconnection architecture of the logic emulator. The time-multiplexing of interconnection wires is required for multi-FPGA system incorporating several state-of-the-art FPGAs. This paper proposes a circuit partitioning algorithm called SCATOMi(SCheduling driven Algorithm for TOMi)for multi-FPGA system incorporating four to eight FPGAs where FPGAs are interconnected through TOMi(Time-multiplexed, Off-chip, Multicasting interconnection). SCATOMi improves the performance of TOMi architecture by limiting the number of inter-FPGA signal transfers on the critical path and considering the scheduling of inter-FPGA signal transfers. The performance of the partitioning result of SCATOMi is 5.5 times faster than traditional partitioning algorithms. Architecture comparison show that the pin count is reduced to 15.2%-81.3% while the critical path delay is reduced to 46.1%-67.6% compared to traditional architectures.
PDF

VLSI design of a shared multibuffer ATM Switch for throughput enhancement in multicast environments (멀티캐스트 환경에서 향상된 처리율을 갖는 공유 다중 버퍼 ATM스위치의 VLSI 설계)

Lee, Jong-Ick;Lee, Moon-Key
- Proceedings of the IEEK Conference
- /
- 2001.06a
- /
- pp.383-386
- /
- 2001
This paper presents a novel multicast architecture for shared multibuffer ATM switch, which is tailored for throughput enhancement in multicast environments. The address queues for multicast cells are separated from those for unicast cells to arbitrate multicast cells independently from unicast cells. Three read cycles are carried out during each cell slot and multicast cells have chances to be read from shared buffer memory(SBM) in the third read cycle provided that the shared memory is not accessed to read a unicast cell. In this architecture, maximum two cells are queued at each fabric output port per time slot and output mask choose only one cell. Extensive simulations are carried out and it shows that the proposed architecture has enhanced throughput comparing with other multicast schemes in shared multibuffer switch architecture.
PDF

A New Systolic Array for LSD-first Multiplication in $CF(2^m)$ ($CF(2^m)$상의 LSD 우선 곱셈을 위한 새로운 시스톨릭 어레이)

Kim, Chang-Hoon;Nam, In-Gil
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.33 no.4C
- /
- pp.342-349
- /
- 2008
This paper presents a new digit-serial systolic multiplier over $CF(2^m)$ for cryptographic applications. When input data come in continuously, the proposed array produces multiplication results at a rate of one every ${\lceil}m/D{\rceil}$ clock cycles, where D is the selected digit size. Since the inner structure of the proposed array is tree-type, critical path increases logarithmically proportional to D. Therefore, the computation delay of the proposed architecture is significantly less than previously proposed digit-serial systolic multipliers whose critical path increases proportional to D. Furthermore, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementations.
PDF KSCI

An efficient VLSI Implementation of the 2-D DCT with the Algorithm Decomposition (알고리즘 분해를 이용한 2-D DCT)

Jeong, Jae-Gil
- The Journal of Natural Sciences
- /
- v.7
- /
- pp.27-35
- /
- 1995
This paper introduces a VLSI (Very Large Scale Integrated Circuit) implementation of the 2-D Discrete Cosine Transform (DCT) with an application to image and video coding. This implementation, which is based upon a state space model, uses both algorithm and data partitioning to achieve high efficiency. With this implementation, the amount of data transfers between the processing elements (PEs) are reduced and all the data transfers are limitted to be local. This system accepts the input as a progressively scanned data stream which reduces the hardware required for the input data control module. With proper ordering of computations, a matrix transposition between two matrix by matrix multiplications, which is required in many 2-D DCT systems based upon a row-column decomposition, can be also removed. The new implementation scheme makes it feasible to implement a single 2-D DCT VLSI chip which can be easily expanded for a larger 2-D DCT by cascading these chips.
PDF

An Efficient Clock Cycle Reducing Architecture in Full-Search Block Matching Motion Estimation VLSI (전탐색 블럭정합 움직임추정 VLSI 에서 클럭사이클수를 줄이는 효율적 구조)

윤종성;장순화
- Proceedings of the IEEK Conference
- /
- 2000.09a
- /
- pp.259-262
- /
- 2000
본 논문은 전탐색 블럭매칭 움직임추정 VLSI 구조에서 클럭당 두연산(하나는 클럭의 상향에지, 하나는 하향에지에서 동작)을 수행하는 PE(Processing Element)를 교번적으로 결선, 클럭의 상향에지는 물론 하향에지에서도 동작하도록 하는 방식으로 클럭 사이클수를 줄이는 VLSI 구조를 제안한다 기존 구조에 그대로 적용되는 본 방법은 공급 데이타폭이 2 배, PE 의 HW 복잡도가 1.5 배 절대차 합 연산의 복잡도가 2 배로 늘어나 전체 하드웨어가 복잡해지나, PE수를 2배로 하여 클럭사이클수를 줄이는 방법에 비해서는 매우 효율적이다. 본 제안 구조는 계층적 움직임 추정 알고리듬을 사용한 MPEG-2 움직임 추정기 개발의 설계에 적용하여 기능과 HW 복잡도를 확인하였다.
PDF

Specification of a software architecture and protocols for automated VLSI manufacturing system operation (자동화된 VLSI 생산 시스템 운용을 위한 소프트웨어 구조 및 프로토콜 설계)

Park, Jong-Hun;Kim, Jong-Won;Kwon, Wook-Hyun
- Journal of Institute of Control, Robotics and Systems
- /
- v.3 no.1
- /
- pp.94-100
- /
- 1997
본 연구에서는 자동화된 VLSI 제조 시스템 환경에서의 로트 조정기 및 범용 셀 제어기의 구축에 필요한 새로운 소프트웨어 구조 및 프로토콜을 제시하였다. 반도체 제조 시스템의 운용 제어 활동은 로트 조정기와 범용 셀 제어기가 상호 협조적으로 통신하는 클라이언트/서버 구조로 모형화 되었으며, 로트 조정기는 하나 이상의 작업을 수행할 수 있는 범용 셀 제어기에 작업을 의뢰하는 클라이언트로서 작동된다. 반도체 제조 시스템의 운용 소프트웨어와 관련된 기존의 연구들이 개념적인 구조와 전략 만을 다루었던 것과는 달리, 본 연구에서는 생산 설비 뿐만 아니라 물류운반 장치의 제어를 위하여 상세한 수준에서의 설계가 제시되었다. 본 연구의 특징으로는 설비 구성, 로트 형태, 일정 계획 규칙 등의 변경에 대한 동적 재구성 가능성을 들 수 있다. 또한 제안된 설계는 상용화된 프로세스 통신 기능을 사용하여 구현이 용이하다.
PDF

VLSI Design of a New Dyanmic GSMP V3 Architecture (새로운 Dynamic GSMP V3 구조의 VLSI 설계)

Kim, Yeong-Cheol;Lee, Tae-Won;Kim, Gwang-Ok;Lee, Myeong-Ok
- The KIPS Transactions:PartC
- /
- v.8C no.3
- /
- pp.287-298
- /
- 2001
본 논문에서는 ATM 기반 MPLS 망에서 효율적으로 IP 서비스를 전송하기 위한 동적 버퍼관리 방식의 Dynamic GSMP V3(General Switching Management Protocol Version 3)의 VLSI 구현을 위한 하드웨어 구조를 제안하고 설계하였다. 또한 현재 표준화중인 GSMP와 동적 버퍼관리 방식을 수용한 GSMP를 셀 손실률 측면에서 비교 분석하였다. ATM 스위치 상에 연결 제어의 성능 향상을 위해 스위치 상에 연결 제어의 성능 향상을 위해 스위치에서 연결설정 및 제어를 수행하는 Dynamic GSMP V3의 Slave 블록을 삼성 SoG 0.5$\mu\textrm{m}$ 공정으로 설계하였다. 기존의 방식과 제안한 방식의 성능 평가를 위해 확률 랜덤 변수에 의해 발생된 셀과 최소 버퍼 알고리즘을 이용하여 모의 실험을 하였으며, 이때 셀 손실률이 향상되었음을 알 수 있었다.
PDF

An efficient VLSI Architecture of 9/7 DWT filter using shift-adder for JPEG2000 (Shift-adder 를 이용한 JPEG2000 용 9/7 DWT 필터의 효율적인 VLSI 구조)

Son, Chang-Hoon;Kim, Young-Min
- Proceedings of the Korea Information Processing Society Conference
- /
- 2007.11a
- /
- pp.748-749
- /
- 2007
본 논문은 저전력이고 속도가 빠르면서도 작은 gate 면적만으로 JPEG2000 표준의 이산 웨이블릿 변환 (DWT)을 수행하는 VLSI 구조를 제안하였다. 제안한 구조는 line-based 와 convolution 방식을 사용하여 설계하였다. DWT 필터는 1 차원 구조로서 영상의 수평방향이나 수직방향을 차례대로 처리하였고, 16-비트 고정 소수점 형식의 Daubechies 9/7 필터 계수를 사용하였다. 기존의 DWT VLSI 설계에서 매우 큰 영역을 차지하는 multiplier 들을 shift-adder 들로 대체하여 기존 방식의 gate 사용 면적을 38.5% 로 크게 줄일 수 있었다. 또한 최대 지연시간과 총 소비전력은 각각 기존에 비해 78% 와 29.6% 로 개선되었다.
https://doi.org/10.3745/PKIPS.y2007m11a.748 인용 PDF

Design and Analysis of Motion Estimation Architecture Applicable to Low-power Energy Management Algorithm (저전력 에너지 관리 알고리즘 적용을 위한 하드웨어 움직임 추정기 구조 설계 및 특성 분석)

Kim Eung-Sup;Lee Chanho
- Proceedings of the IEEK Conference
- /
- 2004.06b
- /
- pp.561-564
- /
- 2004
The motion estimation which requires huge computation consumes large power in a video encoder. Although a number of fast-search algorithms are proposed to reduce the power consumption, the smaller the computation, the worse the performance they have. In this paper, we propose an architecture that a low energy management scheme can be applied with several fast-search algorithm. In addition. we show that ECVH, a software scheduling scheme which dynamically changes the search algorithm, the operating frequency, and the supply voltage using the remaining slack time within given power-budget, can be applied to the architecture, and show that the power consumption can be reduced.
PDF

VLSI Implementation of Auto-Correlation Architecture for Synchronization of MIMO-OFDM WLAN Systems

Cho, Jong-Min;Kim, Jin-Sang;Cho, Won-Kyung
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.10 no.3
- /
- pp.185-192
- /
- 2010
This paper presents a hardware-efficient auto-correlation scheme for the synchronization of MIMO-OFDM based wireless local area network (WLAN) systems, such as IEEE 802.11n. Carrier frequency offset (CFO) estimation for the frequency synchronization requires high complexity auto-correlation operations of many training symbols. In order to reduce the hardware complexity of the MIMO-OFDM synchronization, we propose an efficient correlation scheme based on time-multiplexing technique and the use of reduced samples while preserving the performance. Compared to a conventional architecture, the proposed architecture requires only 27% logic gates and 22% power consumption with acceptable BER performance loss.
https://doi.org/10.5573/JSTS.2010.10.3.185 인용 PDF KSCI

Search Result 277, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)