• Title/Summary/Keyword: hardware design space exploration

Search Result 10, Processing Time 0.032 seconds

System-level Design Space Exploration and Resource Mapping Strategies for a Reconfigurable Hybrid System

  • Ahn, Seong-Yong;Lee, Jeong-A
    • Proceedings of the IEEK Conference
    • /
    • 2002.07b
    • /
    • pp.924-927
    • /
    • 2002
  • In this paper we proposed the design space exploration environment of re-configurable hybrid systems and evaluate the performance by changing design parameters. With this, we analyzed the effect of various scheduling methods which determine how we allocate hardware/software resources to application program. A simple static (fixed) mapping strategy produces almost the same performance compared with a sophisticated dynamic mapping strategy especially when a CPU is already busy with its pre-assigned own tasks.

  • PDF

A Hardware Design Space Exploration toward Low-Area and High-Performance Architecture for the 128-bit Block Cipher Algorithm SEED (128-비트 블록 암호화 알고리즘 SEED의 저면적 고성능 하드웨어 구조를 위한 하드웨어 설계 공간 탐색)

  • Yi, Kang
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.4
    • /
    • pp.231-239
    • /
    • 2007
  • This paper presents the trade-off relationship between area and performance in the hardware design space exploration for the Korean national standard 128-bit block cipher algorithm SEED. In this paper, we compare the following four hardware design types of SEED algorithm : (1) Design 1 that is 16 round fully pipelining approach, (2) Design 2 that is a one round looping approach, (3) Design 3 that is a G function sharing and looping approach, and (4) Design 4 that is one round with internal 3 stage pipelining approach. The Design 1, Design 2, and Design 3 are the existing design approaches while the Design 4 is the newly proposed design in this paper. Our new design employs the pipeline between three G-functions and adders consisting of a F function, which results in the less area requirement than Design 2 and achieves the higher performance than Design 2 and Design 3 due to pipelining and module sharing techniques. We design and implement all the comparing four approaches with real hardware targeting FPGA for the purpose of exact performance and area analysis. The experimental results show that Design 4 has the highest performance except Design 1 which pursues very aggressive parallelism at the expanse of area. Our proposed design (Design 4) shows the best throughput/area ratio among all the alternatives by 2.8 times. Therefore, our new design for SEED is the most efficient design comparing with the existing designs.

Hardware/Software Partitioning Methodology for Reconfigurable System (재구성형 시스템을 위한 하드웨어/소프트웨어 분할 기법)

  • Kim, Jun-Yong;Ahn, Seong-Yong;Lee, Jeong-A.
    • The KIPS Transactions:PartA
    • /
    • v.11A no.5
    • /
    • pp.303-312
    • /
    • 2004
  • In this paper, we propose a methodology solving the problem of the hardware-software partitioning in reconfigurable systems using a Y-chart design space exploration and implement a simulator according to the methodology. The methodology generates a mapping set between tasks and hardware elements using the hardware element model and the application model. We evaluate the throughput by simulating cases in each mapping set. With the throughput evaluation result, we can select the mapping case with the highest throughput. We also propose an heuristic improving the simulation time by reducing the mapping set on the basis of the relationship between workload and parallelism. Simulation results show that we can reduce the size of mapping set which poses difficulties on hardware-software partitioning by up to 80%.

Design for Robustness and Cost Effectiveness: The Case of an Optical Profilometer

  • Baldi, Antonio;Pedone, Paola;Romano, Daniele
    • International Journal of Quality Innovation
    • /
    • v.7 no.1
    • /
    • pp.98-111
    • /
    • 2006
  • The paper presents the design of an optical profilometer, a device used for the reconstruction of the micro-geometry of mechanical parts in applications where high precision is needed. The design is based on Robust Design, a major' methodology for quality improvement of engineering systems. Several design solutions, namely different hardware and software setups of the device, are compared in order to select a configuration realising a desired trade-off between performance and cost. The peculiarity of the design strategy is the use of a computer model of the measurement process where the physical part of the process is simulated. This allows for an extensive exploration of the design space, thus opening the way to product innovation.

A Design and Implementation of a Timing Analysis Simulator for a Design Space Exploration on a Hybrid Embedded System (Hybrid 내장형 시스템의 설계공간탐색을 위한 시간분석 시뮬레이터의 설계 및 구현)

  • Ahn, Seong-Yong;Shim, Jea-Hong;Lee, Jeong-A
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.459-466
    • /
    • 2002
  • Modern embedded system employs a hybrid architecture which contains a general micro processor and reconfigurable devices such as FPGAS to retain flexibility and to meet timing constraints. It is a hard and important problem for embedded system designers to explore and find a right system configuration, which is known as design space exploration (DSE). With DES, it is possible to predict a final system configuration during the design phase before physical implementation. In this paper, we implement a timing analysis simulator for a DSE on a hybrid embedded system. The simulator, integrating exiting timing analysis tools for hardware and software, is designed by extending Y-chart approach, which allows quantitative performance analysis by varying design parameters. This timing analysis simulator is expected to reduce design time and costs and be used as a core module of a DSE for a hybrid embedded system.

Implementation of GA Processor with Multiple Operators, Based on Subpopulation Architecture (분할구조 기반의 다기능 연산 유전자 알고리즘 프로세서의 구현)

  • Cho Min-Sok;Chung Duck-Jin
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.52 no.5
    • /
    • pp.295-304
    • /
    • 2003
  • In this paper, we proposed a hardware-oriented Genetic Algorithm Processor(GAP) based on subpopulation architecture for high-performance convergence and reducing computation time. The proposed architecture was applied to enhancing population diversity for correspondence to premature convergence. In addition, the crossover operator selection and linear ranking subpop selection were newly employed for efficient exploration. As stochastic search space selection through linear ranking and suitable genetic operator selection with respect to the convergence state of each subpopulation was used, the elapsed time of searching optimal solution was shortened. In the experiments, the computation speed was increased by over $10\%$ compared to survival-based GA and Modified-tournament GA. Especially, increased by over $20\%$ in the multi-modal function. The proposed Subpop GA processor was implemented on FPGA device APEX EP20K600EBC652-3 of AGENT 2000 design kit.

CHARMS: A Mapping Heuristic to Explore an Optimal Partitioning in HW/SW Co-Design (CHARMS: 하드웨어-소프트웨어 통합설계의 최적 분할 탐색을 위한 매핑 휴리스틱)

  • Adeluyi, Olufemi;Lee, Jeong-A
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.9
    • /
    • pp.1-8
    • /
    • 2010
  • The key challenge in HW/SW co-design is how to choose the appropriate HW/SW partitioning from the vast array of possible options in the mapping set. In this paper we present a unique and efficient approach for addressing this problem known as Customized Heuristic Algorithm for Reducing Mapping Sets(CHARMS). CHARMS uses sensitivity to individual task computational complexity as well the computed weighted values of system performance influencing metrics to streamline the mapping sets and extract the most optimal cases. Using H.263 encoder, we show that CHARMS sieves out 95.17% of the sub-optimal mapping sets, leaving the designer with 4.83% of the best cases to select from for run-time implementation.

Serialized Multitasking Code Generation from Dataflow Specification (데이타 플로우 명세로부터 직렬화된 멀티태스킹 코드 생성)

  • Kwon, Seong-Nam;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.9_10
    • /
    • pp.429-440
    • /
    • 2008
  • As embedded system becomes more complex, software development becomes more important in the entire design process. Most embedded applications consist of multi -tasks, that are executed in parallel. So, dataflow model that expresses concurrency naturally is preferred than sequential programming language to develop multitask software. For the execution of multitasking codes, operating system is essential to schedule multi-tasks and to deal with the communication between tasks. But, it is needed to execute multitasking code without as when the target hardware platform cannot execute as or target platforms are candidates of design space exploration, because it is very costly to port as for all candidate platforms of DSE. For this reason, we propose the serialized multitasking code generation technique from dataflow specification. In the proposed technique, a task is specified with dataflow model, and generated as a C code. Code generation consists of two steps: First, a block in a task is generated as a separate function. Second, generated functions are scheduled by a multitasking scheduler that is also generated automatically. To make it easy to write customized scheduler manually, the data structure and information of each task are defined. With the preliminary experiment of DivX player, it is confirmed that the generated code from the proposed framework is efficiently and correctly executed on the target system.

Evaluation Toolkit for K-FPGA Fabric Architectures (K-FPGA 패브릭 구조의 평가 툴킷)

  • Kim, Kyo-Sun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.4
    • /
    • pp.15-25
    • /
    • 2012
  • The research on the FPGA CAD tools in academia has been lacking practicality due to the underlying FPGA fabric architecture which is too simple and inefficient to be applied for commercial FPGAs. Recently, the database of placement positions and routing graphs on commercial FPGA architectures has been built, and provided for enabling the academic development of placement and routing tools. To extend the limit of academic CAD tools even further, we have developed the evaluation toolkit for the K-FPGA architecture which is under development. By providing interface for exchanging data with a commercial FPGA toolkit at every step of mapping, packing, placement and routing in the tool chain, the toolkit enables individual tools to be developed without waiting for the results of the preceding step, and with no dependency on the quality of the results, and compared in detail with commercial tools at any step. Also, the fabric primitive library is developed by extracting the prototype from a reporting file of a commercial FPGA, restructuring it, and modeling the behavior of basic gates. This library can be used as the benchmarking target, and a reference design for new FPGA architectures. Since the architecture is described in a standard HDL which is familiar with hardware designers, and read in the tools rather than hard coded, the tools are "data-driven", and tolerable with the architectural changes due to the design space exploration. The experiments confirm that the developed library is correct, and the functional correctness of applications implemented on the FPGA fabric can be validated by simulation. The placement and routing tools are under development. The completion of the toolkit will enable the development of practical FPGA architectures which, in return, will synergically animate the research on optimization CAD tools.

A Lower Bound Estimation on the Number of Micro-Registers in Time-Multiplexed FPGA Synthesis (시분할 FPGA 합성에서 마이크로 레지스터 개수에 대한 하한 추정 기법)

  • 엄성용
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.9
    • /
    • pp.512-522
    • /
    • 2003
  • For a time-multiplexed FPGA, a circuit is partitioned into several subcircuits, so that they temporally share the same physical FPGA device by hardware reconfiguration. In these architectures, all the hardware reconfiguration information called contexts are generated and downloaded into the chip, and then the pre-scheduled context switches occur properly and timely. Typically, the size of the chip required to implement the circuit depends on both the maximum number of the LUT blocks required to implement the function of each subcircuit and the maximum number of micro-registers to store results over context switches in the same time. Therefore, many partitioning or synthesis methods try to minimize these two factors. In this paper, we present a new estimation technique to find the lower bound on the number of micro-registers which can be obtained by any synthesis methods, respectively, without performing any actual synthesis and/or design space exploration. The lower bound estimation is very important in sense that it greatly helps to evaluate the results of the previous work and even the future work. If the estimated lower bound exactly matches the actual number in the actual design result, we can say that the result is guaranteed to be optimal. In contrast, if they do not match, the following two cases are expected: we might estimate a better (more exact) lower bound or we find a new synthesis result better than those of the previous work. Our experimental results show that there are some differences between the numbers of micro-registers and our estimated lower bounds. One reason for these differences seems that our estimation tries to estimate the result with the minimum micro-registers among all the possible candidates, regardless of usage of other resources such as LUTs, while the previous work takes into account both LUTs and micro-registers. In addition, it implies that our method may have some limitation on exact estimation due to the complexity of the problem itself in sense that it is much more complicated than LUT estimation and thus needs more improvement, and/or there may exist some other synthesis results better than those of the previous work.