• Title/Summary/Keyword: memory efficiency

Search Result 709, Processing Time 0.022 seconds

Efficient Hardware Implementation of Real-time Rectification using Adaptively Compressed LUT

  • Kim, Jong-hak;Kim, Jae-gon;Oh, Jung-kyun;Kang, Seong-muk;Cho, Jun-Dong
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.1
    • /
    • pp.44-57
    • /
    • 2016
  • Rectification is used as a preprocessing to reduce the computation complexity of disparity estimation. However, rectification also requires a complex computation. To minimize the computing complexity, rectification using a lookup-table (R-LUT) has been introduced. However, since, the R-LUT consumes large amount of memory, rectification with compressed LUT (R-CLUT) has been introduced. However, the more we reduce the memory consumption, the more we need decoding overhead. Therefore, we need to attain an acceptable trade-off between the size of LUT and decoding overhead. In this paper, we present such a trade-off by adaptively combining simple coding methods, such as differential coding, modified run-length coding (MRLE), and Huffman coding. Differential coding is applied to transform coordinate data into a differential form in order to further improve the coding efficiency along with Huffman coding for better stability and MRLE for better performance. Our experimental results verified that our coding scheme yields high performance with maintaining robustness. Our method showed about ranging from 1 % to 16 % lower average inverse of compression ratio than the existing methods. Moreover, we maintained low latency with tolerable hardware overhead for real-time implementation.

The Design and Fabrication of SRAM Modules Surface Mounted on Multilayer Borads (다층 기판 위에 표면실장된 SRAM 모듈 설계 제작)

  • Kim, Chang-Yeon;Jee, Yong
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.32A no.3
    • /
    • pp.89-99
    • /
    • 1995
  • In this paper, we ecamined the effect that MCM-L technique influencess on the design and fabrication of multichip memory modules in increasing the packing desity of memory capacity and maximizing its electrical characteristics. For that purpose, we examined the effective methods of reducing the area of module layout and the wiring length with the variation of chip allocation and the number of wiring layers. We fabricated a 256K${\times}$8bit SRAM module with eight 32K${\times}$8bit SRAM chips. The routing experiment showed that we could optimize the area of module layout and wiring length by placing chips in a row, arranging module I/O pads parallel to chip I/O pads, and equalizing the number of terminal sides of module I/O's to that of chip I/O's. The routing was optimized when we used three wire layers in case of one sided chip mounting or five wire layers in case of double sided chip mounting. The fabricated modules showed 18.9 cm/cm$^{2}$ in wiring density, 65 % in substrate occupancy efficiency, and module substrate and functionally tested to find out the module working perfectly.

  • PDF

Scalable Application Mapping for SIMD Reconfigurable Architecture

  • Kim, Yongjoo;Lee, Jongeun;Lee, Jinyong;Paek, Yunheung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.6
    • /
    • pp.634-646
    • /
    • 2015
  • Coarse-Grained Reconfigurable Architecture (CGRA) is a very promising platform that provides fast turn-around-time as well as very high energy efficiency for multimedia applications. One of the problems with CGRAs, however, is application mapping, which currently does not scale well with geometrically increasing numbers of cores. To mitigate the scalability problem, this paper discusses how to use the SIMD (Single Instruction Multiple Data) paradigm for CGRAs. While the idea of SIMD is not new, SIMD can complicate the mapping problem by adding an additional dimension of iteration mapping to the already complex problem of operation and data mapping, which are all interdependent, and can thus significantly affect performance through memory bank conflicts. In this paper, based on a new architecture called SIMD reconfigurable architecture, which allows SIMD execution at multiple levels of granularity, we present how to minimize bank conflicts considering all three related sub-problems, for various RA organizations. We also present data tiling and evaluate a conflict-free scheduling algorithm as a way to eliminate bank conflicts for a certain class of mapping problem.

An Efficient Variable Rearrangement Technique for STT-RAM Based Hybrid Caches

  • Youn, Jonghee M.;Cho, Doosan
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.67-78
    • /
    • 2016
  • The emerging Spin-Transfer Torque RAM (STT-RAM) is a promising component that can be used to improve the efficiency as a result of its high storage density and low leakage power. However, the state-of-the-art STT-RAM is not ready to replace SRAM technology due to the negative effect of its write operations. The write operations require longer latency and more power than the same operations in SRAM. Therefore, a hybrid cache with SRAM and STT-RAM technologies is proposed to obtain the benefits of STT-RAM while minimizing its negative effects by using SRAM. To efficiently use of the hybrid cache, it is important to place write intensive data onto the cache. Such data should be placed on SRAM to minimize the negative effect. Thus, we propose a technique that optimizes placement of data in main memory. It drives the proper combination of advantages and disadvantages for SRAM and STT-RAM in the hybrid cache. As a result of the proposed technique, write intensive data are loaded to SRAM and read intensive data are loaded to STT-RAM. In addition, our technique also optimizes temporal locality to minimize conflict misses. Therefore, it improves performance and energy consumption of the hybrid cache architecture in a certain range.

DEVELOPMENT OF AN IMPROVED THREE-DIMENSIONAL STATIC AND DYNAMIC STRUCTURAL ANALYSIS BASED ON FETI-LOCAL METHOD WITH PENALTY TERM

  • KIM, SEIL;JOO, HYUNSHIG;CHO, HAESEONG;SHIN, SANGJOON
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.21 no.3
    • /
    • pp.125-142
    • /
    • 2017
  • In this paper, development of the three-dimensional structural analysis is performed by applying FETI-local method. In the FETI-local method, the penalty term is added as a preconditioner. The OPT-DKT shell element is used in the present structural analysis. Newmark-${\beta}$ method is employed to conduct the dynamic analysis. The three-dimensional FETI-local static structural analysis is conducted. The contour and the displacement of the results are compared following the different number of sub-domains. The computational time and memory usage are compared with respect to the number of CPUs used. The three-dimensional dynamic structural analysis is conducted while applying FETI-local method. The present results show appropriate scalability in terms of the computational time and memory usage. It is expected to improve the computational efficiency by combining the advantages of the original FETI method, i.e., FETI-mixed using the mixed local-global Lagrange multiplier.

Auto Regulated Data Provisioning Scheme with Adaptive Buffer Resilience Control on Federated Clouds

  • Kim, Byungsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5271-5289
    • /
    • 2016
  • On large-scale data analysis platforms deployed on cloud infrastructures over the Internet, the instability of the data transfer time and the dynamics of the processing rate require a more sophisticated data distribution scheme which maximizes parallel efficiency by achieving the balanced load among participated computing elements and by eliminating the idle time of each computing element. In particular, under the constraints that have the real-time and limited data buffer (in-memory storage) are given, it needs more controllable mechanism to prevent both the overflow and the underflow of the finite buffer. In this paper, we propose an auto regulated data provisioning model based on receiver-driven data pull model. On this model, we provide a synchronized data replenishment mechanism that implicitly avoids the data buffer overflow as well as explicitly regulates the data buffer underflow by adequately adjusting the buffer resilience. To estimate the optimal size of buffer resilience, we exploits an adaptive buffer resilience control scheme that minimizes both data buffer space and idle time of the processing elements based on directly measured sample path analysis. The simulation results show that the proposed scheme provides allowable approximation compared to the numerical results. Also, it is suitably efficient to apply for such a dynamic environment that cannot postulate the stochastic characteristic for the data transfer time, the data processing rate, or even an environment where the fluctuation of the both is presented.

An Analysis Method of Large Structure Using Matrix Blocking (블록화기법을 이용한 대형구조물의 해석방법)

  • Jung, Sung-Jin;Lee, Min-Sup
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.18 no.2
    • /
    • pp.30-37
    • /
    • 2014
  • In this study, we studied how to perform the structural analysis which need a large-capacity flash memory with the computer program when the flash memory storage of a personal computer has no enough room for the analysis of structure. As one of the solutions of this problem, the blocking method of stiffness matrix, which is a method that stiffness matrix is divided by a few blocks and each block is sequentially used for the calculation of matrix decomposition, is proposed and an algorithm available in computer program is derived on the method. Finally, A structural analysis program (sNs) based on this study is developed and the correctness and efficiency of the algorithm is founded through some examples which are fundamental in structural analysis.

A SMA-based morphing flap: conceptual and advanced design

  • Ameduri, Salvatore;Concilio, Antonio;Pecora, Rosario
    • Smart Structures and Systems
    • /
    • v.16 no.3
    • /
    • pp.555-577
    • /
    • 2015
  • In the work at hand, the development of a morphing flap, actuated through shape memory alloy load bearing elements, is described. Moving from aerodynamic specifications, prescribing the morphed shape enhancing the aerodynamic efficiency of the flap, a suitable actuation architecture was identified, able to affect the curvature. Each rib of the flap was split into three elastic elements, namely "cells", connected each others in serial way and providing the bending stiffness to the structure. The edges of each cell are linked to SMA elements, whose contraction induces rotation onto the cell itself with an increase of the local curvature of the flap airfoil. The cells are made of two metallic plates crossing each others to form a characteristic "X" configuration; a good flexibility and an acceptable stress concentration level was obtained non connecting the plates onto the crossing zone. After identifying the main design parameters of the structure (i.e. plates relative angle, thickness and depth, SMA length, cross section and connections to the cell) an optimization was performed, with the scope of enhancing the achievable rotation of the cell, its ability in absorbing the external aerodynamic loads and, at the same time, containing the stress level and the weight. The conceptual scheme of the architecture was then reinterpreted in view of a practical realization of the prototype. Implementation issues (SMA - cells connection and cells relative rotation to compensate the impressed inflection assuring the SMA pre-load) were considered. Through a detailed FE model the prototype morphing performance were investigated in presence of the most severe load conditions.

Use of Super Elements and Substructures for Three Dimensional Analysis of the Box System with Openings (개구부가 있는 벽식구조물의 3차원해석을 위한 슈퍼요소와 부분구조의 이용)

  • 이동근;김현수;남궁계홍
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2001.10a
    • /
    • pp.3-10
    • /
    • 2001
  • The box system that is composed only of reinforced concrete walls and slabs are adopted on many high-rise apartment buildings recently constructed in Korea. And the framed structure with shear wall core that can effectively resist horizontal forces is frequently adopted for the structural system for high-rise building structures. In these structures, a shear wall may have one or more openings for functional reasons. It is necessary to use subdivided finite elements for accurate analysis of the shear wall with openings. But it would take tremendous amount of computational time and memory if the entire building structure is subdivided into a finer mesh . An efficient analysis method that can be used regardless of the number, size and location of openings is proposed in this study, The analysis method uses super element, substructure, matrix condensation technique and fictitious beam technique. Three-dimensional analyses of the box system and the framed structure with shear wall core having various types of openings were performed to verify the efficiency of the proposed method. It was confirmed that the proposed method have outstanding accuracy with drastically reduced time and computer memory from the analyses of example structures.

  • PDF

An Efficient Complex Event Detection Algorithm based on NFA_HTS for Massive RFID Event Stream

  • Wang, Jianhua;Liu, Jun;Lan, Yubin;Cheng, Lianglun
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.2
    • /
    • pp.989-997
    • /
    • 2018
  • Massive event stream brings us great challenges in its volume, velocity, variety, value and veracity. Picking up some valuable information from it often faces with long detection time, high memory consumption and low detection efficiency. Aiming to solve the problems above, an efficient complex event detection method based on NFA_HTS (Nondeterministic Finite Automaton_Hash Table Structure) is proposed in this paper. The achievement of this paper lies that we successfully use NFA_HTS to realize the detection of complex event from massive RFID event stream. Specially, in our scheme, after using NFA to capture the related RFID primitive events, we use HTS to store and process the large matched results, as a result, our scheme can effectively solve the problems above existed in current methods by reducing lots of search, storage and computation operations on the basis of taking advantage of the quick classification and storage technologies of hash table structure. The simulation results show that our proposed NFA_HTS scheme in this paper outperforms some general processing methods in reducing detection time, lowering memory consumption and improving event throughput.