• Title/Summary/Keyword: memory bank

Search Result 61, Processing Time 0.027 seconds

High-Speed Pipelined Memory Architecture for Gigabit ATM Packet Switching (Gigabit ATM Packet 교환을 위한 파이프라인 방식의 고속 메모리 구조)

  • Gab Joong Jeong;Mon Key Lee
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.11
    • /
    • pp.39-47
    • /
    • 1998
  • This paper describes high-speed pipelined memory architecture for a shared buffer ATM switch. The memory architecture provides high speed and scalability. It eliminates the restriction of memory cycle time in a shared buffer ATM switch. It provides versatile performance in a shared buffer ATM switch using its scalability. It consists of a 2-D array configuration of small memory banks. Increasing the array configuration enlarges the entire memory capacity. Maximum cycle time of the designed pipelined memory is 4 ns with 5 V V$\_$dd/ and 25$^{\circ}C$. It is embedded in the prototype chip of a shared scalable buffer ATM switch with 4 x 4 configuration of 4160-bit SRAM memory banks. It is integrated in 0.6 $\mu\textrm{m}$ 2-metal 1-poly CMOS technology.

  • PDF

Automatic Detection of Memory Subsystem Parameters for Embedded Systems (임베디드 시스템을 위한 메모리 서브시스템 파라미터의 자동 검출)

  • Ha, Tae-Jun;Seo, Sang-Min;Chun, Po-Sung;Lee, Jae-Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.5
    • /
    • pp.350-354
    • /
    • 2009
  • To optimize the performance of software programs, it is important to know certain hardware parameters such as the CPU speed, the cache size, the number of TLB entries, and the parameters of the memory subsystem. There exist several ways to obtain the values of various hardware parameters. Firstly. the values can be taken from the hardware manual. Secondly, the parameters can be obtained by calling functions provided by the operating systems. Finally, hardware detection programs can find the desired values. Such programs are usually executed on PC or server systems and report the CPU speed, the cache size, the number of TLB entries, and so on. However, they do not sufficiently detect the parameters of one of the most important parts of the computer concerning performance, namely the memory bank layout in the memory subsystem. In this paper, we present an algorithm to detect the memory bank parameters. We run an implementation of our algorithm on various embedded systems and compare the detected values with the real hardware parameters. The results show that the presented algorithm detects the cache size, the number of TLB entries, and the memory bank layout with high accuracy.

Design and Performance Analysis of High Performance Processor-Memory Integrated Architectures (고성능 프로세서-메모리 혼합 구조의 설계 및 성능 분석)

  • Kim, Young-Sik;Kim, Shin-Dug;Han, Tack-Don
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2686-2703
    • /
    • 1998
  • The widening pClformnnce gap between processor and memory causes an emergence of the promising architecture, processor-memory (PM) integration In this paper, various design issues for P-M integration are studied, First, an analytical model of the DRAM access time is constructed considering both the bank conflict ratio and the DRAM page hit ratio. Then the points of both the performance improvement and the perfonnance bottle neck are found by the proposed model as designing on-chip DRAM architectures. This paper proposes the new architecture, called the delayed precharge bank architecture, to improve the perfonnance of memory system as increasing the DRAM page hit ratio. This paper also adapts an efficient bank interleaving mechanism to the proposed architecture. This architecture is verified !II he better than the hierarchical multi-bank architecture as well as the conventional bank architecture by executiun driven simulation. Eight SPEC95 benchmarks are used for simulation as changing parameters for the cache architecture, the number of DRAM banks, and the delayed time quantum.

  • PDF

A Block Allocation Policy to Enhance Wear-leveling in a Flash File System (플래시 파일시스템에서 wear-leveling 개선을 위한 블록 할당 정책)

  • Jang, Si-Woong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.574-577
    • /
    • 2007
  • While disk can be overwritten on updating data, because flash memory can not be overwritten on updating data, new data are updated in new area. If data are frequently updated, garbage collection, which is achieved by erasing blocks, should be performed to reclaim new area. Hence, because the number of erase operations is limited due to characteristics of flash memory, every block should be evenly written and erased. However, if data with access locality are processed by cost benefit algorithm with separation of hot block and cold block, though the performance of processing is high, wear-leveling is not even. In this paper, we propose CB-MB (Cost Benefit between Multi Bank) algorithm in which hot data are allocated in one bank and cold data in another bank, and in which role of hot bank and cold bank is exchanged every period. CB-MB showed that its performance was similar to that of others for uniform workload, however, the method provides much better performance than that of others for workload of access locality.

  • PDF

A method for improving wear-leveling of flash file systems in workload of access locality (접근 지역성을 가지는 작업부하에서 플래시 파일시스템의 wear-leveling 향상 기법)

  • Jang, Si-Woong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.1
    • /
    • pp.108-114
    • /
    • 2008
  • Since flash memory cannot be overwritten, new data are updated in new area. If data are frequently updated, garbage collection which is achieved by erasing blocks, should be performed to reclaim new area. Hence, because the count of erase operations is limited due to characteristics of flash memory, every block should be evenly written and erased. However, if data with access locality are processed by cost benefit algorithm with separation of hot block ad cold block though the performance of processing is hight wear-leveling is not even. In this paper, we propose CB-MB (Cost Benefit between Multi Bank) algorithm in which hot data are allocated in one bank and cold data in another bank, and in which role of hot bank and cold bank is exchanged every period. CB-MB shows that its performance is 30% better than cost benefit algorithm with separation of cold block and hot block its wear-leveling is about a third of that in standard deviation.

Run-time Memory Optimization Algorithm for the DDMB Architecture (DDMB 구조에서의 런타임 메모리 최적화 알고리즘)

  • Cho, Jeong-Hun;Paek, Yun-Heung;Kwon, Soo-Hyun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.5 s.102
    • /
    • pp.413-420
    • /
    • 2006
  • Most vendors of digital signal processors (DSPs) support a Harvard architecture, which has two or more memory buses, one for program and one or more for data and allow the processor to access multiple words of data from memory in a single instruction cycle. We already addressed how to efficiently assign data to multi-memory banks in our previous work. This paper reports on our recent attempt to optimize run-time memory. The run-time environment for dual data memory banks (DBMBs) requires two run-time stacks to control activation records located in two memory banks corresponding to calling procedures. However, activation records of two memory banks for a procedure are able to have different size. As a consequence, dual run-time stacks can be unbalanced whenever a procedure is called. This unbalance between two memory banks causes that usage of one memory bank can exceed the extent of on-chip memory area although there is free area in the other memory bank. We attempt balancing dual run-time slacks to enhance efficiently utilization of on-chip memory in this paper. The experimental results have revealed that although our algorithm is relatively quite simple, it still can utilize run-time memories efficiently; thus enabling our compiler to run extremely fast, yet minimizing the usage of un-time memory in the target code.

A novel hardware design for SIFT generation with reduced memory requirement

  • Kim, Eung Sup;Lee, Hyuk-Jae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.13 no.2
    • /
    • pp.157-169
    • /
    • 2013
  • Scale Invariant Feature Transform (SIFT) generates image features widely used to match objects in different images. Previous work on hardware-based SIFT implementation requires excessive internal memory and hardware logic [1]. In this paper, a new hardware organization is proposed to implement SIFT with less memory and hardware cost than the previous work. To this end, a parallel Gaussian filter bank is adopted to eliminate the buffers that store intermediate results because parallel operations allow all intermediate results available at the same time. Furthermore, the processing order is changed from the raster-scan order to the block-by-block order so that the line buffer size storing the source image is also reduced. These techniques trade the reduction of memory size with a slight increase of the execution time and external memory bandwidth. As a result, the memory size is reduced by 94.4%. The proposed hardware for SIFT implementation includes the Descriptor generation block, which is omitted in the previous work [1]. The addition of the hardwired descriptor generation improves the computation speed by about 30 times when compared with the previous work.

Scalable Application Mapping for SIMD Reconfigurable Architecture

  • Kim, Yongjoo;Lee, Jongeun;Lee, Jinyong;Paek, Yunheung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.6
    • /
    • pp.634-646
    • /
    • 2015
  • Coarse-Grained Reconfigurable Architecture (CGRA) is a very promising platform that provides fast turn-around-time as well as very high energy efficiency for multimedia applications. One of the problems with CGRAs, however, is application mapping, which currently does not scale well with geometrically increasing numbers of cores. To mitigate the scalability problem, this paper discusses how to use the SIMD (Single Instruction Multiple Data) paradigm for CGRAs. While the idea of SIMD is not new, SIMD can complicate the mapping problem by adding an additional dimension of iteration mapping to the already complex problem of operation and data mapping, which are all interdependent, and can thus significantly affect performance through memory bank conflicts. In this paper, based on a new architecture called SIMD reconfigurable architecture, which allows SIMD execution at multiple levels of granularity, we present how to minimize bank conflicts considering all three related sub-problems, for various RA organizations. We also present data tiling and evaluate a conflict-free scheduling algorithm as a way to eliminate bank conflicts for a certain class of mapping problem.

창원시 대산면 강변여과수의 수질과 낙동강 수질의 관련성 연구

  • 장성;함세영;김형수;차용훈;정재열
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2004.04a
    • /
    • pp.451-454
    • /
    • 2004
  • The study aims to assess the quality of bank filtrate in relation to streamflow and physico-chemical properties of the stream. Turbidity, pH, temperature and dissolved oxygen (DO) of Nakdong River and riverbank filtrate were statistically analyzed. The physico-chemical properties of riverbank filtrate were measured from irregularly different seven pumping wells every day. Autocorrelation analyses were conducted to the qualities of stream water and bank filtrated water. Temperature, pH and DO of streamflow shows strong linearity and long memory effect, indicating the effect of seasonal air temperature and rainy season. Temperature of riverbank filtrate shows weak linearity and weak memory, indicating differently from the trend of stream temperature. Turbidity of steramflow shows strong linearity and long memory effect, while turbidity of riverbank filtrate indicates weak linearity and weak memory. Cross-correlation analysis shows low relation between turbidity, pH, temperature and DO of riverbank filtrate and those of streamflow. Turbidity of streamflow was largely affected by the streamflow rate, showing a similar trend with autocorrelation function of streamflow rate. The turbidity of riverbank filtrate has a lag time of 25 hours. This indicates that turbidity of streamflow in a dry season has very low effect on the turbidity of riverbank filtrate, and a high turbidity of the stream in a rainy season has a fairly low effect on the turbidity of riverbank filtrate.

  • PDF

A Study on Efficient Use of Dual Data Memory Banks in Flight Control Computers

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.9 no.1
    • /
    • pp.29-34
    • /
    • 2017
  • Over the past several decades, embedded system and flight control computer technologies have been evolved to meet the diverse needs of the mobile device market. Current embedded systems are at the heart of technologies that can take advantage of small-sized specialized hardware while still providing high-efficiency performance at low cost. One of these key technologies is multiple memory banks. For example, a dual memory bank can provide two times more memory bandwidth in the same memory space. This benefit take lower cost to provide the same bandwidth. However, there is still few software technologies to support the efficient use of multiple memory banks. In this study, we present a technique to efficiently exploit multiple memory banks by software support. Specifically, our technique use an interference graph to optimally allocate data to different memory banks by an optimizing compiler. As a result, the execution time can be improved upto 7% with the proposed technique.