• Title/Summary/Keyword: memory access time

Search Result 409, Processing Time 0.024 seconds

Gated Recurrent Unit based Prefetching for Graph Processing (그래프 프로세싱을 위한 GRU 기반 프리페칭)

  • Shivani Jadhav;Farman Ullah;Jeong Eun Nah;Su-Kyung Yoon
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.2
    • /
    • pp.6-10
    • /
    • 2023
  • High-potential data can be predicted and stored in the cache to prevent cache misses, thus reducing the processor's request and wait times. As a result, the processor can work non-stop, hiding memory latency. By utilizing the temporal/spatial locality of memory access, the prefetcher introduced to improve the performance of these computers predicts the following memory address will be accessed. We propose a prefetcher that applies the GRU model, which is advantageous for handling time series data. Display the currently accessed address in binary and use it as training data to train the Gated Recurrent Unit model based on the difference (delta) between consecutive memory accesses. Finally, using a GRU model with learned memory access patterns, the proposed data prefetcher predicts the memory address to be accessed next. We have compared the model with the multi-layer perceptron, but our prefetcher showed better results than the Multi-Layer Perceptron.

  • PDF

A Low Power Phase-Change Random Access Memory Using A Selective Data Write Scheme (선택적 데이터 쓰기 기법을 이용한 저전력 상변환 메모리)

  • Yang, Byung-Do
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.1
    • /
    • pp.45-50
    • /
    • 2007
  • This paper proposes a low power selective data write (SDW) scheme for a phase-change random access memory (PRAM). The PRAM consumes large write power because large write currents are required during long time. At first, the SDW scheme reads a stored data during write operation. And then, it writes an input data only when the input and stored data are different. Therefore, it can reduce the write power consumption to a half. The 1K-bit PRAM test chip with $128{\times}8bits$ is implemented with a $0.8{\mu}m$ CMOS technology with a $0.8{\mu}m$ GST cell.

Improving the Reliability by Straight Channel of As2Se3-based Resistive Random Access Memory (As2Se3 기반 Resistive Random Access Memory의 채널 직선화를 통한 신뢰성 향상)

  • Nam, Ki-Hyun;Kim, Chung-Hyeok
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.29 no.6
    • /
    • pp.327-331
    • /
    • 2016
  • Resistive random access memory (ReRAM) of metallic conduction channel mechanism is based on the electrochemical control of metal in solid electrolyte thin film. Amorphous chalcogenide materials have the solid electrolyte characteristic and optical reactivity at the same time. The optical reactivity has been used to improve the memory switching characteristics of the amorphous $As_2Se_3$-based ReRAM. This study focuses on the formation of holographic lattices patterns in the amorphous $As_2Se_3$ thin film for straight conductive channel. The optical parameters of amorphous $As_2Se_3$ thin film which is a refractive index and extinction coefficient was taken by n&k thin film analyzer. He-Cd laser (wavelength: 325 nm) was selected based on these basic optical parameters. The straighten conduction channel was formed by holographic lithography method using He-Cd laser.$ Ag^+$ ions that photo-diffused periodically by holographic lithography method will be the role of straight channel patterns. The fabricated ReRAM operated more less voltage and indicated better reliability.

The Early Write Back Scheme For Write-Back Cache (라이트 백 캐쉬를 위한 빠른 라이트 백 기법)

  • Chung, Young-Jin;Lee, Kil-Whan;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.101-109
    • /
    • 2009
  • Generally, depth cache and pixel cache of 3D graphics are designed by using write-back scheme for efficient use of memory bandwidth. Also, there are write after read operations of same address or only write operations are occurred frequently in 3D graphics cache. If a cache miss is detected, an access to the external memory for write back operation and another access to the memory for handling the cache miss are operated simultaneously. So on frequent cache miss situations, as the memory access bandwidth limited, the access time of the external memory will be increased due to memory bottleneck problem. As a result, the total performance of the processor or the IP will be decreased, also the problem will increase peak power consumption. So in this paper, we proposed a novel early write back cache architecture so as to solve the problems issued above. The proposed architecture controls the point when to access the external memory as to copy the valid data block. And this architecture can improve the cache performance with same hit ratio and same capacity cache. As a result, the proposed architecture can solve the memory bottleneck problem by preventing intensive memory accesses. We have evaluated the new proposed architecture on 3D graphics z cache and pixel cache on a SoC environment where ARM11, 3D graphic accelerator and various IPs are embedded. The simulation results indicated that there were maximum 75% of performance increase when using various simulation vectors.

Proposal of 3D Graphic Processor Using Multi-Access Memory System (Multi-Access Memory System을 이용한 3D 그래픽 프로세서 제안)

  • Lee, S-Ra-El;Kim, Jae-Hee;Ko, Kyung-Sik;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.4
    • /
    • pp.119-128
    • /
    • 2019
  • Due to the nature of the 3D graphics processor system, many mathematical calculations are required and parallel processing research using GPU (Graphics Processing Unit) is being performed for high-speed processing. In this paper, we propose a 3D graphics processor using MAMS, a parallel processor that does not use cache memory, to solve the GPU problem of increasing bandwidth caused by cache memory miss and the problem that 3D shader processing speed is not constant. The 3D graphics processor using MAMS proposed in this paper designed Vertex shader, Pixel shader, Tiling and Rasterizing structure using DirectX command analysis, the FPGA(Xilinx Virtex6@100MHz) board for MAMS was constructed and designed using Verilog. We compared the processing time of the developed FPGA (100Mhz) and nVidia GeForce GTX 660 (980Mhz), the processing time using GTX 660 was not constant and suing MAMS was constant.

T-Tree Index Structures Utilizing Prefetch Methods (프리패치 기법을 적용한 T.트리 인덱스 구조)

  • Lee, Ig-Hoon;Shim, Jun-Ho
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.119-131
    • /
    • 2009
  • During a decade, e-Commerce environments supporting real-time transaction processing have been getting larger. In telecommunication and financial environments, research and building for main memory database systems have been doing to support real-time transaction processing. A research on indexing for fast transaction support focuses on reducing cache misses or reducing memory access latency when cache misses happen. In the paper, we propose a prefetch method for tree index structures to reduce memory access latency. We present a prefetch-efficient pCST-tree and show superiority of the proposed tree by experiments.

  • PDF

Extended Pairing Heap Algorithms Considering Cache Effect (캐쉬 효과를 고려한 확장된 Pairing Heap 알고리즘)

  • 정균락;김경훈
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.5_6
    • /
    • pp.250-257
    • /
    • 2003
  • As the memory access time becomes slower relative to the fast processor speed, most systems use cache memory to reduce the gap. The cache performance has an increasingly large impact on the performance of algorithms. Blocking is the well known method to utilize cache and has shown good results in multiplying matrices and search trees like d-heap. But if we use blocking in the data structures which require rotation during insertion or deletion, the execution time increases as the data movements between blocks are necessary. In this paper, we have proposed the extended pairing heap algorithms using block node and shown by experiments that our structure is superior Also in case of using block node, we use less memory space as the number of pointers decreases.

Efficient Image Data Processing using a Real Time Concurrent Single Memory Input/Output Access (실시간 단일 메모리 동시 입출력을 이용한 효율적인 영상 데이터 처리)

  • Lee, Gunjoong;Han, Geumhee;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.103-106
    • /
    • 2012
  • A memory access method that data are read with different sequences with writing order is a simple but important procedure in many image compression standards, such as JPEG, MPEG1/2/4, H.264, and HEVC. For real time processing, double buffering is widely used using two block sized buffers, that accesses buffers concurrently with alternative way to read and write. In some cases like a transpose memory in 2D DCT with a simple and regular access order, a single buffering which requires only single block sized buffer can be used. This paper shows that even in complex access orders there is a regularity among updating orders within a finite turns, and suggested an effective implementation method using a single block sized buffer to process concurrent read/write operation with different access orders.

  • PDF

270 MHz Full HD H.264/AVC High Profile Encoder with Shared Multibank Memory-Based Fast Motion Estimation

  • Lee, Suk-Ho;Park, Seong-Mo;Park, Jong-Won
    • ETRI Journal
    • /
    • v.31 no.6
    • /
    • pp.784-794
    • /
    • 2009
  • We present a full HD (1080p) H.264/AVC High Profile hardware encoder based on fast motion estimation (ME). Most processing cycles are occupied with ME and use external memory access to fetch samples, which degrades the performance of the encoder. A novel approach to fast ME which uses shared multibank memory can solve these problems. The proposed pixel subsampling ME algorithm is suitable for fast motion vector searches for high-quality resolution images. The proposed algorithm achieves an 87.5% reduction of computational complexity compared with the full search algorithm in the JM reference software, while sustaining the video quality without any conspicuous PSNR loss. The usage amount of shared multibank memory between the coarse ME and fine ME blocks is 93.6%, which saves external memory access cycles and speeds up ME. It is feasible to perform the algorithm at a 270 MHz clock speed for 30 frame/s real-time full HD encoding. Its total gate count is 872k, and internal SRAM size is 41.8 kB.

A Performance Evaluation System of the SOTDMA Algorithm for AIS (AIS용 SOTDMA 알고리즘 성능평가 시스템)

  • 김승범;임용곤;이상정
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.765-768
    • /
    • 2001
  • A universal shipborne automatic identification system(AIS) uses self-organised time division multiple access(SOTDMA). The performance evaluation system of the SOTDMA algorithm for AIS is needed to implement it efficiently. This paper shows the method of designing it. Real ships access the VHF maritime mobile band but in this performance system, several ship objects access the shared memory. Real ships are designed as the object and the wireless communication channel is designed as the shared memory. This system shows several stations are assigned the transmission slot by the SOTDMA algorithm

  • PDF