• Title/Summary/Keyword: Memory access

Search Result 1,131, Processing Time 0.032 seconds

Cost-effective multistage interconnection network for UNMA model system (NUMA(non-uniform memory access) 모델 시스템을 위한 cost-effective한 다단계 상호연결망)

  • 최창훈;김성천
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.5
    • /
    • pp.19-32
    • /
    • 1997
  • So far, the multiple path MINs to provide redundant paths in the traditional UPP MINs have been realized by adding additional hardware such as extra stages, duplicated data links, or multiple copies of sthe MIN. And the traditional MINs do not exploit locality: communication with all processor-memory paris takes the same amount of time. Also so far there has been little progress for exploiting locality of reference in MINs. In this paper, we present a new topology MIN, hybrid MIN that is constructed with 2N-3 SEs which is far fewer SEs than that of traditional MINs. Although the hybrid MIN is constructed with 2N-3 SEs, the hybrid MIN satisfies full access capability (FAC) and has redundant paths(but providing single path for 2 memory modules of each processor). Moreover the has redundant paths (but providing single path for 2 memory modules of each processor). Moreover the Hybrid MIN provides shortcut path between pairs which have frequent dat acommunication (locality of reference). Its performance under varing degrees of localized communication is analyzed.

  • PDF

A Low Power Design of H.264 Codec Based on Hardware and Software Co-design

  • Park, Seong-Mo;Lee, Suk-Ho;Shin, Kyoung-Seon;Lee, Jae-Jin;Chung, Moo-Kyoung;Lee, Jun-Young;Eum, Nak-Woong
    • Information and Communications Magazine
    • /
    • v.25 no.12
    • /
    • pp.10-18
    • /
    • 2008
  • In this paper, we present a low-power design of H.264 codec based on dedicated hardware and software solution on EMP(ETRI Multi-core platform). The dedicated hardware scheme has reducing computation using motion estimation skip and reducing memory access for motion estimation. The design reduces data transfer load to 66% compared to conventional method. The gate count of H.264 encoder and the performance is about 455k and 43Mhz@30fps with D1(720x480) for H.264 encoder. The software solution is with ASIP(Application Specific Instruction Processor) that it is SIMD(Single Instruction Multiple Data), Dual Issue VLIW(Very Long Instruction Word) core, specified register file for SIMD, internal memory and data memory access for memory controller, 6 step pipeline, and 32 bits bus width. Performance and gate count is 400MHz@30fps with CIF(Common Intermediated format) and about 100k per core for H.264 decoder.

Electrical characteristic for Phase-change Random Access Memory according to the $Ge_{1}Se_{1}Te_{2}$ thin film of cell structure (상변화 메모리 응용을 위한 $Ge_{1}Se_{1}Te_{2}$ 박막의 셀 구조에 따른 전기적 특성)

  • Na, Min-Seok;Lim, Dong-Kyu;Kim, Jae-Hoon;Choi, Hyuk;Chung, Hong-Bay
    • Proceedings of the KIEE Conference
    • /
    • 2007.07a
    • /
    • pp.1335-1336
    • /
    • 2007
  • Among the emerging non-volatile memory technologies, phase change memories are the most attractive in terms of both performance and scalability perspectives. Phase-change random access memory(PRAM), compare with flash memory technologies, has advantages of high density, low cost, low consumption energy and fast response speed. However, PRAM device has disadvantages of set operation speed and reset operation power consumption. In this paper, we investigated scalability of $Ge_{1}Se_{1}Te_{2}$ chalcogenide material to improve its properties. As a result, reduction of phase change region have improved electrical properties of PRAM device.

  • PDF

Multiaccess Memory System supporting Local Buffer Memory System to Processing Elements (처리기에 지역 버퍼 메모리 시스템을 지원하는 다중접근기억장치)

  • Lee, Hyung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.30-37
    • /
    • 2012
  • A memory system with the linear skewing scheme has been regarded as one of suitable memory systems for a single instruction, multiple data (SIMD) architecture. The memory system supports simultaneous access n data to m memory modules within various access types with a constant interval in an arbitrary position in two dimensional data array of $M{\times}N$. Although $m{\times}cells$ memory cells are physically required to support logical two dimensional $M{\times}N$ array of data by means of the memory system, at least (m-n)${\times}cells$ memory cells remain in disuse, where cells is (M-1)/q+(N-1)/$p{\times}{\lceil}M/q{\rceil}+1$. On keeping functionalities the memory system supports, $(n{\times}t){\times}N/p$ out of a number of unused memory cells, where t>0, being used as local buffer memories for n processing elements is proposed in this paper.

File Access Pattern Collection Scheme based on Repetitiveness (반복성을 고려한 파일 액세스 패턴 수집 기법)

  • Hwnag-Bo, Jun-Hyoung;Seok, Seong-U;Seo, Dae-Hwa
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.12
    • /
    • pp.674-684
    • /
    • 2001
  • This paper presents the SIC(Size-Interval-Count) prefetching scheme that can record the file access patterns of applications within a relatively small space of memory based on the repetitiveness of the file access patterns. Several knowledge-based prefetching methods were recently introduced, which includes high correctness in predicting future accesses of applications. They records the access patterns of applications and uses recorded access pattern information to predict which blocks will be requested next. Yet, these methods require to much memory space. Accordingly, the proposed method then uses the recorded file access patterns, referred to as "SIC access pattern information", to correctly predict the future accesses of the applications. The proposed prefetching method improved the response time by about 40% compared to the general file system and showed remarkable memory efficiency compared to the previously knowledge-based prefetching methods.

  • PDF

Analyzing the Overhead of the Memory Mapped File I/O for In-Memory File Systems (메모리 파일시스템에서 메모리 매핑을 이용한 파일 입출력의 오버헤드 분석)

  • Choi, Jungsik;Han, Hwansoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.10
    • /
    • pp.497-503
    • /
    • 2016
  • Emerging next-generation storage technologies such as non-volatile memory will help eliminate almost all of the storage latency that has plagued previous storage devices. In conventional storage systems, the latency of slow storage devices dominates access latency; hence, software efficiency is not critical. With low-latency storage, software costs can quickly dominate memory latency. Hence, researchers have proposed the memory mapped file I/O to avoid the software overhead. Mapping a file into the user memory space enables users to access the file directly. Therefore, it is possible to avoid the complicated I/O stack. This minimizes the number of user/kernel mode switchings. In addition, there is no data copy between kernel and user areas. Despite of the benefits in the memory mapped file I/O, its overhead still needs to be addressed, as the existing mechanism for the memory mapped file I/O is designed for slow block devices. In this paper, we identify the overheads of the memory mapped file I/O via experiments.

Fully Room Temperature fabricated $TaO_x$ Thin Film for Non-volatile Memory

  • Choi, Sun-Young;Kim, Sang-Sig;Lee, Jeon-Kook
    • Proceedings of the Materials Research Society of Korea Conference
    • /
    • 2011.05a
    • /
    • pp.28.2-28.2
    • /
    • 2011
  • Resistance random access memory (ReRAM) is a promising candidate for next-generation nonvolatile memory because of its advantageous qualities such as simple structure, superior scalability, fast switching speed, low-power operation, and nondestructive readout. We investigated the resistive switching behavior of tantalum oxide that has been widely used in dynamic random access memories (DRAM) in the present semiconductor industry. As a result, it possesses full compatibility with the entrenched complementary metal-oxide-semiconductor processes. According to previous studies, TiN is a good oxygen reservoir. The TiN top electrode possesses the specific properties to control and modulate oxygen ion reproductively, which results in excellent resistive switching characteristics. This study presents fully room temperature fabricated the TiN/$TaO_x$/Pt devices and their electrical properties for nonvolatile memory application. In addition, we investigated the TiN electrode dependence of the electrical properties in $TaO_x$ memory devices. The devices exhibited a low operation voltage of 0.6 V as well as good endurance up to $10^5$ cycles. Moreover, the benefits of high devise yield multilevel storage possibility make them promising in the next generation nonvolatile memory applications.

  • PDF

A Case Study of a Navigator Optimization Process

  • Cho, Doosan
    • International journal of advanced smart convergence
    • /
    • v.6 no.1
    • /
    • pp.26-31
    • /
    • 2017
  • When mobile navigator device accesses data randomly, the cache memory performance is rapidly deteriorated due to low memory access locality. For instance, GPS (General Positioning System) of navigator program for automobiles or drones, that are currently in common use, uses data from 32 satellites and computes current position of a receiver. This computation of positioning is the major part of GPS which accounts more than 50% computation in the program. In this computation task, the satellite signals are received in real time and stored in buffer memories. At this task, since necessary data cannot be sequentially stored, the data is read and used at random. This data accessing patterns are generated randomly, thus, memory system performance is worse by low data locality. As a result, it is difficult to process data in real time due to low data localization. Improving the low memory access locality inherited on the algorithms of conventional communication applications requires a certain optimization technique to solve this problem. In this study, we try to do optimizations with data and memory to improve the locality problem. In experiment, we show that our case study can improve processing speed of core computation and improve our overall system performance by 14%.

270 MHz Full HD H.264/AVC High Profile Encoder with Shared Multibank Memory-Based Fast Motion Estimation

  • Lee, Suk-Ho;Park, Seong-Mo;Park, Jong-Won
    • ETRI Journal
    • /
    • v.31 no.6
    • /
    • pp.784-794
    • /
    • 2009
  • We present a full HD (1080p) H.264/AVC High Profile hardware encoder based on fast motion estimation (ME). Most processing cycles are occupied with ME and use external memory access to fetch samples, which degrades the performance of the encoder. A novel approach to fast ME which uses shared multibank memory can solve these problems. The proposed pixel subsampling ME algorithm is suitable for fast motion vector searches for high-quality resolution images. The proposed algorithm achieves an 87.5% reduction of computational complexity compared with the full search algorithm in the JM reference software, while sustaining the video quality without any conspicuous PSNR loss. The usage amount of shared multibank memory between the coarse ME and fine ME blocks is 93.6%, which saves external memory access cycles and speeds up ME. It is feasible to perform the algorithm at a 270 MHz clock speed for 30 frame/s real-time full HD encoding. Its total gate count is 872k, and internal SRAM size is 41.8 kB.

A Remote Cache Coherence Protocol for Single Shared Memory in Multiprocessor System (단일 공유 메모리를 가지는 다중 프로세서 시스템의 원격 캐시 일관성 유지 프로토콜)

  • Kim, Seong-Woon;Kim, Bo-Gwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.6
    • /
    • pp.19-28
    • /
    • 2005
  • The multiprocessor architecture is a good method to improve the computer system performance. The CC-NUMA provides a single shared space with the physically distributed memories is used widely in the multiprocessor computer system. A CC-NUMA has the full-mapped directory for the shared memory md uses a remote cache memory for tile fast memory access. In this paper, we propose a processing node architecture for a CC-NUMA system and a cache coherency protocol on the physically distributed but logically shared system. We show an implementation result of the system which is adopted the cache coherency protocol.