• Title/Summary/Keyword: Cache architectures

Search Result 32, Processing Time 1.076 seconds

Localized Composition of LU-SGS Code on Latest Microprocessors (최신 마이크로프로세서상에서 LU-SGS 코드의 국소화 작성)

  • Choi Jeong-Yeol
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2001.05a
    • /
    • pp.45-50
    • /
    • 2001
  • An approach of composing a performance optimized computational code is suggested for latest microprocessors. The approach named as localization is a concept of minimizing the access to system's main memory and maximizing the utilization of second level cache that is common to all the latest computer system. The localized compositions of LU-SGS scheme for fluid dynamics were made in three different levels and tested on three different microprocessor architectures most widely used in these days. The test results of localization concept showed a remarkable performance, that is the showing gain up to 4.5 times faster solution than the baseline algorithm $450\%$ for producing an exactly the same solution.

  • PDF

A Development of Fusion Processor Architecture for Efficient Main Memory Access in CPU-GPU Environment (CPU-GPU환경에서 효율적인 메인메모리 접근을 위한 융합 프로세서 구조 개발)

  • Park, Hyun-Moon;Kwon, Jin-San;Hwang, Tae-Ho;Kim, Dong-Sun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.11 no.2
    • /
    • pp.151-158
    • /
    • 2016
  • The HSA resolves an old problem with existing CPU and GPU architectures by allowing both units to directly access each other's memory pools via unified virtual memory. In a physically realized system, however, frequent data exchanges between CPU and GPU for a virtual memory block result bottlenecks and coherence request overheads. In this paper, we propose Fusion Processor Architecture for efficient access of main memory from both CPU and GPU. It consists of Job Manager, Re-mapper, and Pre-fetcher to control, organize, and distribute work loads and working areas for GPU cores. These components help on reducing memory exchanges between the two processors and improving overall efficiency by eliminating faulty page table requests. To verify proposed algorithm architectures, we develop an emulator based on QEMU, and compare several architectures such as CUDA(Compute Unified Device Architecture), OpenMP, OpenCL. As a result, Proposed fusion processor architectures show 198% faster than others by removing unnecessary memory copies and cache-miss overheads.

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

  • Choi, Hongjun;Kim, Cheolhong
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.33-38
    • /
    • 2014
  • Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.

A Comparative Study on Off-Path Content Access Schemes in NDN (NDN에서 Off-Path 콘텐츠 접근기법들에 대한 성능 비교 연구)

  • Lee, Junseok;Kim, Dohyung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.12
    • /
    • pp.319-328
    • /
    • 2021
  • With popularization of services for massive content, the fundamental limitations of TCP/IP networking were discussed and a new paradigm called Information-centric networking (ICN) was presented. In ICN, content is addressed by the content identifier (content name) instead of the location identifier such as IP address, and network nodes can use the cache to store content in transit to directly service subsequent user requests. As the user request can be serviced from nearby network caches rather than from far-located content servers, advantages such as reduced service latency, efficient usage of network bandwidth, and service scalability have been introduced. However, these advantages are determined by how actively content stored in the cache can be utilized. In this paper, we 1) introduce content access schemes in Named-data networking, one of the representative ICN architectures; 2) in particular, review the schemes that allow access to cached content away from routing paths; 3) conduct comparative study on the performance of the schemes using the ndnSIM simulator.

Design and Performance Analysis of High Performance Processor-Memory Integrated Architectures (고성능 프로세서-메모리 혼합 구조의 설계 및 성능 분석)

  • Kim, Young-Sik;Kim, Shin-Dug;Han, Tack-Don
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2686-2703
    • /
    • 1998
  • The widening pClformnnce gap between processor and memory causes an emergence of the promising architecture, processor-memory (PM) integration In this paper, various design issues for P-M integration are studied, First, an analytical model of the DRAM access time is constructed considering both the bank conflict ratio and the DRAM page hit ratio. Then the points of both the performance improvement and the perfonnance bottle neck are found by the proposed model as designing on-chip DRAM architectures. This paper proposes the new architecture, called the delayed precharge bank architecture, to improve the perfonnance of memory system as increasing the DRAM page hit ratio. This paper also adapts an efficient bank interleaving mechanism to the proposed architecture. This architecture is verified !II he better than the hierarchical multi-bank architecture as well as the conventional bank architecture by executiun driven simulation. Eight SPEC95 benchmarks are used for simulation as changing parameters for the cache architecture, the number of DRAM banks, and the delayed time quantum.

  • PDF

Extended Buffer Management with Flash Memory SSDs (플래시메모리 SSD를 이용한 확장형 버퍼 관리)

  • Sim, Do-Yoon;Park, Jang-Woo;Kim, Sung-Tan;Lee, Sang-Won;Moon, Bong-Ki
    • Journal of KIISE:Databases
    • /
    • v.37 no.6
    • /
    • pp.308-314
    • /
    • 2010
  • As the price of flash memory continues to drop and the technology of flash SSD controller innovates, high performance flash SSDs with affordable prices flourish in the storage market. Nevertheless, it is hard to expect that flash SSDs will replace harddisks completely as database storage. Instead, the approach to use flash SSD as a cache for harddisks would be more practical, and, in fact, several hybrid storage architectures for flash memory and harddisk have been suggested in the literature. In this paper, we propose a new approach to use flash SSD as an extended buffer for main buffer in database systems, which stores the pages replaced out from main buffer and returns the pages which are re-referenced in the upper buffer layer, improving the system performance drastically. In contrast to the existing approaches to use flash SSD as a cache in the lower storage layer, our approach, which uses flash SSD as an extended buffer in the upper host, can provide fast random read speed for the warm pages which are being replaced out from the limited main buffer. In fact, for all the pages which are missing from the main buffer in a real TPC-C trace, the hit ratio in the extended buffer could be more than 60%, and this supports our conjecture that our simple extended buffer approach could be very effective as a cache. In terms of performance/price, our extended buffer architecture outperforms two other alternative approaches with the same cost, 1) large main buffer and 2) more harddisks.

WLRU: Remote Cache Management Policy for Distributed Shared Memory Architectures (WLRU: 분산 공유 메모리 구조에 적합한 원격 캐시 관리 정책)

  • Suh Hyo-Joong;Lee Byong-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.61-63
    • /
    • 2005
  • 분산 메모리에 기반한 다중 프로세서 시스템은 기존의 중앙 집중형 메모리 구조의 단점인 메모리 접근의 병목현상을 극복하고 프로세서와 메모리의 부가에 따라 메모리 대역폭을 확장시킬 수 있는 구조로써 최근의 다중 프로세서 시스템 구조의 주류로 대두되고 있다. 다중 프로세서 시스템의 성능은 메모리 접근 지연에 의하여 제한 받고 있는데 이러한 이유는 프로세서의 동작 주파수 속도에 비하여 메모리의 접근 지연이 수십 배 이상이 되기 때문이다. 특히 분산 메모리 다중 프로세서 시스템에 있어서 메모리 접근은 지역 메모리 접근과 원격 메모리 접근의 두 가지 유형으로 나눌 수 있는데 이 중 원격 메모리 접근 지연은 시스템의 상호 접속망 구조에 따라 지역 메모리 접근 지연에 비하여 수 배 내지 수십 배에 이르고 있다. 본 논문에서는 분산 메모리 다중 프로세서 시스템에서 상호 접속 망의 구조에 따라 원격 메모리 접근 간에도 시간 지연의 차이가 있음에 착안하여 원격 메모리 접근 시간 지연에 따른 최적화 된 원격 캐시 관리 정책을 제시하며 각 상호 접속 망의 구조에 따라 이러한 캐시 관리 정책에 의한 성능 향상의 정도를 측정한다.

  • PDF

Low-Power 2-level Cache Architectures for Embedded System (내장형 시스템을 위한 저전력 2-레벨 캐쉬 메모리의 설계)

  • Jong-Min Lee;Soon-Tae Kim;Kyung-Ah Kim;Su-Ho Park;Yong-Ho Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.806-809
    • /
    • 2008
  • 온칩(on-chip) 캐쉬는 외부 메모리로의 접근을 감소시키는 중요한 역할을 한다. 본 연구에서는 내장형 시스템에 맞추어 설계된 2-레벨 캐쉬 메모리 구조를 제안하고자 한다. 레벨1(L1) 캐쉬의 구성으로 작은 크기, 직접사상(direct-mapped) 그리고 바로쓰기(write-through)를 채용한다. 대조적으로 레벨2(L2) 캐쉬는 일반적인 캐쉬 크기와 집합연관(Set-associativity) 그리고 나중쓰기(write-back) 정책을 채용한다. 결과적으로 L1캐쉬는 한 사이클 이내에 접근될 수 있고 L2캐쉬는 전체 캐쉬의 미스율(global miss rate)을 낮추는데 효과적이다. 두 캐쉬 계층간 바로쓰기(write-thorough) 정책에서 오는 빈번한 L2 캐쉬 접근으로 인한 에너지 소비를 줄이기 위해 본 연구에서는 One-way 접근 기법을 제안하였다. 본 연구에서 제안한 2-레벨 캐쉬 메모리 구조는 평균적으로 26%의 성능향상과 43%의 에너지 소비 그리고 77%의 에너지-지연 곱에서 이득을 보여주었다.

Cache Performance Analysis of Multiprocessor Systems for OLTP Applications based on a Memory-Resident DBMS (메모리 상주 DBMS 기반의 OLTP 응용을 위한 다중프로세서 시스템 캐쉬 성능 분석)

  • Chung, Yong-Wha;Hahn, Woo-Jong;Yoon, Suk-Han;Park, Jin-Won;Lee, Kang-Woo;Kim, Yang-Woo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.4
    • /
    • pp.383-392
    • /
    • 2000
  • Currently, multiprocessors are evaluated almost exclusively with scientific applications. Commercial applications are rarely explored because it is difficult to obtain the source codes of commercial DBMS. Even when the source code is available, such as for POSTGRES, understanding the source code enough to perform detailed meaningful performance evaluations is a daunting task for computer architects.To evaluate multiprocessors with commercial applications, we have developed our own DBMS, called EZDB. EZDB is a parallelized DBMS, loosely inspired from POSTGRES, and running on top of a software architecture simulator. It is capable of executing parallel programs written in SQL. Contrary to POSTGRES, EZDB is not intended as a prototype for a production-quality DBMS. Its purpose is to easily run and evaluate the performance of commercial applications on multiprocessor architectures. To illustrate the usefulness of EZDB, we showed the cache performance data collected for the TPC-B benchmark on a shared-memory multiprocessor. The simulation results showed that the data structures exhibited unique sharing characteristics and that their locality properties and working sets were very different from those in scientific applications.

  • PDF

A Name-based Service Discovering Mechanism for Efficient Service Delivery in IoT (IoT에서 효율적인 서비스 제공을 위한 이름 기반 서비스 탐색 메커니즘)

  • Cho, Kuk-Hyun;Kim, Jung-Jae;Ryu, Minwoo;Cha, Si-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.6
    • /
    • pp.46-54
    • /
    • 2018
  • The Internet of Things (IoT) is an environment in which various devices provide services to users through communications. Because of the nature of the IoT, data are stored and distributed in heterogeneous information systems. In this situation, IoT end applications should be able to access data without having information on where the data are or what the type of storage is. This mechanism is called Service Discovery (SD). However, some problems arise, since the current SD architectures search for data in physical devices. First, turnaround time increases from searching for services based on physical location. Second, there is a need for a data structure to manage devices and services separately. These increase the administrator's service configuration complexity. As a result, the device-oriented SD structure is not suitable to the IoT. Therefore, we propose an SD structure called Name-based Service-centric Service Discovery (NSSD). NSSD provides name-based centralized SD and uses the IoT edge gateway as a cache server to speed up service discovery. Simulation results show that NSSD provides about twice the improvement in average turnaround time, compared to existing domain name system and distributed hash table SD architectures.