• Title/Summary/Keyword: 2-레벨 데이터 캐쉬

Search Result 8, Processing Time 0.02 seconds

Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System (내장형 시스템을 위한 에너지-성능 측면에서 효율적인 2-레벨 데이터 캐쉬 구조의 설계)

  • Lee, Jong-Min;Kim, Soon-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.5
    • /
    • pp.292-303
    • /
    • 2010
  • On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.

Core-aware Cache Replacement Policy for Reconfigurable Last Level Cache (재구성 가능한 라스트 레벨 캐쉬 구조를 위한 코어 인지 캐쉬 교체 기법)

  • Son, Dong-Oh;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.11
    • /
    • pp.1-12
    • /
    • 2013
  • In multi-core processors, Last Level Cache(LLC) can reduce the speed gap between the memory and the core. For this reason, LLC has big impact on the performance of processors. LLC is composed of shared cache and private cache. In computer architecture community, most researchers have mainly focused on the management techniques for shared cache, while management techniques for private cache have not been widely researched. In conventional private LLC, memory is statically assigned to each core, resulting in serious performance degradation when the workloads are not fairly distributed. To overcome this problem, this paper proposes the replacement policy for managing private cache of LLC efficiently. As proposed core-aware cache replacement policy can reconfigure LLC dynamically, hit rate of LLC is increases drastically. Moreover, proposed policy uses 2-bit saturating counters to improve the performance. According to our simulation results, the proposed method can improve hit rates by 9.23% and reduce the access time by 12.85% compared to the conventional method.

Performance Analysis of Cache and Internal Memory of a High Performance DSP for an Optimal Implementation of Motion Picture Encoder (고성능 DSP에서 동영상 인코더의 최적화 구현을 위한 캐쉬 및 내부 메모리 성능 분석)

  • Lim, Se-Hun;Chung, Sun-Tae
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.5
    • /
    • pp.72-81
    • /
    • 2008
  • High Performance DSP usually supports cache and internal memory. For an optimal implementation of a multimedia stream application on such a high performance DSP, one needs to utilize the cache and internal memory efficiently. In this paper, we investigate performance analysis of cache, and internal memory configuration and placement necessary to achieve an optimal implementation of multimedia stream applications like motion picture encoder on high performance DSP, TMS320C6000 series, and propose strategies to improve performance for cache and internal memory placement. From the results of analysis and experiments, it is verified that 2-way L2 cache configuration with the remaining memory configured as internal memory shows relatively good performance. Also, it is shown that L1P cache hit rate is enhanced when frequently called routines and routines having caller-callee relationships with them are continuously placed in the internal memory and that L1D cache hit rate is enhanced by the simple change of the data size. The results in the paper are expected to contribute to the optimal implementation of multimedia stream applications on high performance DSPs.

Real-time Implementation of HVXC codec conforming to MPEG-4 audio using TMS320C6701 DSP (TMS320C6701 DSP를 이용한 MPEG-4 오디오 HVXC 코덱의 실시간 구현)

  • Kang, Kyeong-Ok;Hong, Jin-Woo;Kim, Jin-Woong;Na, Hoon;Jeong, Dae-Gwon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1999.11b
    • /
    • pp.261-266
    • /
    • 1999
  • 본 논문에서는 인터넷 폰이나 디지털 이동통신에서와 같이 낮은 비트율이 요구되는 응용분야에서 사용될 수 있는 HVXC 부호화 및 복호화 알고리즘을 TMS320C6701 160MHz DSP를 사용하여 실시간 동작을 구현한 내용을 기술한다. 사용한 최적화 방법으로는 기본적으로 연산 시간이 많이 소요되는 함수 루틴에 대한 C 언어레벨의 최적화 및 어셈블리어 레벨의 최적화를 수행하였고, TMS320C6701 DSP 내부 프로그램 메모리를 프로그램 캐쉬로 사용하였다. 또한, 계산량이 많은 부분과 테이블 참조가 필요한 연산을DSP의 내부 데이터 메모리 영역에서 수행하여 소요시간을 단축하였으며, 음성신호 및 비트스트림의 입출력에는 background DMA(direct memory access) 방식을 이용하였다. 이와 같은 최적화결과 2kbps 및 4kbps의 비트율에서 압축 및 복원을 실시간으로 수행할 수 있다.

  • PDF

Probability-based Pre-fetching Method for Multi-level Abstracted Data in Web GIS (웹 지리정보시스템에서 다단계 추상화 데이터의 확률기반 프리페칭 기법)

  • 황병연;박연원;김유성
    • Spatial Information Research
    • /
    • v.11 no.3
    • /
    • pp.261-274
    • /
    • 2003
  • The effective probability-based tile pre-fetching algorithm and the collaborative cache replacement algorithm are able to reduce the response time for user's requests by transferring tiles which will be used in advance and determining tiles which should be removed from the restrictive cache space of a client based on the future access probabilities in Web GISs(Geographical Information Systems). The Web GISs have multi-level abstracted data for the quick response time when zoom-in and zoom-out queries are requested. But, the previous pre-fetching algorithm is applied on only two-dimensional pre-fetching space, and doesn't consider expanded pre-fetching space for multi-level abstracted data in Web GISs. In this thesis, a probability-based pre-fetching algorithm for multi-level abstracted in Web GISs was proposed. This algorithm expanded the previous two-dimensional pre-fetching space into three-dimensional one for pre-fetching tiles of the upper levels or lower levels. Moreover, we evaluated the effect of the proposed pre-fetching algorithm by using a simulation method. Through the experimental results, the response time for user requests was improved 1.8%∼21.6% on the average. Consequently, in Web GISs with multi-level abstracted data, the proposed pre-fetching algorithm and the collaborative cache replacement algorithm can reduce the response time for user requests substantially.

  • PDF

Hybrid Scheme of Data Cache Design for Reducing Energy Consumption in High Performance Embedded Processor (고성능 내장형 프로세서의 에너지 소비 감소를 위한 데이타 캐쉬 통합 설계 방법)

  • Shim, Sung-Hoon;Kim, Cheol-Hong;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.3
    • /
    • pp.166-177
    • /
    • 2006
  • The cache size tends to grow in the embedded processor as technology scales to smaller transistors and lower supply voltages. However, larger cache size demands more energy. Accordingly, the ratio of the cache energy consumption to the total processor energy is growing. Many cache energy schemes have been proposed for reducing the cache energy consumption. However, these previous schemes are concerned with one side for reducing the cache energy consumption, dynamic cache energy only, or static cache energy only. In this paper, we propose a hybrid scheme for reducing dynamic and static cache energy, simultaneously. for this hybrid scheme, we adopt two existing techniques to reduce static cache energy consumption, drowsy cache technique, and to reduce dynamic cache energy consumption, way-prediction technique. Additionally, we propose a early wake-up technique based on program counter to reduce penalty caused by applying drowsy cache technique. We focus on level 1 data cache. The hybrid scheme can reduce static and dynamic cache energy consumption simultaneously, furthermore our early wake-up scheme can reduce extra program execution cycles caused by applying the hybrid scheme.

A Study on an Error Correction Code Circuit for a Level-2 Cache of an Embedded Processor (임베디드 프로세서의 L2 캐쉬를 위한 오류 정정 회로에 관한 연구)

  • Kim, Pan-Ki;Jun, Ho-Yoon;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.1
    • /
    • pp.15-23
    • /
    • 2009
  • Microprocessors, which need correct arithmetic operations, have been the subject of in-depth research in relation to soft errors. Of the existing microprocessor devices, the memory cell is the most vulnerable to soft errors. Moreover, when soft errors emerge in a memory cell, the processes and operations are greatly affected because the memory cell contains important information and instructions about the entire process or operation. Users do not realize that if soft errors go undetected, arithmetic operations and processes will have unexpected outcomes. In the field of architectural design, the tool that is commonly used to detect and correct soft errors is the error check and correction code. The Itanium, IBM PowerPC G5 microprocessors contain Hamming and Rasio codes in their level-2 cache. This research, however, focuses on huge server devices and does not consider power consumption. As the operating and threshold voltage is currently shrinking with the emergence of high-density and low-power embedded microprocessors, there is an urgent need to develop ECC (error check correction) circuits. In this study, the in-output data of the level-2 cache were analyzed using SimpleScalar-ARM, and a 32-bit H-matrix for the level-2 cache of an embedded microprocessor is proposed. From the point of view of power consumption, the proposed H-matrix can be implemented using a schematic editor of Cadence. Therefore, it is comparable to the modified Hamming code, which uses H-spice. The MiBench program and TSMC 0.18 um were used in this study for verification purposes.

Real-time Implementation of MPEG-4 HVXC Encoder and Decoder on Floating Point DSP (부동 소수점 DSP를 이용한 MPEG-4 HVXC 인코더 및 디코더의 실시간 구현)

  • Kang, Kyeong-ok;Na, Hoon;Hong, Jin-Woo;Jeong, Dae-Gwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.37-44
    • /
    • 2000
  • In this paper, we described the real-time implementation effort of MPEG-4 audio HVXC (Harmonic Vector eXcitation Coding) algorithm for very low bitrates, which has target applications from mobile communications to Internet telephony, on current high performance floating point TMS320C6701 DSP. We adopted a hardware structure for real-time operation. In order for software optimization, we used C- and assembly-language level optimizations for time-critical functional codes. Utilizing the internal program memory of the DSP as the program cache, the internal data memory overlap technique and DMA functionality, we could get a goal of realtime operation of HVXC codec both at 2 kbit/s and at 4 kbit/s. For an encoder at 2 kbit/s, the optimization ratio to original code is about 96 %. Finally, we got the subjective quality of MOS 2.45 at 2 kbit/s from an informal quality test.

  • PDF