Search | Korea Science

Processor Design Technique for Low-Temperature Filter Cache (필터 캐쉬의 저온도 유지를 위한 프로세서 설계 기법)

Choi, Hong-Jun;Yang, Na-Ra;Lee, Jeong-A;Kim, Jong-Myon;Kim, Cheol-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.15 no.1
- /
- pp.1-12
- /
- 2010
Recently, processor performance has been improved dramatically. Unfortunately, as the process technology scales down, energy consumption in a processor increases significantly whereas the processor performance continues to improve. Moreover, peak temperature in the processor increases dramatically due to the increased power density, resulting in serious thermal problem. For this reason, performance, energy consumption and thermal problem should be considered together when designing up-to-date processors. This paper proposes three modified filter cache schemes to alleviate the thermal problem in the filter cache, which is one of the most energy-efficient design techniques in the hierarchical memory systems : Bypass Filter Cache (BFC), Duplicated Filter Cache (DFC) and Partitioned Filter Cache (PFC). BFC scheme enables the direct access to the L1 cache when the temperature on the filter cache exceeds the threshold, leading to reduced temperature on the filter cache. DFC scheme lowers temperature on the filter cache by appending an additional filter cache to the existing filter cache. The filter cache for PFC scheme is composed of two half-size filter caches to lower the temperature on the filter cache by reducing the access frequency. According to our simulations using Wattch and Hotspot, the proposed partitioned filter cache shows the lowest peak temperature on the filter cache, leading to higher reliability in the processor.
https://doi.org/10.9708/jksci.2010.15.1.001 인용 PDF KSCI

Low-power Filter Cache Design Technique for Multicore Processors (멀티 코어 프로세서를 위한 저전력 필터 캐쉬 설계 기법)

Park, Young-Jin;Kim, Jong-Myon;Kim, Cheol-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.14 no.12
- /
- pp.9-16
- /
- 2009
Energy consumption as well as performance should be considered when designing up-to-date multicore processors. In this paper, we propose new design technique to reduce the energy consumption in the instruction cache for multicore processors by using modified filter cache. The filter cache has been recognized as one of the most energy-efficient design techniques for singlecore processors. The energy consumed in the instruction cache accounts for a significant portion of total processor energy consumption. Therefore, energy-aware instruction cache design techniques are essential to reduce the energy consumption in a multicore processor. The proposed technique reduces the energy consumption in the instruction cache for multicore processors by reducing the number of accesses to the level-1 instruction cache. We evaluate the proposed design using a simulation infrastructure based on SimpleScalar and CACTI. Simulation results show that the proposed architecture reduces the energy consumption in the instruction cache for multicore processors by up to 3.4% compared to the conventional filter cache architecture. Moreover, the proposed architecture shows better performance over the conventional filter cache architecture.
https://doi.org/10.9708/jksci.2009.14.12.009 인용 PDF

Design of a High-Speed RFID Filtering Engine and Cache Based Improvement (고속 RFID 필터링 엔진의 설계와 캐쉬 기반 성능 향상)

Park Hyun-Sung;Kim Jong-Deok
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.31 no.5A
- /
- pp.517-525
- /
- 2006
In this paper, we present a high-speed RFID data filtering engine designed to carry out filtering under the conditions of massive data and massive filters. We discovered that the high-speed RFID data filtering technique is very similar to the high-speed packet classification technique which is used in high-speed routers and firewall systems. Actually, our filtering engine is designed based on existing packet classification algorithms, Bit Parallelism and Aggregated Bit Vector(ABV). In addition, we also discovered that there are strong temporal relations and redundancy in the RFID data filtering operations. We incorporated two kinds of caches, tag and filter caches, to make use of this characteristic to improve the efficiency of the filtering engine. The performance of the proposed engine has been examined by implementing a prototype system and testing it. Compared to the basic sequential filter comparison approach, our engine shows much better performance, and it gets better as the number of filters increases.
PDF KSCI

A design of low power structures of texture caches for mobile 3D graphics accelerator (모바일 3D 그래픽 가속기를 위한 저전력 텍스쳐 캐쉬 구조 설계)

Kim, Young-Sik;Lee, Jae-Young
- Journal of Korea Game Society
- /
- v.6 no.4
- /
- pp.63-70
- /
- 2006
This paper studied various low power structures of texture caches for mobile 3D graphics accelerator to reduce the memory latency of texture data. Also the paper designed the texture cache with the variable threshold values of power mode transition according to the filtering algorithms. In the trace driven simulation, we compared the performance of those structures using Quake game engine as the benchmark. Also the algorithm was proposed and verified by the simulation, which has variable threshold values of power mode transitions according to the selected texture filtering method.
PDF

A Low Power 3D Graphics Accelerator Considering Both Active and Standby Modes for Mobile Devices (모바일기기의 동작모드와 대기모드를 모두 고려한 저전력 3차원 그래픽 가속기)

Kim, Young-Sik
- Journal of KIISE:Computer Systems and Theory
- /
- v.34 no.2
- /
- pp.57-64
- /
- 2007
This paper proposed the low power texture cache for mobile 3D graphics accelerators. It is very important to reduce the leakage power in the standby mode for mobile 3D graphics accelerators and the memory access latency of texture mapping in the active mode which needs a large memory bandwidth. The proposed structure reduces the leakage power using variable threshold values of power mode transitions according to the selected texture filtering algorithms of application programs, which has the run time gain for texture mapping. In the trace driven cache simulation the proposed structure shows the best 7% performance gain to the previous MSA cache according to the new performance metric considering both normalized leakage power and run time impact.
PDF KSCI

Semantic schema data processing using cache mechanism (캐쉬메카니즘을 이용한 시맨틱 스키마 데이터 처리)

Kim, Byung-Gon;Oh, Sung-Kyun
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.3
- /
- pp.89-97
- /
- 2011
In semantic web information system like ontology that access distributed information from network, efficient query processing requires an advanced caching mechanism to reduce the query response time. P2P network system have become an important infra structure in web environment. In P2P network system, when the query is initiated, reducing the demand of data transformation to source peer is important aspect of efficient query processing. Caching of query and query result takes a particular advantage by adding or removing a query term. Many of the answers may already be cached and can be delivered to the user right away. In web environment, semantic caching method has been proposed which manages the cache as a collection of semantic regions. In this paper, we propose the semantic caching technique in cluster environment of peers. Especially, using schema data filtering technique and schema similarity cache replacement method, we enhanced the query processing efficiency.
https://doi.org/10.9708/jksci.2011.16.3.089 인용 PDF KSCI

An Area Efficient Low Power Data Cache for Multimedia Embedded Systems (멀티미디어 내장형 시스템을 위한 저전력 데이터 캐쉬 설계)

Kim Cheong-Ghil;Kim Shin-Dug
- The KIPS Transactions:PartA
- /
- v.13A no.2 s.99
- /
- pp.101-110
- /
- 2006
One of the most effective ways to improve cache performance is to exploit both temporal and spatial locality given by any program executional characteristics. This paper proposes a data cache with small space for low power but high performance on multimedia applications. The basic architecture is a split-cache consisting of a direct-mapped cache with small block sire and a fully-associative buffer with large block size. To overcome the disadvantage of small cache space, two mechanisms are enhanced by considering operational behaviors of multimedia applications: an adaptive multi-block prefetching to initiate various fetch sizes and an efficient block filtering to remove rarely reused data. The simulations on MediaBench show that the proposed 5KB-cache can provide equivalent performance and reduce energy consumption up to 40% as compared with 16KB 4-way set associative cache.
https://doi.org/10.3745/KIPSTA.2006.13A.2.101 인용 PDF KSCI

Cache-Answerability of XML Queries in Regular Path Expressions on the Web (웹에서 정규경로 표현식을 포함한 XML 질의의 캐쉬를 이용한 처리)

박정기;강현철
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.04b
- /
- pp.58-60
- /
- 2004
웹의 확산과 더불어 웹 페이지 검색의 성능 즉, 빠른 응답시간과 확장성(scalability)은 각 웹 사이트의 절대적 평가 기준이 되었다. 웹 옹용은 일반적으로 불특정 다수를 대상으로 하기 때문에 확장성 또한 주요 성능의 척도가 된다. 이와 같은 웹 사이트 성능을 담보하기 위한 대표적 요소기술이 웹 캐슁이다. 본 논문은 웹 상에서 XML 데이터베이스 기반의 웹 응용(XML database-backed web application)을 위한 응용서버의 XML 캐쉬를 이용하여 주어진 XML 질의를 변환, 처리하는 기법과 구현에 관한 것으로 XPath의 경로표현식 중 가장 중요한 세 가지 기능인 조건을 명시하는 필터 연산자, 부모-자식 관계를 나타내는 경로 연산자(/), 그리고 조상-후손 관계를 나타내는 경로 연산자(//)를 연구 범위로 하였다. [2]에서는 조상-후손 관계를 나타내는 경로 연산자(//)가 없는 경우에 경로표현식으로 주어진 XML 질의를 캐쉬를 이용하여 변환, 처리하는 알고리즘을 제시하였는데 본 논문에서는 [2]의 알고리즘을 확장하여 경로 연간자(//)가 지원되도록 하였다. 조상-후손 경로 연산자(//)로는 정규경로 표현식(regular path expression)을 나타낼 수 있는데 이는 스키마가 불확실한 반구조적 데이터인 XML 데이터에 대한 질의 표현에 유용하다. 제시된 알고리즘에서는 DTD를 이용하여 경로 정보를 얻어 처리함으로써 주어진 질의를 캐쉬와 하부 XML 소스에 대한 질의로 변환하였다. 이 알고리즘을 바탕으로 관계 DBMS를 이용하여 구현된 시스템으로 실제 웹 상에서 성능 실험을 수행하였다. 성능 실험 결과 정규 경로 표현식을 포함하는 XML 질의에 대해서도 웹에서 캐쉬를 이용한 처리가 효율적임을 확인하였다.키는데 목적이 있다.RED에 비해 향상된 성능을 보여주었다.웍스 네트워크상의 다양한 디바이스들간의 네트워크 다양화와 분산화 기능을 얻을 수 있었고, 기존의 고가의 해외 솔루션인 Echelon사의 LonMaker 소프트웨어를 사용하지 않고도 국내의 순수 솔루션인 리눅스 기반의 LonWare 3.0 다중 바인딩 기능을 통해 저 비용으로 홈 네트워크 구성 관리 서버 시스템 개발에 대한 비용을 줄일 수 있다. 기대된다.e 함량이 대체로 높게 나타났다. 점미가 수가용성분에서 goucose대비 용출함량이 고르게 나타나는 경향을 보였고 흑미는 알칼리가용분에서 glucose가 상당량(0.68%) 포함되고 있음을 보여주었고 arabinose(0.68%), xylose(0.05%)도 다른 종류에 비해서 다량 함유한 것으로 나타났다. 흑미는 총식이섬유 함량이 높고 pectic substances, hemicellulose, uronic acid 함량이 높아서 콜레스테롤 저하 등의 효과가 기대되며 고섬유식품으로서 조리 특성 연구가 필요한 것으로 사료된다.리하였다. 얻어진 소견(所見)은 다음과 같았다. 1. 모년령(母年齡), 임신회수(姙娠回數), 임신기간(姙娠其間), 출산시체중등(出産時體重等)의 제요인(諸要因)은 주산기사망(周産基死亡)에 대(對)하여 통계적(統計的)으로 유의(有意)한 영향을 미치고 있어 $25{\sim}29$세(歲)의 연령군에서, 2번째 임신과 2번째의 출산에서 그리고 만삭의 임신 기간에, 출산시체중(出産時體重) $3.50{\sim}3.99kg$사이의 아이에서 그 주산기사망률(周産基死亡率)이 각각 가장 낮았다. 2. 사산(死産)과 초생아사망(初生兒死亡)을 구분(區分)하여 고려해 볼때 사산(死産)은 모성(母性)의 임신력(
PDF

Hierarchical Ring Extension of NUMA Systems using Snooping Protocol (스누핑 프로토콜을 사용하는 NUMA 시스템의 계층적 링 구조로의 확장)

Seong, Hyeon-Jung;Kim, Hyeong-Ho;Jang, Seong-Tae;Jeon, Ju-Sik
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.11
- /
- pp.1305-1317
- /
- 1999
NUMA 구조는 원격 메모리에 대한 접근이 불가피한 구조적 특성 때문에 상호 연결망이 성능을 좌우하는 큰 변수가 된다. 기존에 대중적으로 사용되던 버스는 물리적 확장성 및 대역폭에서 대규모 시스템을 구성하는 데 한계를 보인다. 이를 대체하는 고속의 지점간 링크를 사용한 링 구조는 버스가 가지는 확장성 및 대역폭의 한계라는 단점을 개선하였으나, 많은 클러스터가 연결되는 경우에는 전송 지연시간이 증가하는 문제점을 가지고 있다. 본 논문에서는 스누핑 프로토콜이 적용된 링 구조에서 클러스터 개수 증가에 따른 지연시간 증가의 문제점을 보완하기 위해 계층적 링 구조로의 확장을 제안하고, 이 구조에 효과적인 캐쉬 일관성 프로토콜을 설계하였다. 전역 링과 지역 링을 연결하는 브리지는 캐쉬 프로토콜을 관리하며 이 프로토콜에 의해 지역 링의 부하를 줄일 수 있도록 트랜잭션을 필터링하는 역할도 담당함으로써 시스템의 성능을 향상시킨다. probability-driven 시뮬레이터를 통해 계층적 링 구조가 시스템의 성능 및 링 이용률에 미치는 영향을 알아본다. Abstract Since NUMA architecture has to access remote memory, interconnection network performance determines performance of NUMA architecture. Bus, which has been used as popular interconnection network of NUMA, has a limit to build a large-scale system because of limited physical scalability and bandwidth. Ring interconnection network, composed of high-speed point-to-point link, made up for bus's defects of scalability and bandwidth. But, it also has problem of increasing delay as the number of clusters is increased. In this paper, we propose a hierarchical expansion of snoop-based ring architecture in order to overcome ring's defects of increasing delay. And we also design an efficient cache coherence protocol adopted to this architecture. Bridge, which connects local ring and global ring, maintains cache coherence protocol and does snoop-filtering which reduces local ring and cluster bus utilization. Therefore bridge can improve performance of this system. We analyze effects of hierarchical architecture on the performance of system and utilization of point-to-point links using probability-driven simulator.

A Design of Fractional Motion Estimation Engine with 4×4 Block Unit of Interpolator & SAD Tree for 8K UHD H.264/AVC Encoder (8K UHD(7680×4320) H.264/AVC 부호화기를 위한 4×4블럭단위 보간 필터 및 SAD트리 기반 부화소 움직임 추정 엔진 설계)

Lee, Kyung-Ho;Kong, Jin-Hyeung
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.6
- /
- pp.145-155
- /
- 2013
In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Fractional Motion Estimation in 8K UHD($7680{\times}4320$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $10{\times}10$ reference data for interpolation, we design 2D cache buffer which consists of the $10{\times}10$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The gate count is 436.5Kgates. The proposed H.264/AVC Fractional Motion Estimation can support 8K UHD at 30 frames per second by running at 187MHz.
https://doi.org/10.5573/ieek.2013.50.6.145 인용 PDF KSCI

Search Result 13, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)