Search | Korea Science

Analysis on the GPU Performance according to Hierarchical Memory Organization (계층적 메모리 구성에 따른 GPU 성능 분석)

Choi, Hongjun;Kim, Jongmyon;Kim, Cheolhong
- The Journal of the Korea Contents Association
- /
- v.14 no.3
- /
- pp.22-32
- /
- 2014
Recently, GPGPU has been widely used for general-purpose processing as well as graphics processing by providing optimized hardware for parallel processing. Memory system has big effects on the performance of parallel processing units such as GPU. In the GPU, hierarchical memory architecture is implemented for high memory bandwidth. Moreover, both memory address coalescing and memory request merging techniques are widely used. This paper analyzes the GPU performance according to various memory organizations. According to our simulation results, GPU performance improves by 15.5%, 21.5%, 25.5%, 30.9% as adding 8KB L1, 16KB L1, 32KB L1, 64KB L1 cache, respectively, compared to case without L1 cache. However, experimental results show that some benchmarks decrease performance since memory transaction increases due to data dependency. Moreover, average memory access latency is increased as the depth of hierarchical cache level increases when cache miss occurs significantly.
https://doi.org/10.5392/JKCA.2014.14.03.022 인용 PDF KSCI

An Address Translation Technique Large NAND Flash Memory using Page Level Mapping (페이지 단위 매핑 기반 대용량 NAND플래시를 위한 주소변환기법)

Seo, Hyun-Min;Kwon, Oh-Hoon;Park, Jun-Seok;Koh, Kern
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.3
- /
- pp.371-375
- /
- 2010
SSD is a storage medium based on NAND Flash memory. Because of its short latency, low power consumption, and resistance to shock, it's not only used in PC but also in server computers. Most SSDs use FTL to overcome the erase-before-overwrite characteristic of NAND flash. There are several types of FTL, but page mapped FTL shows better performance than others. But its usefulness is limited because of its large memory footprint for the mapping table. For example, 64MB memory space is required only for the mapping table for a 64GB MLC SSD. In this paper, we propose a novel caching scheme for the mapping table. By using the mapping-table-meta-data we construct a fully associative cache, and translate the address within O(1) time. The simulation results show more than 80 hit ratio with 32KB cache and 90% with 512KB cache. The overall memory footprint was only 1.9% of 64MB. The time overhead of cache miss was measured lower than 2% for most workload.
PDF KSCI

Reliability Improvement of the Tag Bits of the Cache Memory against the Soft Errors (소프트 에러에 대한 캐쉬 메모리의 태그 비트 신뢰성 향상 기법)

Kim, Young-Ung
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.14 no.1
- /
- pp.15-21
- /
- 2014
Due to the development of manufacturing technology scaling, more transistors can be placed on a cache memories of a processor. However, processors become more vulnerable to the soft errors because of highly integrated transistors, the reliability of cache memory must consider seriously at the design level. Various researches are proposed to overcome the vulnerability of soft error, but researches of tag bit are proposed very rarely. In this paper, we revaluate the reliability improvement technique for tag bit, and analyse the protection rate of write-back operation, which is a typical case of not satisfying temporal locality. We also propose the methodology to improve the protection rate of write-back operation. The experiments of the proposed scheme shows up to 76.8% protection rate without performance degradations.
https://doi.org/10.7236/JIIBC.2014.14.1.15 인용 PDF KSCI

MLC-LFU : The Multi-Level Buffer Cache Management Policy for Flash Memory (MLC-LFU : 플래시 메모리를 위한 멀티레벨 버퍼 캐시 관리 정책)

Ok, Dong-Seok;Lee, Tae-Hoon;Chung, Ki-Dong
- Journal of KIISE:Computing Practices and Letters
- /
- v.15 no.1
- /
- pp.14-20
- /
- 2009
Recently, NAND flash memory is used not only for portable devices, but also for personal computers and server computers. Buffer cache replacement policies for the hard disks such as LRU and LFU are not good for NAND flash memories because they do not consider about the characteristics of NAND flash memory. CFLRU and its variants, CFLRU/C, CFLRU/E and DL-CFLRU/E(CFLRUs) are the buffer cache replacement policies considered about the characteristics of NAND flash memories, but their performances are not better than those of LRD. In this paper, we propose a new buffer cache replacement policy for NAND flash memory. Which is based on LFU and is taking into account the characteristics of NAND flash memory. And we estimate the performance of hit ratio and flush operation numbers. The proposed policy shows better hit ratio and the number of flush operation than any other policies.
PDF KSCI

PMS : Prefetching Strategy for Multi-level Storage System (PMS : 다단계 저장장치를 고려한 효율적인 선반입 정책)

Lee, Kyu-Hyung;Lee, Hyo-Jeong;Noh, Sam-Hyuk
- Journal of KIISE:Computer Systems and Theory
- /
- v.36 no.1
- /
- pp.26-32
- /
- 2009
The multi-level storage architecture has been widely adopted in servers and data centers. However, while prefetching has been shown as a crucial technique to exploit sequentiality in accesses common for such systems and hide the increasing relative cost of disk I/O, existing multi-level storage studies have focused mostly on cache replacement strategies. In this paper, we show that prefetching algorithms designed for single-level systems may have their limitations magnified when applied to multi-level systems. Overly conservative prefetching will not be able to effectively use the lower-level cache space, while overly aggressive prefetching will be compounded across levels and generate large amounts of wasted prefetch. We design and implement a hierarchy-aware lower-level prefetching strategy called PMS(Prefetching strategy for Multi-level Storage system) that applicable to any upper level prefetching algorithms. PMS does not require any application hints, a priori knowledge from the application or modification to the va interface. Instead, it monitors the upper-level access patterns as well as the lower-level cache status, and dynamically adjusts the aggressiveness of the lower-level prefetching activities. We evaluated the PMS through extensive simulation studies using a verified multi-level storage simulator, an accurate disk simulator, and access traces with different access patterns. Our results indicate that PMS dynamically controls aggressiveness of lower-level prefetching in reaction to multiple system and workload parameters, improving the overall system performance in all 32 test cases. Working with four well-known existing prefetching algorithms adopted in real systems, PMS obtains an improvement of up to 35% for the average request response time, with an average improvement of 16.56% over all cases.
PDF KSCI

Performance evaluation and analysis of TILE-Gx36 many-core processor with PARSEC benchmark (PARSEC을 이용한 TILE-Gx36 다중코어 프로세서의 성능 평가 및 분석)

Lee, Boseon;Kim, Han-Yee;Yu, Heonchang;Suh, Taeweon
- The Journal of Korean Association of Computer Education
- /
- v.17 no.1
- /
- pp.107-115
- /
- 2014
This paper evaluates and analyzes the performance of TILE-Gx36(Gx36), a many-core processor. The PARSEC parallel benchmark suite was used to measure the performance, and Core i7 (i7) and Atom are used for the performance comparison. When experimented with the maximum number of threads that can be executed concurrently on each machine, Gx36 showed a 2.73${\times}$ inferior performance to Core i7 and a 1.93${\times}$ superior performance to Atom. Gx36 has the largest Last Level Cache(LLC) among the compared processors. Nevertheless, it reported the biggest number of LLC misses, which, we strongly believe, is the major culprit for lower performance than expected. Our study suggests that the DDC employed in Gx36 is not a favorable cache structure for the general-purpose high-performance computing. The actual measurement with off-the-shelf machine provides non-biased data for polishing the future many-core architecture.
PDF

A Study on an Error Correction Code Circuit for a Level-2 Cache of an Embedded Processor (임베디드 프로세서의 L2 캐쉬를 위한 오류 정정 회로에 관한 연구)

Kim, Pan-Ki;Jun, Ho-Yoon;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.1
- /
- pp.15-23
- /
- 2009
Microprocessors, which need correct arithmetic operations, have been the subject of in-depth research in relation to soft errors. Of the existing microprocessor devices, the memory cell is the most vulnerable to soft errors. Moreover, when soft errors emerge in a memory cell, the processes and operations are greatly affected because the memory cell contains important information and instructions about the entire process or operation. Users do not realize that if soft errors go undetected, arithmetic operations and processes will have unexpected outcomes. In the field of architectural design, the tool that is commonly used to detect and correct soft errors is the error check and correction code. The Itanium, IBM PowerPC G5 microprocessors contain Hamming and Rasio codes in their level-2 cache. This research, however, focuses on huge server devices and does not consider power consumption. As the operating and threshold voltage is currently shrinking with the emergence of high-density and low-power embedded microprocessors, there is an urgent need to develop ECC (error check correction) circuits. In this study, the in-output data of the level-2 cache were analyzed using SimpleScalar-ARM, and a 32-bit H-matrix for the level-2 cache of an embedded microprocessor is proposed. From the point of view of power consumption, the proposed H-matrix can be implemented using a schematic editor of Cadence. Therefore, it is comparable to the modified Hamming code, which uses H-spice. The MiBench program and TSMC 0.18 um were used in this study for verification purposes.
PDF KSCI

Buffer Invalidation Schemes for High Performance Transaction Processing in Shared Database Environment (공유 데이터베이스 환경에서 고성능 트랜잭션 처리를 위한 버퍼 무효화 기법)

김신희;배정미;강병욱
- The Journal of Information Systems
- /
- v.6 no.1
- /
- pp.159-180
- /
- 1997
Database sharing system(DBSS) refers to a system for high performance transaction processing. In DBSS, the processing nodes are locally coupled via a high speed network and share a common database at the disk level. Each node has a local memory, a separate copy of operating system, and a DBMS. To reduce the number of disk accesses, the node caches database pages in its local memory buffer. However, since multiple nodes may be simultaneously cached a page, cache consistency must be ensured so that every node can always access the latest version of pages. In this paper, we propose efficient buffer invalidation schemes in DBSS, where the database is logically partitioned using primary copy authority to reduce locking overhead. The proposed schemes can improve performance by reducing the disk access overhead and the message overhead due to maintaining cache consistency. Furthermore, they can show good performance when database workloads are varied dynamically.
PDF

An On-chip Multiprocessor Miroprocessor with Shared MMU and Cache

Lee, Yong-Hwan;Jeong, Woo-Kyeong;An, Sang-Jun;Lee, Yong-Surk
- Journal of Electrical Engineering and information Science
- /
- v.2 no.4
- /
- pp.1-7
- /
- 1997
A multiprocessor microprocessor named SMPC(scaleable multiprocessor chip) that contains tow IU (integer unit) is presented in this paper. It can execute multiple instructions from several tasks exploiting task-level parallelism that is free from instruction dependencies, and provide high performance and throughput on both single program and multiprogramming environments. the IU is a 32-bit scalar processor expecially designed to boost up the performance of string manipulations which are frequently used in RDBMS(relational data base management system) applications. A memory management unit and a data cache shared by two IUs improve the performance and reduce the chip area required. ETH SMPC is implemented in VLSI circuit by custom design and automated design tools.
PDF

Analysis on the Temperature of 3D Multi-core Processors according to Vertical Placement of Core and L2 Cache (코어와 L2 캐쉬의 수직적 배치 관계에 따른 3차원 멀티코어 프로세서의 온도 분석)

Son, Dong-Oh;Ahn, Jin-Woo;Park, Jae-Hyung;Kim, Jong-Myon;Kim, Cheol-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.6
- /
- pp.1-10
- /
- 2011
In designing multi-core processors, interconnection delay is one of the major constraints in performance improvement. To solve this problem, the 3-dimensional integration technology has been adopted in designing multi-core processors. The 3D multi-core architecture can reduce the physical wire length by stacking cores vertically, leading to reduced interconnection delay and reduced power consumption. However, the power density of 3D multi-core architecture is increased significantly compared to the traditional 2D multi-core architecture, resulting in the increased temperature of the processor. In this paper, the floorplan methods which change the forms of vertical placement of the core and the level-2 cache are analyzed to solve the thermal problems in 3D multi-core processors. According to the experimental results, it is an effective way to reduce the temperature in the processor that the core and the level-2 cache are stacked adjacently. Compared to the floorplan where cores are stacked adjacently to each other, the floorplan where the core is stacked adjacently to the level-2 cache can reduce the temperature by 22% in the case of 4-layers, and by 13% in the case of 2-layers.
https://doi.org/10.9708/jksci.2011.16.6.001 인용 PDF KSCI

Search Result 60, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)