Search | Korea Science

Analysis on the GPU Performance according to Hierarchical Memory Organization (계층적 메모리 구성에 따른 GPU 성능 분석)

Choi, Hongjun;Kim, Jongmyon;Kim, Cheolhong
- The Journal of the Korea Contents Association
- /
- v.14 no.3
- /
- pp.22-32
- /
- 2014
Recently, GPGPU has been widely used for general-purpose processing as well as graphics processing by providing optimized hardware for parallel processing. Memory system has big effects on the performance of parallel processing units such as GPU. In the GPU, hierarchical memory architecture is implemented for high memory bandwidth. Moreover, both memory address coalescing and memory request merging techniques are widely used. This paper analyzes the GPU performance according to various memory organizations. According to our simulation results, GPU performance improves by 15.5%, 21.5%, 25.5%, 30.9% as adding 8KB L1, 16KB L1, 32KB L1, 64KB L1 cache, respectively, compared to case without L1 cache. However, experimental results show that some benchmarks decrease performance since memory transaction increases due to data dependency. Moreover, average memory access latency is increased as the depth of hierarchical cache level increases when cache miss occurs significantly.
https://doi.org/10.5392/JKCA.2014.14.03.022 인용 PDF KSCI

Memory Hierarchy Optimization in Embedded Systems using On-Chip SRAM (On-Chip SRAM을 이용한 임베디드 시스템 메모리 계층 최적화)

Kim, Jung-Won;Kim, Seung-Kyun;Lee, Jae-Jin;Jung, Chang-Hee;Woo, Duk-Kyun
- Journal of KIISE:Computer Systems and Theory
- /
- v.36 no.2
- /
- pp.102-110
- /
- 2009
The memory wall is the growing disparity of speed between CPU and memory outside the CPU chip. An economical solution is a memory hierarchy organized into several levels, such as processor registers, cache, main memory, disk storage. We introduce a novel memory hierarchy optimization technique in Linux based embedded systems using on-chip SRAM for the first time. The optimization technique allocates On-Chip SRAM to the code/data that selected by programmers by using virtual memory systems. Experiments performed with nine applications indicate that the runtime improvements can be achieved by up to 35%, with an average of 14%, and the energy consumption can be reduced by up to 40%, with an average of 15%.
PDF KSCI

A Study on English-Korean Messenger MT System based on Structured Translation Memory (구조화된 번역 메모리 기반 영한 메신저 자동 번역 시스템에 관한 연구)

Choi, Sung-Kwon;Kim, Young-Gil
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.04a
- /
- pp.361-364
- /
- 2011
본 논문의 목표는 크게 두 가지이다. 하나는 2010년에 개발한 메신저 자동번역 시스템을 소개하는 것이고, 다른 하나는 메신저 대화체 문장을 더욱 고품질로 번역하기 위한 구조화된 번역 메모리(Structured Translation Memory)를 소개하는 것이다. 구조화된 번역 메모리는 기존의 문자열 기반의 번역 메모리와 자동 번역 시스템의 경계를 허무는 개념으로 구조를 표현하는 계층적 번역 메모리들로 구성된다. 구조화된 번역 메모리는 문자열 번역 메모리, 원형 어휘로 구성된 번역 메모리, 고유명사가 청킹된 번역 메모리, 날짜/숫자가 청킹된 번역 메모리, 기본명사구가 청킹된 번역 메모리, 문장 패턴 번역 메모리로 단계적으로 구성된다. 구조화된 번역 메모리를 적용하기 전의 2010년의 영한 메신저 자동 번역 시스템의 번역률이 81.67%였던 반면에, 구조화된 번역 메모리를 적용하려는 2011년의 영한 메신저 자동 번역 시스템의 시물레이션 번역률은 85.25%인 것으로 평가되었다. 따라서 구조화된 번역 메모리를 적용하였을 때는 기존의 번역률보다 3.58% 향상할 것으로 예측된다.
https://doi.org/10.3745/PKIPS.y2011m04a.361 인용 PDF

A Study of the Merging Layers of the Storage System for Flash-Based DBMS (플래시 메모리용 DBMS를 위한 스토리지 시스템의 계층 통합에 대한 연구)

Sim, Hyo-Gi;Yoon, Kyoung-Hon;Park, Sung-Min;Jung, Ho-Young;Cha, Jae-Hyuk;Kang, Soo-Yong
- Journal of Digital Contents Society
- /
- v.8 no.4
- /
- pp.593-600
- /
- 2007
Small computer systems such as mobile devices adopt NAND flash memories as their storage media. However, DBMS running on such systems are optimized to hard disks. When small computer systems use DBMS they usually use additional system layer, like FTL, that emulates flash memories as normal hard disks and DBMS cannot control flash memories directly. In this paper, we propose unified storage system that DBMS controls flash memories directly. We implemented the system in a real environment and proved the proposed system outperforms legacy systems.
PDF

Remote Cache Replacement Policy using Processor Locality in Multi-Processor System (다중 프로세서 시스템에서 프로세서 지역성을 이용한 원격 캐쉬 교체 정책)

Han Sang Yoon;Kwak Jong Wook;Jhang Seong Tae;Jhon Chu Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.541-556
- /
- 2005
The memory access latency of the system has been a primary factor of performance degradation in single-processor system and multi-processor system. The remote memory access latency takes a lot of overhead over the local memory access latency especially in the distributed shared-memory system. To resolve this problem, the multi-level cache architecture that contains a remote cache in the multi-processor system has been proposed. In this paper, we propose a new cache replacement policy that improves the performance of the multi-processor system with the remote cache. If the multi-level cache keeps the multi-level inclusion(MLI) property and uses the LRU(Least Recently Used) cache replacement policy, the LRU information of the higher-level cache(a processor cache) would be different with that of the lower-level cache(a remote cache). In this situation, the replacement of a remote cache line can induce the exchange of a processor cache line that is used by the processor. It is a main factor of performance degradation in a whole system. To alleviate this disadvantage of the LRU replacement polity, the new policy analyses tht processor's remote memory access pattern of each node and uses this information to reduce the number of invalidations of the useful cache line in the higher-level cache. The new replacement policy of the remote cache can improve the performance by $3.5\%$ in maximum and $2.5\%$ in average on SPLASH-2 benchmarks, compared to the general LRU cache replacement policy.
PDF KSCI

Electromyogram Pattern Recognition by Hierarchical Temporal Memory Learning Algorithm (시공간적 계층 메모리 학습 알고리즘을 이용한 근전도 패턴인식)

Sung, Moo-Joung;Chu, Jun-Uk;Lee, Seung-Ha;Lee, Yun-Jung
- Journal of the Korean Institute of Intelligent Systems
- /
- v.19 no.1
- /
- pp.54-61
- /
- 2009
This paper presents a new electromyogram (EMG) pattern recognition method based on the Hierarchical Temporal Memory (HTM) algorithm which is originally devised for image pattern recognition. In the modified HTM algorithm, a simplified two-level structure with spatial pooler, temporal pooler, and supervised mapper is proposed for efficient learning and classification of the EMG signals. To enhance the recognition performance, the category information is utilized not only in the supervised mapper but also in the temporal pooler. The experimental results show that the ten kinds of hand motion are successfully recognized.
https://doi.org/10.5391/JKIIS.2009.19.1.054 인용 PDF KSCI

An Efficient Data Distribution Method on a Distributed Shared Memory Machine (분산공유 메모리 시스템 상에서의 효율적인 자료분산 방법)

Min, Ok-Gee
- The Transactions of the Korea Information Processing Society
- /
- v.3 no.6
- /
- pp.1433-1442
- /
- 1996
Data distribution of SPMD(Single Program Multiple Data) pattern is one of main features of HPF (High Performance Fortran). This paper describes design is sues for such data distribution and its efficient execution model on TICOM IV computer, named SPAX(Scalable Parallel Architecture computer based on X-bar network). SPAX has a hierarchical clustering structure that uses distributed shared memory(DSM). In such memory structure, it cannot make a full system utilization to apply unanimously either SMDD(shared Memory Data Distribution) or DMDD(Distributed Memory Data Distribution). Here we propose another data distribution model, called DSMDD(Distributed Shared Memory Data Distribution), a data distribution model based on hierarchical masters-slaves scheme. In this model, a remote master and slaves are designated in each node, shared address scheme is used within a node and message passing scheme between nodes. In our simulation, assuming a node size in which system performance degradation is minimized,DSMDD is more effective than SMDD and DMDD. Especially,the larger number of logical processors and the less data dependency between distributed data,the better performace is obtained.
PDF

Large-Memory Data Processing on a Remote Memory System using Commodity Hardware (대용량 메모리 데이타 처리를 위한 범용 하드웨어 기반의 원격 메모리 시스템)

Jung, Hyung-Soo;Han, Hyuck;Yeom, Heon-Y.
- Journal of KIISE:Computer Systems and Theory
- /
- v.34 no.9
- /
- pp.445-458
- /
- 2007
This article presents a novel infrastructure for large-memory database processing using commodity hardware with operating system support. We exploit inexpensive PCs and a high-speed network capable of Remote Direct Memory Access (RDMA) operations to build a new memory hierarchy between fast volatile memory and slow disk storage. The new memory hierarchy guarantees a reasonable response time, and its storage size enables us to run large-memory database systems with little performance degradation. The proposed architecture has two main components: (1) a remote memory system inside the Linux kernel to manage other computers' memory pages efficiently and (2) a remote memory pager responsible for manipulating remote read/write operations on remote memory pages. We insist that the proposed architecture is practical enough to support the rigorous demands of commercial in-memory database systems by demonstrating the performance of publicly available main-memory databases (e.g., MySQL) on our prototyped system. The experimental results show very interesting results from the TPC-C benchmark.
PDF KSCI

An Efficient Implementation of B-Tree Using Lazy Update on Flash Memory (플래시 메모리 상에서 지연 갱신을 이용한 B-트리의 효율적인 구현)

Kim, Bo-Kyeong;Yoo, Min-Hee;Lee, Dong-Ho
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06a
- /
- pp.69-72
- /
- 2011
플래시 메모리 기반의 저장 시스템은 빠른 접근 속도, 작고 가벼운 특성, 저전력 소모 등의 이유로 하드 디스크를 대체하는 저장 매체로 주목 받고 있다. 플래시 메모리는 하드 디스크와 다르게 읽기 쓰기 소거 연산이 필요하며 수혈 단위와 수혈 시간 이 비대칭적이다. 또한 제자리 갱신이 불가능하기 때문에 가장 느린 소거 동작을 선행하여 갱신 연산을 수행한다. 기존 호스트 시스템은 읽기 쓰기 연산 만을 수행하기 때문에 플래시 메모리를 바로 사용하기 위해서는 별도의 소프트웨어 중간 계층인 플래시 전환 계층이 필요하다. 그러나 디스크 기반의 B-트리를 플래시 전환 계층 위에서 인덱스로 사용하면 B-트리 특성상 제자리 갱신이 빈번하게 발생하기 때문에 성능 저하가 발생한다. 따라서 플래시 메모리 특성을 고려한 새로운 인덱스 구조가 필요하게 되었다. 플래시 메모리 전용의 인덱스 ${\mu}$-트리와 LSB-트리가 제안 되었지만, ${\mu}$-트리는 페이지 관리의 비효율성, LSB-트리는 임시 노드 관리 추가 비용의 문제점을 가지고 있다. 본 논문에서 ${\mu}$-트리와 LSB 트리의 문제점을 해결하기 위하여 지연 갱신을 이용한 B-트리를 제안한다. 제안하는 인덱스는 변경이 일어나는 노드를 메모리에 적재시켜 데이터 삽입 시 노드 갱신을 지연시키고 노드 분할 없이 데이터의 순차 삽입을 처리하여 검색 및 쓰기 성능을 향상시킨다. 본 논문에서는 관련 연구인 ${\mu}$-트리와 LSB-트리를 수식을 통하여 제안하는 인덱스 구조의 우수성을 보인다.

The Divisible Electronic Cash System using Secret Sharing Scheme (비밀 분산 기법을 이용한 분할 가능한 전자화폐 시스템)

장석철;이임영
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 2001.11a
- /
- pp.189-192
- /
- 2001
최근 정보통신기술의 발전과 인터넷의 폭발적인 사용자 증가로 인해 전자상거래가 활성화되고 있다. 또한 전자상거래에서 가장 중요한 시스템인 전자화폐 시스템에 대한 연구와 개발이 활발하게 진행되고 있다. 특히, 전자화폐 시스템의 요구사항 중에 분할성 관련된 연구는 대부분이 계층적 구조 테이블을 이용한 방식이었다. 하지만 이 방식은 많은 메모리가 필요하고, 또한 많은 계산량을 필요로 한다는 단점이 있다. 따라서 본 논문에서는 이러한 문제점을 해결하고, 계층적 구조 테이블을 이용하지 않고 분할성을 제공할 수 있는 또 다른 방식인 비밀분산 기법을 이용하여 새로운 전자화폐 시스템을 제안한다.
PDF

Search Result 115, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)