통합 검색 | Korea Science

Algorithmic GPGPU Memory Optimization

Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
- JSTS:Journal of Semiconductor Technology and Science
- /
- 제14권4호
- /
- pp.391-406
- /
- 2014
The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.
https://doi.org/10.5573/JSTS.2014.14.4.391 인용 PDF KSCI

CXL 메모리 및 활용 소프트웨어 기술 동향 (Technology Trends in CXL Memory and Utilization Software )

안후영;김선영;박유미;한우종
- 전자통신동향분석
- /
- 제39권1호
- /
- pp.62-73
- /
- 2024
Artificial intelligence relies on data-driven analysis, and the data processing performance strongly depends on factors such as memory capacity, bandwidth, and latency. Fast and large-capacity memory can be achieved by composing numerous high-performance memory units connected via high-performance interconnects, such as Compute Express Link (CXL). CXL is designed to enable efficient communication between central processing units, memory, accelerators, storage, and other computing resources. By adopting CXL, a composable computing architecture can be implemented, enabling flexible server resource configuration using a pool of computing resources. Thus, manufacturers are actively developing hardware and software solutions to support CXL. We present a survey of the latest software for CXL memory utilization and the most recent CXL memory emulation software. The former supports efficient use of CXL memory, and the latter offers a development environment that allows developers to optimize their software for the hardware architecture before commercial release of CXL memory devices. Furthermore, we review key technologies for improving the performance of both the CXL memory pool and CXL-based composable computing architecture along with various use cases.
https://doi.org/10.22648/ETRI.2024.J.390106 인용 PDF

Scratchpad Memory Architectures and Allocation Algorithms for Hard Real-Time Multicore Processors

Liu, Yu;Zhang, Wei
- Journal of Computing Science and Engineering
- /
- 제9권2호
- /
- pp.51-72
- /
- 2015
Time predictability is crucial in hard real-time and safety-critical systems. Cache memories, while useful for improving the average-case memory performance, are not time predictable, especially when they are shared in multicore processors. To achieve time predictability while minimizing the impact on performance, this paper explores several time-predictable scratch-pad memory (SPM) based architectures for multicore processors. To support these architectures, we propose the dynamic memory objects allocation based partition, the static allocation based partition, and the static allocation based priority L2 SPM strategy to retain the characteristic of time predictability while attempting to maximize the performance and energy efficiency. The SPM based multicore architectural design and the related allocation methods thus form a comprehensive solution to hard real-time multicore based computing. Our experimental results indicate the strengths and weaknesses of each proposed architecture and the allocation method, which offers interesting on-chip memory design options to enable multicore platforms for hard real-time systems.
https://doi.org/10.5626/JCSE.2015.9.2.51 인용 PDF KSCI

Recovery Methods in Main Memory DBMS

Kim, Jeong-Joon;Kang, Jeong-Jin;Lee, Ki-Young
- International journal of advanced smart convergence
- /
- 제1권2호
- /
- pp.26-29
- /
- 2012
Recently, to efficiently support the real-time requirements of RTLS( Real Time Location System) services, interest in the main memory DBMS is rising. In the main memory DBMS, because all data can be lost when the system failure happens, the recovery method is very important for the stability of the database. Especially, disk I/O in executing the log and the checkpoint becomes the bottleneck of letting down the total system performance. Therefore, it is urgently necessary to research about the recovery method to reduce disk I/O in the main memory DBMS. Therefore, In this paper, we analyzed existing log techniques and check point techniques and existing main memory DBMSs' recovery techniques for recovery techniques research for main memory DBMS.
https://doi.org/10.7236/JASC2012.1.2.5 인용 PDF KSCI

IoT/에지 컴퓨팅에서 저전력 메모리 아키텍처의 개선 연구 (A Study on Improvement of Low-power Memory Architecture in IoT/edge Computing)

조두산
- 한국산업융합학회 논문집
- /
- 제24권1호
- /
- pp.69-77
- /
- 2021
The widely used low-cost design methodology for IoT devices is very popular. In such a networked device, memory is composed of flash memory, SRAM, DRAM, etc., and because it processes a large amount of data, memory design is an important factor for system performance. Therefore, each device selects optimized design factors such as function, performance and cost according to market demand. The design of a memory architecture available for low-cost IoT devices is very limited with the configuration of SRAM, flash memory, and DRAM. In order to process as much data as possible in the same space, an architecture that supports parallel processing units is usually provided. Such parallel architecture is a design method that provides high performance at low cost. However, it needs precise software techniques for instruction and data mapping on the parallel architecture. This paper proposes an instruction/data mapping method to support optimized parallel processing performance. The proposed method optimizes system performance by actively using hardware and software parallelism.
https://doi.org/10.21289/KSIC.2021.24.1.69 인용 PDF KSCI HTML

플래시 메모리 기기를 위한 다중 버전 잠금 기법 (Multi-version Locking Scheme for Flash Memory Devices)

변시우
- 대한전기학회:학술대회논문집
- /
- 대한전기학회 2005년도 심포지엄 논문집 정보 및 제어부문
- /
- pp.191-193
- /
- 2005
Flash memories are one of best media to support portable computer's storages. However, we need to improve traditional data management scheme due to the relatively slow characteristics of flash operation as compared to RAM memory. In order to achieve this goal, we devise a new scheme called Flash Two Phase Locking (F2PL) scheme for efficient data processing. F2PL improves transaction performance by allowing multi version reads and efficiently handling slow flash write/erase operation in lock management process.
PDF

다중 섹터 사이즈를 지원하는 낸드 플래시 메모리 기반의 저장장치를 위한 효율적인 FTL 매핑 관리 기법 (Efficient FTL Mapping Management for Multiple Sector Size-based Storage Systems with NAND Flash Memory)

임승호;최민
- 한국정보과학회논문지:컴퓨팅의 실제 및 레터
- /
- 제16권12호
- /
- pp.1199-1203
- /
- 2010
컴퓨터 시스템에서 Host와 저장장치간의 데이터 이동은 섹터를 기본 단위로 하고 있는데, 섹터 사이즈는 시스템마다 다른 가변적인 크기일 수 있다. 낸드 플래시 메모리는 구조상 페이지 사이즈와 섹터 사이즈 사이의 상관관계에 있어서, 섹터 사이즈가 낸드 플래시 메모리를 관리하는 방식에 상당한 영향을 미친다. 본 논문에서는 낸드 플래시 메모리 기반의 저장장치에서 효율적인 다중 섹터 사이즈를 지원하는 FTL 매핑 관리 기법을 제안하고, 그 관리 방법과 성능에 관하여 분석하여 본다. 본 논문에서 제안한 방식에 의하면 다중 섹터를 지원하는 낸드 플래시 메모리 저장장치를 효율적으로 관리하여 줄 수 있다.
PDF KSCI

프리패치 기법을 적용한 T.트리 인덱스 구조 (T-Tree Index Structures Utilizing Prefetch Methods)

이익훈;심준호
- 한국전자거래학회지
- /
- 제14권4호
- /
- pp.119-131
- /
- 2009
최근 전자상거래 환경에서 실시간 트랜잭션 처리가 필요한 환경들이 많아지고 있다. 이동 통신, 금융시장 환경에서 빠른 실시간 트랜잭션 처리 지원을 위한 메인메모리 데이터베이스에 대한 연구와 구축이 많아졌다. 빠른 트랜잭션 지원을 위한 인덱싱 기법에 대한 연구로는 최근 마이크로 프로세서의 구조와 기능을 이용하여 캐시미스 수를 줄이거나 캐시 미스 발생시에 데이터 접근 지연시간을 줄이기 위한 방법들에 대한 연구가 수행되고 있다. 본 논문은 최근 마이크로 프로세서에서 지원하고 있는 프리패치 기법을 이용하여 캐시 미스 시에 데이터 접근 지연시간을 줄이는 트리인덱스 프리패치 기법을 제안한다. 또한 프리패치 기법에 효과적인 pCST-트리 인덱스 구조를 제안하고 실험을 통해 제안한 트리의 우수성을 제시한다.
PDF

SoC 플랫폼 상에서 임베디드 블루투스 오디오 스트리밍 솔루션 개발 (Development of an Embedded Bluetooth Audio Streaming Solution on SoC Platform)

김태현
- 정보처리학회논문지A
- /
- 제13A권7호
- /
- pp.589-598
- /
- 2006
본 논문에서는 블루투스 무선 링크를 이용한 실시간 오디오 스트리밍을 위해 DSP를 내장한 SoC (System-on-Chip) 플랫폼 상에서 임베디드 블루투스 솔루션의 개발과 최적화에 대해 설명한다. 개발된 솔루션을 이식성을 고려해서 가상 운영체제 상에서 구현된 임베디드 블루투스 프로토콜 스택, 프로파일과 타겟 멀티미디어 SoC의 특성을 이용한 최적화 기법들을 포함한다. 수요 최적화 기법으로는 SoC 내의 스크래치 패드 메모리의 활용을 통한 메모리 접근 최소화, DSP 연산과 병렬 메모리 접근 명령을 이용한 코덱 구현, 무선 통신 환경을 고려한 동적 오디오 품질 조정 등이 있다. 실험 결과는 본 연구에서 제안한 최적화 기법을 적용한 임베디드 솔루션은 별도의 외부 메모리 없이 고품질 오디오 스트리밍을 지원할 수 있음을 보여준다.
https://doi.org/10.3745/KIPSTA.2006.13A.7.589 인용 PDF KSCI

제한된 자원을 갖는 장치에서 효과적인 얼굴 인증 방법 (An Effective Face Authentication Method for Resource - Constrained Devices)

이경희;변혜란
- 한국정보과학회논문지:소프트웨어및응용
- /
- 제31권9호
- /
- pp.1233-1245
- /
- 2004
사용자를 인증하는데 생체인식(biometrics)을 사용하는 것은 보안성과 편리성에서 우수함에도 불구하고, 생체 정보를 사용하는 전형적인 인증 알고리즘은 스마트카드(smart cards)와 같은 자원이 한정된 장치에서는 실행되지 못할 수도 있다. 따라서, 제한된 자원을 갖는 장치에서 생체인식 과정이 수행되기 위해서는 적은 메모리와 처리 능력을 요구하는 가벼운 인증 알고리즘의 개발이 필요하다. 또한 생물학적 특징들 중에서 얼굴에 의한 인증은 인간에게 보다 친숙하고 얼굴 영상 획득이 비강제성을 띤다는 점에서 사용하기 가장 편리한 생체인식 기술이다. 본 논문에서는 생체인식 기술 연구의 일환으로 새로운 얼굴 인중 알고리즘을 제안한다. 이 얼굴 인증 알고리즘은 두 가지 면에서 새로운 특성을 갖는다. 그 하나는 유전자 알고리즘(GA: Genetic Algorithms) 에 의해 추출된 특징 집합(feature set)을 입력벡터로 사용하는 Support Vector Machines(SVM)을 얼굴인증에 이용함으로써 메모리 요구량을 감소시킨다는 것이다. 다른 하나는, 필요에 따라 특징 집합의 크기 조절에 대한 시스템 파라미터를 조절함으로써, 인식률은 다소 감소하더라도 인증 과정에 필요한 메모리양을 더욱 더 감소시킬 수 있다는 것이다. 이러한 특성은 메모리양이 한정된 장치에서 얼굴 인중 알고리즘을 수행할 수 있게 하는 데 상당히 효과적이다. 다양한 변화가 있는 얼굴 데이터베이스들에 대하여 실험한 결과, GA에 의해 선택된 식별력이 우수한 특징들을 SVM의 입력벡터로 사용하는 제안한 얼굴 인증 알고리즘이, GA에 의한 특징 선택 과정이 없는 알고리즘보다 정확성과 메모리 요구량에서 우수한 성능을 보임을 알 수 있다. 또한 시스템 파라미터의 변경 실험에 의해 선택될 특징의 개수가 조절될 수 있음을 보인다.
PDF KSCI

검색결과 500건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)