Search | Korea Science

Analysis of Performance and Energy Efficiency of Core Mapping for Rasterization Algorithm using CUDA (CUDA를 이용한 Rasterization 알고리즘의 코어 매핑에 따른 성능 및 에너지 효율 분석)

Park, Min-Ho;Kim, Jong-Myon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.05a
- /
- pp.140-143
- /
- 2013
본 논문에서는 데이터 병렬성이 뛰어난 벡터 기반의 Rasterization 알고리즘을 CUDA를 이용하여 코어 매핑에 따른 성능 및 에너지 효율을 분석해 보았다. 블록 사이즈를 동일하게 맞춘 후 블록의 차원을 변경 하는 방법과 블록 사이즈를 변경하는 방법을 사용하여 실험하였다. 모의실험결과, 블록 사이즈가 동일할 때는 오차 범위 내로 동일한 성능과 에너지 효율을 보였다. 아키텍처마다 모든 자원을 사용할수 있는 이론적인 블록 및 스레드 구조가 존재하지만 메모리 접근에 대한 최적화를 이루어 내지 못한다면 Amdahl's law에 의해 성능 향상에 한계가 있다는 것을 확인하였다.
https://doi.org/10.3745/PKIPS.y2013m05a.140 인용 PDF

Direct Mapping of the Executable Code in Single-tier Memory Operating System using SCM (SCM을 이용한 단일계층 메모리 운영체제에서의 실행 코드 직접 매핑 기법)

Park, Jong Woo;Jung, Seung Wan;Yoon, Jun young;Seo, Dae-Wha
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.11a
- /
- pp.81-82
- /
- 2013
바이트 단위로 접근이 가능하고, 비휘발성을 가지는 SCM(Storage Class Memory)을 이용하여 프로세스의 작업공간으로 활용함과 동시에 파일을 저장하는 형태의 운영체제 기법에 대한 연구가 활발하게 이루어지고 있다. 본 논문에서는 이러한 형태에서 파일이 저장되는 방법을 토대로 프로세스 생성 시 실행 파일의 읽기 전용의 특성을 가지는 실행 코드를 프로세스 공간에 직접 매핑하는 기법에 대하여 제안한다.
https://doi.org/10.3745/PKIPS.y2013m11a.81 인용 PDF

Memory Reduction Method of Radix-2² MDF IFFT for OFDM Communication Systems (OFDM 통신시스템을 위한 radix-2² MDF IFFT의 메모리 감소 기법)

Cho, Kyung-Ju
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.13 no.1
- /
- pp.42-47
- /
- 2020
In OFDM-based very high-speed communication systems, FFT/IFFT processor should have several properties of low-area and low-power consumption as well as high throughput and low processing latency. Thus, radix-2k MDF (multipath delay feedback) architectures by adopting pipeline and parallel processing are suitable. In MDF architecture, the feedback memory which increases in proportion to the input signal word-length has a large area and power consumption. This paper presents a feedback memory size reduction method of radix-22 MDF IFFT processor for OFDM applications. The proposed method focuses on reducing the feedback memory size in the first two stages of MDF architectures since the first two stages occupy about 75% of the total feedback memory. In OFDM transmissions, IFFT input signals are composed of modulated data and pilot, null signals. In order to reduce the IFFT input word-length, the integer mapping which generates mapped data composed of two signed integer corresponding to modulated data and pilot/null signals is proposed. By simulation, it is shown that the proposed method has achieved a feedback memory reduction up to 39% compared to conventional approach.
https://doi.org/10.17661/jkiiect.2020.13.1.42 인용 PDF KSCI

Bandwidth-Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU (3차원 텍스쳐 기반 볼륨 가시화를 위한 GPU 대역폭 효과적인 렌더링 기법)

Lee Won-Jong;Han Tack-Don
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.07a
- /
- pp.673-675
- /
- 2005
본 논문은 3차원 텍스쳐 기반의 볼륨 가시화를 위한 GPU 대역폭에 효과적인 렌더링 기법을 제안한다. 전처리 과정에서 옥트리를 이용하여 원본 볼륨 데이터를 계층적으로 균일한 크기로 분할하여 실제 영역만을 효과적으로 검출하게 되고, 렌더링 시에는 가시순서에 따라 옥트리를 탐색하며 리프 노드의 각 부볼륨을 텍스쳐 매핑 유닛에서 처리하고 블렌딩 유닛에서 이를 합성한다. 작은 크기($16^3$ 또는 $32^3$)의 부볼륨 처리는 텍스쳐와 픽셀 캐시의 이용율을 높이고 공백 공간 생략을 가용하게 하여 GPU의 메모리 대역폭을 크게 줄여 렌더링을 가속할 수 있다. 제안하는 기법의 캐시 효율, 메모리 트래픽, 렌더링 시간 등 다양한 실험 결과와 성능분석이 제공된다. 실험 결과는 제안하는 기 법이 전통적인 렌더링 방법에 비해 평균 11배의 대역폭 감소와 3배 빠른 렌더링을 가능하게 하여 GPU를 이용한 볼륨 렌더링에 효과적인 방법임을 보여주었다.
PDF

Hardware Implementation of Rasterizer with SIMD Architecture Applicable to Mobile 3D Graphics System (모바일 3차원 그래픽스 시스템에 적용 가능한 SIMD 구조를 갖는 래스터라이저의 하드웨어 구현)

Ha, Chang-Soo;Sung, Kwang-Ju;Choi, Byeong-Yoon
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2010.05a
- /
- pp.313-315
- /
- 2010
In this paper, we describe research results of developing hardware rasterizer that is applicable to mobile 3D graphics system, designed in SIMD architecture and verified in FPGA. Tile-based scan conversion unit is designed like SIMD architecture running four tiles simultaneously and each tile traverses pixels hierarchical in 3-level so that visiting counts is minimized. As experimental results, $8{\times}8$ is the most efficient size of tile and the last step of tile traversing is performed on $2{\times}2$ sized subtile. The rasterizer supports flat shading and gouraud shading and texture mapper supports affine mapping and perspective corrected mapping. Also, texture mapper supports point sampling mode and bilinear interpolating sampling mode and two types of wrapping modes and various blending modes. The rasterzer operates as 120Mhz on xilinx vertex4 $l{\times}100$ device. To easy verification, texture memory and frame buffer are generated as block rom and block ram.
PDF

The Efficient Merge Operation in Log Buffer-Based Flash Translation Layer for Enhanced Random Writing (임의쓰기 성능향상을 위한 로그블록 기반 FTL의 효율적인 합병연산)

Lee, Jun-Hyuk;Roh, Hong-Chan;Park, Sang-Hyun
- The KIPS Transactions:PartD
- /
- v.19D no.2
- /
- pp.161-186
- /
- 2012
Recently, the flash memory consistently increases the storage capacity while the price of the memory is being cheap. This makes the mass storage SSD(Solid State Drive) popular. The flash memory, however, has a lot of defects. In order that these defects should be complimented, it is needed to use the FTL(Flash Translation Layer) as a special layer. To operate restrictions of the hardware efficiently, the FTL that is essential to work plays a role of transferring from the logical sector number of file systems to the physical sector number of the flash memory. Especially, the poor performance is attributed to Erase-Before-Write among the flash memory's restrictions, and even if there are lots of studies based on the log block, a few problems still exists in order for the mass storage flash memory to be operated. If the FAST based on Log Block-Based Flash often is generated in the wide locality causing the random writing, the merge operation will be occur as the sectors is not used in the data block. In other words, the block thrashing which is not effective occurs and then, the flash memory's performance get worse. If the log-block makes the overwriting caused, the log-block is executed like a cache and this technique contributes to developing the flash memory performance improvement. This study for the improvement of the random writing demonstrates that the log block is operated like not only the cache but also the entire flash memory so that the merge operation and the erase operation are diminished as there are a distinct mapping table called as the offset mapping table for the operation. The new FTL is to be defined as the XAST(extensively-Associative Sector Translation). The XAST manages the offset mapping table with efficiency based on the spatial locality and temporal locality.
https://doi.org/10.3745/KIPSTD.2012.19D.2.161 인용 PDF KSCI

VLSI design of efficient VLC/VLD utilizing the characteristics of MPEG DCT coefficients (MPEG DCT 계수의 특징을 이용한 효율적인 VLC/VLD의 VLSI 설계)

Kong, Jong-Pil;Kim, Young-Min
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.33B no.1
- /
- pp.79-86
- /
- 1996
In this paper we propose an architecture for VLC(Variable Length Coder) and VLD(Variable Length Decoder) which is simple with respect to implementation point and efficient in memory. We implemented encoding and decoding circuit where we need only 7-bit address memory space for 114 MPEG1 DCT coefficients and employed minimal number of flip-flops and logics for an architecture to integrate a shift register for serial-to-parallel or parallel-to-serial conversion of the data in code mapping ROM. We obtained 50Mbps operating speed in both encoding and decoding process as the result of simulation using 0.80.8${\mu}m$ CMOS standard cells.
PDF

An Improved Index Structure for the Flash Memory Based F2FS File System

Kim, Yong-Seok
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.12
- /
- pp.1-8
- /
- 2022
As an efficient file system for SSD(Solid State Drive), F2FS is employed in the kernel of Linux operating system. F2FS applies various methods to improve performance by reflecting the characteristics of flash memory. One of them is improvement of the index structure that contains addresses of data blocks for each file. This paper presents a method for further improving performance by modifying the index structure of F2FS. F2FS manages all index blocks as logical numbers, and an address mapping table is used to find the physical block addresses of index blocks on flash memory. This paper shows performance improvement by applying logical numbers to the last level index blocks only. The count of mapping table search for a data block access is reduced to 1~2 from 1~4.
https://doi.org/10.9708/jksci.2022.27.12.001 인용 PDF KSCI HTML

Conversion of Large RDF Data using Hash-based ID Mapping Tables with MapReduce Jobs (맵리듀스 잡을 사용한 해시 ID 매핑 테이블 기반 대량 RDF 데이터 변환 방법)

Kim, InA;Lee, Kyu-Chul
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2021.10a
- /
- pp.236-239
- /
- 2021
With the growth of AI technology, the scale of Knowledge Graphs continues to be expanded. Knowledge Graphs are mainly expressed as RDF representations that consist of connected triples. Many RDF storages compress and transform RDF triples into the condensed IDs. However, if we try to transform a large scale of RDF triples, it occurs the high processing time and memory overhead because it needs to search the large ID mapping table. In this paper, we propose the method of converting RDF triples using Hash-based ID mapping tables with MapReduce, which is the software framework with a parallel, distributed algorithm. Our proposed method not only transforms RDF triples into Integer-based IDs, but also improves the conversion speed and memory overhead. As a result of our experiment with the proposed method for LUBM, the size of the dataset is reduced by about 3.8 times and the conversion time was spent about 106 seconds.
PDF

Improving Flash Translation Layer for Hybrid Flash-Disk Storage through Sequential Pattern Mining based 2-Level Prefetching Technique (하이브리드 플래시-디스크 저장장치용 Flash Translation Layer의 성능 개선을 위한 순차패턴 마이닝 기반 2단계 프리패칭 기법)

Chang, Jae-Young;Yoon, Un-Keum;Kim, Han-Joon
- The Journal of Society for e-Business Studies
- /
- v.15 no.4
- /
- pp.101-121
- /
- 2010
This paper presents an intelligent prefetching technique that significantly improves performance of hybrid fash-disk storage, a combination of flash memory and hard disk. Since flash memory embedded in a hybrid device is much faster than hard disk in terms of I/O operations, it can be utilized as a 'cache' space to improve system performance. The basic strategy for prefetching is to utilize sequential pattern mining, with which we can extract the access patterns of objects from historical access sequences. We use two techniques for enhancing the performance of hybrid storage with prefetching. One of them is to modify a FAST algorithm for mapping the flash memory. The other is to extend the unit of prefetching to a block level as well as a file level for effectively utilizing flash memory space. For evaluating the proposed technique, we perform the experiments using the synthetic data and real UCC data, and prove the usability of our technique.
PDF KSCI

Search Result 88, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)