Search | Korea Science

Speculative Parallelism Characterization Profiling in General Purpose Computing Applications

Wang, Yaobin;An, Hong;Liu, Zhiqin;Li, Li;Yu, Liang;Zhen, Yilu
- Journal of Computing Science and Engineering
- /
- v.9 no.1
- /
- pp.20-28
- /
- 2015
General purpose computing applications have not yet been thoroughly explored in procedure level speculation, especially in the light-weighted profiling way. This paper proposes a light-weighted profiling mechanism to analyze speculative parallelism characterization in several classic general purpose computing applications from SPEC CPU2000 benchmark. By comparing the key performance factors in loop and procedure-level speculation, it includes new findings on the behaviors of loop and procedure-level parallelism under these applications. The experimental results are as follows. The best gzip application can only achieve a 2.4X speedup in loop level speculation, while the best mcf application can achieve almost 3.5X speedup in procedure level. It proves that our light-weighted profiling method is also effective. It is found that between the loop-level and procedure-level TLS, the latter is better on several cases, which is against the conventional perception. It is especially shown in the applications where their 'hot' procedure body is concluded as 'hot' loops.
https://doi.org/10.5626/JCSE.2015.9.1.20 인용 PDF KSCI

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

Choi, Hongjun;Kim, Cheolhong
- Smart Media Journal
- /
- v.3 no.1
- /
- pp.33-38
- /
- 2014
Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.
PDF KSCI

Introduction to general purpose GPU computing (GPU를 이용한 범용 계산의 소개)

Yu, Donghyeon;Lim, Johan
- Journal of the Korean Data and Information Science Society
- /
- v.24 no.5
- /
- pp.1043-1061
- /
- 2013
Recent advances in computer technology introduce massive data and their analysis becomes important. The high performance computing is one of the most essential part in analysis of massive data. In this paper, we review the general purpose of the graphics processing unit and its application to parallel computing, which has been of great interest in statistics communities.
https://doi.org/10.7465/jkdi.2013.24.5.1043 인용 PDF KSCI

Real time simulation using multiple DSPs for fossil power plants (병렬처리를 이용한 화력발전소의 실시간 시뮬레이션)

박희준;김병국
- 제어로봇시스템학회:학술대회논문집
- /
- 1997.10a
- /
- pp.480-483
- /
- 1997
A fossil power plant can be modeled by a lot of algebraic equations and differential equations. When we simulate a large, complicated fossil power plant by a computer such as workstation or PC, it takes much time until overall equations are completely calculated. Therefore, new processing systems which have high computing speed is ultimately needed to develope real-time simulators. Vital points of real-time simulators are accuracy, computing speed, and deadline observing. In this paper, we present a enhanced strategy in which we can provide powerful computing power by parallel processing of DSP processors with communication links. We designed general purpose DSP modules, and a VME interface module. Because the DSP module is designed for general purpose, we can easily expand the parallel system by just connecting new DSP modules to the system. Additionally we propose methods about downloading programs, initial data to each DSP module via VME bus, DPRAM and processing sequences about computing and updating values between DSP modules and CPU30 board when the simulator is working.
PDF

Analysis of Programming Techniques for Creating Optimized CUDA Software (최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석)

Kim, Sung-Soo;Kim, Dong-Heon;Woo, Sang-Kyu;Ihm, In-Sung
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.7
- /
- pp.775-787
- /
- 2010
Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.
PDF KSCI

Service Infrastructure of Wearable Computing (웨어러블 컴퓨팅을 위한 서비스 인프라 구조)

Han, Dong-Won;Park, Jun-Seok;Cho, Il-Yeon
- Journal of the Ergonomics Society of Korea
- /
- v.24 no.1
- /
- pp.43-46
- /
- 2005
The future information technologies and service paradigm will move from PC, the general purpose desktop computing environment, to the next-generation PC that provides information any where, any time, and any device. The next-generation PC such as wearable computers are specialized to the human-centric functionalities and always-on connected services. In this study, service infrastructure of wearable computing with WBAN(Wearable Body Area Network) was suggested for the ubiquitous computing environment.
https://doi.org/10.5143/JESK.2005.24.1.043 인용 PDF KSCI

Design and Implementation of Dual-Mode Cordless Phone and walkie-Talky System: A Software Radio Approach (소프트웨어 라디오 방식의 무선전화기 및 워키토키 이중 모드 시스템의 구현)

Sung, Min-Young
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.9 no.3
- /
- pp.674-680
- /
- 2008
An SDR (Software Defined Radio) system based on general purpose computing platform has benefits of ease of software development process, high degree of software compatibility, and cost-effectiveness of general purpose processors. This paper discusses design and implementation of a dual-mode SDR system that supports both cordless phone and walkie-talky system running on Linux-based general purpose computing platform. For this purpose, we designed modulation and demodulation software on open source-based GNU radio middleware. We also designed a customized RF front-end hardware which performs frequency conversion between RF and IF. The proposed SDR system successfully exhibited its ability to operate both cordless phone and walkie-talky communication on Intel processor-based general purpose computing platform. But experience with the prototype SDR system shows that further research is required for run-time software reconfiguration and efficient integration with conventional TCP/IP protocol stacks.
https://doi.org/10.5762/KAIS.2008.9.3.674 인용 PDF

Implementing Efficient Camera ISP Filters on GPGPUs Using OpenCL (GPGPU 기반의 효율적인 카메라 ISP 구현)

Park, Jongtae;Facchini, Beron;Hong, Jingun;Burgstaller, Bernd
- Proceedings of the Korea Information Processing Society Conference
- /
- 2010.11a
- /
- pp.1784-1787
- /
- 2010
General Purpose Graphic Processing Unit (GPGPU) computing is a technique that utilizes the high-performance many-core processors of high-end graphic cards for general-purpose computations such as 3D graphics, video/image processing, computer vision, scientific computing, HPC and many more. GPGPUs offer a vast amount of raw computing power, but programming is extremely challenging because of hardware idiosyncrasies. The open computing language (OpenCL) has been proposed as a vendor-independent GPGPU programming interface. OpenCL is very close to the hardware and thus does little to increase GPGPU programmability. In this paper we present how a set of digital camera image signal processing (ISP) filters can be realized efficiently on GPGPUs using OpenCL. Although we found ISP filters to be memory-bound computations, our GPGPU implementations achieve speedups of up to a factor of 64.8 over their sequential counterparts. On GPGPUs, our proposed optimizations achieved speedups between 145% and 275% over their baseline GPGPU implementations. Our experiments have been conducted on a Geforce GTX 275; because of OpenCL we expect our optimizations to be applicable to other architectures as well.
https://doi.org/10.3745/PKIPS.y2010m11a.1784 인용 PDF

Implementation of IQ/IDCT in H.264/AVC Decoder Using GP-GPU (GP-GPU를 이용한 H.264／AVC 디코더의 IQ/IDCT구현)

Jeong, Jun-Mo;Lee, Kwang-Yeob
- Journal of IKEEE
- /
- v.14 no.2
- /
- pp.76-81
- /
- 2010
The need for dedicated hardware continue to decrease as the mobile CPU's performance increases. But, there is a limit to a mobile CPU's performance. GP-GPU(General-Purpose computing on Graphics Processing Units) can improve performance without adding other dedicated hardware. This paper presents the implementation of Inverse Quantization, Inverse DCT and Color Space Conversion module in H.264/AVC decoder using GP-GPU for a mobile environments. The proposed architecture improves approximately 40% of performance when it use all the features.
PDF KSCI

Research of accelerating method of video quality measurement program using GPGPU (GPGPU를 이용한 영상 품질 측정 프로그램의 가속화 연구)

Lee, Seonguk;Byeon, Gibeom;Kim, Kisu;Hong, Jiman
- Smart Media Journal
- /
- v.5 no.4
- /
- pp.69-74
- /
- 2016
Recently, parallel computing using GPGPU(General-Purpose computing on Graphics Processing Units) according to the development of the graphics processing unit is expanding. This can be achieved through the processing speeds faster than traditional computing environments across many fields, including science, medicine, engineering, and analysis. However, in using the GPU technology to implement the a parallel program there are many constraints. In this paper, we port a CPU-based program(Video Quality Measurement Program) to use technology. The program ported to GPU-based show about 1.83 times the execution speed than CPU-based program. We study on the acceleration of the GPU-based program. Also we discuss the technical constraints and problems that occur when you modify the CPU to the GPU-based programs.
PDF KSCI

Search Result 160, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)