• Title/Summary/Keyword: GPUs

Search Result 127, Processing Time 0.025 seconds

Evaluation of Performance and Maintenance Cost for Roadside's Particulate Matter Reduction Devices Using Smart Green Infrastructure Technology (스마트 그린인프라 기술을 활용한 도로변 미세먼지 저감장치의 성능 및 유지·관리 비용 평가)

  • Song, Kyu-Sung;Seok, Young-Sun;Yim, Hyo-Sook;Chon, Jin-Hyung
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.25 no.4
    • /
    • pp.15-31
    • /
    • 2022
  • The Green Purification Unit System (GPUS) is a green infrastructure facility applicable to the roadside to reduce particulate matter from road traffic. This study introduces two types of GPUS (type1 and type2) and assesses the performance and maintenance costs of each of them. The GPUS's performance analysis used the data collected in November 2021 after the installation of the GPUS type1 and type2 at the study site in Suwon. The changes in the particulate matter concentration near the GPUS were measured. The maintenance cost of GPUS type1 and type2 was assessed by calculating the initial installation cost and the management and repair cost after installation. The results of the performance analysis showed that the GPUS type1, which was manufactured by combining plants and electric dust collectors, had a superior particulate matter reduction performance. In particular, type1 produced a greater effect of particulate matter reduction in the time with a high concentration (50㎍/m3 or higher) of particulate matter due to the operation of electric dust collectors. GPUS type2, which was designed in the form of a plant wall without applying an electric dust collector, showed lower reduction performance than type1 but showed sufficiently improved performance compared to the existing band green area. Meanwhile, the GPUS type1 had three times higher costs for the initial installation than GPUS type2. In terms of costs for managing and repairing, it was evaluated that type1 would be slightly more costly than type2. Finally, this study discussed the applicability of two types of GPUS based on the result of the analysis of their particulate matter performance and maintenance cost at the same time. Since GPUS type2 has a cheaper cost than type1, it could be more economical. However, in the area suffering a high concentration of particulate matter, GPUS type1 would be more effective than type2. Therefore, the choice of GPUS types should rely on the status of particulate matter concentration in the area where GPUS is being installed.

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs (GP-GPU의 캐시메모리를 활용하기 위한 병렬 블록 LU 분해 프로그램의 구현)

  • Kim, Youngtae;Kim, Doo-Han;Yu, Myoung-Han
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.41-47
    • /
    • 2013
  • GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.

WRF Physics Models Using GP-GPUs with CUDA Fortran (WRF 물리 과정의 GP-GPU 계산을 위한 CUDA Fortran 프로그램 구현)

  • Kim, Youngtae;Lee, Yong Hee;Chung, Kwan-Young
    • Atmosphere
    • /
    • v.23 no.2
    • /
    • pp.231-235
    • /
    • 2013
  • We parallelized WRF major physics routines for Nvidia GP-GPUs with CUDA Fortran. GP-GPUs are originally designed for graphic processing, but show high performance with low electricity for calculating numerical models. In the CUDA environment, a data domain is allocated into thread blocks and threads in each thread block are computing in parallel. We parallelized the WRF program to use of thread blocks efficiently. We validated the GP-GPU program with the original CPU program, and the WRF model using GP-GPUs shows efficient speedup.

Latency Hiding based Warp Scheduling Policy for High Performance GPUs

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.4
    • /
    • pp.1-9
    • /
    • 2019
  • LRR(Loose Round Robin) warp scheduling policy for GPU architecture results in high warp-level parallelism and balanced loads across multiple warps. However, traditional LRR policy makes multiple warps execute long latency operations at the same time. In cases that no more warps to be issued under long latency, the throughput of GPUs may be degraded significantly. In this paper, we propose a new warp scheduling policy which utilizes latency hiding, leading to more utilized memory resources in high performance GPUs. The proposed warp scheduler prioritizes memory instruction based on GTO(Greedy Then Oldest) policy in order to provide reduced memory stalls. When no warps can execute memory instruction any more, the warp scheduler selects a warp for computation instruction by round robin manner. Furthermore, our proposed technique achieves high performance by using additional information about recently committed warps. According to our experimental results, our proposed technique improves GPU performance by 12.7% and 5.6% over LRR and GTO on average, respectively.

Parallel Implementation of the Recursive Least Square for Hyperspectral Image Compression on GPUs

  • Li, Changguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3543-3557
    • /
    • 2017
  • Compression is a very important technique for remotely sensed hyperspectral images. The lossless compression based on the recursive least square (RLS), which eliminates hyperspectral images' redundancy using both spatial and spectral correlations, is an extremely powerful tool for this purpose, but the relatively high computational complexity limits its application to time-critical scenarios. In order to improve the computational efficiency of the algorithm, we optimize its serial version and develop a new parallel implementation on graphics processing units (GPUs). Namely, an optimized recursive least square based on optimal number of prediction bands is introduced firstly. Then we use this approach as a case study to illustrate the advantages and potential challenges of applying GPU parallel optimization principles to the considered problem. The proposed parallel method properly exploits the low-level architecture of GPUs and has been carried out using the compute unified device architecture (CUDA). The GPU parallel implementation is compared with the serial implementation on CPU. Experimental results indicate remarkable acceleration factors and real-time performance, while retaining exactly the same bit rate with regard to the serial version of the compressor.

Practical methods for GPU-based whole-core Monte Carlo depletion calculation

  • Kyung Min Kim;Namjae Choi;Han Gyu Lee;Han Gyu Joo
    • Nuclear Engineering and Technology
    • /
    • v.55 no.7
    • /
    • pp.2516-2533
    • /
    • 2023
  • Several practical methods for accelerating the depletion calculation in a GPU-based Monte Carlo (MC) code PRAGMA are presented including the multilevel spectral collapse method and the vectorized Chebyshev rational approximation method (CRAM). Since the generation of microscopic reaction rates for each nuclide needed for the construction of the depletion matrix of the Bateman equation requires either enormous memory access or tremendous physical memory, both of which are quite burdensome on GPUs, a new method called multilevel spectral collapse is proposed which combines two types of spectra to generate microscopic reaction rates: an ultrafine spectrum for an entire fuel pin and coarser spectra for each depletion region. Errors in reaction rates introduced by this method are mitigated by a hybrid usage of direct online reaction rate tallies for several important fissile nuclides. The linear system to appear in the solution process adopting the CRAM is solved by the Gauss-Seidel method which can be easily vectorized on GPUs. With the accelerated depletion methods, only about 10% of MC calculation time is consumed for depletion, so an accurate full core cycle depletion calculation for a commercial power reactor (BEAVRS) can be done in 16 h with 24 consumer-grade GPUs.

Implementation of a GPU Cluster System using Inexpensive Graphics Devices (저가의 그래픽스 장치를 이용한 GPU 클러스터 시스템 구현)

  • Lee, Jong-Min;Lee, Jung-Hwa;Kim, Seong-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1458-1466
    • /
    • 2011
  • Recently the research on GPGPU has been carried out actively as the performance of GPUs has been increased rapidly. In this paper, we propose the system architecture by benchmarking the existing supercomputer architecture for a cost-effective system using GPUs in low-cost graphics devices and implement a GPU cluster system with eight GPUs. We also make the software development environment that is suitable for the GPU cluster system and use it for the performance evaluation by implementing the n-body problem. According to its result, we found that it is efficient to use multiple GPUs when the problem size is large due to its communication cost. In addition, we could calculate up to eight million celestial bodies by applying the method of calculating block by block to mitigate the problem size constraint due to the limited resource in GPUs.

Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units (범용 그래픽 처리 장치의 메모리 설계를 위한 그래픽 처리 장치의 메모리 특성 분석)

  • Choi, Hongjun;Kim, Cheolhong
    • Smart Media Journal
    • /
    • v.3 no.1
    • /
    • pp.33-38
    • /
    • 2014
  • Even though the performance of microprocessor is improved continuously, the performance improvement of computing system becomes hard to increase, in order to some drawbacks including increased power consumption. To solve the problem, general-purpose computing on graphics processing units(GPGPUs), which execute general-purpose applications by using specialized parallel-processing device representing graphics processing units(GPUs), have been focused. However, the characteristics of applications related with graphics is substantially different from the characteristics of general-purpose applications. Therefore, GPUs cannot exploit the outstanding computational resources sufficiently due to various constraints, when they execute general-purpose applications. When designing GPUs for GPGPU, memory system is important to effectively exploit the GPUs since typically general-purpose applications requires more memory accesses than graphics applications. Especially, external memory access requiring long latency impose a big overhead on the performance of GPUs. Therefore, the GPU performance must be improved if hierarchical memory architecture which can reduce the number of external memory access is applied. For this reason, we will investigate the analysis of GPU performance according to hierarchical cache architectures in executing various benchmarks.

Parallel Processing Algorithm of JPEG2000 Using GPU (GPU를 이용한 JPEG2000 병렬 알고리즘)

  • Lee, Dong-Ha;Cho, Shi-Won;Lee, Dong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1075-1080
    • /
    • 2008
  • Most modem computers or game consoles are well equipped with powerful graphics processing units(GPUs) to accelerate graphics operations. However, since the graphics engines in these GPUs are specially designed for graphics operations, we could not take advantage of their computing power for more general nongraphic operations. In this paper, we studied the GPUs graphics engine in order to accelerate the image processing capability. Specifically, we implemented a JPEC2000 decoding/encoding framework that involves both OpenMP and GPU. Initial experimental results show that significant speed-up can be achieved by utilizing the GPU power.

High-Performance Korean Morphological Analyzer Using the MapReduce Framework on the GPU

  • Cho, Shi-Won;Lee, Dong-Wook
    • Journal of Electrical Engineering and Technology
    • /
    • v.6 no.4
    • /
    • pp.573-579
    • /
    • 2011
  • To meet the scalability and performance requirements of data analyses, which often involve voluminous data, efficient parallel or concurrent algorithms and frameworks are essential. We present a high-performance Korean morphological analyzer which employs the MapReduce framework on the graphics processing unit (GPU). MapReduce is a programming framework introduced by Google to aid the development of web search applications on a large number of central processing units (CPUs). GPUs are designed as a special-purpose co-processor. Their programming interfaces are typically formulated for graphics applications. Compared to CPUs, GPUs have greater computation power and memory bandwidth; however, GPUs are more difficult to program because of the design of their architectures. The performance of the Korean morphological analyzer using the MapReduce framework on the GPU is evaluated in comparison with the CPU-based model. The proposed Korean Morphological analyzer shows promising scalable performance on distributed computing with the GPU.