• Title/Summary/Keyword: Pipeline computing

Search Result 57, Processing Time 0.034 seconds

KAWS: Coordinate Kernel-Aware Warp Scheduling and Warp Sharing Mechanism for Advanced GPUs

  • Vo, Viet Tan;Kim, Cheol Hong
    • Journal of Information Processing Systems
    • /
    • v.17 no.6
    • /
    • pp.1157-1169
    • /
    • 2021
  • Modern graphics processor unit (GPU) architectures offer significant hardware resource enhancements for parallel computing. However, without software optimization, GPUs continuously exhibit hardware resource underutilization. In this paper, we indicate the need to alter different warp scheduler schemes during different kernel execution periods to improve resource utilization. Existing warp schedulers cannot be aware of the kernel progress to provide an effective scheduling policy. In addition, we identified the potential for improving resource utilization for multiple-warp-scheduler GPUs by sharing stalling warps with selected warp schedulers. To address the efficiency issue of the present GPU, we coordinated the kernel-aware warp scheduler and warp sharing mechanism (KAWS). The proposed warp scheduler acknowledges the execution progress of the running kernel to adapt to a more effective scheduling policy when the kernel progress attains a point of resource underutilization. Meanwhile, the warp-sharing mechanism distributes stalling warps to different warp schedulers wherein the execution pipeline unit is ready. Our design achieves performance that is on an average higher than that of the traditional warp scheduler by 7.97% and employs marginal additional hardware overhead.

Sort-Based Distributed Parallel Data Cube Computation Algorithm using MapReduce (맵리듀스를 이용한 정렬 기반의 데이터 큐브 분산 병렬 계산 알고리즘)

  • Lee, Suan;Kim, Jinho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.196-204
    • /
    • 2012
  • Recently, many applications perform OLAP(On-Line Analytical Processing) over a very large volume of data. Multidimensional data cube is regarded as a core tool in OLAP analysis. This paper focuses on the method how to efficiently compute data cubes in parallel by using a popular parallel processing tool, MapReduce. We investigate efficient ways to implement PipeSort algorithm, a well-known data cube computation method, on the MapReduce framework. The PipeSort executes several (descendant) cuboids at the same time as a pipeline by scanning one (ancestor) cuboid once, which have the same sorting order. This paper proposed four ways implementing the pipeline of the PipeSort on the MapReduce framework which runs across 20 servers. Our experiments show that PipeMap-NoReduce algorithm outperforms the rest algorithms for high-dimensional data. On the contrary, Post-Pipe stands out above the others for low-dimensional data.

Genome Analysis Pipeline I/O Workload Analysis (유전체 분석 파이프라인의 I/O 워크로드 분석)

  • Lim, Kyeongyeol;Kim, Dongoh;Kim, Hongyeon;Park, Geehan;Choi, Minseok;Won, Youjip
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.123-130
    • /
    • 2013
  • As size of genomic data is increasing rapidly, the needs for high-performance computing system to process and store genomic data is also increasing. In this paper, we captured I/O trace of a system which analyzed 500 million sequence reads data in Genome analysis pipeline for 86 hours. The workload created 630 file with size of 1031.7 Gbyte and deleted 535 file with size of 91.4 GByte. What is interesting in this workload is that 80% of all accesses are from only two files among 654 files in the system. Size of read and write request in the workload was larger than 512 KByte and 1 Mbyte, respectively. Majority of read write operations show random and sequential patterns, respectively. Throughput and bandwidth observed in each processing phase was different from each other.

A Framework of Intelligent Middleware for DNA Sequence Analysis in Cloud Computing Environment (DNA 서열 분석을 위한 클라우드 컴퓨팅 기반 지능형 미들웨어 설계)

  • Oh, Junseok;Lee, Yoonjae;Lee, Bong Gyou
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.29-43
    • /
    • 2014
  • The development of NGS technologies, such as scientific workflows, has reduced the time required for decoding DNA sequences. Although the automated technologies change the genome sequence analysis environment, limited computing resources still pose problems for the analysis. Most scientific workflow systems are pre-built platforms and are highly complex because a lot of the functions are implemented into one system platform. It is also difficult to apply components of pre-built systems to a new system in the cloud environment. Cloud computing technologies can be applied to the systems to reduce analysis time and enable simultaneous analysis of massive DNA sequence data. Web service techniques are also introduced for improving the interoperability between DNA sequence analysis systems. The workflow-based middleware, which supports Web services, DBMS, and cloud computing, is proposed in this paper for expecting to reduceanalysis time and aiding lightweight virtual instances. It uses DBMS for managing the pipeline status and supporting the creation of lightweight virtual instances in the cloud environment. Also, the RESTful Web services with simple URI and XML contents are applied for improving the interoperability. The performance test of the system needs to be conducted by comparing results other developed DNA analysis services at the stabilization stage.

Realistic and Fast Depth-of-Field Rendering in Direct Volume Rendering (직접 볼륨 렌더링에서 사실적인 고속 피사계 심도 렌더링)

  • Kang, Jiseon;Lee, Jeongjin;Shin, Yeong-Gil;Kim, Bohyoung
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.5
    • /
    • pp.75-83
    • /
    • 2019
  • Direct volume rendering is a widely used method for visualizing three-dimensional volume data such as medical images. This paper proposes a method for applying depth-of-field effects to volume ray-casting to enable more realistic depth-of-filed rendering in direct volume rendering. The proposed method exploits a camera model based on the human perceptual model and can obtain realistic images with a limited number of rays using jittered lens sampling. It also enables interactive exploration of volume data by on-the-fly calculating depth-of-field in the GPU pipeline without preprocessing. In the experiment with various data including medical images, we demonstrated that depth-of-field images with better depth perception were generated 2.6 to 4 times faster than the conventional method.

Learning Module Design for Neural Network Processor(ERNIE) (신경회로망칩(ERNIE)을 위한 학습모듈 설계)

  • Jung, Je-Kyo;Kim, Yung-Joo;Dong, Sung-Soo;Lee, Chong-Ho
    • Proceedings of the KIEE Conference
    • /
    • 2003.11b
    • /
    • pp.171-174
    • /
    • 2003
  • In this paper, a Learning module for a reconfigurable neural network processor(ERNIE) was proposed for an On-chip learning. The existing reconfigurable neural network processor(ERNIE) has a much better performance than the software program but it doesn't support On-chip learning function. A learning module which is based on Back Propagation algorithm was designed for a help of this weak point. A pipeline structure let the learning module be able to update the weights rapidly and continuously. It was tested with five types of alphabet font to evaluate learning module. It compared with C programed neural network model on PC in calculation speed and correctness of recognition. As a result of this experiment, it can be found that the neural network processor(ERNIE) with learning module decrease the neural network training time efficiently at the same recognition rate compared with software computing based neural network model. This On-chip learning module showed that the reconfigurable neural network processor(ERNIE) could be a evolvable neural network processor which can fine the optimal configuration of network by itself.

  • PDF

Parallel Computation of FDTD algorithm using CUDA (CUDA를 이용한 FDTD 알고리즘의 병렬처리)

  • Lee, Ho-Young;Park, Jong-Hyun;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.82-87
    • /
    • 2010
  • Modern GPUs(Graphic Processing Units) provide computing capability higher than that of the general CPUs(Central Processor Units). With supports of programmability of graphics pipeline GP-GPU(General Purpose computation on GPU) has gained much attention expanding its application area. This paper compares sequential and massively parallel implementations of FDTD(Finite Difference Time Domain) algorithm using CUDA(Compute Unified Device Architecture). Experimental results show upto 45X speedup over conventional CPU execution.

A Low-Voltage Low-Power Opamp-Less 8-bit 1-MS/s Pipelined ADC in 90-nm CMOS Technology

  • Abbasizadeh, Hamed;Rikan, Behnam Samadpoor;Lee, Dong-Soo;Hayder, Abbas Syed;Lee, Kang-Yoon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.6
    • /
    • pp.416-424
    • /
    • 2014
  • This paper presents an 8-bit pipelined analog-to-digital converter. The supply voltage applied for comparators and other sub-blocks of the ADC were 0.7V and 0.5V, respectively. This low power ADC utilizes the capacitive charge pump technique combined with a source-follower and calibration to resolve the need for the opamp. The differential charge pump technique does not require any common mode feedback circuit. The entire structure of the ADC is based on fully dynamic circuits that enable the design of a very low power ADC. The ADC was designed to operate at 1MS/s in 90nm CMOS process, where simulated results using ADS2011 show the peak SNDR and SFDR of the ADC to be 47.8 dB (7.64 ENOB) and 59 dB respectively. The ADC consumes less than 1mW for all active dynamic and digital circuitries.

DMGL: An OpenGL ES Based Mobile 3D Rendering Libraries (DMGL: OpenGL ES 기반 모바일 3D 렌더링 라이브러리)

  • Hwang, Gyu-Hyun;Park, Sang-Hun
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.8
    • /
    • pp.1160-1168
    • /
    • 2008
  • Recent technological innovations of mobile hardware which make it possible to implement real-time 3D rendering effects under mobile environment have provided a potential to develop realistic mobile application programs. This paper presents platform independent, OpenGL ES based, real-time mobile rendering libraries, called DMGL for supporting high quality 3D rendering on handhold devices. The libraries allows the programmers who develops mobile graphics softwares to generate varying advanced real-time 3D graphics effects without great effort. Moreover, GPGPU-based libraries give a set of functions to solve complex equations for simulating natural phenomena such as smoke and fire, and to render the results in real-time.

  • PDF

Extracting Graphics Information for Better Video Compression

  • Hong, Kang Woon;Ryu, Won;Choi, Jun Kyun;Lim, Choong-Gyoo
    • ETRI Journal
    • /
    • v.37 no.4
    • /
    • pp.743-751
    • /
    • 2015
  • Cloud gaming services are heavily dependent on the efficiency of real-time video streaming technology owing to the limited bandwidths of wire or wireless networks through which consecutive frame images are delivered to gamers. Video compression algorithms typically take advantage of similarities among video frame images or in a single video frame image. This paper presents a method for computing and extracting both graphics information and an object's boundary from consecutive frame images of a game application. The method will allow video compression algorithms to determine the positions and sizes of similar image blocks, which in turn, will help achieve better video compression ratios. The proposed method can be easily implemented using function call interception, a programmable graphics pipeline, and off-screen rendering. It is implemented using the most widely used Direct3D API and applied to a well-known sample application to verify its feasibility and analyze its performance. The proposed method computes various kinds of graphics information with minimal overhead.