• Title/Summary/Keyword: GPU

Search Result 978, Processing Time 0.027 seconds

Olefin/Paraffin Separation though Facilitated Transport Membranes in Solid State

  • Hong, Seong-Uk;Won, Jong-Ok;Hong, Jae-Min;Park, Hyun-Chae;Kang, Yong-Soo
    • Proceedings of the Membrane Society of Korea Conference
    • /
    • 1999.07a
    • /
    • pp.15-18
    • /
    • 1999
  • A simple mathematical model for facilitated mass transport through a fixed site carrier membrane was derived by assuming an instantaneous, microscopic concentration (activity) fluctuation. The current model demonstrates that the facilitation factor depends on the extent of concentration fluctuation, the time scale ratios of diffusion to chemical reaction and the ratio of the carrier concentration to the solute solubility in matrix. The model was examined against the experimental data on oxygen transport in membranes containing metallo-porphyrin carriers, and the agreement was exceptional (within 10% error). The basic concept of this approach was applied to separate olefin from olefin/paraffin mixtures. A proprietaty carrier, developed here, resulted that the selectivity of propylene over propane was more than 120 and the propylene permeance exceed 40 gpu.

  • PDF

An Improved Hybrid Approach to Parallel Connected Component Labeling using CUDA

  • Soh, Young-Sung;Ashraf, Hadi;Kim, In-Taek
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.1
    • /
    • pp.1-8
    • /
    • 2015
  • In many image processing tasks, connected component labeling (CCL) is performed to extract regions of interest. CCL was usually done in a sequential fashion when image resolution was relatively low and there are small number of input channels. As image resolution gets higher up to HD or Full HD and as the number of input channels increases, sequential CCL is too time-consuming to be used in real time applications. To cope with this situation, parallel CCL framework was introduced where multiple cores are utilized simultaneously. Several parallel CCL methods have been proposed in the literature. Among them are NSZ label equivalence (NSZ-LE) method[1], modified 8 directional label selection (M8DLS) method[2], and HYBRID1 method[3]. Soh [3] showed that HYBRID1 outperforms NSZ-LE and M8DLS, and argued that HYBRID1 is by far the best. In this paper we propose an improved hybrid parallel CCL algorithm termed as HYBRID2 that hybridizes M8DLS with label backtracking (LB) and show that it runs around 20% faster than HYBRID1 for various kinds of images.

Implementation of MPI-based WiMAX Base Station for SDR System (SDR 시스템을 위한 MPI 기반 WiMAX 기지국의 구현)

  • Ahn, Chi Young;Kim, Hyo Han;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.9 no.4
    • /
    • pp.59-67
    • /
    • 2013
  • Compared to the conventional Hardware-oriented base stations, Software Defined Radio (SDR)-based base station provides various advantages especially in flexibility and expandability. It enables the multimode capability required in 4th-generation (4G) environment which aims at a convergence network of various kinds of communication standards. However, since a single base station processes all data required in various multiple waveforms, the SDR base station faces a problem of data processing speed. In this paper, we propose a new concept of SDR base station system which adopts a parallel processing technology of clustering environment. We implemented a WiMAX system with SDR concept which adopts the Message Passing Interface (MPI) technology which enables the speed-up operations. In order to maximize the efficiency of parallel processing in signal processing, we analyze how the algorithm at each of modules is related to data to be processed. Through the implemented system, we show a drastic improvement in operation time due to parallel processing using the proposed MPI technology. In addition, we demonstrate a feasibility of SDR system for 4G or even beyond-4G as well.

Parallel String Matching and Optimization Using OpenCL on FPGA (FPGA 상에서 OpenCL을 이용한 병렬 문자열 매칭 구현과 최적화 방향)

  • Yoon, Jin Myung;Choi, Kang-Il;Kim, Hyun Jin
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.1
    • /
    • pp.100-106
    • /
    • 2017
  • In this paper, we propose a parallel optimization method of Aho-Corasick (AC) algorithm and Parallel Failureless Aho-Corasick (PFAC) algorithm using Open Computing Language (OpenCL) on Field Programmable Gate Array (FPGA). The low throughput of string matching engine causes the performance degradation of network process. Recently, many researchers have studied the string matching engine using parallel computing. FPGA's vendors offer a parallel computing platform using OpenCL. In this paper, we apply the AC and PFAC algorithm on DE1-SoC board with Cyclone V FPGA, where the optimization that considers FPGA architecture is performed. Experiments are performed considering global id, local id, local memory, and loop unrolling optimizations using PFAC algorithm. The performance improvement using loop unrolling is 129 times greater than AC algorithm that not adopt loop unrolling. The performance improvements using loop unrolling are 1.1, 0.2, and 1.5 times greater than those using global id, local id, and local memory optimizations mentioned above.

Voronoi Diagram Computation for a Molecule Using Graphics Hardware (그래픽 하드웨어를 이용한 분자용 보로노이 다이어그램 계산)

  • Lee, Jung-Eun;Baek, Nak-Hoon;Kim, Ku-Jin
    • The KIPS Transactions:PartA
    • /
    • v.19A no.4
    • /
    • pp.169-174
    • /
    • 2012
  • We present an algorithm that computes a 3 dimensional Voronoi diagram for a protein molecule in this paper. The molecule is represented as a set of spheres with van der Waals radii. The Voronoi diagram is constructed in the 3D space by finding the voxels containing it. For the feasibility of the computation, we represent the molecule as a BVH (bounding volume hierarchy), and our system is accelerated by modern graphics hardware with CUDA programming support. Compared to single-core CPU implementations, experimental results show 323 times faster performance in the computation time, when the space is partitioned into $2^{24}$ voxels.

Numerical Improvement of Advection Term for Realistic Smoke Simulation (사실적인 연기 시뮬레이션을 위한 이류항 계산의 수치적 개선)

  • Chang, Mun-Hee;Park, Su-Wan;Kim, Eun-Ju;Ryu, Kwan-Woo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10a
    • /
    • pp.143-147
    • /
    • 2006
  • 자연 현상에서 나타나는 연기나 난류의 움직임을 사실적으로 시뮬레이션을 할 때 Navier-Stokes 방정식을 이용한다. 이 방정식을 이용한 구현은 방대한 연산량과 계산의 복잡성으로 인하여 실시간 시뮬레이션이 어렵다. 이 때문에 실시간 처리를 위하여 복잡한 수식을 근사화한다. 유체 시뮬레이션의 이류(advect) 과정에서 근사화를 위해 Semi-Lagrangian 방법을 이용할 때, 연기 시뮬레이션은 시간이 지남에 따라 밀도가 현저히 줄어들고 소규모의 소용돌이(small-scale vorticity) 현상이 급격히 감소하는 등의 수치적 소실이 발생한다. 본 논문에서는 이 문제를 해결하기 위해 이류항(advection term)을 계산할 때 새로운 수치적 방법을 제안한다. 본 논문에서는 이류항의 값을 구할 때, 현재 격자 주변의 값 중에서 다음 단계에 현재 격자의 위치로 오는 속도를 가진 격자를 찾아, 그 격자의 속도를 이류 속도 벡터로 활용한다. 이는 밀도와 소용돌이 현상의 수치적 소실을 줄여서 사실성을 높이고 실시간 처리도 가능하게 한다. 또한 본 논문에서는 GPU 구현을 통해 벡터 연산 등의 효율성을 높이며 시뮬레이션의 속도를 향상시킨다.

  • PDF

A Study on Improved Comments Generation Using Transformer (트랜스포머를 이용한 향상된 댓글 생성에 관한 연구)

  • Seong, So-yun;Choi, Jae-yong;Kim, Kyoung-chul
    • Journal of Korea Game Society
    • /
    • v.19 no.5
    • /
    • pp.103-114
    • /
    • 2019
  • We have been studying a deep-learning program that can communicate with other users in online communities since 2017. But there were problems with processing a Korean data set because of Korean characteristics. Also, low usage of GPUs of RNN models was a problem too. In this study, as Natural Language Processing models are improved, we aim to make better results using these improved models. To archive this, we use a Transformer model which includes Self-Attention mechanism. Also we use MeCab, korean morphological analyzer, to address a problem with processing korean words.

Green Computing Design and Implementation Using Job Management Scheduling (작업관리를 이용한 그린 컴퓨팅 설계 및 구축)

  • Lee, Young-Joo;Sung, Jin-Woo;Jang, Ji-Hoon;Park, Chan-Yeol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.1171-1173
    • /
    • 2012
  • 이제는 하나뿐인 지구를 지키고 살리는 녹색혁명의 시대에 살고 있다. 이에 따라 컴퓨팅의 환경도 그린 컴퓨팅 환경으로 바뀌어지고 있다. 그린 컴퓨팅은 컴퓨팅 작업에 소모되는 에너지를 줄여보자는 것으로서 컴퓨터에 대한 전력을 절감함으로써 에너지 비용 절감, 저탄소 환경으로 구성하는 것이다. 그린 컴퓨팅은 녹색 ICT(Information & Communication Technology)의 일환으로, 컴퓨터 자체를 움직이는 여러 에너지들 뿐만 아니라 컴퓨터의 냉각과 구동 및 주변기가들을 작동시키는데 소모되는 전력 등을 줄이기 위해서 CPU나 GPU등 각종 프로세서들의 재설계, 대체에너지 등을 활용하는 방안 등 탄소배출을 최소화시키는 등의 환경을 보호하는 개념의 컴퓨팅이다. Christian Belady 2007년 2월, Electronics Cooling Magazine의 통계에 의하면 2001년에는 인프라 비용과 전력 비용의 합이 서버의 가격과 같았고, 2004년에는 인프라 비용이 서버 비용과 같아졌다. 그런데, 2008년에는 에너지 비용 하나만으로도 서버 비용과 같아졌다는 것을 알 수 있습니다. 이제 그린 IT, 그린 컴퓨팅은 하면 좋고, 안하고 말고가 아닌 하지 않으면 생존할 수 없는 필수적인 것으로 되어가고 있다. 본 논문에서는 KISTI 슈퍼컴퓨터에서의 그린 컴퓨팅을 구현하기 위하여 먼저 이를 적용하기 위한 서버 시스템을 설계 구축하고 각각의 프로그램을 개발하여 테스트하였다.

A Design and Implementation of Software Defined Radio for Rapid Prototyping of GNSS Receiver

  • Park, Kwi Woo;Yang, Jin-Mo;Park, Chansik
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.7 no.4
    • /
    • pp.189-203
    • /
    • 2018
  • In this paper, a Software Defined Radio (SDR) architecture was designed and implemented for rapid prototyping of GNSS receiver. The proposed SDR can receive various GNSS and direct sequence spread spectrum (DSSS) signals without software modification by expanded input parameters containing information of the desired signal. Input parameters include code information, center frequency, message format, etc. To receive various signal by parameter controlling, a correlator, a data bit extractor and a receiver channel were designed considering the expanded input parameters. In navigation signal processing, pseudorange was measured based on Coordinated Universal Time (UTC) and appropriate navigation message decoder was selected by message format of input parameter so that receiver position can be calculated even if SDR is set up various GNSS combination. To validate the proposed SDR, the software was implemented using C++, CUDA C based on GPU and USRP. Experimentation has confirmed that changing the input parameters allows GPS, GLONASS, and BDS satellite signals to be received. The precision of the position from implemented SDR were measured below 5 m (Circular Error Probability; CEP) for all scenarios. This means that the implemented SDR operated normally. The implemented SDR will be used in a variety of fields by allowing prototyping of various GNSS signal only by changing input parameters.

Deep Learning Model on Gravitational Waves of Merger and Ringdown in Coalescence of Binary Black Holes

  • Lee, Joongoo;Cho, Gihyuk;Kim, Kyungmin;Oh, Sang Hoon;Oh, John J.;Son, Edwin J.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.1
    • /
    • pp.46.2-46.2
    • /
    • 2019
  • We propose a deep learning model that can generate a waveform of coalescing binary black holes in merging and ring-down phases in less than one second with a graphics processing unit (GPU) as an approximant of gravitational waveforms. Up to date, numerical relativity has been accepted as the most adequate tool for the accurate prediction of merger phase of waveform, but it is known that it typically requires huge amount of computational costs. We present our method can generate the waveform with ~98% matching to that of the status-of-the-art waveform approximant, effective-one-body model calibrated to numerical relativity simulation and the time for the generation of ~1500 waveforms takes O(1) seconds. The validity of our model is also tested through the recovery of signal-to-noise ratio and the recovery of waveform parameters by injecting the generated waveforms into a public open noise data produced by LIGO. Our model is readily extendable to incorporate additional physics such as higher harmonics modes of the ring-down phase and eccentric encounters, since it only requires sufficient number of training data from numerical relativity simulations.

  • PDF