Search | Korea Science

Performance Comparison of Join Operations Parallelization by using GPGPU (GPGPU 기반 조인 연산 병렬화 성능 비교)

Lee, Jong-Sub;Lee, Sang-Back;Lee, Kyu-Chul
- Database Research
- /
- v.34 no.3
- /
- pp.28-44
- /
- 2018
In a database system, the most expensive operation among relational operations is a join operation. Generally, CPU-based join operations uses parallel processing with either 1 core or 16 cores at most, which does not significantly improve the function. On the other hand, GPGPU(General-Purpose computing on Graphics Processing Units) allows parallel processing through thousands of processing units, greatly reducing the time required to perform join operations. Parallelization of the operation using GPGPU uses NVIDIA's CUDA SDK. In this paper, we implement parallelization of the join operation using GPGPU and compare the performances. The used join operations are Nested Loop Join (NLJ), Sort Merge Join (SMJ) and Hash Join (HJ), and GPGPU equipment uses TITAN Xp, GTX 1080 Ti and GTX 1080. We measure and compare the performance of join operations based on CPU and GPGPU. We compare this performance with the performance of the previous study on the join operation based on GPGPU. The results of experiment show that the performance based on GPGPU is 6~328 times faster than the one based on CPU.

Geopotentinl Field in Nonlinear Balance with the Sectoral Mode of Rossby-Haurwitz Wave on the Inclined Rotation Axis (섹터모드의 로스비하우어비츠 파동과 균형을 이루는 고도장)

Cheong, Hyeong-Bin;Park, Ja-Rin
- Journal of the Korean earth science society
- /
- v.28 no.7
- /
- pp.936-946
- /
- 2007
Analytical geopotential field in balance with the sectoral mode (the first symmetric mode with respect to the equator) of the Rossby-Haurwitz wave on the inclined rotation axis was derived in presence of superrotation background flow. The balanced field was obtained by inverting the divergence equation with the time derivative being zero. The inversion consists of two steps, i.e., the evaluation of nonlinear forcing terms and the finding of analytical solutions based on the Poisson's equation. In the second step, the forcing terms in the from of Legendre function were readily inverted due to the fact that Legendre function is the eigenfunction of the spherical Laplacian operator, while other terms were solved either by introducing a trial function or by integrating the Legendre equation. The balanced field was found to be expressed with six zonal wavenumber components, and shown to be of asymmetric structure about the equator. In association with asymmetricity, the advantageous point of the balanced field as a validation method for the numerical model was addressed. In special cases where the strength of the background flow is a half of or exactly the same as the rotation rate of the Earth it was revealed that one of the zonal wavenumber components vanishes. The analytical balanced field was compared with the geopotential field which was obtained using a spherical harmonics spectral model. It was found that the normalized difference lied in the order of machine rounding, indicating the reliability of the analytical results. The stability of the sectoral mode of Rossby-Haurwitz wave and the associated balanced field was discussed, comparing with the flrst antisymmetric mode.
https://doi.org/10.5467/JKESS.2007.28.7.936 인용 PDF KSCI

A Study on Task Allocation of Parallel Spatial Joins using Fixed Grids (고정 그리드를 이용한 병렬 공간 조인의 태스크 할당에 관한 연구)

Kim, Jin-Deok;Seo, Yeong-Deok;Hong, Bong-Hui
- The KIPS Transactions:PartD
- /
- v.8D no.4
- /
- pp.347-360
- /
- 2001
The most expensive spatial operation in spatial databases is a spatial join which computes a combined table of which tuple consists of two tuples of the two tables satisfying a spatial predicate. Although the execution time of sequential processing of a spatial join has been so far considerably improved, the response time is not tolerable because of not meeting the requirements of interactive users. It is usually appropriate to use parallel processing to improve the performance of spatial join processing. However, as the number of processors increases, the efficiency of each processor decreases rapidly because of the disk bottleneck and the overhead of message passing. This paper proposes the method of task allocation to soften the disk bottleneck caused by accessing the shared disk at the same time, and to minimize message passing among processors. In order to evaluate the performance of the proposed method in terms of the number of disk accesses and message passing, we conduct experiments on the two kinds of parallel spatial join algorithms. The experimental tests on the MIMD parallel machine with shared disks show that the proposed semi-dynamic task allocation method outperforms the static and dynamic task allocation methods.
PDF

Detection of Complex Event Patterns over Interval-based Events (기간기반 복합 이벤트 패턴 검출)

Kang, Man-Mo;Park, Sang-Mu;Kim, Sank-Rak;Kim, Kang-Hyun;Lee, Dong-Hyeong
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.12 no.4
- /
- pp.201-209
- /
- 2012
The point-based complex event processing handled an instantaneous event by using one time stamp in each event. However, the activity period of the event plays the important role in the field which is the same as the finance, multimedia, medicine, and meteorology. The point-based event is insufficient for expressing the complex temporal relationship in this field. In the application field of the real-time world, the event has the period. The events more than two kinds can be temporally overlapped. In addition, one event can include the other event. The relation about the events of kind of these can not be successive like the point-based event. This thesis designs and implements the method detecting the patterns of the complex event by using the interval-based events. The interval-based events can express the overlapping relation between events. Furthermore, it can include the others. By using the end point of beginning and end point of the termination, the operator of interval-based events shows the interval-based events. It expresses the sequence of the interval-based events and can detect the complex event patterns. This thesis proposes the algorithm using the active instance stack in order to raise efficiency of detection of the complex event patterns. When comprising the event sequence, this thesis applies the window push down technique in order to reduce the number of intermediate results. It raises the utility factor of the running time and memory.
https://doi.org/10.7236/JIWIT.2012.12.4.201 인용 PDF KSCI

A Novel Redundant Binary Montgomery Multiplier and Hardware Architecture (새로운 잉여 이진 Montgomery 곱셈기와 하드웨어 구조)

Lim Dae-Sung;Chang Nam-Su;Ji Sung-Yeon;Kim Sung-Kyoung;Lee Sang-Jin;Koo Bon-Seok
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.16 no.4
- /
- pp.33-41
- /
- 2006
RSA cryptosystem is of great use in systems such as IC card, mobile system, WPKI, electronic cash, SET, SSL and so on. RSA is performed through modular exponentiation. It is well known that the Montgomery multiplier is efficient in general. The critical path delay of the Montgomery multiplier depends on an addition of three operands, the problem that is taken over carry-propagation makes big influence at an efficiency of Montgomery Multiplier. Recently, the use of the Carry Save Adder(CSA) which has no carry propagation has worked McIvor et al. proposed a couple of Montgomery multiplication for an ideal exponentiation, the one and the other are made of 3 steps and 2 steps of CSA respectively. The latter one is more efficient than the first one in terms of the time complexity. In this paper, for faster operation than the latter one we use binary signed-digit(SD) number system which has no carry-propagation. We propose a new redundant binary adder(RBA) that performs the addition between two binary SD numbers and apply to Montgomery multiplier. Instead of the binary SD addition rule using in existing RBAs, we propose a new addition rule. And, we construct and simulate to the proposed adder using gates provided from SAMSUNG STD130 $0.18{\mu}m$ 1.8V CMOS Standard Cell Library. The result is faster by a minimum 12.46% in terms of the time complexity than McIvor's 2 method and existing RBAs.
https://doi.org/10.13089/JKIISC.2006.16.4.33 인용 PDF KSCI

Automated Satellite Image Co-Registration using Pre-Qualified Area Matching and Studentized Outlier Detection (사전검수영역기반정합법과 't-분포 과대오차검출법'을 이용한 위성영상의 '자동 영상좌표 상호등록')

Kim, Jong Hong;Heo, Joon;Sohn, Hong Gyoo
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.26 no.4D
- /
- pp.687-693
- /
- 2006
Image co-registration is the process of overlaying two images of the same scene, one of which represents a reference image, while the other is geometrically transformed to the one. In order to improve efficiency and effectiveness of the co-registration approach, the author proposed a pre-qualified area matching algorithm which is composed of feature extraction with canny operator and area matching algorithm with cross correlation coefficient. For refining matching points, outlier detection using studentized residual was used and iteratively removes outliers at the level of three standard deviation. Throughout the pre-qualification and the refining processes, the computation time was significantly improved and the registration accuracy is enhanced. A prototype of the proposed algorithm was implemented and the performance test of 3 Landsat images of Korea. showed: (1) average RMSE error of the approach was 0.435 pixel; (2) the average number of matching points was over 25,573; (3) the average processing time was 4.2 min per image with a regular workstation equipped with a 3 GHz Intel Pentium 4 CPU and 1 Gbytes Ram. The proposed approach achieved robustness, full automation, and time efficiency.
https://doi.org/10.12652/Ksce.2006.26.4D.687 인용 PDF

Design of a Bit-Level Super-Systolic Array (비트 수준 슈퍼 시스톨릭 어레이의 설계)

Lee Jae-Jin;Song Gi-Yong
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.42 no.12
- /
- pp.45-52
- /
- 2005
A systolic array formed by interconnecting a set of identical data-processing cells in a uniform manner is a combination of an algorithm and a circuit that implements it, and is closely related conceptually to arithmetic pipeline. High-performance computation on a large array of cells has been an important feature of systolic array. To achieve even higher degree of concurrency, it is desirable to make cells of systolic array themselves systolic array as well. The structure of systolic array with its cells consisting of another systolic array is to be called super-systolic array. This paper proposes a scalable bit-level super-systolic amy which can be adopted in the VLSI design including regular interconnection and functional primitives that are typical for a systolic architecture. This architecture is focused on highly regular computational structures that avoids the need for a large number of global interconnection required in general VLSI implementation. A bit-level super-systolic FIR filter is selected as an example of bit-level super-systolic array. The derived bit-level super-systolic FIR filter has been modeled and simulated in RT level using VHDL, then synthesized using Synopsys Design Compiler based on Hynix $0.35{\mu}m$ cell library. Compared conventional word-level systolic array, the newly proposed bit-level super-systolic arrays are efficient when it comes to area and throughput.
PDF KSCI

Design and Implementation of Query Processor for Moving Objects (이동객체를 위한 질의처리 컴포넌트의 설계 및 구현)

Kim, Kyoung-Sook;Kwon, O-Je;Byun, Hee-Young;Jo, Dae-Soo;Kim, Tae-Wan;Li, Ki-Joune
- Journal of Korea Spatial Information System Society
- /
- v.6 no.1 s.11
- /
- pp.31-50
- /
- 2004
With the growth of wireless communication networks and mobile devices taking in GPS, Location-Based Service(LBS) is becoming an integral part of mobile applications. LBS can deal with location-aware features such as persons holding mobile phones or vehicles equipped with GPS, and provide the users with the location information of the features. Thus it is necessary to develop moving object database systems to store, manage, and query moving objects which change their locations continuously as time passes. In this paper, we design and implement a query processing component which deals with moving objects as a key data type. For this component, we define a new SQL-like query language(called MOQL) and as a consequence, design and implement modules that analyze and execute queries. It supports various types of operators that process range queries, infer topological relations, compute trajectories, and find k-nearest neighbors. It can be used as a subsystem if other application systems which deal moving objects and also supports ADO.NET interface that can be used to interact end-users.
PDF

A Hybrid Genetic Algorithm for the Identical Parallel Machine Total Tardiness Problem (동종 병렬기계에서 납기지연 최소화를 위한 혼합형 유전 알고리즘의 개발)

Choe, Hong-Jin;Lee, Jong-Yeong;Park, Mun-Won
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 2004.05a
- /
- pp.624-627
- /
- 2004
본 연구는 동일한 병렬기계에서의 총 납기지연의 합을 최소화하는 일정계획 문제에 대해 다룬다. 이 문제는 Lenstra et al. (1977)에 의해 NP-hard로 알려져 있으며, 작업의 수와 기계의 수가 큰 현실적 문제에 대해 적절한 시간 내에 최적해를 찾는다는 것은 사실상 불가능하다. 따라서 본 연구에서는 이 문제를 해결하기 위하여 혼합형 유전 알고리즘(hybrid genetic algorithm)을 제안한다. 혼합형 유전 알고리즘에서는 임의로 발생시킨 모집단에 대해 먼저 유전 알고리즘(genetic algorithm)이 세대를 진행하며 해를 개선한다. 유전 알고리즘이 일정기간동안 더 이상 해를 개선하지 못하면, 부분탐색 알고리즘(local-search algorithm))이 유전 알고리즘의 모집단의 개체들에 대해 해의 개선을 시도한다. 즉, 부분 탐색 알고리즘은 모집단 속의 각각의 개체를 초기해로 하여 모집단 내의 개체 수만큼의 부분 최적해(local optimum)들을 구한다. 이렇게 구한 부분 최적해들로 새로운 모집단을 구성하면 다시 유전 알고리즘이 진행된다. 이 과정을 종료조건에 이를 때까지 번갈아가며 반복 수행한다. 본 연구에서 제안한 유전 알고리즘에서는 Bean(1994)이 제안한 Random key 방법으로 개체를 표현하였으며 Park(2000)이 제안한 3가지 교차 연산자들을 채용하였다. 부분탐색 알고리즘을 위해서는 쌍대교환(pair-wise interchange) 방법을 통해 이웃해를 생성하였다. 선행실험을 통하여 제안한 혼합형 유전알고리즘에서 사용하는 다양한 모수(parameter)값들을 최적화하였으며 알고리즘의 성능을 비교하기 위하여 기존의 알고리즘과도 비교실험을 수행하였다.복적인 지표가 채택되는 경우를 포함하고 있다. 셋째는 추상적이며 측정이 어려운 지표를 채택하고 있는 경우이다. 여기에는 지표에 대한 정확한 정의가 이루어져 있지 않아 피 평가자가 불필요하거나 과다한 평가 자료를 준비해야 하거나 평가자로 하여금 평가 시 혼돈을 유발할 가능성이 있거나, 또는 상위개념의 평가항목과 하위개념의 평가항목이 혼재되어 구분이 모호한 경우를 포함하고 있다. 바탕으로 '생태적 합리성'이라는 체계적인 지식교육을 거쳐서, '환경정의' 의식의 제고로 이어가고, 굵직한 '환경갈등'의 상황에서 뚜렷한 정치적 태도와 실천을 할 수 있는 '생태적 인간상'의 육성으로 나아갈 수 있어야 한다는 것이 필자의 생각이다. 이를 위해서는 어찌되었건 체험학습 영역에서는 환경현안에 대한 사회적 실천을 '교육 소재'로 삼을 수 있어야 하며, 교과학습 영역에서는 한국사회의 환경현안에 대한 정치경제적 접근을 외면하지 말고 교과서 저작의 소재로 삼을 수 있어야 하며, 이는 '환경관리주의'와 '녹색소비'에 머물러 있는 '환경 지식교육'과 실천을 한단계 진전시키는 작업으로 이어질 것이다. 이후 10년의 환경교육은 바로 '생태적 합리성'과 '환경정의'라는 두 '화두'에 터하여 세워져야 한다.배액에서 약해를 보였으나, 25% 야자지방산의 경우 50 ${\sim}$ 100배액 어디에서도 액해를 보이지 않았다. 별도로 적용한 시험에서, 토마토의 경우에도 25% 야자지방산 비누 50 ${\sim}$ 100배액 모두 약해를 발생하지 않았으나, 오이에서는 25% 야자지방산 비누 100배액에도 약해를 나타내었다. 12. 이상의 결과, 천연지방산을 이용하여 유기농업에 허용되는 각종의 살충비누를 제조할 수 있었으
PDF

Real-Time Implementation of the G.729.1 Using ARM926EJ-S Processor Core (ARM926EJ-S 프로세서 코어를 이용한 G.729.1의 실시간 구현)

So, Woon-Seob;Kim, Dae-Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.33 no.8C
- /
- pp.575-582
- /
- 2008
In this paper we described the process and the results of real-time implementation of G.729.1 wideband speech codec which is standardized in SG15 of ITU-T. To apply the codec on ARM926EJ-S(R) processor core. we transformed some parts of the codec C program including basic operations and arithmetic functions into assembly language to operate the codec in real-time. G.729.1 is the standard wideband speech codec of ITU-T having variable bit rates of $8{\sim}32kbps$ and inputs quantized 16 bits PCM signal per sample at the rate of 8kHz or 16kHz sampling. This codec is interoperable with the G.729 and G.729A and the bandwidth extended wideband($50{\sim}7,000Hz$) version of existing narrowband($300{\sim}3,400Hz$) codec to enhance voice quality. The implemented G.729.1 wideband speech codec has the complexity of 31.2 MCPS for encoder and 22.8 MCPS for decoder and the execution time of the codec takes 11.5ms total on the target with 6.75ms and 4.76ms respectively. Also this codec was tested bit by bit exactly against all set of test vectors provided by ITU-T and passed all the test vectors. Besides the codec operated well on the Internet phone in real-time.
PDF KSCI

Search Result 178, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)