• Title/Summary/Keyword: 병렬 알고리즘

Search Result 1,324, Processing Time 0.029 seconds

Efficient Multiple Joins using the Synchronization of Page Execution Time in Limited Processors Environments (한정된 프로세서 환경에서 체이지 실행시간 동기화를 이용한 효율적인 다중 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.732-741
    • /
    • 2001
  • In the relational database systems the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed 개 reduce the execution time Multiple hash join algorithm using allocation tree is one of the most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. This delay problem was solved by using the concept of synchronization of page execution time with we had proposed In this paper the effects of the performance improvements in each node of the allocation tree are extended to the whole allocation tree and the performance evaluation about that is processed. In addition we propose an efficient algorithm for multiple hash joins in limited number of processor environments according to the relationship between the number of input relations in the allocation tree and the number of processors allocated to the tree. Finally. we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

Real-time Watermarking Algorithm using Multiresolution Statistics for DWT Image Compressor (DWT기반 영상 압축기의 다해상도의 통계적 특성을 이용한 실시간 워터마킹 알고리즘)

  • 최순영;서영호;유지상;김대경;김동욱
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.13 no.6
    • /
    • pp.33-43
    • /
    • 2003
  • In this paper, we proposed a real-time watermarking algorithm to be combined and to work with a DWT(Discrete Wavelet Transform)-based image compressor. To reduce the amount of computation in selecting the watermarking positions, the proposed algorithm uses a pre-established look-up table for critical values, which was established statistically by computing the correlation according to the energy values of the corresponding wavelet coefficients. That is, watermark is embedded into the coefficients whose values are greater than the critical value in the look-up table which is searched on the basis of the energy values of the corresponding level-1 subband coefficients. Therefore, the proposed algorithm can operate in a real-time because the watermarking process operates in parallel with the compression procession without affecting the operation of the image compression. Also it improved the property of losing the watermark and the efficiency of image compression by watermark inserting, which results from the quantization and Huffman-Coding during the image compression. Visual recognizable patterns such as binary image were used as a watermark The experimental results showed that the proposed algorithm satisfied the properties of robustness and imperceptibility that are the major conditions of watermarking.

Real-time Color Recognition Based on Graphic Hardware Acceleration (그래픽 하드웨어 가속을 이용한 실시간 색상 인식)

  • Kim, Ku-Jin;Yoon, Ji-Young;Choi, Yoo-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.1-12
    • /
    • 2008
  • In this paper, we present a real-time algorithm for recognizing the vehicle color from the indoor and outdoor vehicle images based on GPU (Graphics Processing Unit) acceleration. In the preprocessing step, we construct feature victors from the sample vehicle images with different colors. Then, we combine the feature vectors for each color and store them as a reference texture that would be used in the GPU. Given an input vehicle image, the CPU constructs its feature Hector, and then the GPU compares it with the sample feature vectors in the reference texture. The similarities between the input feature vector and the sample feature vectors for each color are measured, and then the result is transferred to the CPU to recognize the vehicle color. The output colors are categorized into seven colors that include three achromatic colors: black, silver, and white and four chromatic colors: red, yellow, blue, and green. We construct feature vectors by using the histograms which consist of hue-saturation pairs and hue-intensity pairs. The weight factor is given to the saturation values. Our algorithm shows 94.67% of successful color recognition rate, by using a large number of sample images captured in various environments, by generating feature vectors that distinguish different colors, and by utilizing an appropriate likelihood function. We also accelerate the speed of color recognition by utilizing the parallel computation functionality in the GPU. In the experiments, we constructed a reference texture from 7,168 sample images, where 1,024 images were used for each color. The average time for generating a feature vector is 0.509ms for the $150{\times}113$ resolution image. After the feature vector is constructed, the execution time for GPU-based color recognition is 2.316ms in average, and this is 5.47 times faster than the case when the algorithm is executed in the CPU. Our experiments were limited to the vehicle images only, but our algorithm can be extended to the input images of the general objects.

Development of RTEMS SMP Platform Based on XtratuM Virtualization Environment for Satellite Flight Software (위성비행소프트웨어를 위한 XtratuM 가상화 기반의 RTEMS SMP 플랫폼)

  • Kim, Sun-wook;Choi, Jong-Wook;Jeong, Jae-Yeop;Yoo, Bum-Soo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.48 no.6
    • /
    • pp.467-478
    • /
    • 2020
  • Hypervisor virtualize hardware resources to utilize them more effectively. At the same time, hypervisor's characteristics of time and space partitioning improves reliability of flight software by reducing a complexity of the flight software. Korea Aerospace Research Institute chooses one of hypervisors for space, XtratuM, and examine its applicability to the flight software. XtratuM has strong points in performance improvement with high reliability. However, it does not support SMP. Therefore, it has limitation in using it with high performance applications including satellite altitude orbit control systems. This paper proposes RTEMS XM-SMP to support SMP with RTEMS, one of real time operating systems for space. Several components are added as hypercalls, and initialization processes are modified to use several processors with inter processors communication routines. In addition, all components related to processors are updated including context switch and interrupts. The effectiveness of the developed RTEMS XM-SMP is demonstrated with a GR740 board by executing SMP benchmark functions. Performance improvements are reviewed to check the effectiveness of SMP operations.

Design of a Bit-Level Super-Systolic Array (비트 수준 슈퍼 시스톨릭 어레이의 설계)

  • Lee Jae-Jin;Song Gi-Yong
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.12
    • /
    • pp.45-52
    • /
    • 2005
  • A systolic array formed by interconnecting a set of identical data-processing cells in a uniform manner is a combination of an algorithm and a circuit that implements it, and is closely related conceptually to arithmetic pipeline. High-performance computation on a large array of cells has been an important feature of systolic array. To achieve even higher degree of concurrency, it is desirable to make cells of systolic array themselves systolic array as well. The structure of systolic array with its cells consisting of another systolic array is to be called super-systolic array. This paper proposes a scalable bit-level super-systolic amy which can be adopted in the VLSI design including regular interconnection and functional primitives that are typical for a systolic architecture. This architecture is focused on highly regular computational structures that avoids the need for a large number of global interconnection required in general VLSI implementation. A bit-level super-systolic FIR filter is selected as an example of bit-level super-systolic array. The derived bit-level super-systolic FIR filter has been modeled and simulated in RT level using VHDL, then synthesized using Synopsys Design Compiler based on Hynix $0.35{\mu}m$ cell library. Compared conventional word-level systolic array, the newly proposed bit-level super-systolic arrays are efficient when it comes to area and throughput.

LASPI: Hardware friendly LArge-scale stereo matching using Support Point Interpolation (LASPI: 지원점 보간법을 이용한 H/W 구현에 용이한 스테레오 매칭 방법)

  • Park, Sanghyun;Ghimire, Deepak;Kim, Jung-guk;Han, Youngki
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.932-945
    • /
    • 2017
  • In this paper, a new hardware and software architecture for a stereo vision processing system including rectification, disparity estimation, and visualization was developed. The developed method, named LArge scale stereo matching method using Support Point Interpolation (LASPI), shows excellence in real-time processing for obtaining dense disparity maps from high quality image regions that contain high density support points. In the real-time processing of high definition (HD) images, LASPI does not degrade the quality level of disparity maps compared to existing stereo-matching methods such as Efficient LArge-scale Stereo matching (ELAS). LASPI has been designed to meet a high frame-rate, accurate distance resolution performance, and a low resource usage even in a limited resource environment. These characteristics enable LASPI to be deployed to safety-critical applications such as an obstacle recognition system and distance detection system for autonomous vehicles. A Field Programmable Gate Array (FPGA) for the LASPI algorithm has been implemented in order to support parallel processing and 4-stage pipelining. From various experiments, it was verified that the developed FPGA system (Xilinx Virtex-7 FPGA, 148.5MHz Clock) is capable of processing 30 HD ($1280{\times}720pixels$) frames per second in real-time while it generates disparity maps that are applicable to real vehicles.

A Benchmark of Hardware Acceleration Technology for Real-time Simulation in Smart Farm (CUDA vs OpenCL) (스마트 시설환경 실시간 시뮬레이션을 위한 하드웨어 가속 기술 분석)

  • Min, Jae-Ki;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.160-160
    • /
    • 2017
  • 자동화 기술을 통한 한국형 스마트팜의 발전이 비약적으로 이루어지고 있는 가운데 무인화를 위한 지능적인 스마트 시설환경 관찰 및 분석에 대한 요구가 점점 증가 하고 있다. 스마트 시설환경에서 취득 가능한 시계열 데이터는 온도, 습도, 조도, CO2, 토양 수분, 환기량 등 다양하다. 시스템의 경계가 명확함에도 해당 속성의 특성상 타임도메인과 공간도메인 상에서 정확한 추정 또는 예측이 난해하다. 시설 환경에 접목이 증가하고 있는 지능형 관리 기술 구현을 위해선 시계열 공간 데이터에 대한 신속하고 정확한 정량화 기술이 필수적이라 할 수 있다. 이러한 기술적인 요구사항을 해결하고자 시도되는 다양한 방법 중에서 공간 분해능 향상을 위한 다지점 계측 메트릭스를 실험적으로 구성하였다. $50m{\times}100m$의 단면적인 연동 딸기 온실을 대상으로 $3{\times}3{\times}3$의 3차원 환경 인자 계측 매트릭스를 설치하였다. 1 Hz의 주기로 4가지 환경인자(온도, 습도, 조도, CO2)를 계측하였으며, 계측 하는 시점과 동시에 병렬적으로 공간통계법을 이용하여 미지의 지점에 대한 환경 인자들을 실시간으로 추정하였다. 선행적으로 50 cm 공간 분해능에 대응하기 위하여 Kriging interpolation법을 횡단면에 대하여 분석한 후 다시 종단면에 대하여 분석하였다. 3 Ghz에 해당하는 연산 능력을 보유한 컴퓨터에서 1초 동안 획득한 데이터에 대한 분석을 마치는데 소요되는 시간이 15초 내외로 나타났다. 이는 해당 알고리즘의 매우 높은 시간 복잡도(Order of $O=O^3$)에 기인하는 것으로 다양한 시설 환경의 관리 방법론에 적절히 대응하기에 한계가 있다 할 수 있다. 실시간으로 시간 복잡도가 높은 연산을 수행하기 위한 기술적인 과제를 해결하고자, 근래에 관심이 증가하고 있는 NVIDIA 사에서 제공하는 CUDA 엔진과 Apple사의 제안을 시작으로 하여 공개 소프트웨어 개발 컨소시엄인 크로노스 그룹에서 제공하는 OpenCL 엔진을 비교 분석하였다. CUDA 엔진은 GPU(Graphics Processing Unit)에서 정보 분석 프로그램의 연산 집약적인 부분만을 담당하여 신속한 결과를 산출할 수 있는 라이브러리이며 해당 하드웨어를 구비하였을 때 사용이 가능하다. 반면, OpenCL은 CUDA 엔진이 특정 하드웨어에서 구동이 되는 한계를 극복하고자 하드웨어에 비의존적인 라이브러리를 제공하는 것이 다르며 클러스터링 기술과 연계를 통해 낮은 하드웨어 성능으로 인한 단점을 극복하고자 하였다. 본 연구에서는 CUDA 8.0(https://developer.nvidia.com/cuda-downloads)버전과 Pascal Titan X(NVIDIA, CA, USA)를 사용한 방법과 OpenCL 1.2(https://www.khronos.org/opencl/)버전과 Samsung Exynos5422 칩을 장착한 ODROID-XU4(Hardkernel, AnYang, Korea)를 사용한 방법을 비교 분석하였다. 50 cm의 공간 분해능에 대응하기 위한 4차원 행렬($100{\times}200{\times}5{\times}4$)에 대하여 정수 지수화를 위한 Quantization을 거쳐 CUDA 엔진과 OpenCL 엔진을 적용한 비교한 결과, CUDA 엔진은 1초 내외, OpenCL 엔진의 경우 5초 내외의 연산 속도를 보였다. CUDA 엔진의 경우 비용측면에서 약 10배, 전력 소모 측면에서 20배 이상 소요되었다. 따라서 우선적으로 OpenCL 엔진 기반 하드웨어 가속 기술 최적화 연구를 통해 스마트 시설환경 실시간 시뮬레이션 기술 도입을 위한 기술적 과제를 풀어갈 것이다.

  • PDF

Assessment of System Reliability and Capacity-Rating of Concrete Box-Girder Highway Brdiges (R.C 박스거교의 체계신뢰성 해석 및 안전도 평가)

  • 조효남;신재철
    • Magazine of the Korea Concrete Institute
    • /
    • v.7 no.3
    • /
    • pp.187-198
    • /
    • 1995
  • This paper develops practical and reallstic reliabllity models and methods for the evaluation of system rehability and system rellabllity based ratlng of R.C box glrder bridge superstructures. The precise prediction of reberved carrying capacity of bridge as d system is extremely difficult especially when the brldges are highly redundant and slgnlficantly deter 1or;itcd or dainagetl. Thls papel proposes a nt2w approach for the evaluation of reseived system c,drrying capaaty of br~dges in terms ot equ~vdleiit system strength, which may b~ ddcflned as a brtdge system strength correipcmdlng tu the system rehability of the bridge. This cm be ticrAvcd from an Inverse process bami or1 the con~ept of FOSM(F1rst Order Second Moment) form of system reliabihty index. The sf rength llmt state models for K C box girder br~dges suggested In the paper dre based on the basi~ bending and shear strength And thc system reliatxllty pro,~lerri of box gritier super structure 1s formuldted as parallel serles models obtalncd f ~ o m thc FMA(Fdilure blode Rp proath) based on major failure mc>clmusrns or c~itlcal fdure ,>tatcs of each nuder .WOSM(Ad-vanced First Order Second Moment) and IST(1mportance Sampling Technique) simulation algorithm are used for the reliability analysis of the proposed models.

New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm (Radix-2 MBA 기반 병렬 MAC의 VLSI 구조)

  • Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.4
    • /
    • pp.94-104
    • /
    • 2008
  • In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.