• Title/Summary/Keyword: parallel library

Search Result 188, Processing Time 0.03 seconds

An Application-Level Fault Tolerant Linear System Solver Using an MPMD Type Asynchronous Iteration (MPMD 방식의 비동기 연산을 이용한 응용 수준의 무정지 선형 시스템의 해법)

  • Park, Pil-Seong
    • The KIPS Transactions:PartA
    • /
    • v.12A no.5 s.95
    • /
    • pp.421-426
    • /
    • 2005
  • In a large scale parallel computation, some processor or communication link failure results in a waste of huge amount of CPU hours. However, MPI in its current specification gives the user no possibility to handle such a problem. In this paper, we propose an application-level fault tolerant linear system solver by using an MPMD-type asynchronous iteration, purely on the basis of the MPI standard without using any non-standard fault-tolerant MPI library.

Plug and Play Style Performance Visualizer for Parallel Programs (병렬 프로그램을 위한 PnP 스타일의 성능 가시화기)

  • 문상수;김정선;문영식
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10c
    • /
    • pp.756-758
    • /
    • 1999
  • 본 논문에서는 최적의 성능을 갖는 병렬 프로그램을 개발하는데 필수 도구인 성능가시화기를 이식성, 확장성 그리고 효율성을 고려해 설계 및 구현한 PnP 스타일의 성능 가시화기에 대하여 기술한다. 본 가시화기는 기존 가시화기의 문제점인 수정 및 변용에의 어려움을 해결하기 위하여 독립된 계층구조인 인스트루멘테이션층, 인터페이스층, 가시화층으로 구성함으로써 확장성 및 이식성을 갖도록 하였다. 인스트루멘테이션층은 사건(event)을 포획하기 위해 개발된 라이브러리인 ECL(Event Capture Library)로 구성되며, 인터페이스층은 인스트루멘테이션층과 가시화층간에 확장성 있는 문제중심 인터페이스를 제공하기 위해 개발된 사건 기술 언어 및 Java 문제중심 엑세스 라이브러리로 구성되었다. 그리고 PnP 스타일의 성능 가시화기를 설계함으로써 뷰와 필터의 추가 및 수정이 용이하도록 가시화층을 구현하였다. 이렇게 구현된 성능가시화기는 독립된 도구로 사용될 수 있을 뿐 아니라 병렬 프로그래밍, 디버깅, 그리고 성능 분석이 통합된 프로그램 개발환경 구축의 핵심도구로서 활용될 수 있을 것이다.

  • PDF

Cyclic Peptides as Therapeutic Agents and Biochemical Tools

  • Joo, Sang-Hoon
    • Biomolecules & Therapeutics
    • /
    • v.20 no.1
    • /
    • pp.19-26
    • /
    • 2012
  • There are many cyclic peptides with diverse biological activities, such as antibacterial activity, immunosuppressive activity, and anti-tumor activity, and so on. Encouraged by natural cyclic peptides with biological activity, efforts have been made to develop cyclic peptides with both genetic and synthetic methods. The genetic methods include phage display, intein-based cyclic peptides, and mRNA display. The synthetic methods involve individual synthesis, parallel synthesis, as well as split-and-pool synthesis. Recent development of cyclic peptide library based on split-and-pool synthesis allows on-bead screening, in-solution screening, and microarray screening of cyclic peptides for biological activity. Cyclic peptides will be useful as receptor agonist/antagonist, RNA binding molecule, enzyme inhibitor and so on, and more cyclic peptides will emerge as therapeutic agents and biochemical tools.

An implementation and performance measurement of Matlab matrix operation library for parallel computing on dual CPU PC (이중 CPU PC에서 병렬 계산을 위한 Matlab 행렬 연산 라이브러리의 구현 및 성능 측정)

  • 김철민;이정훈
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10c
    • /
    • pp.871-873
    • /
    • 2001
  • 본 논문에서는 전기 단층 촬영 기법과 같이 많은 양의 데이터에 대해 산술 계산을 수행하는 응용의 수행속도를 개선하기 위하여 이중 CPU PC 상에서 Matlab의 기본연산, 즉 행렬 곱하기, 역행렬 계산, 의사 역행렬 계산 등을 병렬로 수행하는 라이브러리 프로그램을 구현하고 그 성능을 측정한다. 구현된 라이브러리는 행렬의 곱하기, 역행렬 계산, 의사 역행렬 계산 등 기본적인 행렬 연산에 대해 각 CPU에서 수행될 쓰레드를 생성하고 이 쓰레드에 분할 행렬을 인자로 넘겨줌으로써 병렬 계산을 실행하도록 하고 부분 결과를 합성하여 최종적인 결과를 산출하게 된다. 구현된 코드를 수행시켜 속도를 측정한 결과 행렬의 곱하기는 최대 69%, 역행렬은 34.8 %, 의사 역행렬은 52 % 까지 수행시간을 단축시켰다. 이에 의해 전기 단층 촬영 프로그램은 한번의 전류 주입에 대해 영상 복원에 소요되는 시간을 48 %로 감소시켰다.

  • PDF

Architecture design and FPGA implementation of a system control unit for a multiprocessor chip (다중 프로세서 칩을 위한 시스템 제어 장치의 구조설계 및 FPGA 구현)

  • 박성모;정갑천
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.12
    • /
    • pp.9-19
    • /
    • 1997
  • This paper describes the design and FPGA implementation of a system control unit within a multiprocessor chip which can be used as a node processor ina massively parallel processing (MPP) caches, memory management units, a bus unit and a system control unit. Major functions of the system control unit are locking/unlocking of the shared variables of protected access, synchronization of instruction execution among four integer untis, control of interrupts, generation control of processor's status, etc. The system control unit was modeled in very high level using verilog HDL. Then, it was simulated and verified in an environment where trap handler and external interrupt controller were added. Functional blocks of the system control unit were changed into RTL(register transfer level) model and synthesized using xilinx FPGA cell library in synopsys tool. The synthesized system control unit was implemented by Xilinx FPGA chip (XC4025EPG299) after timing verification.

  • PDF

Design of an efficient multiplierless FIR filter chip with variable length taps (곱셈기가 없는 효율적인 가변탭 FIR 필터 칩 설계)

  • 윤성현;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.6
    • /
    • pp.22-27
    • /
    • 1997
  • This paper propose a novel VLSI architecture for a multiplierless FIR filter chip providing variable-length taps. To change the number of taps, we propose two special features called a data-reuse structure and a recurrent-coefficient scheme. These features consist of several MUXs and registers and reduce the number of gates over 20% compared with existing chips using an address generation unit and a modulo unit. Since multipliers occupy large VLSI area, a multiplierless filter chip meeting real-time requirement can save large area. We propose a modified bit-serial multiplication algorithm to compute two partial products in parallel, and thus, the proposed filter is twice faster and has smaller hardware than previous multiplierless filters. We developed VHDL models and performed logic synthesis using the 0.8.mu.m SOG (sea-of-gate) cell library. The chip has only 9,507 gates, was fabricated, and is running at 77MHz.

  • PDF

Performance Analysis of DNN inference using OpenCV Built in CPU and GPU Functions (OpenCV 내장 CPU 및 GPU 함수를 이용한 DNN 추론 시간 복잡도 분석)

  • Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.1
    • /
    • pp.75-78
    • /
    • 2022
  • Deep Neural Networks (DNN) has become an essential data processing architecture for the implementation of multiple computer vision tasks. Recently, DNN-based algorithms achieve much higher recognition accuracy than traditional algorithms based on shallow learning. However, training and inference DNNs require huge computational capabilities than daily usage purposes of computers. Moreover, with increased size and depth of DNNs, CPUs may be unsatisfactory since they use serial processing by default. GPUs are the solution that come up with greater speed compared to CPUs because of their Parallel Processing/Computation nature. In this paper, we analyze the inference time complexity of DNNs using well-known computer vision library, OpenCV. We measure and analyze inference time complexity for three cases, CPU, GPU-Float32, and GPU-Float16.

Large-scale 3D fast Fourier transform computation on a GPU

  • Jaehong Lee;Duksu Kim
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1035-1045
    • /
    • 2023
  • We propose a novel graphics processing unit (GPU) algorithm that can handle a large-scale 3D fast Fourier transform (i.e., 3D-FFT) problem whose data size is larger than the GPU's memory. A 1D FFT-based 3D-FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data-transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large-scale benchmarks and compare its performance with the state-of-the-art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU-based 3D-FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive-scale problems, whereas our method's performance is stable.

Study on MPI-based parallel sequence similarity search in the LINUX cluster (클러스터 환경에서의 MPI 기반 병렬 서열 유사성 검색에 관한 연구)

  • Hong, Chang-Bum;Cha, Jeoung-Ho;Lee, Sung-Hoon;Shin, Seung-Woo;Park, Keun-Joon;Park, Keun-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.6 s.44
    • /
    • pp.69-78
    • /
    • 2006
  • In the field of the bioinformatics, it plays an important role in predicting functional information or structure information to search similar sequence in biological DB. Biolrgical sequences have been increased dramatically since Human Genome Project. At this point, because the searching speed for the similar sequence is highly regarded as the important factor for predicting function or structure, the SMP(Sysmmetric Multi-Processors) computer or cluster is being used in order to improve the performance of searching time. As the method to improve the searching time of BLAST(Basic Local Alighment Search Tool) being used for the similarity sequence search, We suggest the nBLAST algorithm performing on the cluster environment in this paper. As the nBLAST uses the MPI(Message Passing Interface), the parallel library without modifying the existing BLAST source code, to distribute the query to each node and make it performed in parallel, it is possible to easily make BLAST parallel without complicated procedures such as the configuration. In addition, with the experiment performing the nBLAST in the 28 nodes of LINUX cluster, the enhanced performance according to the increase in the number of the nodes has been confirmed.

  • PDF