• 제목/요약/키워드: Parallel processor

검색결과 482건 처리시간 0.031초

LASOB 상에서 계산 트리 형식을 생성하기 위한 최적 병렬 알고리즘 (An Optimal Parallel Algorithm for Generating Computation Tree Form on Linear Array with Slotted Optical Buses)

  • 김영학
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제27권5호
    • /
    • pp.475-484
    • /
    • 2000
  • 최근에 전자 버스 대신에 광 버스를 사용하여 버스의 대역폭을 늘리고 하드웨어의 복잡도를 줄이기 위한 처리기 배열의 구조가 다수의 문헌에서 제안되었다. 본 논문에서는 먼저 슬롯된 광 버스를 갖는 선형 처리기 배열(LASOB) 상에서 괄호 매칭 문제에 대한 상수 시간 알고리즘을 제안한다. 다음에 이 알고리즘을 사용하여 길이 n의 대수 식이 주어지면 n개의 처리기를 갖는 LASOB 상에서 상수 시간에 계산 트리 형식을 생성하는 비용이 최적인 병렬 알고리즘을 제안한다. 아직 임의의 병렬 컴퓨터 모델에서 이 문제에 대한 상수 시간에 수행되는 비용 최적인 병렬 알고리즘은 알려지지 않고 있다.

  • PDF

SEED 와 TDES 암호 알고리즘을 구현하는 암호 프로세서의 VLSI 설계 (VLSI Design of Cryptographic Processor for SEED and Triple DES Encryption Algorithm)

  • 정진욱;최병윤
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 하계종합학술대회 논문집(2)
    • /
    • pp.169-172
    • /
    • 2000
  • This paper describes design of cryptographic processor which can execute SEED, DES, and triple DES encryption algorithm. To satisfy flexible architecture and area-efficient structure, the processor has I unrolled loop structure with hardware sharing and can support four standard mode, such as ECB, CBC, CFB, and OFB modes. To reduce overhead of key computation, the precomputation technique is used. Also to eliminate increase of processing time due to data input and output time, background I/O technique is used which data input and output operation execute in parallel with encryption operation of cryptographic processor. The cryptographic processor is designed using 2.5V 0.25 $\mu\textrm{m}$ CMOS technology and consists of about 34.8K gates. Its peak performances is about 250 Mbps under 100 Mhz ECB SEED mode and 125 Mbps under 100 Mhz triple DES mode.

  • PDF

인공지능프로세서 기술 동향 (Trends in AI Processor Technology)

  • 이미영;정재훈;이주현;한진호;권영수
    • 전자통신동향분석
    • /
    • 제35권3호
    • /
    • pp.66-75
    • /
    • 2020
  • As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.

3중 DES와 DES 암호 알고리즘용 암호 프로세서와 VLSI 설계 (VLSI Design of Cryptographic Processor for Triple DES and DES Encryption Algorithm)

  • 정진욱;최병윤
    • 한국멀티미디어학회:학술대회논문집
    • /
    • 한국멀티미디어학회 2000년도 춘계학술발표논문집
    • /
    • pp.117-120
    • /
    • 2000
  • This paper describe VLSL design of crytographic processor which can execute triple DES and DES encryption algorithm. To satisfy flexible architecture and area-efficient structure, the processor has 1 unrolled loop structure without pipeline and can support four standard mode, such as ECB, CBC, CFB, and OFB modes. To reduce overhead of key computation , the key precomputation technique is used. Also to eliminate increase of processing time due to data input and output time, background I/O techniques is used which data input and output operation execute in parallel with encryption operation of cryptographic processor. The cryptographic processor is implemented using Altera EPF10K40RC208-4 devices and has peak performance of about 75 Mbps under 20 Mhz ECB DES mode and 25 Mbps uder 20 Mhz triple DES mode.

  • PDF

A Low-Complexity 128-Point Mixed-Radix FFT Processor for MB-OFDM UWB Systems

  • Cho, Sang-In;Kang, Kyu-Min
    • ETRI Journal
    • /
    • 제32권1호
    • /
    • pp.1-10
    • /
    • 2010
  • In this paper, we present a fast Fourier transform (FFT) processor with four parallel data paths for multiband orthogonal frequency-division multiplexing ultra-wideband systems. The proposed 128-point FFT processor employs both a modified radix-$2^4$ algorithm and a radix-$2^3$ algorithm to significantly reduce the numbers of complex constant multipliers and complex booth multipliers. It also employs substructure-sharing multiplication units instead of constant multipliers to efficiently conduct multiplication operations with only addition and shift operations. The proposed FFT processor is implemented and tested using 0.18 ${\mu}m$ CMOS technology with a supply voltage of 1.8 V. The hardware- efficient 128-point FFT processor with four data streams can support a data processing rate of up to 1 Gsample/s while consuming 112 mW. The implementation results show that the proposed 128-point mixed-radix FFT architecture significantly reduces the hardware cost and power consumption in comparison to existing 128-point FFT architectures.

Parallel Fuzzy Inference Method for Large Volumes of Satellite Images

  • Lee, Sang-Gu
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제1권1호
    • /
    • pp.119-124
    • /
    • 2001
  • In this pattern recognition on the large volumes of remote sensing satellite images, the inference time is much increased. In the case of the remote sensing data [5] having 4 wavebands, the 778 training patterns are learned. Each land cover pattern is classified by using 159, 900 patterns including the trained patterns. For the fuzzy classification, the 778 fuzzy rules are generated. Each fuzzy rule has 4 fuzzy variables in the condition part. Therefore, high performance parallel fuzzy inference system is needed. In this paper, we propose a novel parallel fuzzy inference system on T3E parallel computer. In this, fuzzy rules are distributed and executed simultaneously. The ONE_To_ALL algorithm is used to broadcast the fuzzy input to the all nodes. The results of the MIN/MAX operations are transferred to the output processor by the ALL_TO_ONE algorithm. By parallel processing of the fuzzy rules, the parallel fuzzy inference algorithm extracts match parallelism and achieves a good speed factor. This system can be used in a large expert system that ha many inference variables in the condition and the consequent part.

  • PDF

IBM SP2와 SGI Origin 2000에서의 병렬 VHDL 시뮬레이션 (Parallel VHDL Simulation on IBM SP2 and SGI Origin 2000)

  • 정영식
    • 한국시뮬레이션학회논문지
    • /
    • 제7권1호
    • /
    • pp.69-83
    • /
    • 1998
  • In this paper, we present the results of simulation by running parallel VHDL simulation on typical MPP(Massively Parallel Processor) systems such as IBM SP2 and SGI Origin 2000. Parallel simulation uses the synchronous protocol and parallel program is implemented using MPI(Message Passing Interface) based on message passing model, so that it can urn on any parallel programming environment which supports MPI, a standard communication library. And then GVT(Global Virtual Time) computation for parallel simulation is based on the global broadcasting with MPI_Bcast(), which is a standard function in MPI and piggybacking. Our benchmark exhibits that as size of VHDL grows, the parallel simulation has a better performance compared with the sequential simulation. In addition, we also show the results of comparison between IBM SP2 and SGI Origin 2000 by applying the same application to those indirectly.

  • PDF

DEVELOPMENT OF PARALLEL COMPUTATION METHOD FOR THE p VERSION IN THE FINITE ELEMENT METHOD

  • Kim, Chang-Geun;Cha, Ho-Jung
    • Journal of applied mathematics & informatics
    • /
    • 제6권2호
    • /
    • pp.649-659
    • /
    • 1999
  • This paper presents a parallel implementation of stiff-ness matrix calculation based on the processor farm model on a net-work of workstations running PVM programming environment. As the computational characteristics of stiffnes matrix exhibits good po-tentials for effective prallel computation the performance improve-ment is show to be almost linear with the number of sorkstations involved in the computation.

A Study on Sorting in A Computer Using The Binary Multi-level Multi-access Protocol

  • Jung Chang-Duk
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2006년도 춘계학술대회
    • /
    • pp.303-310
    • /
    • 2006
  • The sorting algorithms have been developed to take advantage of distributed computers. But the speedup of parallel sorting algorithms decrease rapidly with increased number of processors due to parallel processing overhead such as context switching time and inter-processor communication cost. In this paper, we propose a parallel sorting method which provides linear speedup of an optimal serial algorithm for a system with a large number of processors. This algorithm may even provide superlinear speedup for a practical system. The algorithm takes advantage of an interconnection network properties and its protocol.

  • PDF

AN ASYNCHRONOUS PARALLEL SOLVER FOR SOME MATRIX PROBLEMS

  • Park, Pil-Seong
    • Journal of applied mathematics & informatics
    • /
    • 제7권3호
    • /
    • pp.1045-1058
    • /
    • 2000
  • In usual synchronous parallel computing, workload balance is a crucial factor to reduce idle times of some processors that have finished their jobs earlier than others. However, it is difficult to achieve on a heterogeneous workstation clusters where the available computing power of each processor is unpredictable. As a way to overcome such a problem, the idea of asynchronous methods has grown out and is being increasingly used and studied, but there is none for eigenvalue problems yet. In this paper, we suggest a new asynchronous method to solve some singular matrix problems, that can also be used for finding a certain eigenvector of some matrices.