• Title/Summary/Keyword: parallel processing

Search Result 2,101, Processing Time 0.037 seconds

A Parallel Processing of Finding Neighbor Agents in Flocking Behaviors Using GPU (GPU를 이용한 무리 짓기에서 이웃 에이전트 찾기의 병렬 처리)

  • Lee, Jae-Moon
    • Journal of Korea Game Society
    • /
    • v.10 no.5
    • /
    • pp.95-102
    • /
    • 2010
  • This paper proposes a parallel algorithm of the flocking behaviors using GPU. To do this, we used CUDA as the parallel processing architecture of GPU and then analyzed its characteristics and constraints. Based on them, the paper improved the performance by parallelizing to find the neighbors for an agent which requires the largest cost in the flocking behaviors. We implemented the proposed algorithm on GTX 285 GPU and compared experimentally its performance with the original spatial partitioning method. The results of the comparison showed that the proposed algorithm outperformed the original method up to 9 times with respect to the execution time.

A study on the genetic algorithms for the scheduling of parallel computation (병렬계산의 스케쥴링에 있어서 유전자알고리즘에 관한 연구)

  • 성기석;박지혁
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1997.10a
    • /
    • pp.166-169
    • /
    • 1997
  • For parallel processing, the compiler partitions a loaded program into a set of tasks and makes a schedule for the tasks that will minimize parallel processing time for the loaded program. Building an optimal schedule for a given set of partitioned tasks of a program has known to be NP-complete. In this paper we introduce a GA(Genetic Algorithm)-based scheduling method in which a chromosome consists of two parts of a string which decide the number and order of tasks on each processor. An additional computation is used for feasibility constraint in the chromosome. By granularity theory, a partitioned program is categorized into coarse-grain or fine-grain types. There exist good heuristic algorithms for coarse-grain type partitioning. We suggested another GA adaptive to the coarse-grain type partitioning. The infeasibility of chromosome is overcome by the encoding and operators. The number of processors are decided while the GA find the minimum parallel processing time.

  • PDF

A New Decomposition Method for Parallel Processing Multi-Level Optimization

  • Park, Dong-Hoon;Park, Hyung-Wook;Kim, Min-Soo
    • Journal of Mechanical Science and Technology
    • /
    • v.16 no.5
    • /
    • pp.609-618
    • /
    • 2002
  • In practical designs, most of the multidisciplinary problems have a large-size and complicate design system. Since multidisciplinary problems have hundreds of analyses and thousands of variables, the grouping of analyses and the order of the analyses in the group affect the speed of the total design cycle. Therefore, it is very important to reorder and regroup the original design processes in order to minimize the total computational cost by decomposing large multidisciplinary problems into several multidisciplinary analysis subsystems (MDASS) and by processing them in parallel. In this study, a new decomposition method is proposed for parallel processing of multidisciplinary design optimization, such as collaborative optimization (CO) and individual discipline feasible (IDF) method. Numerical results for two example problems are presented to show the feasibility of the proposed method.

A Study on the Design of Format Converter for Pixel-Parallel Image Processing (픽셀-병렬 영상처리에 있어서 포맷 컨버터 설계에 관한 연구)

  • 김현기;김현호;하기종;최영규;류기환;이천희
    • Proceedings of the IEEK Conference
    • /
    • 2001.06b
    • /
    • pp.269-272
    • /
    • 2001
  • In this paper we proposed the format converter design and implementation for real time image processing. This design method is based on realized the large processor-per-pixel array by integrated circuit technology in which this two types of integrated structure is can be classify associative parallel processor and parallel process with DRAM cell. Layout pitch of one-bit-wide logic is identical memory cell pitch to array high density PEs in integrate structure. This format converter design has control path implementation efficiently, and can be utilized the high technology without complicated controller hardware. Sequence of array instruction are generated by host computer before process start, and instructions are saved on unit controller. Host computer is executed the pixel-parallel operation starting at saved instructions after processing start

  • PDF

Implementation of High-Speed Reed-Solomon Decoder Using the Modified Euclid's Algorithm (개선된 수정 유클리드 알고리듬을 이용한 고속의 Reed-Solomon 복호기의 설계)

  • 김동선;최종찬;정덕진
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.7
    • /
    • pp.909-915
    • /
    • 1999
  • In this paper, we propose an efficient VLSI architecture of Reed-Solomon(RS) decoder. To improve the speed. we develope an architecture featuring parallel and pipelined processing. To implement the parallel and pipelined processing architecture, we analyze the RS decoding algorithm and the honor's algorithm for parallel processing and we also modified the Euclid's algorithm to apply the efficient parallel structure in RS decoder. To show the proposed architecture, the performance of the proposed RS decoder is compared to Shao's and we obtain the 10 % efficiency in area and three times faster in speed when it's compared to Shao's time domain decoder. In addition, we implemented the proposed RS decoder with Altera FPGA Flex10K-50.

  • PDF

Initial Timing Acquisition for Binary Phase-Shift Keying Direct Sequence Ultra-wideband Transmission

  • Kang, Kyu-Min;Choi, Sang-Sung
    • ETRI Journal
    • /
    • v.30 no.4
    • /
    • pp.495-505
    • /
    • 2008
  • This paper presents a parallel processing searcher structure for the initial synchronization of a direct sequence ultra-wideband (DS-UWB) system, which is suitable for the digital implementation of baseband functionalities with a 1.32 Gsample/s chip rate analog-to-digital converter. An initial timing acquisition algorithm and a data demodulation method are also studied. The proposed searcher effectively acquires initial symbol and frame timing during the preamble transmission period. A hardware efficient receiver structure using 24 parallel digital correlators for binary phase-shift keying DS-UWB transmission is presented. The proposed correlator structure operating at 55 MHz is shared for correlation operations in a searcher, a channel estimator, and the demodulator of a RAKE receiver. We also present a pseudo-random noise sequence generated with a primitive polynomial, $1+x^2+x^5$, for packet detection, automatic gain control, and initial timing acquisition. Simulation results show that the performance of the proposed parallel processing searcher employing the presented pseudo-random noise sequence outperforms that employing a preamble sequence in the IEEE 802.15.3a DS-UWB proposal.

  • PDF

David II: A new architecture for parallel rendering processors with effective memory system (David II: 효과적인 메모리 시스템을 가지는 병렬 렌더링 프로세서)

  • Lee, Kil-Whan;Park, Woo-Chan;Kim, Il-San;Han, Tack-Don
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.1655-1658
    • /
    • 2004
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture, called DAVID II, resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that DAVID II achieves almost linear speedup at best case even in sixteen rasterizers.

  • PDF

PASS: A Parallel Speech Understanding System

  • Chung, Sang-Hwa
    • Journal of Electrical Engineering and information Science
    • /
    • v.1 no.1
    • /
    • pp.1-9
    • /
    • 1996
  • A key issue in spoken language processing has become the integration of speech understanding and natural language processing(NLP). This paper presents a parallel computational model for the integration of speech and NLP. The model adopts a hierarchically-structured knowledge base and memory-based parsing techniques. Processing is carried out by passing multiple markers in parallel through the knowledge base. Speech-specific problems such as insertion, deletion, and substitution have been analyzed and their parallel solutions are provided. The complete system has been implemented on the Semantic Network Array Processor(SNAP) and is operational. Results show an 80% sentence recognition rate for the Air Traffic Control domain. Moreover, a 15-fold speed-up can be obtained over an identical sequential implementation with an increasing speed advantage as the size of the knowledge base grows.

  • PDF

Designing Modulo $({2^n}-1)$ Parallel Multipliers and its Technological Application Using Op Amp Circuits (Op Amp 회로를 이용한, 모듈로 $({2^n}-1)$ 병렬 승산기의 설계 및 그 기술의 응용)

  • Lee, Hun-Giu;Kim, Chul
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.6
    • /
    • pp.436-445
    • /
    • 2001
  • In this paper, we introduce modulo ( 2$^n$-1) parallel-processing residue multipliers, using Op Amp circuits, and their technological application to designing binary multipliers. The limit of multiplying speed in computational processing is a serious harrier in the advances of VLSI technology. To solve this problem, we implement a class of modulo ( 2$^n$-1) parallel multipliers having superior time complexity to O( log$_2$( log$_2$( log$_2$$^n$))) by applying Op Amp circuits, while investigating their technological application to binary multipliers. Since they have excellent time & area complexity compared with previous parallel multipliers, and are applicable to designing binary multipliers of the same efficiency, such parallel multipliers possess high academic value. Indexing Terms Modular Multipliers. Binary Multipliers. Parallel Processing, Operational Amplifiers, Mersenne Numbers.

  • PDF

Parallel Implementations of Digital Focus Indices Based on Minimax Search Using Multi-Core Processors

  • HyungTae, Kim;Duk-Yeon, Lee;Dongwoon, Choi;Jaehyeon, Kang;Dong-Wook, Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.2
    • /
    • pp.542-558
    • /
    • 2023
  • A digital focus index (DFI) is a value used to determine image focus in scientific apparatus and smart devices. Automatic focus (AF) is an iterative and time-consuming procedure; however, its processing time can be reduced using a general processing unit (GPU) and a multi-core processor (MCP). In this study, parallel architectures of a minimax search algorithm (MSA) are applied to two DFIs: range algorithm (RA) and image contrast (CT). The DFIs are based on a histogram; however, the parallel computation of the histogram is conventionally inefficient because of the bank conflict in shared memory. The parallel architectures of RA and CT are constructed using parallel reduction for MSA, which is performed through parallel relative rating of the image pixel pairs and halved the rating in every step. The array size is then decreased to one, and the minimax is determined at the final reduction. Kernels for the architectures are constructed using open source software to make it relatively platform independent. The kernels are tested in a hexa-core PC and an embedded device using Lenna images of various sizes based on the resolutions of industrial cameras. The performance of the kernels for the DFIs was investigated in terms of processing speed and computational acceleration; the maximum acceleration was 32.6× in the best case and the MCP exhibited a higher performance.