• 제목/요약/키워드: parallel architecture

검색결과 891건 처리시간 0.023초

Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA

  • Jeong, In-Kyu;Hong, Min-Gee;Hahn, Kwang-Soo;Choi, Joonsoo;Kim, Choen
    • 대한원격탐사학회지
    • /
    • 제28권6호
    • /
    • pp.683-691
    • /
    • 2012
  • High resolution satellite images are now widely used for a variety of mapping applications including photogrammetry, GIS data acquisition and visualization. As the spectral and spatial data size of satellite images increases, a greater processing power is needed to process the images. The solution of these problems is parallel systems. Parallel processing techniques have been developed for improving the performance of image processing along with the development of the computational power. However, conventional CPU-based parallel computing is often not good enough for the demand for computational speed to process the images. The GPU is a good candidate to achieve this goal. Recently GPUs are used in the field of highly complex processing including many loop operations such as mathematical transforms, ray tracing. In this study we proposed a technique for parallel processing of high resolution satellite images using GPU. We implemented a spectral radiometric processing algorithm on Landsat-7 ETM+ imagery using CUDA, a parallel computing architecture developed by NVIDIA for GPU. Also performance of the algorithm on GPU and CPU is compared.

FPGA 상에서 OpenCL을 이용한 병렬 문자열 매칭 구현과 최적화 방향 (Parallel String Matching and Optimization Using OpenCL on FPGA)

  • 윤진명;최강일;김현진
    • 전기학회논문지
    • /
    • 제66권1호
    • /
    • pp.100-106
    • /
    • 2017
  • In this paper, we propose a parallel optimization method of Aho-Corasick (AC) algorithm and Parallel Failureless Aho-Corasick (PFAC) algorithm using Open Computing Language (OpenCL) on Field Programmable Gate Array (FPGA). The low throughput of string matching engine causes the performance degradation of network process. Recently, many researchers have studied the string matching engine using parallel computing. FPGA's vendors offer a parallel computing platform using OpenCL. In this paper, we apply the AC and PFAC algorithm on DE1-SoC board with Cyclone V FPGA, where the optimization that considers FPGA architecture is performed. Experiments are performed considering global id, local id, local memory, and loop unrolling optimizations using PFAC algorithm. The performance improvement using loop unrolling is 129 times greater than AC algorithm that not adopt loop unrolling. The performance improvements using loop unrolling are 1.1, 0.2, and 1.5 times greater than those using global id, local id, and local memory optimizations mentioned above.

현대패션에 나타난 건축 공간을 활용한 이미지 구축현상 고찰 - 아르마니 그룹, 프라다, 콤므 데 가르송을 중심으로 - (An Observation on Phenomenon of Image Construction Using Architecture Space in Contemporary Fashion)

  • 박신미
    • 복식
    • /
    • 제62권7호
    • /
    • pp.150-169
    • /
    • 2012
  • Contemporary fashion broadens its image in relation with architecture and incorporates architecture in its creative field. The aim of this research is to investigate the characteristics of 'architecture for fashion' in a social context and to verify the collaborative characteristics of post-1990s fashion and architecture. This paper describes architecture as a means of expressing contemporary style and identifies social consequences resulting from this. While high fashion in the early to mid-twentieth century followed a similar trend evident in architecture to directly apply architectural elements into the creativity of the works, high fashion from 1990 extended its creative field by using architecture to symbolically represent its image and style. In line with the possibility for fashion shows to be considered as a performance art, the potential of collaboration between architecture and fashion as an installation with audience participation is discussed. Architecture, for fashion, provides significant grounds for fashion to be recognized as a parallel, independent sphere of art. Contemporary fashion, either by itself or through collaboration with architecture, comprehends space in its zone of creation. The collaborative characteristics of post-1990s fashion and architecture is verified through case studies of the three fashion houses, 'Armani Group', 'PRADA' and 'Comme des Garςcons.'

UWB OFDM 통신 시스템 용 FFT(Fast Fourier Transform) 설계에 관한 연구 (A Study on the Design of FFT Architecture for Ultra-Wide Band OFDM Communication System)

  • 박계완;윤상훈;정정화
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2004년도 하계종합학술대회 논문집(1)
    • /
    • pp.309-312
    • /
    • 2004
  • This paper proposes the architecture of UWB OFDM communication system. More high data rate is requested in the 128-point FFT/IFFT of the UWB OFDM communication system than the conventional communication systems. So, the proposed architecture uses pipeline and parallel architecture. For a highly efficient architecture, the optimal clipping power and the input quantization bits are found in simulation. The hardware complexity of the proposed architecture is presented is consideration of Adder, Register and Complex Multiplier.

  • PDF

A Parallel Algorithm for Large-Scale Linear Programs with a Special Structure

  • Oh, Seyoung
    • 충청수학회지
    • /
    • 제6권1호
    • /
    • pp.139-155
    • /
    • 1993
  • A new sequential algorithm and computational results for large-scale linear programs with a special structure were presented in the previous paper [9]. In this paper, a parallel version of the algorithm was developed for a hypercube multiprocessor architecture NCUBE2. Computational results using 128 processors are presented for a randomly generated large-scale sparse or dense problems with the number of variables up to 256 and constraints up to 5 million.

  • PDF

Content addressable memory의 이웃패턴감응고장 테스트를 위한 내장된 자체 테스트 기법 (Built-in self test for testing neighborhood pattern sensitive faults in content addressable memories)

  • 강용석;이종철;강성호
    • 전자공학회논문지C
    • /
    • 제35C권8호
    • /
    • pp.1-9
    • /
    • 1998
  • A new parallel test algorithm and a built-in self test (BIST) architecture are developed to test various types of functional faults efficiently in content addressable memories (CAMs). In test mode, the read oepratin is replaced by one parallel content addressable search operation and the writing operating is performed parallely with small peripheral circuit modificatins. The results whow that an efficient and practical testing with very low complexity and area overhead can be achieved.

  • PDF

EFFICIENT PARALLEL GAUSSIAN NORMAL BASES MULTIPLIERS OVER FINITE FIELDS

  • Kim, Young-Tae
    • 호남수학학술지
    • /
    • 제29권3호
    • /
    • pp.415-425
    • /
    • 2007
  • The normal basis has the advantage that the result of squaring an element is simply the right cyclic shift of its coordinates in hardware implementation over finite fields. In particular, the optimal normal basis is the most efficient to hardware implementation over finite fields. In this paper, we propose an efficient parallel architecture which transforms the Gaussian normal basis multiplication in GF($2^m$) into the type-I optimal normal basis multiplication in GF($2^{mk}$), which is based on the palindromic representation of polynomials.

Design of an efficient routing algorithm on the WK-recursive network

  • Chung, Il-Yong
    • 스마트미디어저널
    • /
    • 제11권9호
    • /
    • pp.39-46
    • /
    • 2022
  • The WK-recursive network proposed by Vecchia and Sanges[1] is widely used in the design and implementation of local area networks and parallel processing architectures. It provides a high degree of regularity and scalability, which conform well to a design and realization of distributed systems involving a large number of computing elements. In this paper, the routing of a message is investigated on the WK-recursive network, which is key to the performance of this network. We present an efficient shortest path algorithm on the WK-recursive network, which is simpler than Chen and Duh[2] in terms of design complexity.

K-Nearest Neighbor Associative Memory with Reconfigurable Word-Parallel Architecture

  • An, Fengwei;Mihara, Keisuke;Yamasaki, Shogo;Chen, Lei;Mattausch, Hans Jurgen
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제16권4호
    • /
    • pp.405-414
    • /
    • 2016
  • IC-implementations provide high performance for solving the high computational cost of pattern matching but have relative low flexibility for satisfying different applications. In this paper, we report an associative memory architecture for k nearest neighbor (KNN) search, which is one of the most basic algorithms in pattern matching. The designed architecture features reconfigurable vector-component parallelism enabled by programmable switching circuits between vector components, and a dedicated majority vote circuit. In addition, the main time-consuming part of KNN is solved by a clock mapping concept based weighted frequency dividers that drastically reduce the in principle exponential increase of the worst-case search-clock number with the bit width of vector components to only a linear increase. A test chip in 180 nm CMOS technology, which has 32 rows, 8 parallel 8-bit vector-components in each row, consumes altogether in peak 61.4 mW and only 11.9 mW for nearest squared Euclidean distance search (at 45.58 MHz and 1.8 V).

7.7 Gbps Encoder Design for IEEE 802.11ac QC-LDPC Codes

  • Jung, Yong-Min;Chung, Chul-Ho;Jung, Yun-Ho;Kim, Jae-Seok
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제14권4호
    • /
    • pp.419-426
    • /
    • 2014
  • This paper proposes a high-throughput encoding process and encoder architecture for quasi-cyclic low-density parity-check codes in IEEE 802.11ac standard. In order to achieve the high throughput with low complexity, a partially parallel processing based encoding process and encoder architecture are proposed. Forward and backward accumulations are performed in one clock cycle to increase the encoding throughput. A low complexity cyclic shifter is also proposed to minimize the hardware overhead of combinational logic in the encoder architecture. In IEEE 802.11ac systems, the proposed encoder is rate compatible to support various code rates and codeword block lengths. The proposed encoder is implemented with 130-nm CMOS technology. For (1944, 1620) irregular code, 7.7 Gbps throughput is achieved at 100 MHz clock frequency. The gate count of the proposed encoder core is about 96 K.