• Title/Summary/Keyword: Parallel computer architecture

Search Result 233, Processing Time 0.026 seconds

7.7 Gbps Encoder Design for IEEE 802.11ac QC-LDPC Codes

  • Jung, Yong-Min;Chung, Chul-Ho;Jung, Yun-Ho;Kim, Jae-Seok
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.419-426
    • /
    • 2014
  • This paper proposes a high-throughput encoding process and encoder architecture for quasi-cyclic low-density parity-check codes in IEEE 802.11ac standard. In order to achieve the high throughput with low complexity, a partially parallel processing based encoding process and encoder architecture are proposed. Forward and backward accumulations are performed in one clock cycle to increase the encoding throughput. A low complexity cyclic shifter is also proposed to minimize the hardware overhead of combinational logic in the encoder architecture. In IEEE 802.11ac systems, the proposed encoder is rate compatible to support various code rates and codeword block lengths. The proposed encoder is implemented with 130-nm CMOS technology. For (1944, 1620) irregular code, 7.7 Gbps throughput is achieved at 100 MHz clock frequency. The gate count of the proposed encoder core is about 96 K.

Efficient Implementation of a Pseudorandom Sequence Generator for High-Speed Data Communications

  • Hwang, Soo-Yun;Park, Gi-Yoon;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • v.32 no.2
    • /
    • pp.222-229
    • /
    • 2010
  • A conventional pseudorandom sequence generator creates only 1 bit of data per clock cycle. Therefore, it may cause a delay in data communications. In this paper, we propose an efficient implementation method for a pseudorandom sequence generator with parallel outputs. By virtue of the simple matrix multiplications, we derive a well-organized recursive formula and realize a pseudorandom sequence generator with multiple outputs. Experimental results show that, although the total area of the proposed scheme is 3% to 13% larger than that of the existing scheme, our parallel architecture improves the throughput by 2, 4, and 6 times compared with the existing scheme based on a single output. In addition, we apply our approach to a $2{\times}2$ multiple input/multiple output (MIMO) detector targeting the 3rd Generation Partnership Project Long Term Evolution (3GPP LTE) system. Therefore, the throughput of the MIMO detector is significantly enhanced by parallel processing of data communications.

Parallel Genetic Algorithm based on a Multiprocessor System FIN and Its Application to a Classifier Machine

  • 한명묵
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.5
    • /
    • pp.61-71
    • /
    • 1998
  • Genetic Algorithm(GA) is a method of approaching optimization problems by modeling and simulating the biological evolution. GA needs large time-consuming, so ti had better do on a parallel computer architecture. Our proposed system has a VLSI-oriented interconnection network, which is constructed from a viewpoint of fractal geometry, so that self-similarity is considered in its configuration. The approach to Parallel Genetic Algorithm(PGA) on our proposed system is explained, and then, we construct the classifier system such that the set of samples is classified into weveral classes based on the features of each sample. In the process of designing the classifier system, We have applied PGA to the Traveling Salesman Problem and classified the sample set in the Euclidean space into several categories with a measure of the distance.

  • PDF

Implementation of a 'Rasterization based on Vector Algorithm' suited for a Multi-thread Shader architecture (Multi-Thread 쉐이더 구조에 적합한 Vector 기반의 Rasterization 알고리즘의 구현)

  • Lee, Ju-Suk;Kim, Woo-Young;Lee, Bo-Haeng;Lee, Kwang-Yeob
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.10
    • /
    • pp.46-52
    • /
    • 2009
  • A Multi-Core/Multi-Thread architecture is adopted for the Shader processor to enhance the processing performance. The Shader processor is designed to utilize its processing core IP for multiple purposes, such as Vertex-Shading, Rasterization, Pixel-Shading, etc. In this paper, we propose a 'Rasterization based on Vector Algorithm' that makes parallel pixels processing possible with Multi-Core and Multi-Thread architecture on the Shader Core. The proposed algorithm takes only 2% operation counts of the Scan-Line Algorithm and processes pixels independently.

A Multi-Scale Parallel Convolutional Neural Network Based Intelligent Human Identification Using Face Information

  • Li, Chen;Liang, Mengti;Song, Wei;Xiao, Ke
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1494-1507
    • /
    • 2018
  • Intelligent human identification using face information has been the research hotspot ranging from Internet of Things (IoT) application, intelligent self-service bank, intelligent surveillance to public safety and intelligent access control. Since 2D face images are usually captured from a long distance in an unconstrained environment, to fully exploit this advantage and make human recognition appropriate for wider intelligent applications with higher security and convenience, the key difficulties here include gray scale change caused by illumination variance, occlusion caused by glasses, hair or scarf, self-occlusion and deformation caused by pose or expression variation. To conquer these, many solutions have been proposed. However, most of them only improve recognition performance under one influence factor, which still cannot meet the real face recognition scenario. In this paper we propose a multi-scale parallel convolutional neural network architecture to extract deep robust facial features with high discriminative ability. Abundant experiments are conducted on CMU-PIE, extended FERET and AR database. And the experiment results show that the proposed algorithm exhibits excellent discriminative ability compared with other existing algorithms.

A Software VIA based PC Cluster System on SCI Network (SCI 네트워크 상의 소프트웨어 VIA기반 PC글러스터 시스템)

  • Shin, Jeong-Hee;Chung, Sang-Hwa;Park, Se-Jin
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.4
    • /
    • pp.192-200
    • /
    • 2002
  • The performance of a PC cluster system is limited by the use of traditional communication protocols, such as TCP/IP because these protocols are accompanied with significant software overheads. To overcome the problem, systems based on user-level interface for message passing without intervention of kernel have been developed. The VIA(Virtual Interface Architecture) is one of the representative user-level interfaces which provide low latency and high bandwidth. In this paper, a VIA system is implemented on an SCI(Scalable Coherent Interface) network based PC cluster. The system provides both message-passing and shared-memory programming environments and shows the maximum bandwidth of 84MB/s and the latency of $8{\mu}s$. The system also shows better performance in comparison with other comparable computer systems in carrying out parallel benchmark programs.

Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation (대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현)

  • 김종문;송윤선;김명원
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.2
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

Efficient Process Network Implementation of Ray-Tracing Application on Heterogeneous Multi-Core Systems

  • Jung, Hyeonseok;Yang, Hoeseok
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.289-293
    • /
    • 2016
  • As more mobile devices are equipped with multi-core CPUs and are required to execute many compute-intensive multimedia applications, it is important to optimize the systems, considering the underlying parallel hardware architecture. In this paper, we implement and optimize ray-tracing application tailored to a given mobile computing platform with multiple heterogeneous processing elements. In this paper, a lightweight ray-tracing application is specified and implemented in Kahn process network (KPN) model-of-computation, which is known to be suitable for the description of real-time applications. We take an open-source C/C++ implementation of ray-tracing and adapt it to KPN description in the Distributed Application Layer framework. Then, several possible configurations are evaluated in the target mobile computing platform (Exynos 5422), where eight heterogeneous ARM cores are integrated. We derive the optimal degree of parallelism and a suitable distribution of the replicated tasks tailored to the target architecture.

Heterogeneous Parallel Architecture for Face Detection Enhancement

  • Albssami, Aishah;Sharaf, Sanaa
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.193-198
    • /
    • 2022
  • Face Detection is one of the most important aspects of image processing, it considers a time-consuming problem in real-time applications such as surveillance systems, face recognition systems, attendance system and many. At present, commodity hardware is getting more and more heterogeneity in terms of architectures such as GPU and MIC co-processors. Utilizing those co-processors along with the existing traditional CPUs gives the algorithm a better chance to make use of both architectures to achieve faster implementations. This paper presents a hybrid implementation of the face detection based on the local binary pattern (LBP) algorithm that is deployed on both traditional CPU and MIC co-processor to enhance the speed of the LBP algorithm. The experimental results show that the proposed implementation achieved improvement in speed by 3X when compared to a single architecture individually.

Recognition of the 3-D motion of a human arm with HIGIPS

  • Yao, Feng-Hui;Tamaki, Akikazu;Kato, Kiyoshi
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1991.10b
    • /
    • pp.1724-1729
    • /
    • 1991
  • This paper gives an overview of HIGIPS design concepts and prototype HIGIPS configuration, and discusses its application to recognition of the 3-D motion of a human arm. HIGIPS which employs the combination of pipeline architecture and multiprocessor architecture, is a high-speed, high-performance and low cost N * M multimicroprocessor parallel machine, where N is the number of pipeline stages and M is the number of processors in each stage. The algorithm to recognize the motion of a human arm with a single TV camera was developed on personal computer (NEC PC9801 series). As a constraint condition, some simple ring marks are used. Each joint of the arm is attached with a ring mark to obtain its centroid position when the arm moves. These centroid positions in the three-dimensional space are linked at each of the successive pictures of the moving arm to recover its overall motion. This algorithm takes about 2 seconds to process one image frame on the general-purpose personal computer. This paper mainly discuses how to partition this algorithm and execute on HIGIPS, and shows the speed up. From this application, it is clear that HIGIPS is an efficient machine for image processing and recognizing.

  • PDF