• Title/Summary/Keyword: Eight-parallel

Search Result 121, Processing Time 0.027 seconds

Design of an Image Processing ASIC Architecture using Parallel Approach with Zero or Little (통신부담을 감소시킨 영상처리를 위한 병렬처리 방식 ASIC구조 설계)

  • 안병덕;정지원;선우명훈
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.10
    • /
    • pp.2043-2052
    • /
    • 1994
  • This paper proposes a new parallel ASIC architecture for real-time image processing to reduce inter-processing element (inter-PE) communication overhead, called a Sliding Memory Plane (SliM) Image Processor. The Slim Image Processor consists of $3\times3$ processing elements (PEs) connected by a mesh topology. With easy scalability due to the topology. a set of SliM Image Processors can form a mesh-connected SIMD parallel architecture. called the SliM Array Processor. The idea of sliding means that all pixels are slided into all neighboring PEs without interrupting PEs and without a coprocessor or a DMA controller. Since the inter-PE communication and computation occur simultaneously. the inter-PE communication overhead, significant disadvantage of existing machines greatly diminishes. Two I/O planes provide a buffering capability and reduce the date I/O overhead. In addition, using the by-passing path provides eight-way connectivity even with four links. with these salient features. SliM shows a significant performance improvement. This paper presents architectures of a PE and the SliM Image Processor, and describes the design of an instruction set.

  • PDF

Acceleration of Intrusion Detection for Multi-core Video Surveillance Systems (멀티 코어 프로세서 기반의 영상 감시 시스템을 위한 침입 탐지 처리의 가속화)

  • Lee, Gil-Beom;Jung, Sang-Jin;Kim, Tae-Hwan;Lee, Myeong-Jin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.141-149
    • /
    • 2013
  • This paper presents a high-speed intrusion detection process for multi-core video surveillance systems. The high-speed intrusion detection was designed to a parallel process. Based on the analysis of the conventional process, a parallel intrusion detection process was proposed so as to be accelerated by utilizing multiple processing cores in contemporary computing systems. The proposed process performs the intrusion detection in a per-frame parallel manner, considering the data dependency between frames. The proposed process was validated by implementing a multi-threaded intrusion detection program. For the system having eight processing cores, the detection speed of the proposed program is higher than that of the conventional one by up to 353.76% in terms of the frame rate.

FAST Design for Large-Scale Satellite Image Processing (대용량 위성영상 처리를 위한 FAST 시스템 설계)

  • Lee, Youngrim;Park, Wanyong;Park, Hyunchun;Shin, Daesik
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.25 no.4
    • /
    • pp.372-380
    • /
    • 2022
  • This study proposes a distributed parallel processing system, called the Fast Analysis System for remote sensing daTa(FAST), for large-scale satellite image processing and analysis. FAST is a system that designs jobs in vertices and sequences, and distributes and processes them simultaneously. FAST manages data based on the Hadoop Distributed File System, controls entire jobs based on Apache Spark, and performs tasks in parallel in multiple slave nodes based on a docker container design. FAST enables the high-performance processing of progressively accumulated large-volume satellite images. Because the unit task is performed based on Docker, it is possible to reuse existing source codes for designing and implementing unit tasks. Additionally, the system is robust against software/hardware faults. To prove the capability of the proposed system, we performed an experiment to generate the original satellite images as ortho-images, which is a pre-processing step for all image analyses. In the experiment, when FAST was configured with eight slave nodes, it was found that the processing of a satellite image took less than 30 sec. Through these results, we proved the suitability and practical applicability of the FAST design.

Comparison of Parallel and Fan-Beam Monochromatic X-Ray CT Using Synchrotron Radiation

  • Toyofuku, Fukai;Tokumori, Kenji;Kanda, Shigenobu;Ohki, Masafumi;Higashida, Yoshiharu;Hyodo, Kazuyuki;Ando, Masami;Uyama, Chikao
    • Proceedings of the Korean Society of Medical Physics Conference
    • /
    • 2002.09a
    • /
    • pp.407-410
    • /
    • 2002
  • Monochromatic x-ray CT has several advantages over conventional CT, which utilizes bremsstrahlung white x-rays from an x-ray tube. There are several methods to produce such monochromatic x-rays. The most popular one is crystal diffraction monochromatization, which has been commonly used because of the fact that the energy spread is very narrow and the energy can be changed continuously. The alternative method is the use of fluorescent x-ray, which has several advantages such as large beam size and fast energy change. We have developed a parallel-beam and a fan-beam monochromatic x-ray CT, and compared some characteristics such as accuracy of CT numbers between those systems. The fan beam monochromatic x-rays were generated by irradiating target materials by incident white x-rays from a bending magnet beam line NE5 in 6.5 GeV Accumulation Ring at Tukuba. The parallel beam monochromatic x-rays were generated by using a silicon double crystal monochromator at the bending magnet beam line BL-20BM in Spring-8. A Cadmium telluride (CdTe) 256 channel array detector with 512mm sensitive width capable of operating at room temperature was used in the photon counting mode. A cylindrical phantom containing eight concentrations of gadolinium was used for the fan beam monochromatic x-ray CT system, while a phantom containing acetone, ethanol, acrylic and water was used for the parallel monochromatic x-ray CT system. The linear attenuation coefficients obtained from CT numbers of those monochromatic x-ray CT images were compared with theoretical values. They showed a good agreement within 3%. It was found that the quantitative measurement can be possible by using the fan beam monochromatic x-ray CT system as well as a parallel beam monochromatic X-ray CT system.

  • PDF

High-throughput Low-complexity Mixed-radix FFT Processor using a Dual-path Shared Complex Constant Multiplier

  • Nguyen, Tram Thi Bao;Lee, Hanho
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.1
    • /
    • pp.101-109
    • /
    • 2017
  • This paper presents a high-throughput low-complexity 512-point eight-parallel mixed-radix multipath delay feedback (MDF) fast Fourier transform (FFT) processor architecture for orthogonal frequency division multiplexing (OFDM) applications. To decrease the number of twiddle factor (TF) multiplications, a mixed-radix $2^4/2^3$ FFT algorithm is adopted. Moreover, a dual-path shared canonical signed digit (CSD) complex constant multiplier using a multi-layer scheme is proposed for reducing the hardware complexity of the TF multiplication. The proposed FFT processor is implemented using TSMC 90-nm CMOS technology. The synthesis results demonstrate that the proposed FFT processor can lead to a 16% reduction in hardware complexity and higher throughput compared to conventional architectures.

A Study on the Design of Modified Banyan Switch for High Speed Communication network (고속 통신망을 위한 개선된 반얀 스위치 설계에 관한 연구)

  • 조삼호;권승탁;김용석
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.122-125
    • /
    • 1999
  • In this paper, we propose a new architecture of the Banyan switch for a high speed networking and the high speed parallel computer. The proposed switching network with a remodeled architecture is a newly modified Banyan network with eight input and output pots, respectively. We have analysed the maximum throughput of the revised switch. Our analyses has shown that under the uniform random traffic load, the FIFO discipline is limited to 70%. Therefore the result of the analyses shows that the results of the networking simulation with the new switch are feasible and if we adopt such as new architecture of the revised model of the Banyan switch, the hardware complexity can be reduced. The FIFO discipline has increased about 11% when we compare the switching system with the input buffer system. We have designed and verified the new switching system in VHDL.

  • PDF

Efficient Process Network Implementation of Ray-Tracing Application on Heterogeneous Multi-Core Systems

  • Jung, Hyeonseok;Yang, Hoeseok
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.289-293
    • /
    • 2016
  • As more mobile devices are equipped with multi-core CPUs and are required to execute many compute-intensive multimedia applications, it is important to optimize the systems, considering the underlying parallel hardware architecture. In this paper, we implement and optimize ray-tracing application tailored to a given mobile computing platform with multiple heterogeneous processing elements. In this paper, a lightweight ray-tracing application is specified and implemented in Kahn process network (KPN) model-of-computation, which is known to be suitable for the description of real-time applications. We take an open-source C/C++ implementation of ray-tracing and adapt it to KPN description in the Distributed Application Layer framework. Then, several possible configurations are evaluated in the target mobile computing platform (Exynos 5422), where eight heterogeneous ARM cores are integrated. We derive the optimal degree of parallelism and a suitable distribution of the replicated tasks tailored to the target architecture.

Control Gain Tuning of a Simultaneous Multi-Axis PID Control System by Taguchi Method (다구찌방법을 이용한 다축 동시 PID 제어시스템의 제어이득 조정)

  • Lee, Ki-Ha;Kim, Jong-Won
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.16 no.6
    • /
    • pp.25-35
    • /
    • 1999
  • This paper presents a control gain tuning scheme for multi-axis PID control systems by Taguchi method. As an experimental set-up, a parallel mechanism machine tool has been selected. This machine has eight servodrives and each servodrive has four control gains, respectively. Therefore, total 32 control gains have to be tuned. Through a series of design of experiments, an optimal and robust set of PID control gains is tuned. The index of the sum of position error and velocity error is reduced to 61.4% after the experimental gain tuning regardless of the feedrate variation.

  • PDF

Design of Modified Banyan Switch for High Speed Communication Network

  • Kwon, Seung-Tag;Sam-Ho cho
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.537-540
    • /
    • 2000
  • In this paper, we propose and design new architecture of the modified Banyan switch for a high speed networking and the high speed parallel computer. The proposed switching network with a remodeled architecture is a newly modified Banyan network with eight input and output ports. The switch scheme is that two packets may arrive on different inputs destined for the same output. We have analyzed the maximum throughput of the revised switch. The result of the analyses shows good agreement simulation and if we adopt such architecture of the revised model of the Banyan switch, the hardware complexity can be reduced. The FIFO discipline has increased about lloio when we compare the switching system with the input buffer system. We have designed and verified the switching system in VHDL.

  • PDF

Efficient Implementation of CG and CR Methods for Linear Systems on a Single Processing Node of the HITACHI SR8000

  • Nishimura, S.;Takahashi, D.;Shigehara, T.;Mizoguchi, H.;Mishima, T.
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.298-301
    • /
    • 2000
  • We discuss the iterative methods for linear systems on a single processing node of the HITACHI SR8000. Each processing node of the SR8000 is a shared memory parallel computer which is composed of eight RISC processors with a pseudo-vector facility. We implement highly optimized codes for basic linear operations including a matrix-vector product and apply them to the conjugate gradient (CG) and the conjugate residual (CR) methods for linear systems. Our tuned codes for both method score nearly 50% of the theoretical peak performance, which is the best in the sense that it corresponds to an asymptotic performance of the inner product.

  • PDF