• Title/Summary/Keyword: Parallel Data Communication

Search Result 342, Processing Time 0.027 seconds

Design of Low-power Serial-to-Parallel and Parallel-to-Serial Converter using Current-cut method (전류 컷 기법을 적용한 저전력형 직병렬/병직렬 변환기 설계)

  • Park, Yong-Woon;Hwang, Sung-Ho;Cha, Jae-Sang;Yang, Chung-Mo;Kim, Sung-Kweon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.10A
    • /
    • pp.776-783
    • /
    • 2009
  • Current-cut circuit is an effective method to obtain low power consumption in wireless communication systems as high speed OFDM. For the operation of current-mode FFT LSI with analog signal processing essentially requires current-mode serial-to-parallel/parallel-to-serial converter with multi input and output structure. However, the Hold-mode operation of current-mode serial-to-parallel/parallel-to-serial converter has unnecessary power consumption. We propose a novel current-mode serial-to-parallel/parallel-to-serial converter with current-cut circuit and full chip simulation results agree with experimental data of low power consumption. The proposed current-mode serial-to-parallel/parallel-to-serial converter promise the wide application of the current-mode analog signal processing in the field of low power wireless communication LSI.

DEVS 형식론을 이용한 다중프로세서 운영체제의 모델링 및 성능평가

  • 홍준성
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1994.10a
    • /
    • pp.32-32
    • /
    • 1994
  • In this example, a message passing based multicomputer system with general interdonnedtion network is considered. After multicomputer systems are developed with morm-hole routing network, topologies of interconecting network are not major considertion for process management and resource sharing. Tehre is an independeent operating system kernel oneach node. It communicates with other kernels using message passingmechanism. Based on this architecture, the problem is how mech does performance degradation will occur in the case of processor sharing on multicomputer systems. Processor sharing between application programs is veryimprotant decision on system performance. In almost cases, application programs running on massively parallel computer systems are not so much user-interactive. Thus, the main performance index is system throughput. Each application program has various communication patterns. and the sharing of processors causes serious performance degradation in hte worst case such that one processor is shared by two processes and another processes are waiting the messages from those processes. As a result, considering this problem is improtant since it gives the reason whether the system allows processor sharingor not. Input data has many parameters in this simulation . It contains the number of threads per task , communication patterns between threads, data generation and also defects in random inupt data. Many parallel aplication programs has its specific communication patterns, and there are computation and communication phases. Therefore, this phase informatin cannot be obtained random input data. If we get trace data from some real applications. we can simulate the problem more realistic . On the other hand, simualtion results will be waseteful unless sufficient trace data with varisous communication patterns is gathered. In this project , random input data are used for simulation . Only controllable data are the number of threads of each task and mapping strategy. First, each task runs independently. After that , each task shres one and more processors with other tasks. As more processors are shared , there will be performance degradation . Form this degradation rate , we can know the overhead of processor sharing . Process scheduling policy can affects the results of simulation . For process scheduling, priority queue and FIFO queue are implemented to support round-robin scheduling and priority scheduling.

  • PDF

A Study on Distributed System Construction and Numerical Calculation Using Raspberry Pi

  • Ko, Young-ho;Heo, Gyu-Seong;Lee, Sang-Hyun
    • International journal of advanced smart convergence
    • /
    • v.8 no.4
    • /
    • pp.194-199
    • /
    • 2019
  • As the performance of the system increases, more parallelized data is being processed than single processing of data. Today's cpu structure has been developed to leverage multicore, and hence data processing methods are being developed to enable parallel processing. In recent years desktop cpu has increased multicore, data is growing exponentially, and there is also a growing need for data processing as artificial intelligence develops. This neural network of artificial intelligence consists of a matrix, making it advantageous for parallel processing. This paper aims to speed up the processing of the system by using raspberrypi to implement the cluster building and parallel processing system against the backdrop of the foregoing discussion. Raspberrypi is a credit card-sized single computer made by the raspberrypi Foundation in England, developed for education in schools and developing countries. It is cheap and easy to get the information you need because many people use it. Distributed processing systems should be supported by programs that connected multiple computers in parallel and operate on a built-in system. RaspberryPi is connected to switchhub, each connected raspberrypi communicates using the internal network, and internally implements parallel processing using the Message Passing Interface (MPI). Parallel processing programs can be programmed in python and can also use C or Fortran. The system was tested for parallel processing as a result of multiplying the two-dimensional arrangement of 10000 size by 0.1. Tests have shown a reduction in computational time and that parallelism can be reduced to the maximum number of cores in the system. The systems in this paper are manufactured on a Linux-based single computer and are thought to require testing on systems in different environments.

PARALLEL IMPROVEMENT IN STRUCTURED CHIMERA GRID ASSEMBLY FOR PC CLUSTER (PC 클러스터를 위한 정렬 중첩 격자의 병렬처리)

  • Kim, Eu-Gene;Kwon, Jang-Hyuk
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.157-162
    • /
    • 2005
  • Parallel implementation and performance assessment of the grid assembly in a structured chimera grid approach is studied. The grid assembly process, involving hole cutting and searching donor, is parallelized on the PC cluster. A message passing programming model based on the MPI library is implemented using the single program multiple data(SPMD) paradigm. The coarse-grained communication is optimized with the minimized memory allocation because that the parallel grid assembly can access the decomposed geometry data in other processors by only message passing in the distributed memory system such as a PC cluster. The grid assembly workload is based on the static load balancing tied to flow solver. A goal of this work is a development of parallelized grid assembly that is suited for handling multiple moving body problems with large grid size.

  • PDF

A Parallel Test Structure for eDRAM-based Tightly Coupled Memory in SoCs (시스템 온 칩 내 eDRAM을 사용한 Tightly Coupled Memory의 병렬 테스트 구조)

  • Kook, In-Sung;Lee, Jae-Min
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.4 no.3
    • /
    • pp.209-216
    • /
    • 2011
  • Recently the design of SoCs(System-on-Chips) in which TCM is embedded for high speed operation increases rapidly. In this paper, a parallel test structure for eDRAM-based TCM embedded in SoCs is proposed. In the presented technique, the MUT (Memory Under Test) is changed to parallel structure and it increases testability of MUT with boundary scan chains. The eDRAM is designed in structure for parallel test so that it can be tested for each modules. Dynamic test can be performed based on input-output data. The proposed techniques are verified their performance by circuits simulation.

A Container Orchestration System for Process Workloads

  • Jong-Sub Lee;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.4
    • /
    • pp.270-278
    • /
    • 2023
  • We propose a container orchestration system for process workloads that combines the potential of big data and machine learning technologies to integrate enterprise process-centric workloads. This proposed system analyzes big data generated from industrial automation to identify hidden patterns and build a machine learning prediction model. For each machine learning case, training data is loaded into a data store and preprocessed for model training. In the next step, you can use the training data to select and apply an appropriate model. Then evaluate the model using the following test data: This step is called model construction and can be performed in a deployment framework. Additionally, a visual hierarchy is constructed to display prediction results and facilitate big data analysis. In order to implement parallel computing of PCA in the proposed system, several virtual systems were implemented to build the cluster required for the big data cluster. The implementation for evaluation and analysis built the necessary clusters by creating multiple virtual machines in a big data cluster to implement parallel computation of PCA. The proposed system is modeled as layers of individual components that can be connected together. The advantage of a system is that components can be added, replaced, or reused without affecting the rest of the system.

A High-Speed 2-Parallel Radix-$2^4$ FFT Processor for MB-OFDM UWB Systems (MB-OFDM UWB 통신 시스템을 위한 고속 2-Parallel Radix-$2^4$ FFT 프로세서의 설계)

  • Lee, Jee-Sung;Lee, Han-Ho
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.533-534
    • /
    • 2006
  • This paper presents the architecture design of a high-speed, low-complexity 128-point radix-$2^4$ FFT processor for ultra-wideband (UWB) systems. The proposed high-speed, low-complexity FFT architecture can provide a higher throughput rate and low hardware complexity by using 2-parallel data-path scheme and single-path delay-feedback (SDF) structure. This paper presents the key ideas applied to the design of high-speed, low-complexity FFT processor, especially that for achieving high throughput rate and reducing hardware complexity. The proposed FFT processor has been designed and implemented with the 0.18-m CMOS technology in a supply voltage of 1.8 V. The throughput rate of proposed FFT processor is up to 1 Gsample/s while it requires much smaller hardware complexity.

  • PDF

Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation (대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현)

  • 김종문;송윤선;김명원
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.2
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

Novel Parallel Approach for SIFT Algorithm Implementation

  • Le, Tran Su;Lee, Jong-Soo
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.298-306
    • /
    • 2013
  • The scale invariant feature transform (SIFT) is an effective algorithm used in object recognition, panorama stitching, and image matching. However, due to its complexity, real-time processing is difficult to achieve with current software approaches. The increasing availability of parallel computers makes parallelizing these tasks an attractive approach. This paper proposes a novel parallel approach for SIFT algorithm implementation using a block filtering technique in a Gaussian convolution process on the SIMD Pixel Processor. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and input/output capabilities of the processor, which results in a system that can perform real-time image and video compression. We apply this implementation to images and measure the effectiveness of such an approach. Experimental simulation results indicate that the proposed method is capable of real-time applications, and the result of our parallel approach is outstanding in terms of the processing performance.

Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy (예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선)

  • Yang, Seiyang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.439-446
    • /
    • 2016
  • In this paper, an efficient prediction accuracy enhancement strategy is proposed for improving the performance of the prediction-based parallel event-driven gate-level timing simulation. The proposed new strategy adopts the static double prediction and the dynamic prediction for input and output values of local simulations. The double prediction utilizes another static prediction data for the secondary prediction once the first prediction fails, and the dynamic prediction tries to use the on-going simulation result accumulated dynamically during the actual parallel simulation execution as prediction data. Therefore, the communication overhead and synchronization overhead, which are the main bottleneck of parallel simulation, are maximally reduced. Throughout the proposed two prediction enhancement techniques, we have observed about 5x simulation performance improvement over the commercial parallel multi-core simulation for six test designs.