• Title/Summary/Keyword: Parallel Process

Search Result 1,455, Processing Time 0.048 seconds

Design of Contention Free Parallel MAP Decode Module (메모리 경합이 없는 병렬 MAP 복호 모듈 설계)

  • Chung, Jae-Hun;Rim, Chong-Suck
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.1
    • /
    • pp.39-49
    • /
    • 2011
  • Turbo code needs long decoding time because of iterative decoding. To communicate with high speed, we have to shorten decoding time and it is possible with parallel process. But memory contention can cause from parallel process, and it reduces performance of decoder. To avoid memory contention, QPP interleaver is proposed in 2006. In this paper, we propose MDF method which is fit to QPP interleaver, and has relatively short decoding time and reduced logic. And introduce the design of MAP decode module using MDF method. Designed decoder is targetted to FPGA of Xilinx, and its throughput is 80Mbps maximum.

An Optimized Approach of Fault Distribution for Debugging in Parallel

  • Srivasatav, Maneesha;Singh, Yogesh;Chauhan, Durg Singh
    • Journal of Information Processing Systems
    • /
    • v.6 no.4
    • /
    • pp.537-552
    • /
    • 2010
  • Software Debugging is the most time consuming and costly process in the software development process. Many techniques have been proposed to isolate different faults in a program thereby creating separate sets of failing program statements. Debugging in parallel is a technique which proposes distribution of a single faulty program segment into many fault focused program slices to be debugged simultaneously by multiple debuggers. In this paper we propose a new technique called Faulty Slice Distribution (FSD) to make parallel debugging more efficient by measuring the time and labor associated with a slice. Using this measure we then distribute these faulty slices evenly among debuggers. For this we propose an algorithm that estimates an optimized group of faulty slices using as a parameter the priority assigned to each slice as computed by value of their complexity. This helps in the efficient merging of two or more slices for distribution among debuggers so that debugging can be performed in parallel. To validate the effectiveness of this proposed technique we explain the process using example.

The Design of Parallel Processing S/W Using CUDA for Realtime 3D Laser Ladar Imaging System (실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계)

  • Cho, Yong Il;Ha, Choong Lim;Yang, Ji Hyeon;Kim, Jae Hyup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, we propose a CUDA(Common Unified Device Architecture) based SW(software) design method for CPU(Central Processing Unit) and GPU(Graphic Processing Unit) parallel structure to implement real-time process in 3D Laser ladar(LADAR) imaging system. LADAR is a complex system to generate 3-dimensional image based on the laser ranging information, and requires massive process resources in each phase. Therefore, designing and implementing parallel structure are crucial to realize a real-time process within limited system resource. As a conclusion, we can meet the speed of required real-time process allocating separable work load to CUDA GPU by analyzing process algorithm in each phase and confirm the process speed increase by 46%.

Customer Order Scheduling Problem on Parallel Machines with Identical Order Size

  • Yang, Jae-Hwan
    • Management Science and Financial Engineering
    • /
    • v.13 no.2
    • /
    • pp.47-77
    • /
    • 2007
  • This paper considers a scheduling problem where a customer orders multiple products(jobs) from a production facility. The objective is to minimize the sum of the order(batch) completion times. While a machine can process only one job at a time, multiple machines can simultaneously process jobs in a batch. Although each job has a unique processing time, we consider the case where batch processing times are identical. This simplification allows us to develop heuristics with improved performance bounds. This problem was motivated by a real world problem encountered by foreign electronics manufacturers. We first establish the complexity of the problem. For the two parallel machine case, we introduce two simple but intuitive heuristics, and find their worst case relative error bounds. One bound is tight and the other bound goes to 1 as the number of orders goes to infinity. However, neither heuristic is superior for all instances. We extend one of the heuristics to an arbitrary number of parallel machines. For a fixed number of parallel machines, we find a worst case bound which goes to 1 as the number of orders goes to infinity. Then, a tighter bound is found for the three parallel machine case. Finally, the heuristics are empirically evaluated.

Parallel BCH Encoding/decoding Method and VLSI Design for Nonvolatile Memory (비휘발성 메모리를 위한 병렬 BCH 인코딩/디코딩 방법 및 VLSI 설계)

  • Lee, Sang-Hyuk;Baek, Kwang-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.41-47
    • /
    • 2010
  • This paper has proposed parallel BCH, one of error correction coding methods which has been used to NAND flash memory for SSD(solid state disk). To alter error correction capability, the proposed design improved reliability on data block has higher error rate as used frequency increasingly. Decoding parallel process bit width is as two times as encoding parallel process bit width, that could reduce decoding processing time, accordingly resulting in one half reduction over conventional ECC.

A Study on the Effect of Nanofluids Flow Direction in Double Pipe (이중관 내부 나노유체의 유동방향 영향에 관한 연구)

  • Choi, Hoon-Ki;Lim, Yun-Seung
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.20 no.6
    • /
    • pp.82-91
    • /
    • 2021
  • We compared the heat transfer characteristics of the parallel and the counterflow flow in the concentric double tube of the Al2O3/water nanofluids using numerical methods. The high- and low-temperature fluids flow through the inner circular tube and the annular tube, respectively. The heat transfer characteristics according to the flow direction were compared by changing the volume flow rate and the volume concentration of the nanoparticles. The results showed that the heat transfer rate and overall heat transfer coefficient improved compared to those of basic fluid with increasing the volume and flow rate of nanoparticles. When the inflow rate was small, the heat transfer performance of the counterflow was about 22% better than the parallel flow. As the inflow rate was increased, the parallel flow and the counterflow had similar heat transfer rates. In addition, the effectiveness of the counterflow increased from 10% to 22% rather than the parallel flow. However, we verified that the increment in the friction factor of the counterflow is not large compared to the increment in the heat transfer rate.

A Study on the Implementation of GPSS Program on a Parallel Computer (GPSS 프로그램의 병렬화에 관한 연구)

  • 윤정미
    • Journal of the Korea Society for Simulation
    • /
    • v.8 no.2
    • /
    • pp.57-72
    • /
    • 1999
  • With the rapidly increasing complexity of decision-marking or system development in the fields of industry, management, etc., modelling techniques using simulation has become more highlighted. Particularly, the advent of parallel computer systems not only has opened a new horizon of parallel simulation, but also has greatly contributed to the speed-up of the execution of simulation. The implementation of parallel simulation, however, is not a easy job for those who accustomed to the existing computer systems. And it is also necessarily confronted with the problem of synchronization conflict in the process. Thus, how to allow a wider community of users to gain access to parallel simulation while solving synchronization conflicts has become an important issue in simulation study. As a method to solve these problems, this paper is primarily concerned with the implementation of GPSS which is a generally used simulation language for discrete event simulation, onto a parallel computer using C-LINDA. For that, this paper, is to suggest a model and algorithm and to experiment it using a case.

  • PDF

A study on the advanced RFID system using the parallel cyclic redundancy check (병렬 순환 잉여 검사를 이용한 발전된 무선인식 시스템에 관한 연구)

  • Kang Tai-Kyu;Yoon Sang-Mun;Shin Seok-kyun;Kang Min-Soo;Lee Key-Sea
    • Proceedings of the KSR Conference
    • /
    • 2004.10a
    • /
    • pp.1235-1240
    • /
    • 2004
  • This paper has presented the parallel cyclic redundancy check (CRC) technique that performs CRC computation in parallel superior to the conventional CRC technique that processes data bits serially. Also, it has showed that the implemented parallel CRC circuit had been successfully applied to the inductively coupled passive RFID system working at a frequency of 13.56MHz in order to process the detection of logical faults more fast and the system had been verified experimentally. In comparison with previous works, the proposed RFID system using the parallel CRC technique has been shown to reduce the latency and increase the data processing rates in the results. Therefore, it seems reasonable to conclude that the parallel CRC realization in the RFID system offers a means of maintaining the integrity of data in the high speed RFID system.

  • PDF

The Implementation of Fast Object Recognition Using Parallel Processing on CPU and GPU (CPU와 GPU의 병렬 처리를 이용한 고속 물체 인식 알고리즘 구현)

  • Kim, Jun-Chul;Jung, Young-Han;Park, Eun-Soo;Cui, Xue-Nan;Kim, Hak-Il;Huh, Uk-Youl
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.5
    • /
    • pp.488-495
    • /
    • 2009
  • This paper presents a fast feature extraction method for autonomous mobile robots utilizing parallel processing and based on OpenMP, SSE (Streaming SIMD Extension) and CUDA programming. In the first step on CPU version, the algorithms and codes are optimized and then implemented by parallel processing. The parallel algorithms are debugged to maintain the same level of performance and the process for extracting key points and obtaining dominant orientation with respect to key points is parallelized. After extraction, a parallel descriptor via SSE instructions is constructed. And the GPU version also implemented by parallel processing using CUDA based on the SIFT. The GPU-Parallel descriptor achieves an acceleration up to five times compared with the CPU-Parallel descriptor, but it shows the lower performance than CPU version. CPU version also speed-up the four and half times compared with the original SIFT while maintaining robust performance.

Optimized Implementation of PIPO Lightweight Block Cipher on 32-bit RISC-V Processor (32-bit RISC-V상에서의 PIPO 경량 블록암호 최적화 구현)

  • Eum, Si Woo;Jang, Kyung Bae;Song, Gyeong Ju;Lee, Min Woo;Seo, Hwa Jeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.6
    • /
    • pp.167-174
    • /
    • 2022
  • PIPO lightweight block ciphers were announced in ICISC'20. In this paper, a single-block optimization implementation and parallel optimization implementation of PIPO lightweight block cipher ECB, CBC, and CTR operation modes are performed on a 32-bit RISC-V processor. A single block implementation proposes an efficient 8-bit unit of Rlayer function implementation on a 32-bit register. In a parallel implementation, internal alignment of registers for parallel implementation is performed, and a method for four different blocks to perform Rlayer function operations on one register is described. In addition, since it is difficult to apply the parallel implementation technique to the encryption process in the parallel implementation of the CBC operation mode, it is proposed to apply the parallel implementation technique in the decryption process. In parallel implementation of the CTR operation mode, an extended initialization vector is used to propose a register internal alignment omission technique. This paper shows that the parallel implementation technique is applicable to several block cipher operation modes. As a result, it is confirmed that the performance improvement is 1.7 times in a single-block implementation and 1.89 times in a parallel implementation compared to the performance of the existing research implementation that includes the key schedule process in the ECB operation mode.