• Title/Summary/Keyword: Parallel Computing and Communications

Search Result 44, Processing Time 0.02 seconds

A Parallel Implementation of JPEG2000 4K Ultra High Definition Image using OpenCL (OpenCL을 이용한 JPEG2000 4K 초고화질 영상처리의 병렬고속화 구현)

  • Park, Daeseung;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.10 no.1
    • /
    • pp.1-5
    • /
    • 2015
  • With the help of fast growing multimedia technology and high preference for users of large screens, the newest video coding standard, HEVC (High Efficiency Video Coding) high-quality video compression), has been introduced. Therefore, the high definition image services which are four times more clear than conventional HD video, are getting popular. JPEG 2000 also has stated to support 4K and 8K UHD. As a result, it requires fast processing technology to read and write UHD images. This paper introduces a study on fast parallel processing technology for UHD images. For this purpose, first, JPEG 2000 is reviewed and a GPU based parallel implementation is proposed for a preprocessing of color conversion stage. The parallelled algorithm is implemented with OpenCL (Open Computing Language). The simulation results show that the proposed method shows 5 times performance improvements on processing speed for 4K UHD over the method using threads.

Design of a scalable general-purpose parallel associative processor using content-addressable memory (Content-Addressable Memory를 이용한 확장 가능한 범용 병렬 Associative Processor 설계)

  • Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.51-59
    • /
    • 2006
  • Von Neumann architecture suffers from the interface between the central processing unit and the memory, which is called 'Von Neumann bottleneck' In this paper, we propose a scalable general-purpose associative processor (AP) based on content-addressable memory (CAM) which solves this problem and is suitable for the search-oriented applications. We propose an efficient instruction set and a structural scalability to extend for larger applications. We define twelve instructions and provide some reduced instructions to speed up which execute two instructions in a single instruction cycle. The proposed AP performs in a bit-serial, word-parallel fashion and can be considered as a 32-bit general-purpose parallel processor with a massively parallel SIMD structure. We design and simulate a maximum/minumum search greater-than/less-than search, and parallel addition to verify the proposed architecture. The algorithms are executed in a constant time O(k) regardless of the number of input data.

A Hierarchical Server Structure for Parallel Location Information Search of Mobile Hosts (이동 호스트의 병렬적 위치 정보 탐색을 위한 서버의 계층 구조)

  • Jeong, Gwang-Sik;Yu, Heon-Chang;Hwang, Jong-Seon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.1_2
    • /
    • pp.80-89
    • /
    • 2001
  • The development in the mobile computing systems have arisen new and previously unforeseen problems, such as problems in information management of mobile host, disconnection of mobile host and low bandwidths of wireless communications. Especially, location information management strategy of mobile host results in an increased overhead in mobile computing systems. Due to the mobility of the mobiles host, the changes in the mobile host's address depends on the mobile host's location, and is maintained by mapping physical address on virtual address, Since previously suggested several strategies for mapping method between physical address and virtual address did not tackle the increase of mobile host and distribution of location information, it was not able to support the scalability in mobile computing systems. Thus, to distribute the location inrormation, we propose an advanced n-depth LiST (Location information Search Tree) and the parallel location search and update strategy based on the advanced n-depth LiST. The advanced n-depth LiST is logically a hierarchical structure that clusters the location information server by ring structure and reduces the location information search and update cost by parallel seatch and updated method. The experiment shows that even though the distance of two MHs that communicate with each other is large, due to the strnctural distribution of location information, advanced n-depth LiST results in good performance. Moreover, despite the reduction in the location information search cost, there was no increase in the location information update cost.

  • PDF

A Performance Evaluation of Parallel Color Conversion based on the Thread Number on Multi-core Systems (멀티코어 시스템에서 쓰레드 수에 따른 병렬 색변환 성능 검증)

  • Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.9 no.4
    • /
    • pp.73-76
    • /
    • 2014
  • With the increasing popularity of multi-core processors, they have been adopted even in embedded systems. Under this circumstance many multimedia applications can be parallelized on multi-core platforms because they usually require heavy computations and extensive memory accesses. This paper proposes an efficient thread-level parallel implementation for color space conversion on multi-core CPU. Thread-level parallelism has been becoming very useful parallel processing paradigm especially on shared memory computing systems. In this work, it is exploited by allocating different input pixels to each thread for concurrent loop executions. For the performance evaluation, this paper evaluate the performace improvements for color conversion on multi-core processors based on the processing speed comparison between its serial implementation and parallel ones. The results shows that thread-level parallel implementations show the overall similar ratios of performance improvements regardless of different multi-cores.

An efficient parallel solution algorithm on the linear second-order partial differential equations with large sparse matrix being based on the block cyclic reduction technique (Block Cyclic Reduction 기법에 의한 대형 Sparse Matrix 선형 2계편미분방정식의 효율적인 병렬 해 알고리즘)

  • 이병홍;김정선
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.15 no.7
    • /
    • pp.553-564
    • /
    • 1990
  • The co-efficient matrix of linear second-order partial differential equations in the general form is partitioned with (n-1)x(n-1) submartices and is transformed into the block tridiagonal system. Then the cyclic odd-even reduction technique is applied to this system with the large-grain data granularity and the block cyclic reduction algorithm to solve unknown vectors of this system is created. But this block cyclic reduction technique is not suitable for the parallel processing system because of its parallelism chanigng at every computing stages. So a new algorithm for solving linear second-order partical differential equations is presentes by the block cyclic reduction technique which is modified in order to keep its parallelism constant, and to reduce gteatly its execution time. Both of these algoriths are compared and studied.

  • PDF

A Task Scheduling Scheme for Bus-Based Symmetric Multiprocessor Systems (버스 기반의 대칭형 다중프로세서 시스템을 위한 태스크 스케줄링 기법)

  • Kang, Oh-Han;Kim, Si-Gwan
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.511-518
    • /
    • 2002
  • Symmetric Multiprocessors (SMP) has emerged as an important and cost-effective platform for high performance parallel computing. Scheduling of parallel tasks and communications of SMP is important because the choice of a scheduling discipline can have a significant impact on the performance of the system. In this paper, we present a task duplication based scheduling scheme for bus-based SMP. The proposed scheme pre-allocates network communication resources so as to avoid potential communication conflicts. The performance of the proposed scheme has been observed by comparing the schedule length under various number of processors and the communication cost.

Load Balancing for Parallel Finite Element Analysis in Computing GRID Environment (컴퓨팅 그리드 시스템에서의 병렬 유한요소 해석을 위한 로드 밸런싱)

  • Lee,Chang-Seong;Im,Sang-Yeong;Kim,Seung-Jo;Jo,Geum-Won
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.31 no.10
    • /
    • pp.1-9
    • /
    • 2003
  • In GRID environments, an efficient load balancing algorithm should be adopted since the system performances of GRID system are not homogeneous. In this work, a new two-step mesh-partitioning scheme based on the graph-partitioning scheme was introduced to consider the difference of system performance. In the two-step mesh-partitioning scheme, the system performance weights were calculated to reflect the effect of heterogeneous system performances and WEVM(Weighted Edge and vertex Method) was adopted to minimize the increase' of communications. Numerical experiments were carried out in multi-cluster environment and WAN (Wide Area Network) environment to investigate the effectiveness of the two-step mesh-partitioning scheme.

A New Multicarrier Multicode DS-CDMA Scheme for Time and Frequency Selective Fading Channels

  • Cao Yewen;Tjhung Tjeng Thiang;Ko Chi Chung
    • Journal of Communications and Networks
    • /
    • v.7 no.1
    • /
    • pp.13-20
    • /
    • 2005
  • In this paper, a new multi carrier, direct sequence code division multiple access (MC-DS-CDMA) system is proposed. Our new signal construction is based on convolutional encoding of the transmitted data, serial-to-parallel (S/P) conversion of the encoded data, Walsh-Hadamard-transformation (WHT), a second S/P conversion of the WHT outputs, spread spectrum (SS) modulation with a common pseudo-noise (PN) sequence, and then multicarrier transmission. The system bit error rate (BER) performance in frequency selective fading channel in the presence of additive white Gaussian noise (AWGN) and a jamming tone is analyzed and simulated. The numerical results are compared with those from an orthogonal MC-DS-CDMA system of Sourour and Nakagawa [7]. It is shown that the two systems have almost the same BER performance, but the proposed scheme has better anti-jamming ability.

Design Optimization of the Arithmatic Logic Unit Circuit for the Processor to Determine the Number of Errors in the Reed Solomon Decoder (리드솔로몬 복호기에서 오류갯수를 계산하는 처리기의 산술논리연산장치 회로 최적화설계)

  • An, Hyeong-Keon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.11C
    • /
    • pp.649-654
    • /
    • 2011
  • In this paper, we show new method to find number of errors in the Reed-Solomon decoder. New design is much faster and has much simpler logic circuit than the former design method. This optimization was possible by very simplified square calculating circuit and parallel processing. The microcontroller of this Reed Solomon decoder can be used for data protection of almost all digital communication and consumer electronic devices.

Accelerating Molecular Dynamics Simulation Using Graphics Processing Unit

  • Myung, Hun-Joo;Sakamaki, Ryuji;Oh, Kwang-Jin;Narumi, Tetsu;Yasuoka, Kenji;Lee, Sik
    • Bulletin of the Korean Chemical Society
    • /
    • v.31 no.12
    • /
    • pp.3639-3643
    • /
    • 2010
  • We have developed CUDA-enabled version of a general purpose molecular dynamics simulation code for GPU. Implementation details including parallelization scheme and performance optimization are described. Here we have focused on the non-bonded force calculation because it is most time consuming part in molecular dynamics simulation. Timing results using CUDA-enabled and CPU versions were obtained and compared for a biomolecular system containing 23558 atoms. CUDA-enabled versions were found to be faster than CPU version. This suggests that GPU could be a useful hardware for molecular dynamics simulation.