• Title/Summary/Keyword: Multi thread

Search Result 187, Processing Time 0.024 seconds

Multi-Threaded Parallel H.264/AVC Decoder for Multi-Core Systems (멀티코어 시스템을 위한 멀티스레드 H.264/AVC 병렬 디코더)

  • Kim, Won-Jin;Cho, Keol;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.11
    • /
    • pp.43-53
    • /
    • 2010
  • Wide deployment of high resolution video services leads to active studies on high speed video processing. Especially, prevalent employment of multi-core systems accelerates researches on high resolution video processing based on parallelization of multimedia software. In this paper, we propose a novel parallel H.264/AVC decoding scheme on a multi-core platform. Parallel H.264/AVC decoding is challenging not only because parallelization may incur significant synchronization overhead but also because software may have complicated dependencies. To overcome such issues, we propose a novel approach called Multi-Threaded Parallelization(MTP). In MTP, to reduce synchronization overhead, a separate thread is allocated to each stage in the pipeline. In addition, an efficient memory reuse technique is used to reduce the memory requirement. To verify the effectiveness of the proposed approach, we parallelized FFmpeg H.264/AVC decoder with the proposed technique using OpenMP, and carried out experiments on an Intel Quad-Core platform. The proposed design performs better than FFmpeg H.264/AVC decoder before the parallelization by 53%. We also reduced the amount of memory usage by 65% and 81% for a high-definition(HD) and a full high-definition(FHD) video, respectively compared with that of popular existing method called 2Dwave.

Visualization of Basal Body Temperature and Its Frequency Spectrum Analysis Using an Android Platform Smartphone (스마트폰을 활용한 여성의 기초체온 가시화 및 주파수 스펙트럼 분석)

  • Park, Sang-Eun;Kim, Jeong-Hwan;Seo, Eun-Ah;Choi, Heejung;Kim, Kyeong-Seop
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.7
    • /
    • pp.934-939
    • /
    • 2014
  • The daily recording of basal body temperature is the most useful method of determining the term of ovulation by resolving the rise in temperature. To support this aim, Graphical User Interface (GUI) system is designed and implemented to visualize the basal body temperature variations on daily basis by using android platform smartphone with programming multi-thread Java modules. To estimate the occurrence of ovulation cycle, a new method of analyzing the low-frequency features including a DC level and the second largest peak in frequency spectrum domain is proposed with interpreting the prominent features into the average basal-body temperature variations and a menstrual cycle.

Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL (OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화)

  • Park, Joo-Yul;Hong, Jung-Hyun;Chung, Ki-Seok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.325-334
    • /
    • 2016
  • Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.

Estimation of Heart Rate Variability with an Android Smart Phone Platform (안드로이드 기반 스마트폰 연동 심박변이도 추정)

  • Kim, Jeong-Hwan;Shin, Seung-Won;Kim, Hyun-Tae;Yoon, Tae-Ho;Kim, Kyeong-Seop;Lee, Jeong-Whan;Eom, Gwang-Moon
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.6
    • /
    • pp.865-871
    • /
    • 2012
  • In this study, ambulatory electrocardiogram(ECG) signal and the rhythms of heart beats are visualized in terms of R-R intervals and Heart Rate Variability(HRV) in the environment of an android plaform. With this aim, Graphical User Interface(GUI) is implemented by executing multi-thread Java programming modules including ECG, heart-beats, tachogram and visualization unit. ECG signals are acquired in an android device by receiving the data from ambulatory ECG sensory system. Finite Impulse Response(FIR) filters are implemented to eliminate the baseline wandering noises contained in the ambulatory signals and DC-offset level in R-R interval data. With simulating the normal or stress emotional state of a subject, we can find the fact that HRV can be successfully estimated and visualized in an android smart phone platform.

Multicore-Aware Code Co-Positioning to Reduce WCET on Dual-Core Processors with Shared Instruction Caches

  • Ding, Yiqiang;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.1
    • /
    • pp.12-25
    • /
    • 2012
  • For real-time systems it is important to obtain the accurate worst-case execution time (WCET). Furthermore, how to improve the WCET of applications that run on multicore processors is both significant and challenging as the WCET can be largely affected by the possible inter-core interferences in shared resources such as the shared L2 cache. In order to solve this problem, we propose an innovative approach that adopts a code positioning method to reduce the inter-core L2 cache interferences between the different real-time threads that adaptively run in a multi-core processor by using different strategies. The worst-case-oriented strategy is designed to decrease the worst-case WCET among these threads to as low as possible. The other two strategies aim at reducing the WCET of each thread to almost equal percentage or amount. Our experiments indicate that the proposed multicore-aware code positioning approaches, not only improve the worst-case performance of the real-time threads but also make good tradeoffs between efficiency and fairness for threads that run on multicore platforms.

HD-Tree: High performance Lock-Free Nearest Neighbor Search KD-Tree (HD-Tree: 고성능 Lock-Free NNS KD-Tree)

  • Lee, Sang-gi;Jung, NaiHoon
    • Journal of Korea Game Society
    • /
    • v.20 no.5
    • /
    • pp.53-64
    • /
    • 2020
  • Supporting NNS method in KD-Tree algorithm is essential in multidimensional data applications. In this paper, we propose HD-Tree, a high-performance Lock-Free KD-Tree that supports NNS in situations where reads and writes occurs concurrently. HD-Tree reduced the number of synchronization nodes used in NNS and requires less atomic operations during Lock-Free method execution. Comparing with existing algorithms, in a multi-core system with 8 core 16 thread, HD-Tree's performance has improved up to 95% on NNS and 15% on modifying in oversubscription situation.

Design of a SIMT architecture GP-GPU Using Tile based on Graphic Pipeline Structure (타일 기반 그래픽 파이프라인 구조를 사용한 SIMT 구조 GP-GPU 설계)

  • Kim, Do-Hyun;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.20 no.1
    • /
    • pp.75-81
    • /
    • 2016
  • This paper proposes a design of the tile based on graphic pipeline to improve the graphic application performance in SIMT based GP-GPU. The proposed Tile based on graphics pipeline avoids unnecessary graphic processing operation, and processes the rasterization step in parallel. The massive data processing in parallel through SIMT architecture improve the computational performance, thereby improving the 3D graphic pipeline performance. The more vertex data of 3D model, the higher performance. The proposed structure was confirmed to improve processing performance of up to 3 times from about 1.18 times as compared to 'RAMP' and previous studies.

A parametric study of bolt-nut joints by the method of finite element contact analysis (유한 요소 접촉 해석법에 의한 나사 체결부 설계 개선에 관한 연구)

  • 이병채;김영곤
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.13 no.3
    • /
    • pp.353-361
    • /
    • 1989
  • A parametric study of load distribution in bolt-nut joints is performed by the method of finite element contact analysis. The contacting surface is assumed unbonded and frictionless. Multi-body contact analysis is performed in elastic region under the assumption of axi-symmetric stress state. Load acting on the first thread from the fastened plate is much greater than that on the other threads in the standard setting. But the load distribution is shown to be improved by making the center of contact force acting on the nut surface move outwards. Such a modification is possible by enlarging the gap between bolt shank and fastened plate or by inserting suitable washers. Shape modification of the standard nut by the making a groove and a step on the nut surface is also suggested, which results in almost uniform load distribution and considerable decrease in the maximum stress of the joint.

GPU-Based Acceleration of Quantum-Inspired Evolutionary Algorithm (GPU를 이용한 Quantum-Inspired Evolutionary Algorithm 가속)

  • Ryoo, Ji-Hyun;Park, Han-Min;Choi, Ki-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.1-9
    • /
    • 2012
  • Quantum-Inspired Evolutionary Algorithm(QEA) contains sufficient data-level parallelism to be naturally accelerated on GPUs. For an efficient reduction of execution time, however, careful task-mapping should be done to properly reflect the characteristics of CPU and GPU. Furthermore, when deciding which part of the application should run on GPU, we need to consider the data transfer between CPU and GPU memory spaces as well as the data-level parallelism. In addition, the usage of zero-copy host memory, proper choice of the execution configuration, and thread organization considering memory coalescing is important to further reduce the execution time. With all these techniques, we could run QEA 3.69 times faster on average in comparison with the multi-threading CPU for the case of 0-1 knapsack problem with 30,000 items.

A Message Priority-based TCP Transmission Algorithm for Drone Systems (드론 시스템을 위한 메시지 우선순위 기반 TCP 통신 알고리즘)

  • Choi, Joon-Hyuck;Kim, Bo-Ram;Lee, Dong-Ik
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.3
    • /
    • pp.509-516
    • /
    • 2018
  • TCP is a well-known communication protocol which is widely used for reliable message transmissions. The urgent mechanism of TCP plays a key role to transmit messages with a high priority. If a high priority message occurs at the transmitting node, the urgent mechanism informs the receiving node about the presence of a high priority message prior to its transmission so that the receiving node can be prepared for handling this message in advance. This implies that the existing urgent mechanism of TCP does not guarantee an immediate or faster delivery of the high priority message itself. Therefore, the ability of priority-based transmission is required on TCP not only to ensure reliable transmissions of normal messages but also to offer a differentiated service according to the priority of message. This paper presents a priority-based transmission algorithm over TCP using a priority queue in a multi-threaded environment. The effectiveness of the proposed algorithm is explored using an experimental setup in which various messages with different priority levels are transmitted.