• Title/Summary/Keyword: Multi-Core Processor

Search Result 131, Processing Time 0.025 seconds

A Performance Evaluation on Classic Mutual Exclusion Algorithms for Exploring Feasibility of Practical Application (실제 적용 타당성 탐색을 위한 고전적 상호배제 알고리즘 성능 평가)

  • Lee, Hyung-Bong;Kwon, Ki-Hyeon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.12
    • /
    • pp.469-478
    • /
    • 2017
  • The mutual exclusion is originally based on the theory of race condition prevention in symmetric multi-processor operating systems. But recently, due to the generalization of multi-core processors, its application range has been rapidly shifted to parallel processing application domain. POSIX thread, WIN32 thread, and Java thread, which are typical parallel processing application development environments, provide a unique mutual exclusion mechanism for each of them. Applications that are very sensitive to performance in these environments may want to reduce the burden of mutual exclusion, even at some cost, such as inconvenience of coding. In this study, we implement Dekker's and Peterson's algorithm in the form of busy-wait and processor-yield in various platforms, and compare the performance of them with the built-in mutual exclusion mechanisms to evaluate the usability of the classic algorithms. The analysis result shows that Dekker's algorithm of processor-yield type is superior to the built-in mechanisms in POSIX and WIN32 thread environments at least 2 times and up to 70 times, and confirms that the practicality of the algorithm is sufficient.

Optimal Design Space Exploration of Multi-core Architecture for Real-time Lane Detection Algorithm (실시간 차선인식 알고리즘을 위한 최적의 멀티코어 아키텍처 디자인 공간 탐색)

  • Jeong, Inkyu;Kim, Jongmyon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.3
    • /
    • pp.339-349
    • /
    • 2017
  • This paper proposes a four-stage algorithm for detecting lanes on a driving car. In the first stage, it extracts region of interests in an image. In the second stage, it employs a median filter to remove noise. In the third stage, a binary algorithm is used to classify two classes of backgrond and foreground of an input image. Finally, an image erosion algorithm is utilized to obtain clear lanes by removing noises and edges remained after the binary process. However, the proposed lane detection algorithm requires high computational time. To address this issue, this paper presents a parallel implementation of a real-time line detection algorithm on a multi-core architecture. In addition, we implement and simulate 8 different processing element (PE) architectures to select an optimal PE architecture for the target application. Experimental results indicate that 40×40 PE architecture show the best performance, energy efficiency and area efficiency.

Development of Vehicle LDW Application Service using AUTOSAR Platform on Multi-Core MCU (멀티코어 상의 AUTOSAR 플랫폼을 활용한 차량용 LDW 응용 서비스 개발)

  • Park, Mi-Ryong;Kim, Dongwon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.113-120
    • /
    • 2014
  • In this paper, we examine Asymmetric Multi-Processing Environment to provide LDW service. Asymmetric Multi-Processing Environment consists of high-speed MCU to support rapid image processing and low-speed MCU for controlling with other ECU at the control domain. Also we designed rapid image process application and LDW application Software Component(SW-C) according to the development process rule of AUTOSAR. To communicate between two MCUs, timer based polling based IPC was designed. Also to communicate with other ECUs(Electronic Control Units), we designed CAN messages to provide alarm information and receiving CAN message to catch the Turn signal. We confirm the possibility of the various ADAS development using an Asymmetric Multi-Processing Environment and AUTOSAR platform. We also expect providing ISO 26262 functional safety.

Analysis on the Temperature of Multi-core Processors according to Placement of Functional Units and L2 Cache (코어 내부 구성요소와 L2 캐쉬의 배치 관계에 따른 멀티코어 프로세서의 온도 분석)

  • Son, Dong-Oh;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.1-8
    • /
    • 2014
  • As cores in multi-core processors are integrated in a single chip, power density increased considerably, resulting in high temperature. For this reason, many research groups have focused on the techniques to solve thermal problems. In general, the approaches using mechanical cooling system or DTM(Dynamic Thermal Management) have been used to reduce the temperature in the microprocessors. However, existing approaches cannot solve thermal problems due to high cost and performance degradation. However, floorplan scheme does not require extra cooling cost and performance degradation. In this paper, we propose the diverse floorplan schemes in order to alleviate the thermal problem caused by the hottest unit in multi-core processors. Simulation results show that the peak temperature can be reduced efficiently when the hottest unit is located near to L2 cache. Compared to baseline floorplan, the peak temperature of core-central and core-edge are decreased by $8.04^{\circ}C$, $8.05^{\circ}C$ on average, respectively.

Data Level Parallelism for H.264/AVC Decoder on a Multi-Core Processor and Performance Analysis (멀티코어 프로세서에서의 H.264/AVC 디코더를 위한 데이터 레벨 병렬화 성능 예측 및 분석)

  • Cho, Han-Wook;Jo, Song-Hyun;Song, Yong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.102-116
    • /
    • 2009
  • There have been lots of researches for H.264/AVC performance enhancement on a multi-core processor. The enhancement has been performed through parallelization methods. Parallelization methods can be classified into a task-level parallelization method and a data level parallelization method. A task-level parallelization method for H.264/AVC decoder is implemented by dividing H.264/AVC decoder algorithms into pipeline stages. However, it is not suitable for complex and large bitstreams due to poor load-balancing. Considering load-balancing and performance scalability, we propose a horizontal data level parallelization method for H.264/AVC decoder in such a way that threads are assigned to macroblock lines. We develop a mathematical performance expectation model for the proposed parallelization methods. For evaluation of the mathematical performance expectation, we measured the performance with JM 13.2 reference software on ARM11 MPCore Evaluation Board. The cycle-accurate measurement with SoCDesigner Co-verification Environment showed that expected performance and performance scalability of the proposed parallelization method was accurate in relatively high level

Analysis on the Thermal Efficiency of Branch Prediction Techniques in 3D Multicore Processors (3차원 구조 멀티코어 프로세서의 분기 예측 기법에 관한 온도 효율성 분석)

  • Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • The KIPS Transactions:PartA
    • /
    • v.19A no.2
    • /
    • pp.77-84
    • /
    • 2012
  • Speculative execution for improving instruction-level parallelism is widely used in high-performance processors. In the speculative execution technique, the most important factor is the accuracy of branch predictor. Unfortunately, complex branch predictors for improving the accuracy can cause serious thermal problems in 3D multicore processors. Thermal problems have negative impact on the processor performance. This paper analyzes two methods to solve the thermal problems in the branch predictor of 3D multi-core processors. First method is dynamic thermal management which turns off the execution of the branch predictor when the temperature of the branch predictor exceeds the threshold. Second method is thermal-aware branch predictor placement policy by considering each layer's temperature in 3D multi-core processors. According to our evaluation, the branch predictor placement policy shows that average temperature is $87.69^{\circ}C$, and average maximum temperature gradient is $11.17^{\circ}C$. And, dynamic thermal management shows that average temperature is $89.64^{\circ}C$ and average maximum temperature gradient is $17.62^{\circ}C$. Proposed branch predictor placement policy has superior thermal efficiency than the dynamic thermal management. In the perspective of performance, the proposed branch predictor placement policy degrades the performance by 3.61%, while the dynamic thermal management degrades the performance by 27.66%.

Design of Parallel Processing of Lane Detection System Based on Multi-core Processor (멀티코어를 이용한 차선 검출 병렬화 시스템 설계)

  • Lee, Hyo-Chan;Moon, Dai-Tchul;Park, In-hag;Heo, Kang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1778-1784
    • /
    • 2016
  • we improved the performance by parallelizing lane detection algorithms. Lane detection, as a intellectual assisting system, helps drivers make an alarm sound or revise the handle in response of lane departure. Four kinds of algorithms are implemented in order as following, Gaussian filtering algorithm so as to remove the interferences, gray conversion algorithm to simplify images, sobel edge detection algorithm to find out the regions of lanes, and hough transform algorithm to detect straight lines. Among parallelized methods, the data level parallelism algorithm is easy to design, yet still problem with the bottleneck. The high-speed data level parallelism is suggested to reduce this bottleneck, which resulted in noticeable performance improvement. In the result of applying actual road video of black-box on our parallel algorithm, the measurement, in the case of single-core, is approximately 30 Frames/sec. Furthermore, in the case of octa-core parallelism, the data level performance is approximately 100 Frames/sec and the highest performance comes close to 150 Frames/sec.

Implementation of IMS Core SIP Gateway based on Embedded (임베디드 기반의 IMS 코아 SIP 게이트웨이 구현)

  • Yoo, Seung-Sun;Kim, Sam-Taek
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.5
    • /
    • pp.209-214
    • /
    • 2014
  • IMS(IP Multi-Media Subsystem) is in the limelight as the Integrated wire and wireless Systems because of a sudden increase of smart mobile devices and growth of multimedia additional services such as IPTV. The structure of IMS is designed as a session control layer to provide various multimedia summative service using SIP based on IP communication network in order to carry out set-up, change and release by NGN of course, the existing voice services. But now It is broadly substituting in the IPTV, wire phone company and it is substituted in internet platform base on the soft-switch in currently. Especially, in currently, 4G LTE in a mobile communication company is rapidly growing in market. Therefore, in this study, we had designed and developed to the main prosser that can admit to 1000 user over and SIP gateway which can link the IMS Core that can link SIP Device which adopt the standard protocol on the SIP and to provide variable multimedia services.

Multi-communication layered HPL model and its application to GPU clusters

  • Kim, Young Woo;Oh, Myeong-Hoon;Park, Chan Yeol
    • ETRI Journal
    • /
    • v.43 no.3
    • /
    • pp.524-537
    • /
    • 2021
  • High-performance Linpack (HPL) is among the most popular benchmarks for evaluating the capabilities of computing systems and has been used as a standard to compare the performance of computing systems since the early 1980s. In the initial system-design stage, it is critical to estimate the capabilities of a system quickly and accurately. However, the original HPL mathematical model based on a single core and single communication layer yields varying accuracy for modern processors and accelerators comprising large numbers of cores. To reduce the performance-estimation gap between the HPL model and an actual system, we propose a mathematical model for multi-communication layered HPL. The effectiveness of the proposed model is evaluated by applying it to a GPU cluster and well-known systems. The results reveal performance differences of 1.1% on a single GPU. The GPU cluster and well-known large system show 5.5% and 4.1% differences on average, respectively. Compared to the original HPL model, the proposed multi-communication layered HPL model provides performance estimates within a few seconds and a smaller error range from the processor/accelerator level to the large system level.

Parallel Implementations of Digital Focus Indices Based on Minimax Search Using Multi-Core Processors

  • HyungTae, Kim;Duk-Yeon, Lee;Dongwoon, Choi;Jaehyeon, Kang;Dong-Wook, Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.2
    • /
    • pp.542-558
    • /
    • 2023
  • A digital focus index (DFI) is a value used to determine image focus in scientific apparatus and smart devices. Automatic focus (AF) is an iterative and time-consuming procedure; however, its processing time can be reduced using a general processing unit (GPU) and a multi-core processor (MCP). In this study, parallel architectures of a minimax search algorithm (MSA) are applied to two DFIs: range algorithm (RA) and image contrast (CT). The DFIs are based on a histogram; however, the parallel computation of the histogram is conventionally inefficient because of the bank conflict in shared memory. The parallel architectures of RA and CT are constructed using parallel reduction for MSA, which is performed through parallel relative rating of the image pixel pairs and halved the rating in every step. The array size is then decreased to one, and the minimax is determined at the final reduction. Kernels for the architectures are constructed using open source software to make it relatively platform independent. The kernels are tested in a hexa-core PC and an embedded device using Lenna images of various sizes based on the resolutions of industrial cameras. The performance of the kernels for the DFIs was investigated in terms of processing speed and computational acceleration; the maximum acceleration was 32.6× in the best case and the MCP exhibited a higher performance.