• Title/Summary/Keyword: Computing time benchmark

Search Result 66, Processing Time 0.026 seconds

An Interference Matrix Based Approach to Bounding Worst-Case Inter-Thread Cache Interferences and WCET for Multi-Core Processors

  • Yan, Jun;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.131-140
    • /
    • 2011
  • Different cores typically share the last-level cache in a multi-core processor. Threads running on different cores may interfere with each other. Therefore, the multi-core worst-case execution time (WCET) analyzer must be able to safely and accurately estimate the worst-case inter-thread cache interference. This is not supported by current WCET analysis techniques that manly focus on single thread analysis. This paper presents a novel approach to analyze the worst-case cache interference and bounding the WCET for threads running on multi-core processors with shared L2 instruction caches. We propose to use an interference matrix to model inter-thread interference, on which basis we can calculate the worst-case inter-thread cache interference. Our experiments indicate that the proposed approach can give a worst-case bound less than 1%, as in benchmark fib-call, and an average 16.4% overestimate for threads running on a dual-core processor with shared-L2 cache. Our approach dramatically improves the accuracy of WCET overestimatation by on average 20.0% compared to work.

A Comparative Performance Analysis of Spark-Based Distributed Deep-Learning Frameworks (스파크 기반 딥 러닝 분산 프레임워크 성능 비교 분석)

  • Jang, Jaehee;Park, Jaehong;Kim, Hanjoo;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.299-303
    • /
    • 2017
  • By piling up hidden layers in artificial neural networks, deep learning is delivering outstanding performances for high-level abstraction problems such as object/speech recognition and natural language processing. Alternatively, deep-learning users often struggle with the tremendous amounts of time and resources that are required to train deep neural networks. To alleviate this computational challenge, many approaches have been proposed in a diversity of areas. In this work, two of the existing Apache Spark-based acceleration frameworks for deep learning (SparkNet and DeepSpark) are compared and analyzed in terms of the training accuracy and the time demands. In the authors' experiments with the CIFAR-10 and CIFAR-100 benchmark datasets, SparkNet showed a more stable convergence behavior than DeepSpark; but in terms of the training accuracy, DeepSpark delivered a higher classification accuracy of approximately 15%. For some of the cases, DeepSpark also outperformed the sequential implementation running on a single machine in terms of both the accuracy and the running time.

Efficient Task Distribution Method for Load Balancing on Clusters of Heterogeneous Workstations (이기종 워크스테이션 클러스터 상에서 부하 균형을 위한 효과적 작업 분배 방법)

  • 지병준;이광모
    • Journal of Internet Computing and Services
    • /
    • v.2 no.3
    • /
    • pp.81-92
    • /
    • 2001
  • The clustering environment with heterogeneous workstations provides the cost effectiveness and usability for executing applications in parallel. The load balancing is considered as a necessary feature for the clustering of heterogeneous workstations to minimize the turnaround time. Since each workstation may have different users, groups. requests for different tasks, and different processing power, the capability of each processing unit is relative to the others' unit in the clustering environment Previous works is a static approach which assign a predetermined weight for the processing capability of each workstation or a dynamic approach which executes a benchmark program to get relative processing capability of each workstation. The execution of the benchmark program, which has nothing to do with the application being executed, consumes the computation time and the overall turnaround time is delayed. In this paper, we present an efficient task distribution method and implementation of load balancing system for the clustering environment with heterogeneous workstations. Turnaround time of the methods presented in this paper is compared with the method without load balancing as well as with the method load balancing with performance evaluation program. The experimental results show that our methods outperform all the other methods that we compared.

  • PDF

Implementation of Particle Swarm Optimization Method Using CUDA (CUDA를 이용한 Particle Swarm Optimization 구현)

  • Kim, Jo-Hwan;Kim, Eun-Su;Kim, Jong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.1019-1024
    • /
    • 2009
  • In this paper, particle swarm optimization(PSO) is newly implemented by CUDA(Compute Unified Device Architecture) and is applied to function optimization with several benchmark functions. CUDA is not CPU but GPU(Graphic Processing Unit) that resolves complex computing problems using parallel processing capacities. In addition, CUDA helps one to develop GPU softwares conveniently. Compared with the optimization result of PSO executed on a general CPU, CUDA saves about 38% of PSO running time as average, which implies that CUDA is a promising frame for real-time optimization and control.

Enhanced Stereo Matching Algorithm based on 3-Dimensional Convolutional Neural Network (3차원 합성곱 신경망 기반 향상된 스테레오 매칭 알고리즘)

  • Wang, Jian;Noh, Jackyou
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.5
    • /
    • pp.179-186
    • /
    • 2021
  • For stereo matching based on deep learning, the design of network structure is crucial to the calculation of matching cost, and the time-consuming problem of convolutional neural network in image processing also needs to be solved urgently. In this paper, a method of stereo matching using sparse loss volume in parallax dimension is proposed. A sparse 3D loss volume is constructed by using a wide step length translation of the right view feature map, which reduces the video memory and computing resources required by the 3D convolution module by several times. In order to improve the accuracy of the algorithm, the nonlinear up-sampling of the matching loss in the parallax dimension is carried out by using the method of multi-category output, and the training model is combined with two kinds of loss functions. Compared with the benchmark algorithm, the proposed algorithm not only improves the accuracy but also shortens the running time by about 30%.

APPLICATION OF BACKWARD DIFFERENTIATION FORMULA TO SPATIAL REACTOR KINETICS CALCULATION WITH ADAPTIVE TIME STEP CONTROL

  • Shim, Cheon-Bo;Jung, Yeon-Sang;Yoon, Joo-Il;Joo, Han-Gyu
    • Nuclear Engineering and Technology
    • /
    • v.43 no.6
    • /
    • pp.531-546
    • /
    • 2011
  • The backward differentiation formula (BDF) method is applied to a three-dimensional reactor kinetics calculation for efficient yet accurate transient analysis with adaptive time step control. The coarse mesh finite difference (CMFD) formulation is used for an efficient implementation of the BDF method that does not require excessive memory to store old information from previous time steps. An iterative scheme to update the nodal coupling coefficients through higher order local nodal solutions is established in order to make it possible to store only node average fluxes of the previous five time points. An adaptive time step control method is derived using two order solutions, the fifth and the fourth order BDF solutions, which provide an estimate of the solution error at the current time point. The performance of the BDF- and CMFD-based spatial kinetics calculation and the adaptive time step control scheme is examined with the NEACRP control rod ejection and rod withdrawal benchmark problems. The accuracy is first assessed by comparing the BDF-based results with those of the Crank-Nicholson method with an exponential transform. The effectiveness of the adaptive time step control is then assessed in terms of the possible computing time reduction in producing sufficiently accurate solutions that meet the desired solution fidelity.

Performance Evaluation of Interconnection Network in Microservers (마이크로서버의 내부 연결망 성능평가)

  • Oh, Myeong-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.6
    • /
    • pp.91-97
    • /
    • 2021
  • A microserver is a type of a computing server, in which two or more CPU nodes are implemented on a separate computing board, and a plurality of computing boards are integrated on a main board. In building a cluster system, the microserver has advantages in several points such as energy efficiency, area occupied, and ease of management compared to the existing method of mounting legacy servers in multiple racks. In addition, since the microserver uses a fast interconnection network between CPU nodes, performance improvement for data transfers is expected. The proposed microserver can mount a total of 16 computing boards with 4 CPU nodes on the main board, and uses Serial-RapidIO (SRIO) as an interconnection network. In order to analyze the performance of the proposed microserver in terms of the interconnection network which is a core performance issue of the microserver, we compare and quantify the performance of commercial microservers. As a result of the test, it showed up to about 7 times higher bandwidth improvement when transmitting data using the interconnection network. In addition, with CloudSuite benchmark programs used in actual cloud computing, maximum 60% reduction in execution time was obtained compared to commercial microservers with similar CPU performance specification.

A Quantitative Approach to Minimize Energy Consumption in Cloud Data Centres using VM Consolidation Algorithm

  • M. Hema;S. KanagaSubaRaja
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.2
    • /
    • pp.312-334
    • /
    • 2023
  • In large-scale computing, cloud computing plays an important role by sharing globally-distributed resources. The evolution of cloud has taken place in the development of data centers and numerous servers across the globe. But the cloud information centers incur huge operational costs, consume high electricity and emit tons of dioxides. It is possible for the cloud suppliers to leverage their resources and decrease the consumption of energy through various methods such as dynamic consolidation of Virtual Machines (VMs), by keeping idle nodes in sleep mode and mistreatment of live migration. But the performance may get affected in case of harsh consolidation of VMs. So, it is a desired trait to have associate degree energy-performance exchange without compromising the quality of service while at the same time reducing the power consumption. This research article details a number of novel algorithms that dynamically consolidate the VMs in cloud information centers. The primary objective of the study is to leverage the computing resources to its best and reduce the energy consumption way behind the Service Level Agreement (SLA)drawbacks relevant to CPU load, RAM capacity and information measure. The proposed VM consolidation Algorithm (PVMCA) is contained of four algorithms: over loaded host detection algorithm, VM selection algorithm, VM placement algorithm, and under loading host detection algorithm. PVMCA is dynamic because it uses dynamic thresholds instead of static thresholds values, which makes it suggestion for real, unpredictable workloads common in cloud data centers. Also, the Algorithms are adaptive because it inevitably adjusts its behavior based on the studies of historical data of host resource utilization for any application with diverse workload patterns. Finally, the proposed algorithm is online because the algorithms are achieved run time and make an action in response to each request. The proposed algorithms' efficiency was validated through different simulations of extensive nature. The output analysis depicts the projected algorithms scaled back the energy consumption up to some considerable level besides ensuring proper SLA. On the basis of the project algorithms, the energy consumption got reduced by 22% while there was an improvement observed in SLA up to 80% compared to other benchmark algorithms.

A Hybrid Mechanism of Particle Swarm Optimization and Differential Evolution Algorithms based on Spark

  • Fan, Debin;Lee, Jaewan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.5972-5989
    • /
    • 2019
  • With the onset of the big data age, data is growing exponentially, and the issue of how to optimize large-scale data processing is especially significant. Large-scale global optimization (LSGO) is a research topic with great interest in academia and industry. Spark is a popular cloud computing framework that can cluster large-scale data, and it can effectively support the functions of iterative calculation through resilient distributed datasets (RDD). In this paper, we propose a hybrid mechanism of particle swarm optimization (PSO) and differential evolution (DE) algorithms based on Spark (SparkPSODE). The SparkPSODE algorithm is a parallel algorithm, in which the RDD and island models are employed. The island model is used to divide the global population into several subpopulations, which are applied to reduce the computational time by corresponding to RDD's partitions. To preserve population diversity and avoid premature convergence, the evolutionary strategy of DE is integrated into SparkPSODE. Finally, SparkPSODE is conducted on a set of benchmark problems on LSGO and show that, in comparison with several algorithms, the proposed SparkPSODE algorithm obtains better optimization performance through experimental results.

Container-based Cluster Management System for User-driven Distributed Computing (사용자 맞춤형 분산 컴퓨팅을 위한 컨테이너 기반 클러스터 관리 시스템)

  • Park, Ju-Won;Hahm, Jaegyoon
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.9
    • /
    • pp.587-595
    • /
    • 2015
  • Several fields of science have traditionally demanded large-scale workflow support, which requires thousands of central processing unit (CPU) cores. In order to support such large-scale scientific workflows, large-capacity cluster systems such as supercomputers are widely used. However, as users require a diversity of software packages and configurations, a system administrator has some trouble in making a service environment in real time. In this paper, we present a container-based cluster management platform and introduce an implementation case to minimize performance reduction and dynamically provide a distributed computing environment desired by users. This paper offers the following contributions. First, a container-based virtualization technology is assimilated with a resource and job management system to expand applicability to support large-scale scientific workflows. Second, an implementation case in which docker and HTCondor are interlocked is introduced. Lastly, docker and native performance comparison results using two widely known benchmark tools and Monte-Carlo simulation implemented using various programming languages are presented.