• Title/Summary/Keyword: Computations Execution

Search Result 32, Processing Time 0.028 seconds

A Multithreaded Architecture for the Efficient Execution of Vector Computations (벡타 연산을 효율적으로 수행하기 위한 다중 스레드 구조)

  • Yun, Seong-Dae;Jeong, Gi-Dong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.974-984
    • /
    • 1995
  • This paper presents a design of a high performance MULVEC (MULtithreaded architecture for the VEctor Computations), as a building block of massively parallel Processing systems. The MULVEC comes from the synthesis of the dataflow model and the extant super sclar RISC microprocesso r. The MULVEC reduces, using status fields, the number of synchronizations in the case of repeated vector computations within the same thread segment, and also reduces the amount of the context switching, network traffic, etc. After be nchmark programs are simulated on the SPARC station 20(super scalar RISC microprocessor)the performance (execution time of programs and the utilization of processors) of MULVEC and the performance(execution time of a program) of *Taccording the different numbers of node are analyzed. We observed that the execution time of the program in MULVEC is faster than that in * T about 1-2 times according the number of nodes and the number of the repetitions of the loop.

  • PDF

Probabilistic Soft Error Detection Based on Anomaly Speculation

  • Yoo, Joon-Hyuk
    • Journal of Information Processing Systems
    • /
    • v.7 no.3
    • /
    • pp.435-446
    • /
    • 2011
  • Microprocessors are becoming increasingly vulnerable to soft errors due to the current trends of semiconductor technology scaling. Traditional redundant multi-threading architectures provide perfect fault tolerance by re-executing all the computations. However, such a full re-execution technique significantly increases the verification workload on the processor resources, resulting in severe performance degradation. This paper presents a pro-active verification management approach to mitigate the verification workload to increase its performance with a minimal effect on overall reliability. An anomaly-speculation-based filter checker is proposed to guide a verification priority before the re-execution process starts. This technique is accomplished by exploiting a value similarity property, which is defined by a frequent occurrence of partially identical values. Based on the biased distribution of similarity distance measure, this paper investigates further application to exploit similar values for soft error tolerance with anomaly speculation. Extensive measurements prove that the majority of instructions produce values, which are different from the previous result value, only in a few bits. Experimental results show that the proposed scheme accelerates the processor to be 180% faster than traditional fully-fault-tolerant processor with a minimal impact on overall soft error rate.

Frameworks and Environments for Mobile Agents

  • Kim Haeng Kon;Chung Youn-Ky
    • The Journal of Information Systems
    • /
    • v.14 no.3
    • /
    • pp.48-52
    • /
    • 2005
  • The Mobile agent-based distributed systems become obtaining significant popularity as a potential vehicle to allow software components to be executed on heterogeneous environments despite mobility of users and computations. However, as these systems generally force mobile agents to use only common functionalities provided in every execution environment, the agents may not access environment-specific resources. In this paper, we propose a new framework using Aspect Oriented Programming technique to accommodate a variety of static resources as well as dynamic ones whose amount is continually changed at runtime even in the same execution environment. Unlike previous works, this framework divides roles of software developers into three groups to relieve application programmers from the complex and error prone parts of implementing dynamic adaptation and allowing each developer to only concentrate on his own part. Also, the framework enables policy decision makers to apply various adaptation policies to dynamically changing environments for adjusting mobile agents to the change of their resources.

  • PDF

Snapshot-Based Offloading for Web Applications with HTML5 Canvas (HTML5 캔버스를 활용하는 웹 어플리케이션의 스냅샷 기반 연산 오프로딩)

  • Jeong, InChang;Jeong, Hyuk-Jin;Moon, Soo-Mook
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.871-877
    • /
    • 2017
  • A vast amount of research has been carried out for executing compute-intensive applications on resource-constrained mobile devices. Computation offloading is a method in which heavy computations are dynamically migrated from a mobile device to a server, exploiting the powerful hardware of the server to perform complex computations. An important issue for offloading is the complexity of reconciling the execution state of applications between the server and the client. To address this issue, snapshot-based offloading has recently been proposed, which utilizes the snapshot of a web app as the portable description of the execution state. However, for web applications using the HTML5 canvas, snapshot-based offloading does not function correctly, because the snapshot cannot capture the state of the canvas. In this paper, we propose a code generation technique to save the canvas state as part of a snapshot, so that the snapshot-based offloading can be applied to web applications using the canvas.

Efficient Construction of Euclidean Steiner Minimum Tree Using Combination of Delaunay Triangulation and Minimum Spanning Tree (들로네 삼각망과 최소신장트리를 결합한 효율적인 유클리드 스타이너 최소트리 생성)

  • Kim, Inbum
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.1
    • /
    • pp.57-64
    • /
    • 2014
  • As Steiner minimum tree building belongs to NP-Complete problem domain, heuristics for the problem ask for immense amount execution time and computations in numerous inputs. In this paper, we propose an efficient mechanism of euclidean Steiner minimum tree construction for numerous inputs using combination of Delaunay triangulation and Prim's minimum spanning tree algorithm. Trees built by proposed mechanism are compared respectively with the Prim's minimum spanning tree and minimums spanning tree based Steiner minimum tree. For 30,000 input nodes, Steiner minimum tree by proposed mechanism shows about 2.1% tree length less and 138.2% execution time more than minimum spanning tree, and does about 0.013% tree length less and 18.9% execution time less than minimum spanning tree based Steiner minimum tree in experimental results. Therefore the proposed mechanism can work moderately well to many useful applications where execution time is not critical but reduction of tree length is a key factor.

Parallelization of Recursive Functions for Recursive Data Structures (재귀적 자료구조에 대한 재귀 함수의 병렬화)

  • An, Jun-Seon;Han, Tae-Suk
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.12
    • /
    • pp.1542-1552
    • /
    • 1999
  • 자료 병렬성이란 자료 집합의 원소들에 대하여 동일한 작업을 동시에 수행하므로써 얻어지는 병렬성을 말한다. 함수형 언어에서 자료 집합에 대한 반복 수행은 재귀적 자료형에 대한 재귀 함수에 의하여 표현된다. 본 논문에서는 이러한 재귀 함수를 자료 병렬 프로그램으로 변환하기 위한 병렬화 방법을 제시한다. 생성되는 병렬 프로그램의 병렬 수행 구조로는 일반적인 형태의 재귀적 자료형에 대하여 정의되는 다형적인 자료 병렬 연산을 사용하여 트리, 리스트 등과 같은 일반적인 재귀적 자료 집합에 대한 자료 병렬 수행이 가능하도록 하였다. 재귀 함수의 병렬화를 위해서는, 함수를 이루는 각각의 계산들의 병렬성을 재귀 호출에 의해 존재하는 의존성에 기반하여 분류하고, 이에 기반하여 각각의 계산들에 대한 적절한 자료 병렬 연산을 사용하는 병렬 프로그램을 생성하였다.Abstract Data parallelism is obtained by applying the same operations to each element of a data collection. In functional languages, iterative computations on data collections are expressed by recursions on recursive data structures. We propose a parallelization method for data-parallel implementation of such recursive functions. We employ polytypic data-parallel primitives to represent the parallel execution structure of the object programs, which enables data parallel execution with general recursive data structures, such as trees and lists. To transform sequential programs to their parallelized versions, we propose a method to classify the types of parallelism in subexpressions, based on the dependencies of the recursive calls, and generate the data-parallel programs using data-parallel primitives appropriately.

Efficient Construction of Large Scale Grade of Services Steiner Tree Using Space Locality and Polynomial-Time Approximation Scheme (공간 지역성과 PTAS를 활용한 대형 GOSST의 효과적 구성)

  • Kim, In-Bum
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.11
    • /
    • pp.153-161
    • /
    • 2011
  • As the problem of GOSST building belongs to NP compete domain, heuristics for the problem ask for immense amount execution time and computations in large scale inputs. In this paper, we propose an efficient mechanism for GOSST construction using space locality PTAS. For 40,000 input nodes with maximum weight 100, the proposed space locality PTAS GOSST with 16 unit areas can reduce about 4.00% of connection cost and 89.26% of execution time less than weighted minimum spanning tree method. Though the proposed method increases 0.03% of connection cost more, but cuts down 96.39% of execution time less than approximate GOSST method (SGOSST) without PTAS. Therefore the proposed space locality PTAS GOSST mechanism can work moderately well to many useful applications where a greate number of weighted inputs should be connected in short time with approximate minimum connection cost.

GPU Implementation Techniques of Genetic Algorithm and Comparative Studies (유전 알고리즘의 GPU 구현 기법 및 비교 연구)

  • Hyeon, Byeong-Yong;Seo, Ki-Sung
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.17 no.4
    • /
    • pp.328-335
    • /
    • 2011
  • GPU (Graphics Processing Units) is consists of SIMD (Single Instruction Multiple Data) architecture and provides fast parallel processing. A GA (Genetic Algorithm), which requires large computations, is implemented in GPU using CUDA (Compute Unified Device Architecture). Three kinds of execution models are presented according to different combinations of processing modules in GPU. Comparison experiments between GPU models and CPU are tested for a couple of benchmark problems by variation of population sizes and complexity of problem sizes.

Optimizing Speed For Adaptive Local Thresholding Algorithm U sing Dynamic Programing

  • Due Duong Anh;Hong Du Tran Le;Duan Tran Duc
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.438-441
    • /
    • 2004
  • Image binarization using a global threshold value [3] performs at high speed, but usually results in undesired binary images when the source images are of poor quality. In such cases, adaptive local thresholding algorithms [1][2][3] are used to obtain better results, and the algorithm proposed by A.E.Savekis which chooses local threshold using fore­ground and background clustering [1] is one of the best thresholding algorithms. However, this algorithm runs slowly due to its re-computing threshold value of each central pixel in a local window MxM. In this paper, we present a dynamic programming approach for the step of calculating local threshold value that reduces many redundant computations and improves the execution speed significantly. Experiments show that our proposal improvement runs more ten times faster than the original algorithm.

  • PDF

Implemenation of an ASIP for acceleration SAD operation (SAD 연산의 가속을 위한 멀티미디어 코프로세서 구현)

  • Jo, Jung-Hyun;Jeong, Ha-Young
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.809-810
    • /
    • 2006
  • An H.264 algorithm is commonly used for video compression applications. This algorithm requires a large number of data computations, for example, the sum of absolute difference (SAD) operation. We analyzed H.264 reference encoding workloads. The H.264 encoding program has 8.78% SAD operation. The SAD operation is to sum up 16 difference-values in H.264 $4{\times}4$ sub-blocks. In order to accelerate SAD operations, we implemented an application specific instruction-set processor (ASIP) that can execute SAD and data transfer instructions. The proposed coprocessor has an absolute value generator and a carry save adder (CSA) unit to sum up 8 difference-values per one clock cycle. We completed SAD operation in 2 clock cycles. Experimental results show that the performance is improved by 34% of total execution time.

  • PDF