• Title/Summary/Keyword: Granularity

Search Result 195, Processing Time 0.028 seconds

The Least-Dirty-First CLOCK Replacement Policy for Phase-Change Memory based Swap Devices (PCM 기반 스왑 장치를 위한 클럭 기반 최소 쓰기 우선 교체 정책)

  • Yoo, Seunghoon;Lee, Eunji;Bahn, Hyokyung
    • Journal of KIISE
    • /
    • v.42 no.9
    • /
    • pp.1071-1077
    • /
    • 2015
  • In this paper, we adopt PCM (phase-change memory) as a virtual memory swap device and present a new page replacement policy that considers the characteristics of PCM. Specifically, we aim to reduce the write traffic to PCM by considering the dirtiness of pages when making a replacement decision. The proposed policy tracks the dirtiness of a page at the granularity of a sub-page and replaces the least dirty page among the pages not recently used. Experimental results show that the proposed policy reduces the amount of data written to PCM by 22.9% on average and up to 73.7% compared to CLOCK. It also extends the lifespan of PCM by 49.0% and reduces the energy consumption of PCM by 3.0% on average.

A GPU scheduling framework for applications based on dataflow specification (데이터 플로우 기반 응용들을 위한 GPU 스케줄링 프레임워크)

  • Lee, Yongbin;Kim, Sungchan
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.10
    • /
    • pp.1189-1197
    • /
    • 2014
  • Recently, general purpose graphic processing units(GPUs) are being widely used in mobile embedded systems such as smart phone and tablet PCs. Because of architectural limitations of mobile GPGPUs, only a single program is allowed to occupy a GPU at a time in a non-preemptive way. As a result, it is difficult to meet performance requirements of applications such as frame rate or response time if applications running on a GPU are not scheduled properly. To tackle this difficulty, we propose to specify applications using synchronous data flow model of computation such that applications are formed with edges and nodes. Then nodes of applications are scheduled onto a GPU unlike conventional scheduling an application as a whole. This approach allows applications to share a GPU at a finer granularity, node (or task)-level, providing several benefits such as eliminating need for manually partitioning applications and better GPU utilization. Furthermore, any scheduling policy can be applied in response to the characteristics of applications.

Shared Data Decomposition Model for Improving Concurrency in Distributed Object-oriented Software Development Environments (분산 객체 지향 소프트웨어 개발 환경에서 동시성 향상을 위한 공유 데이타 분할 모델)

  • Kim, Tae-Hoon;Shin, Yeong-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.8
    • /
    • pp.795-803
    • /
    • 2000
  • This paper presents a shared data decomposition model for improving concurrency in multi-user, distributed software developments. In our model, the target software system is decomposed into the independent components based on project roles to be distributed over clients. The distributed components are decomposed into view objects and core objects to replicate only view objects in a distributed collaboration session. The core objects are kept in only one client and the locking is used to prevent inconsistencies. The grain size of a lock is a role instead of a class which is commonly used as the locking granularity in the existing systems. The experimental result shows that our model reduces response time by 12${\sim}$18% and gives good scalability.

  • PDF

Performane Modeling of Flash Memory Storage Systems Using Simulink (시뮬링크를 이용한 플래시메모리 저장장치 성능 모델링)

  • Min, Hang Jun;Park, Jeong Su;Lee, Joo Il;Min, Sang Lyul;Kim, Kanghee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.5
    • /
    • pp.263-272
    • /
    • 2011
  • The complexity of flash memory based storage systems is high due to diverse host interfaces and other design choices such as mapping granularity, flash memory controller execution models and so on. Thus, it is possible that the actual performance after implementation is not consistent with the target performance. This paper demonstrates that the performance prediction of flash memory based storage systems is possible through performance modeling that takes into account various design parameters. In the performance modeling, the FTL, which is the core element of flash memory based storage systems, is modeled as a set of (copy-on-write) logs and their interactions. Also, the flash memory controller is modeled based on the classification proposed in the design of the Ozone flash controller. In this study, the performance model has been implemented using Simulink and experimental results are presented and analyzed.

A Configurable Software-based Approach for Detecting CFEs Caused by Transient Faults

  • Liu, Wei;Ci, LinLin;Liu, LiPing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1829-1846
    • /
    • 2021
  • Transient faults occur in computation units of a processor, which can cause control flow errors (CFEs) and compromise system reliability. The software-based methods perform illegal control flow detection by inserting redundant instructions and monitoring signature. However, the existing methods not only have drawbacks in terms of performance overhead, but also lack of configurability. We propose a configurable approach CCFCA for detecting CFEs. The configurability of CCFCA is implemented by analyzing the criticality of each region and tuning the detecting granularity. For critical regions, program blocks are divided according to space-time overhead and reliability constraints, so that protection intensity can be configured flexibly. For other regions, signature detection algorithms are only used in the first basic block and last basic block. This helps to improve the fault-tolerant efficiency of the CCFCA. At the same time, CCFCA also has the function of solving confusion and instruction self-detection. Our experimental results show that CCFCA incurs only 10.61% performance overhead on average for several C benchmark program and the average undetected error rate is only 9.29%. CCFCA has high error coverage and low overhead compared with similar algorithms. This helps to meet different cost requirements and reliability requirements.

A Multidimensional View of SNS Usage: Conceptualization and Validation

  • Edgardo R. Bravo;Christian Fernando Libaque-Saenz
    • Asia pacific journal of information systems
    • /
    • v.32 no.3
    • /
    • pp.601-629
    • /
    • 2022
  • Social networking sites (SNSs) have become an essential part of people's lives. It is thus crucial to understand how individuals use these platforms. Previous literature has divided usage into numerous activities and then grouped them into dimensions to avoid excessive granularity. However, these categories have not been derived from a uniform theoretical background; consequently, these dimensions are dispersed, overlapping, and disconnected from each other. This study argues that "SNS usage" is a complex phenomenon consisting of multiple activities that can be grouped into dimensions under the umbrella of communication theories and these dimensions are related to each other in a particular multi-dimensional architecture. "SNS usage" is conceptualized as a third-order construct formed by "producing," "consuming," and "communicating." "Producing," in turn, is proposed as a second-order construct manifested by "commenting," "general information sharing," and "self-disclosure." The proposed model was assessed with data collected from 414 USA adult users and PLS-SEM technique. The results show empirical support for the theorized model. SNS providers now have this architecture that clarifies the role of each dimension of use, which will allow them to design effective strategies to encourage the use of these networks.

Object-Size and Call-Site Tracing based Shared Memory Allocator for False Sharing Reduction in DSM Systems (분산 공유 메모리 시스템에서 거짓 공유를 줄이는 객체-크기 및 호출지-추적 기반 공유 메모리 할당 기법)

  • Lee, Jong-Woo;Park, Young-Ho;Yoon, Yong-Ik
    • Journal of Digital Contents Society
    • /
    • v.9 no.1
    • /
    • pp.77-86
    • /
    • 2008
  • False sharing is a result of co-location of unrelated data in the same unit of memory coherency, and is one source of unnecessary overhead being of no help to keep the memory coherency in multiprocessor systems. Moreover, the damage caused by false sharing becomes large in proportion to the granularity of memory coherency. To reduce false sharing in page-based DSM systems, it is necessary to allocate unrelated data objects that have different access patterns into the separate shared pages. In this paper we propose sized and call-site tracing-based shared memory allocator, shortly SCSTallocator. SCSTallocator places each data object requested from the different call-sites into the separate shared pages, and at the same time places each data object that has different size into different shared pages. Consequently data objects that have the different call-site and different object size prohibited from being allocated to the same shared page. Our observations show that our SCSTallocator outperforms the existing dynamic shared memory allocators. By combining the two existing allocation technique, we can reduce a considerable amount of false sharing misses.

  • PDF

Secure and Fine-grained Electricity Consumption Aggregation Scheme for Smart Grid

  • Shen, Gang;Su, Yixin;Zhang, Danhong;Zhang, Huajun;Xiong, Binyu;Zhang, Mingwu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1553-1571
    • /
    • 2018
  • Currently, many of schemes for smart grid data aggregation are based on a one-level gateway (GW) topology. Since the data aggregation granularity in this topology is too single, the control center (CC) is unable to obtain more fine-grained data aggregation results for better monitoring smart grid. To improve this issue, Shen et al. propose an efficient privacy-preserving cube-data aggregation scheme in which the system model consists of two-level GW. However, a risk exists in their scheme that attacker could forge the signature by using leaked signing keys. In this paper, we propose a secure and fine-grained electricity consumption aggregation scheme for smart grid, which employs the homomorphic encryption to implement privacy-preserving aggregation of users' electricity consumption in the two-level GW smart grid. In our scheme, CC can achieve a flexible electricity regulation by obtaining data aggregation results of various granularities. In addition, our scheme uses the forward-secure signature with backward-secure detection (FSBD) technique to ensure the forward-backward secrecy of the signing keys. Security analysis and experimental results demonstrate that the proposed scheme can achieve forward-backward security of user's electricity consumption signature. Compared with related schemes, our scheme is more secure and efficient.

A Ray-Tracing Algorithm Based On Processor Farm Model (프로세서 farm 모델을 이용한 광추적 알고리듬)

  • Lee, Hyo Jong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.2 no.1
    • /
    • pp.24-30
    • /
    • 1996
  • The ray tracing method, which is one of many photorealistic rendering techniques, requires heavy computational processing to synthesize images. Parallel processing can be used to reduce the computational processing time. A parallel algorithm for the ray tracing has been implemented and executed for various images on transputer systems. In order to develop a scalable parallel algorithm, a processor farming technique has been exploited. Since each image is divided and distributed to each farming processor, the scalability of the parallel system and load balancing are achieved naturally in the proposed algorithm. Efficiency of the parallel algorithm is obtained up to 95% for nine processors. However, the best size of a distributed task is much higher in simple images due to less computational requirement for every pixel. Efficiency degradation is observed for large granularity tasks because of load unbalancing caused by the large task. Overall, transputer systems behave as good scalable parallel processing system with respect to the cost-performance ratio.

  • PDF

DEX2C: Translation of Dalvik Bytecodes into C Code and its Interface in a Dalvik VM

  • Kim, Minseong;Han, Youngsun;Cho, Myeongjin;Park, Chanhyun;Kim, Seon Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.3
    • /
    • pp.169-172
    • /
    • 2015
  • Dalvik is a virtual machine (VM) that is designed to run Java-based Android applications. A trace-based just-in-time (JIT) compilation technique is currently employed to improve performance of the Dalvik VM. However, due to runtime compilation overhead, the trace-based JIT compiler provides only a few simple optimizations. Moreover, because each trace contains only a few instructions, the trace-based JIT compiler inherently exploits fewer optimization and parallelization opportunities than a method-based JIT compiler that compiles method-by-method. So we propose a new method-based JIT compiler, named DEX2C, in order to improve performance by finding more opportunities for both optimization and parallelization in Android applications. We employ C code as an intermediate product in order to find more optimization opportunities by using the GNU C Compiler (GCC), and we will detect parallelism by using the Intel C/C++ parallel compiler and the AESOP compiler in our future work. In this paper, we introduce our DEX2C compiler, which dynamically translates Dalvik bytecodes (DEX) into C code with method granularity. We also describe a new method-based JIT interface in the Dalvik VM for the DEX2C compiler. Our experiment results show that our compiler and its interface achieve significant performance improvement by up to 15.2 times and 3.7 times on average, in Element Benchmark, and up to 2.8 times for FFT in Smartbench.