• Title/Summary/Keyword: 멀티코어

Search Result 413, Processing Time 0.024 seconds

User Experience Assisted Energy-Efficient Software Design for Mobile Devices on the big.LITTLE Core Architecture (사용자 경험을 기반으로 big.LITTLE 멀티코어 구조의 스마트 모바일 단말의 에너지 소비를 최적화 하는 소프트웨어 구조 설계)

  • Lim, Sung-Hwa
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.1
    • /
    • pp.23-28
    • /
    • 2020
  • In Smart mobile devices embedding big.LITTLE architectures, the conventional multi-core assignment scheme for user applications may incur wasteful energy consumption and long response time. In this paper, we propose a user experience assisted energy-efficient multicore assignment scheme. Our simulation results show that the proposed scheme achieves at 40% less energy consumption and at 20% less response time comparing to the legacy scheme.

Performance Enhancement of Parallel Prime Sieving Computation with Hybrid Programming and Pipeline Scheduling (하이브리드 프로그래밍과 파이프라인 작업을 통한 병렬 소수 연산 성능 향상)

  • Ryu, Seung-yo;Kim, Dongseung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.114-117
    • /
    • 2015
  • 이 논문에서는 소수 추출 방법인 Sieve of Eratosthenes 알고리즘을 병렬화하되 실행시간과 에너지 소모 면에서 개선된 효과를 얻고자 한다. 멀티코어 프로세서의 공유 메모리를 효율적으로 활용하도록 하이브리드 병렬 프로그래밍 모델을 적용하고, 부하 균등화를 정교하게 조절하도록 파이프라인 작업 방식을 도입하였다. 실험결과 이전 방식보다 연산속도가 향상되었고, 에너지 사용량도 감소함을 확인하였다.

Implementation of Pipeline-Based Parallel Processing Library (파이프라인 기반의 병렬처리 라이브러리 구현)

  • Ha, Seungu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.453-456
    • /
    • 2021
  • 본 논문에서는 fork-join과 work stealing을 이용하여 동적 병렬처리를 수행하는 라이브러리를 구현하였다. 이 라이브러리는 병렬처리를 직관적으로 할 수 있는 함수형 프로그래밍 스타일의 파이프라인 API를 제공한다. 이를 이용한 성능 테스트에서 멀티코어를 제대로 활용하는 결과를 얻을 수 있었다. 마지막으로 blocking 작업 실행 시 병렬성 유지를 위해 추가로 개선할 수 있는 방법을 제시한다.

Design of Low-Area HEVC Core Transform Architecture (저면적 HEVC 코어 변환기 아키텍쳐 설계)

  • Han, Seung-Mok;Nam, Woo-Jin;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.17 no.2
    • /
    • pp.119-128
    • /
    • 2013
  • This paper proposes and implements an core transform architecture, which is one of the major processes in HEVC video compression standard. The proposed core transform architecture is implemented with only adders and shifters instead of area-consuming multipliers. Shifters in the proposed core transform architecture are implemented in wires and multiplexers, which significantly reduces chip area. Also, it can process from $4{\times}4$ to $16{\times}16$ blocks with common hardware by reusing processing elements. Designed core transform architecture in 0.13um technology can process a $16{\times}16$ block with 2-D transform in 130 cycles, and its gate count is 101,015 gates.

Data Level Parallelism for H.264/AVC Decoder on a Multi-Core Processor and Performance Analysis (멀티코어 프로세서에서의 H.264/AVC 디코더를 위한 데이터 레벨 병렬화 성능 예측 및 분석)

  • Cho, Han-Wook;Jo, Song-Hyun;Song, Yong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.102-116
    • /
    • 2009
  • There have been lots of researches for H.264/AVC performance enhancement on a multi-core processor. The enhancement has been performed through parallelization methods. Parallelization methods can be classified into a task-level parallelization method and a data level parallelization method. A task-level parallelization method for H.264/AVC decoder is implemented by dividing H.264/AVC decoder algorithms into pipeline stages. However, it is not suitable for complex and large bitstreams due to poor load-balancing. Considering load-balancing and performance scalability, we propose a horizontal data level parallelization method for H.264/AVC decoder in such a way that threads are assigned to macroblock lines. We develop a mathematical performance expectation model for the proposed parallelization methods. For evaluation of the mathematical performance expectation, we measured the performance with JM 13.2 reference software on ARM11 MPCore Evaluation Board. The cycle-accurate measurement with SoCDesigner Co-verification Environment showed that expected performance and performance scalability of the proposed parallelization method was accurate in relatively high level

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

Multi-Dimensional Record Scan with SIMD Vector Instructions (SIMD 벡터 명령어를 이용한 다차원 레코드 스캔)

  • Cho, Sung-Ryong;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.732-736
    • /
    • 2010
  • Processing a large amount of data becomes more important than ever. Particularly, the information queries which require multi-dimensional record scan can be efficiently implemented with SIMD instruction sets. In this article, we present a SIMD record scan technique which employs row-based scanning. Our technique is different from existing SIMD techniques for predicate processes and aggregate operations. Those techniques apply SIMD instructions to the attributes in the same column of the database, exploiting the column-based record organization of the in-memory database systems. Whereas, our SIMD technique is useful for multi-dimensional record scanning. As the sizes of registers and the memory become larger, our row-based SIMD scan can have bigger impact on the performance. Moreover, since our technique is orthogonal to the parallelization techniques for multi-core processors, it can be applied to both uni-processors and multi-core processors without too many changes in the software architectures.

A Computer Architecture Education Framework in IT Convergence Services Era (IT융합 서비스 환경을 위한 컴퓨터 아키텍쳐 교육 프레임워크)

  • Choi, Chang Yeol;Choi, Hwang Kyu
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.1
    • /
    • pp.23-31
    • /
    • 2013
  • A rapid growth of IT convergence into different application areas draws a lot of interest in high performance platform and embedded system. Industry needs well educated computer professionals with the practical understanding on the emerging technologies and core issues of contemporary popular services. In this paper, we present an education framework for computer system architecture based on rigorous analyses of the characteristics of IT convergence services and information technology trends. The proposed framework puts emphasis on real-world and hands-on subjects related to multicore architecture, embedded system and parallel processing. We believe effective use in the development and management of computer system architecture courses encouraging both industries and students.

Design and Implementation of NVM-based Concurrent Journaling Scheme (저널링 파일 시스템을 위한 비휘발성 메모리 기반 병행적 저널링 기법의 설계 및 구현)

  • Pak, Suehee;Lee, Eunyoung;Han, Hyuck
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.7
    • /
    • pp.157-163
    • /
    • 2021
  • A single write operation in a file system can modify multiple data, but these changes in the file system are not atomically written to disk. Thus, for the consistency of the file system, conventional journaling guarantees crash consistency instead of sacrificing the system performance. It is known that using non-volatile memory as a journal space can alleviate performance degradation due to low latency and byte-level accessibility of non-volatile memory. However, none of the journaling techniques considering non-volatile memory provide scalability. In this paper, journal space on non-volatile memory is divided into multiple regions for scalable journaling, thus dispersing concentrated operations in one region. Second, the journal area-specific operator structure is used to accelerate data write operations to storage devices. We apply the proposed technique to JFS to evaluate it on multi-core servers equipped with high-performance storage devices. The evaluation results show that the proposed technique performs better than the existing technique of the NVM-based journaling file system.

Deterministic Parallelism for Symbolic Execution Programs based on a Name-Freshness Monad Library

  • Ahn, Ki Yung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.1-9
    • /
    • 2021
  • In this paper, we extend a generic library framework based on the state monad to exploit deterministic parallelism in a purely functional language Haskell and provide benchmarks for the extended features on a multicore machine. Although purely functional programs are known to be well-suited to exploit parallelism, unintended squential data dependencies could prohibit effective parallelism. Symbolic execution programs usually implement fresh name generation in order to prevent confusion between variables in different scope with the same name. Such implementations are often based on squential state management, working against parallelism. We provide reusable primitives to help developing parallel symbolic execution programs with unbound-genercis, a generic name-binding library for Haskell, avoiding sequential dependencies in fresh name generation. Our parallel extension does not modify the internal implementation of the unbound-generics library, having zero possibility of degrading existing serial implementations of symbolic execution based on unbound-genecrics. Therefore, our extension can be applied only to the parts of source code that need parallel speedup.