• Title/Summary/Keyword: Many-core architecture

Search Result 136, Processing Time 0.03 seconds

Research Challenges in Many-core SoC Designs

  • Jeong, Ui-Yeong;Yu, Seung-Ju
    • Information and Communications Magazine
    • /
    • v.25 no.12
    • /
    • pp.3-9
    • /
    • 2008
  • 본고에서는 최근 학계에서뿐만 아니라 Intel, nVidia 등의 반도체 설계업계에서도 차세대 system-on-chip (SoC) 구조로 제안하고, 실제품 설계까지 진행 중인 many-core SoC의 research challenges를 알아본다. 이러한 challenges는 architecture, software, application의 3가지 면에서 살펴보는데, 각 분야에서 주요 문제들을 고찰하고, 이 문제들을 해결하기 위해 현재 진행 중인 주요 연구 방향들을 살펴보고자 한다.

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

Accelerating 2D DCT in Multi-core and Many-core Environments (멀티코어와 매니코어 환경에서의 2 차원 DCT 가속)

  • Hong, Jin-Gun;Jung, Sung-Wook;Kim, Cheong-Ghil;Burgstaller, Bernd
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.250-253
    • /
    • 2011
  • Chip manufacture nowadays turned their attention from accelerating uniprocessors to integrating multiple cores on a chip. Moreover desktop graphic hardware is now starting to support general purpose computation. Desktop users are able to use multi-core CPU and GPU as a high performance computing resources these days. However exploiting parallel computing resources are still challenging because of lack of higher programming abstraction for parallel programming. The 2-dimensional discrete cosine transform (2D-DCT) algorithms are most computational intensive part of JPEG encoding. There are many fast 2D-DCT algorithms already studied. We implemented several algorithms and estimated its runtime on multi-core CPU and GPU environments. Experiments show that data parallelism can be fully exploited on CPU and GPU architecture. We expect parallelized DCT bring performance benefit towards its applications such as JPEG and MPEG.

Performance analyses of naval ships based on engineering level of simulation at the initial design stage

  • Jeong, Dong-Hoon;Roh, Myung-Il;Ham, Seung-Ho;Lee, Chan-Young
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.9 no.4
    • /
    • pp.446-459
    • /
    • 2017
  • Naval ships are assigned many and varied missions. Their performance is critical for mission success, and depends on the specifications of the components. This is why performance analyses of naval ships are required at the initial design stage. Since the design and construction of naval ships take a very long time and incurs a huge cost, Modeling and Simulation (M & S) is an effective method for performance analyses. Thus in this study, a simulation core is proposed to analyze the performance of naval ships considering their specifications. This simulation core can perform the engineering level of simulations, considering the mathematical models for naval ships, such as maneuvering equations and passive sonar equations. Also, the simulation models of the simulation core follow Discrete EVent system Specification (DEVS) and Discrete Time System Specification (DTSS) formalisms, so that simulations can progress over discrete events and discrete times. In addition, applying DEVS and DTSS formalisms makes the structure of simulation models flexible and reusable. To verify the applicability of this simulation core, such a simulation core was applied to simulations for the performance analyses of a submarine in an Anti-SUrface Warfare (ASUW) mission. These simulations were composed of two scenarios. The first scenario of submarine diving carried out maneuvering performance analysis by analyzing the pitch angle variation and depth variation of the submarine over time. The second scenario of submarine detection carried out detection performance analysis by analyzing how well the sonar of the submarine resolves adjacent targets. The results of these simulations ensure that the simulation core of this study could be applied to the performance analyses of naval ships considering their specifications.

Performance evaluation and analysis of TILE-Gx36 many-core processor with PARSEC benchmark (PARSEC을 이용한 TILE-Gx36 다중코어 프로세서의 성능 평가 및 분석)

  • Lee, Boseon;Kim, Han-Yee;Yu, Heonchang;Suh, Taeweon
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.1
    • /
    • pp.107-115
    • /
    • 2014
  • This paper evaluates and analyzes the performance of TILE-Gx36(Gx36), a many-core processor. The PARSEC parallel benchmark suite was used to measure the performance, and Core i7 (i7) and Atom are used for the performance comparison. When experimented with the maximum number of threads that can be executed concurrently on each machine, Gx36 showed a 2.73${\times}$ inferior performance to Core i7 and a 1.93${\times}$ superior performance to Atom. Gx36 has the largest Last Level Cache(LLC) among the compared processors. Nevertheless, it reported the biggest number of LLC misses, which, we strongly believe, is the major culprit for lower performance than expected. Our study suggests that the DDC employed in Gx36 is not a favorable cache structure for the general-purpose high-performance computing. The actual measurement with off-the-shelf machine provides non-biased data for polishing the future many-core architecture.

  • PDF

Efficient Process Network Implementation of Ray-Tracing Application on Heterogeneous Multi-Core Systems

  • Jung, Hyeonseok;Yang, Hoeseok
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.289-293
    • /
    • 2016
  • As more mobile devices are equipped with multi-core CPUs and are required to execute many compute-intensive multimedia applications, it is important to optimize the systems, considering the underlying parallel hardware architecture. In this paper, we implement and optimize ray-tracing application tailored to a given mobile computing platform with multiple heterogeneous processing elements. In this paper, a lightweight ray-tracing application is specified and implemented in Kahn process network (KPN) model-of-computation, which is known to be suitable for the description of real-time applications. We take an open-source C/C++ implementation of ray-tracing and adapt it to KPN description in the Distributed Application Layer framework. Then, several possible configurations are evaluated in the target mobile computing platform (Exynos 5422), where eight heterogeneous ARM cores are integrated. We derive the optimal degree of parallelism and a suitable distribution of the replicated tasks tailored to the target architecture.

Research of NGN based Converged Service Architecture (NGN기반 융복합 서비스 제공 구조 연구)

  • Lee, Jin-Geun;Woo, Sang-Woo
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.325-326
    • /
    • 2008
  • The telecom world is steadily converging with the IP world, the benefits of converged services are required by many traditional telecom users. The aim of this thesis is to study the functional architecture of NGN based converged service. This thesis also shows how the converged service could be implemented on NGN with IMS core architecture.

  • PDF

Implementation of an Optimal Many-core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 기법을 위한 최적의 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.119-128
    • /
    • 2011
  • This paper introduces design space exploration of many-core processors that meet high performance and low power required by the beamforming algorithm of image signals of mobile ultrasound. For the design space exploration of the many-core processor, we mapped different number of ultrasound image data to each processing element of many-core, and then determined an optimal many-core processor architecture in terms of execution time, energy efficiency and area efficiency. Experimental results indicate that PE=4096 and 1024 provide the highest energy efficiency and area efficiency, respectively. In addition, PE=4096 achieves 46x and 10x better than TI DSP C6416, which is widely used for ultrasound image devices, in terms of energy efficiency and area efficiency, respectively.

A Study on the Evaluation of Daylight Performance in High-Rise Residental Complex (초고층 주상복합 아파트의 실내 주광성능 평가에 관한 연구)

  • Kim, Kyung-Ah;Kim, Chang-Sung;Kim, Kang-Soo
    • Journal of the Korean Solar Energy Society
    • /
    • v.26 no.3
    • /
    • pp.127-133
    • /
    • 2006
  • Recently, various building types such as Center-Core shape and Y-shape were studied as the demand for hight-rise residental complex increased. However, Center-Core type can make many Problems because the house unit can face to the north or west. Therefore, this study evaluated daylight conditions for four plan types in high-rise residental complex.

Development of a Multi-Layered Workflow Management System for Product Development Processes (제품 개발 프로세스 관리를 위한 다층 통합 워크플로우 시스템 개발)

  • 강석호;김영호;김동수;배준수;배혜림
    • Korean Management Science Review
    • /
    • v.16 no.1
    • /
    • pp.187-201
    • /
    • 1999
  • In this paper, we propose a multi-layered architecture of workflow management systems based on CORBA (Common Object Request Broker Architecture). The system aims to support product development processes in distributed environment. Many companies have started to adopt workflow management systems to manage and support their business processes. However, there are many problems in direct application of those systems to product development environments. These mainly resulted from the dynamic features of product development processes. It is strongly required to support dynamic processes as well as static and procedural ones in an integrated and consistent manner. To meet these requirements, a basic workflow management system has been developed as the core component of the integrated architecture. This performs the basic functions of workflow management system. Second, a dynamic workflow management system based on a bidding mechanism has been developed to manage processes that cannot be easily defined or are likely to be modified, Finally, an SGML workflow management system, which is the third layer in the architecture, has been developed to manage documents processing workflows by integration SGML documents contents and process information into the structured SGML document.

  • PDF