• Title/Summary/Keyword: Many-core

Search Result 1,563, Processing Time 0.034 seconds

Performance evaluation and analysis of TILE-Gx36 many-core processor with PARSEC benchmark (PARSEC을 이용한 TILE-Gx36 다중코어 프로세서의 성능 평가 및 분석)

  • Lee, Boseon;Kim, Han-Yee;Yu, Heonchang;Suh, Taeweon
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.1
    • /
    • pp.107-115
    • /
    • 2014
  • This paper evaluates and analyzes the performance of TILE-Gx36(Gx36), a many-core processor. The PARSEC parallel benchmark suite was used to measure the performance, and Core i7 (i7) and Atom are used for the performance comparison. When experimented with the maximum number of threads that can be executed concurrently on each machine, Gx36 showed a 2.73${\times}$ inferior performance to Core i7 and a 1.93${\times}$ superior performance to Atom. Gx36 has the largest Last Level Cache(LLC) among the compared processors. Nevertheless, it reported the biggest number of LLC misses, which, we strongly believe, is the major culprit for lower performance than expected. Our study suggests that the DDC employed in Gx36 is not a favorable cache structure for the general-purpose high-performance computing. The actual measurement with off-the-shelf machine provides non-biased data for polishing the future many-core architecture.

  • PDF

Implementation of an Optimal Many-core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 기법을 위한 최적의 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.119-128
    • /
    • 2011
  • This paper introduces design space exploration of many-core processors that meet high performance and low power required by the beamforming algorithm of image signals of mobile ultrasound. For the design space exploration of the many-core processor, we mapped different number of ultrasound image data to each processing element of many-core, and then determined an optimal many-core processor architecture in terms of execution time, energy efficiency and area efficiency. Experimental results indicate that PE=4096 and 1024 provide the highest energy efficiency and area efficiency, respectively. In addition, PE=4096 achieves 46x and 10x better than TI DSP C6416, which is widely used for ultrasound image devices, in terms of energy efficiency and area efficiency, respectively.

An Efficient No-Core Cut Pocketing CAM System for Wire-EDM

  • EL-Midany, Tawfik T.;Kohail, Ahmed M.;Tawfik, Hamdy
    • International Journal of CAD/CAM
    • /
    • v.6 no.1
    • /
    • pp.167-172
    • /
    • 2006
  • Recently, wire-EDM became a necessity for many engineering applications, particularly in the dies making. No-Core cut process is helpful for operations in which falling slug can jam the machine or wire. In this paper a proposed CAM system (called NCPP) is introduced, to overcome the limitations of the existing CAM systems in the machining of No-Core cut. The proposed CAM system (NCPP) provides pocketing of No-core cut and optimal selection of the position of starting hole (wire threading point), to minimize toolpath length. It was written for data exchange between CAD-CAM-CNC machines. This data model will become part of the ISO (Data model for Computerized Numerical Controllers) international standard. The NCPP system has been implemented in Visual C++. Many examples are used to illustrate NCPP system. The results show that, NCPP saves the machining time by significant value. This value depends on the shape and complexity of the workpiece that is being cut.

Electromotive Force Characteristics of Current Transformer According to the Magnetic Properties of Ferromagnetic Core

  • Kim, Young Sun
    • Transactions on Electrical and Electronic Materials
    • /
    • v.16 no.1
    • /
    • pp.37-41
    • /
    • 2015
  • The most common structure of the current transformer (CT) consists of a length of wire wrapped many times around a silicon steel ring passed over the circuit being measured. Therefore, the primary circuit of CT consists of a single turn of the conductor, with a secondary circuit of many tens or hundreds of turns. The primary winding may be a permanent part of the current transformer, with a heavy copper bar to carry the current through the magnetic core. However, when the large current flows into a wire, it is difficult to measure its magnitude of current because the core is saturated and the core shows magnetic nonlinear characteristics. Therefore, we proposed a newly designed CT which has an air gap in the core to decrease the generated magnetic flux. Adding the air gap in the magnetic path increases the total magnetic reluctance against the same magnetic motive force (MMF). Using a ferrite core instead of steel also causes the generation of low magnetic flux. These features can protect the magnetic saturation of the CT core compared with the steel core. This technique can help the design of the CT to obtain a special shape and size.

Design Space Exploration of Embedded Many-Core Processors for Real-Time Fire Feature Extraction (실시간 화재 특징 추출을 위한 임베디드 매니코어 프로세서의 디자인 공간 탐색)

  • Suh, Jun-Sang;Kang, Myeongsu;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.1-12
    • /
    • 2013
  • This paper explores design space of many-core processors for a fire feature extraction algorithm. This paper evaluates the impact of varying the number of cores and memory sizes for the many-core processor and identifies an optimal many-core processor in terms of performance, energy efficiency, and area efficiency. In this study, we utilized 90 samples with dimensions of $256{\times}256$ (60 samples containing fire and 30 samples containing non-fire) for experiments. Experimental results using six different many-core architectures (PEs=16, 64, 256, 1,024, 4,096, and 16,384) and the feature extraction algorithm of fire indicate that the highest area efficiency and energy efficiency are achieved at PEs=1,024 and 4,096, respectively, for all fire/non-fire containing movies. In addition, all the six many-core processors satisfy the real-time requirement of 30 frames-per-second (30 fps) for the algorithm.

Architecture Exploration of Optimal Many-Core Processors for a Vector-based Rasterization Algorithm (래스터화 알고리즘을 위한 최적의 매니코어 프로세서 구조 탐색)

  • Son, Dong-Koo;Kim, Cheol-Hong;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.9 no.1
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we implement and evaluate the performance of a vector-based rasterization algorithm for 3D graphics by using a SIMD (single instruction multiple data) many-core processor architecture. In addition, we evaluate the impact of a data-per-processing elements (DPE) ratio that is defined as the amount of data directly mapped to each processing element (PE) within many-core in terms of performance, energy efficiency, and area efficiency. For the experiment, we utilize seven different PE configurations by varying the DPE ratio (or the number PEs), which are implemented in the same 130 nm CMOS technology with a 500 MHz clock frequency. Experimental results indicate that the optimal PE configuration is achieved as the DPE ratio is in the range from 16,384 to 256 (or the number of PEs is in the range from 16 and 1,024), which meets the requirements of mobile devices in terms of the optimal performance and efficiency.

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

  • Lee, Sang-Pil;Kim, Deok-Ho;Yi, Jae-Young;Ro, Won-Woo
    • Journal of Information Processing Systems
    • /
    • v.8 no.1
    • /
    • pp.159-174
    • /
    • 2012
  • This paper presents a study on a high-performance design for a block cipher algorithm implemented on modern many-core graphics processing units (GPUs). The recent emergence of VLSI technology makes it feasible to fabricate multiple processing cores on a single chip and enables general-purpose computation on a GPU (GPGPU). The GPU strategy offers significant performance improvements for all-purpose computation and can be used to support a broad variety of applications, including cryptography. We have proposed an efficient implementation of the encryption/decryption operations of a block cipher algorithm, SEED, on off-the-shelf NVIDIA many-core graphics processors. In a thorough experiment, we achieved high performance that is capable of supporting a high network speed of up to 9.5 Gbps on an NVIDIA GTX285 system (which has 240 processing cores). Our implementation provides up to 4.75 times higher performance in terms of encoding and decoding throughput as compared to the Intel 8-core system.

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing (효율적인 영상데이터 처리를 위한 SIMD기반 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2011
  • Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.

Mechanical Machining of Prism pattern (프리즘 패턴의 기계적 절삭 가공)

  • Yoo Y. E.;Hong S. M.;Je T. J.;Choi D. S.
    • Proceedings of the Korean Society for Technology of Plasticity Conference
    • /
    • 2005.09a
    • /
    • pp.110-113
    • /
    • 2005
  • In recent, various shapes of pattern in micron or nano scale are adapted in many applications due to their good mechanical or optical properties. Light guide panel (LGP) of the LCD is one of important applications for micro pattern and micro prism shape is one of the typical patterns. Many applications have the patterns on their surface and the size of the pattern keep decreasing down to the order of micron or even under micron. On the other hand, the area to be patterned keeps enlarging. These two trends in patterned products require tooling micro patterns on large surface, which has still many technical problems to be solved mainly due to pattern size and the tooling area. In this study, we fabricate prism shape of patterns using diamond cutting tool on some metal core and plastic core like PMMA Some of cutting conditions are investigated including cutting force, cutting depth and speed for different core materials.

  • PDF

The Researches on Evaluating Model for the Core Competence of Harbor Enterprises

  • Jing, Lu;Wei, Zhang
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2004.08a
    • /
    • pp.96-101
    • /
    • 2004
  • Core competence of enterprises became the focus of many researches recently. This essay summarizes the situation of Chinese harbor enterprises and establishes an evaluating model which makes use of factor-analysis.

  • PDF