• 제목/요약/키워드: many-core processor

Search Result 54, Processing Time 0.03 seconds

An Implementation of ECC(Elliptic Curve Cryptographic)Processor with Bus-splitting method for Embedded SoC(System on a Chip) (임베디드 SoC를 위한 Bus-splitting 기법 적용 ECC 보안 프로세서의 구현)

  • Choi, Seon-Jun;Chang, Woo-Youg;Kim, Young-Chul
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.651-654
    • /
    • 2005
  • In this paper, we designed ECC(Elliptic Curve Cryptographic) Processor with Bus-splitting mothod for embedded SoC. ECC SIP is designed by VHDL RTL modeling, and implemented reusably through the procedure of logic synthesis, simulation and FPGA verification. To communicate with ARM9 core and SIP, we designed SIP bus functional model according to AMBA AHB specification. The design of ECC Processor for platform-based SoC is implemented using the design kit which is composed of many devices such as ARM9 RISC core, memory, UART, interrupt controller, FPGA and so on. We performed software design on the ARM9 core for SIP and peripherals control, memory address mapping and so on.

  • PDF

Optimal Many-core Processor Architecture for Different Ultrasonic Image Resolutions (초음파 영상선호의 크기 변화에 따른 최적의 매니코어 프로세서 구조)

  • Kang, Seong-Mo;Kim, Jong-Myon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.13 no.1
    • /
    • pp.50-55
    • /
    • 2012
  • This paper proposes an optima] many-core processor architecture that meets the requirements of low power and high performance for different ultrasonic image resolutions in hand-held ultrasonic devices. To identify the optimal many-core architecture, seven different PE configurations are simulated for processing ultrasonic images in terms of execution performance and energy consumption. Experimental results indicate that the highest energy efficiencies are achieved at PEs=1,024, 64, and 256 for ultrasonic images at $256{\times}256$, $320{\times}240$, and $800{\times}480$ resolutions, respectively. In addition, the maximum area efficiencies are obtained at PEs=256 (for $256{\times}256$ and $800{\times}480$ image resolutions) and 64 (for $320{\times}240$ image resolution).

Performance evaluation and analysis of TILE-Gx36 many-core processor with PARSEC benchmark (PARSEC을 이용한 TILE-Gx36 다중코어 프로세서의 성능 평가 및 분석)

  • Lee, Boseon;Kim, Han-Yee;Yu, Heonchang;Suh, Taeweon
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.1
    • /
    • pp.107-115
    • /
    • 2014
  • This paper evaluates and analyzes the performance of TILE-Gx36(Gx36), a many-core processor. The PARSEC parallel benchmark suite was used to measure the performance, and Core i7 (i7) and Atom are used for the performance comparison. When experimented with the maximum number of threads that can be executed concurrently on each machine, Gx36 showed a 2.73${\times}$ inferior performance to Core i7 and a 1.93${\times}$ superior performance to Atom. Gx36 has the largest Last Level Cache(LLC) among the compared processors. Nevertheless, it reported the biggest number of LLC misses, which, we strongly believe, is the major culprit for lower performance than expected. Our study suggests that the DDC employed in Gx36 is not a favorable cache structure for the general-purpose high-performance computing. The actual measurement with off-the-shelf machine provides non-biased data for polishing the future many-core architecture.

  • PDF

Implementation of an Optimal SIMD-based Many-core Processor for Sound Synthesis of Guitar (기타 음 합성을 위한 최적의 SIMD기반 매니코어 프로세서 구현)

  • Choi, Ji-Won;Kang, Myeong-Su;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.1-10
    • /
    • 2012
  • Improving operating frequency of processors is no longer today's issues; a multiprocessor technique which integrates many processors has received increasing attention. Currently, high-performance processors that integrate 64 or 128 cores are developing for large data processing over 2, 4, or 8 processor cores. This paper proposes an optimal many-core processor for synthesizing guitar sounds. Unlike the previous research in which a processing element (PE) was assigned to support one of guitar strings, this paper evaluates the impacts of mapping different numbers of PEs to one guitar string in terms of performance and both area and energy efficiencies using architectural and workload simulations. Experimental results show that the maximum area energy efficiencies were achieved at PEs=24 and 96, respectively, for synthesizing guitar sounds with sampling rate of 44.1kHz and 16-bit quantization. The synthesized sounds were very similar to original guitar sounds in their spectra. In addition, the proposed many-core processor was 1,235 and 22 times better than TI TMS320C6416 in area and energy efficiencies, respectively.

High Performance Message Scattering Algorithm in Multicore Processor (멀티코어 프로세서에서의 효율적인 메시지 스캐터링 지원 기법)

  • Park, Jongsu
    • Journal of Platform Technology
    • /
    • v.10 no.2
    • /
    • pp.3-9
    • /
    • 2022
  • In this paper, to maximize the performance of the scatter communication in multi-core and many-core processors, a technique that considers the communication situation of the processing node is applied to a multi-core processor composed of 32 processing nodes. Since the existing scatter algorithm cannot recognize the communication conditions of the processing nodes, communication is generally performed according to an initially set transmission order. In this case, scatter communication starts only after the communication currently being performed by all processing nodes inside the processor is finished. The scatter communication performance was improved by this technique, and it was confirmed that there was a performance improvement of up to 78.93% compared to the existing algorithm through BFM simulation.

An Optimization Tool for Determining Processor Affinity of Networking Processes (통신 프로세스의 프로세서 친화도 결정을 위한 최적화 도구)

  • Cho, Joong-Yeon;Jin, Hyun-Wook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.131-136
    • /
    • 2013
  • Multi-core processors can improve parallelism of application processes and thus can enhance the system throughput. Researchers have recently revealed that the processor affinity is an important factor to determine network I/O performance due to architectural characteristics of multi-core processors; thus, many researchers are trying to suggest a scheme to decide an optimal processor affinity. Existing schemes to dynamically decide the processor affinity are able to transparently adapt for system changes, such as modifications of application and upgrades of hardware, but these have limited access to characteristics of application behavior and run-time information that can be collected heuristically. Thus, these can provide only sub-optimal processor affinity. In this paper, we define meaningful system variables for determining optimal processor affinity and suggest a tool to gather such information. We show that the implemented tool can overcome limitations of existing schemes and can improve network bandwidth.

Design and Implementation of a Linux-based Message Processor to Minimize the Response-time Delay of Non-real-time Messages in Multi-core Environments (멀티코어 환경에서 비실시간 메시지의 응답시간 지연을 최소화하는 리눅스 기반 메시지 처리기의 설계 및 구현)

  • Wang, Sangho;Park, Younghun;Park, Sungyong;Kim, Seungchun;Kim, Cheolhoe;Kim, Sangjun;Jin, Cheol
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.115-123
    • /
    • 2017
  • A message processor is server software that receives non-realtime messages as well as realtime messages from clients that need to be processed within a deadline. With the recent advances of micro-processor technologies and Linux, the message processor is often implemented in Linux-based multi-core servers and it is important to use cores efficiently to maximize the performance of system in multi-core environments. Numerous research efforts on a real-time scheduler for the efficient utilization of the multi-core environments have been conducted. Typically, though, they have been conducted theoretically or via simulation, making a subsequent real-system application difficult. Moreover, many Linux-based real-time schedulers can only be used in a specific Linux version, or the Linux source code needs to be modified. This paper presents the design of a Linux-based message processor for multi-core environments that maps the threads to the cores at user level. The message processor is implemented through a modification of the traditional RM algorithm that consolidates the real-time messages into certain cores using a first-fit-based bin-packing algorithm; this minimizes the response-time delay of the non-real-time messages, while guaranteeing the violation rate of the real-time messages. To compare the performances, the message processor was implemented using the two multi-core-scheduling algorithms GSN-EDF and P-FP, which are provided by the LITMUS framework. The benchmarking results show that the response-time delay of non-real-time messages in the proposed system was improved up to a maximum of 17% to 18%.

Dynamic Scheduling of Network Processes for Multi-Core Systems (멀티 코어 시스템에서 통신 프로세스의 동적 스케줄링)

  • Jang, Hye-Churn;Jin, Hyun-Wook;Kim, Hag-Young
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.968-972
    • /
    • 2009
  • The multi-core processors are being widely exploited by many high-end systems. With significant advances in processor architecture, the network band-width required on the high-end systems is increasing drastically. It is therefore highly desirable to manage multiple cores efficiently to achieve high network band-width with minimum resource requirements. Modern operating systems, however, still have significant design and optimization space to leverage the network performance over multi-core systems. In this paper, we suggest a novel networking process scheduling scheme, which decides the best processor affinity of networking processes based on the processor cache layout, communication intensiveness, and processor loads. The experimental results show that the scheduling scheme implemented in the Linux kernel can improve the network bandwidth and the effectiveness of processor utilization by 20% and 59%, respectively.

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

Design Space Exploration of Embedded Many-Core Processors for Real-Time Fire Feature Extraction (실시간 화재 특징 추출을 위한 임베디드 매니코어 프로세서의 디자인 공간 탐색)

  • Suh, Jun-Sang;Kang, Myeongsu;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.1-12
    • /
    • 2013
  • This paper explores design space of many-core processors for a fire feature extraction algorithm. This paper evaluates the impact of varying the number of cores and memory sizes for the many-core processor and identifies an optimal many-core processor in terms of performance, energy efficiency, and area efficiency. In this study, we utilized 90 samples with dimensions of $256{\times}256$ (60 samples containing fire and 30 samples containing non-fire) for experiments. Experimental results using six different many-core architectures (PEs=16, 64, 256, 1,024, 4,096, and 16,384) and the feature extraction algorithm of fire indicate that the highest area efficiency and energy efficiency are achieved at PEs=1,024 and 4,096, respectively, for all fire/non-fire containing movies. In addition, all the six many-core processors satisfy the real-time requirement of 30 frames-per-second (30 fps) for the algorithm.