• Title/Summary/Keyword: Multi-processors

Search Result 213, Processing Time 0.032 seconds

A Study on Buffer and Shared Memory Optimization for Multi-Processor System (다중 프로세서 시스템에서의 버퍼 및 공유 메모리 최적화 연구)

  • Kim, Jong-Su;Mun, Jong-Uk;Im, Gang-Bin;Jeong, Gi-Hyeon;Choe, Gyeong-Hui
    • The KIPS Transactions:PartA
    • /
    • v.9A no.2
    • /
    • pp.147-162
    • /
    • 2002
  • Multi-processor system with fast I/O devices improves processing performance and reduces the bottleneck by I/O concentration. In the system, the Performance influenced by shared memory used for exchanging data between processors varies with configuration and utilization. This paper suggests a prediction model for buffer and shared memory optimization under interrupt recognition method using mailbox. Ethernet (IEEE 802.3) packets are used as the input of system and the amount of utilized memory is measured for different network bandwidth and burstiness. Some empirical studies show that the amount of buffer and shared memory varies with packet concentration rate as well as I/O bandwidth. And the studies also show the correlation between two memories.

Performance Comparison between Hardware & Software Cache Partitioning Techniques (하드웨어 캐시 파티셔닝과 소프트웨어 캐시 파티셔닝의 성능 비교)

  • Park, JiWoong;Yeom, HeonYoung;Eom, Hyeonsang
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.177-182
    • /
    • 2015
  • The era of multi-core processors has begun since the limit of the clock speed has been reached. These days, multi-core technology is used not only in desktops, servers, and table PCs, but also in smartphones. In this architecture, there is always interference between processes, because of the sharing of system resources. To address this problem, cache partitioning is used, which can be roughly divided into two types: software and hardware cache partitioning. When it comes to dynamic cache partitioning, hardware cache partitioning is superior to software cache partitioning, because it needs no page copy. In this paper, we compare the effectiveness of hardware and software cache partitioning on the AMD Opteron 6282 SE, which is the only commodity processor providing hardware cache partitioning, to see whether this technique can be effectively deployed in dynamic environments.

A Study on the Design of FFT Processor for UWB Ultrafast Wireless Communication Systems (UWB 초고속 무선통신 시스템을 위한 FFT 프로세서 설계에 관한 연구)

  • Lee, Sang-Il;Chun, Young-Il
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.12
    • /
    • pp.2140-2145
    • /
    • 2008
  • We design and synthesize a 128-point FFT processor for multi-band OFDM, which can be applied to a UWB transceiver. The structure of a 128-point FFT processor is based on a Radix-2 FFT algorithm and a R2SDF pipeline architecture. The algorithm is efficiently modeled in VHDL and the result is simulated using Modelsim. Finally, they are synthesized on Xilinx Vertex-II FPGA, and an operational frequency of 18.7MHz has been obtained. It is expected that the proposed 128-point FFT processor can be applied to an entire FFT block as one of parallel processed FFTs. In order to obtain the enhanced maximum frequency of operation, we design the FFT module consisting of four 128-point FFT processors for parallel process. As a result, we achieve the performance requirement of computing the FFT module in multi-band OFDM symbol timing in 90nm ASIC process.

Development of Thermal Performance Prediction for Large Planar Military Antenna with Multi-Cooling Channels (다중 냉각유로가 적용된 수랭식 군사용 대면적 안테나의 열성능 예측 기술)

  • YeRyun Lee;SungWook Jang;PilGyeong Choi;NohJin Kwak;JunJung Park
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.1
    • /
    • pp.43-50
    • /
    • 2024
  • Large planar military antenna boasts a range of electrical components, including TRA(Transmit-Receive Assembly), signal processors, etc. which engage in computations and calculations. These processes generate a significant amount of heat, leading to unforeseen consequences for the equipment. To mitigate these adverse effects, it's imperative to implement a cooling system that can effectively reduce heat-related issues. Given the antenna's intricate nature and the multitude of components it houses, a two-step estimation process is necessary. The first step involves a comprehensive model calculation to determine the total flow characteristics, while the second step entails a thermal analysis of individual TRA set. In this study, we depicted an antenna set using simplified 3D models of its components, considering their material and thermal properties. The sequential analysis process facilitated the calculation of branched flow rates, providing insights into the individual TRA. This approach also allowed us to design a cooling system for the TRA set, assessing its thermal stability in high-temperature environments. To ensure the optimal performance of TRA, breaking down the analysis into stages based on the cooling system's structure can assist operators in predicting numerical results more effectively.

Multi-Core Processor for Real-Time Sound Synthesis of Gayageum (가야금의 실시간 음 합성을 위한 멀티코어 프로세서 구현)

  • Choi, Ji-Won;Cho, Sang-Jin;Kim, Cheol-Hong;Kim, Jong-Myon;Chong, Ui-Pil
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.1-10
    • /
    • 2011
  • Physical modeling has been widely used for sound synthesis since it synthesizes high quality sound which is similar to real-sound for musical instruments. However, physical modeling requires a lot of parameters to synthesize a large number of sounds simultaneously for the musical instrument, preventing its real-time processing. To solve this problem, this paper proposes a single instruction, multiple data (SIMD) based multi-core processor that supports real-time processing of sound synthesis of gayageum which is a representative Korean traditional musical instrument. The proposed SIMD-base multi-core processor consists of 12 processing elements (PE) to control 12 strings of gayageum in which each PE supports modeling of the corresponding string. The proposed SIMD-based multi-core processor can generate synthesized sounds of 12 strings simultaneously after receiving excitation signals and parameters of each string as an input. Experimental results using a sampling reate 44.1 kHz and 16 bits quantization show that synthesis sound using the proposed multi-core processor was very similar to the original sound. In addition, the proposed multi-core processor outperforms commercial processors(TI's TMS320C6416, ARM926EJ-S, ARM1020E) in terms of execution time ($5.6{\sim}11.4{\times}$ better) and energy efficiency (about $553{\sim}1,424{\times}$ better).

Accelerating Symmetric and Asymmetric Cryptographic Algorithms with Register File Extension for Multi-words or Long-word Operation (다수 혹은 긴 워드 연산을 위한 레지스터 파일 확장을 통한 대칭 및 비대칭 암호화 알고리즘의 가속화)

  • Lee Sang-Hoon;Choi Lynn
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.2 s.308
    • /
    • pp.1-11
    • /
    • 2006
  • In this paper, we propose a new register file architecture called the Register File Extension for Multi-words or Long-word Operation (RFEMLO) to accelerate both symmetric and asymmetric cryptographic algorithms. Based on the idea that most of cryptographic algorithms heavily use multi-words or long-word operations, RFEMLO allows multiple contiguous registers to be specified as a single operand. Thus, a single instruction can specify a SIMD-style multi-word operation or a long-word operation. RFEMLO can be applied to general purpose processors by adding instruction set for multi-words or long-word operands and functional units for additional instruction set. To evaluate the performance of RFEMLO, we use Simplescalar/ARM 3.0 (with gcc 2.95.2) and run detailed simulations on various symmetric and asymmetric cryptographic algorithms. By applying RFEMLO, we could get maximum 62% and 70% reductions in the total instruction count of symmetric and asymmetric cryptographic algorithms respectively. Also, performance results show that a speedup of 1.4 to 2.6 can be obtained in symmetric cryptographic algorithms and a speedup of 2.5 to 3.3 can be obtained for asymmetric cryptographic algorithms when we apply RFEMLO to a processor with an in-order pipeline. We also found that RFEMLO can effectively improve the performance of these cryptographic algorithms with much less cost compared to issue-width increase available in Superscalar implementations. Moreover, the RFEMLO can also be applied to Superscalar processor, leading to additional 83% and 138% performance gain in symmetric and asymmetric cryptographic algorithms.

AN ASSESSMENT OF PARALLEL PRECONDITIONERS FOR THE INTERIOR SPARSE GENERALIZED EIGENVALUE PROBLEMS BY CG-TYPE METHODS ON AN IBM REGATTA MACHINE

  • Ma, Sang-Back;Jang, Ho-Jong
    • Journal of applied mathematics & informatics
    • /
    • v.25 no.1_2
    • /
    • pp.435-443
    • /
    • 2007
  • Computing the interior spectrum of large sparse generalized eigenvalue problems $Ax\;=\;{\lambda}Bx$, where A and b are large sparse and SPD(Symmetric Positive Definite), is often required in areas such as structural mechanics and quantum chemistry, to name a few. Recently, CG-type methods have been found useful and hence, very amenable to parallel computation for very large problems. Also, as in the case of linear systems proper choice of preconditioning is known to accelerate the rate of convergence. After the smallest eigenpair is found we use the orthogonal deflation technique to find the next m-1 eigenvalues, which is also suitable for parallelization. This offers advantages over Jacobi-Davidson methods with partial shifts, which requires re-computation of preconditioner matrx with new shifts. We consider as preconditioners Incomplete LU(ILU)(0) in two variants, ever-relaxation(SOR), and Point-symmetric SOR(SSOR). We set m to be 5. We conducted our experiments on matrices from discretizations of partial differential equations by finite difference method. The generated matrices has dimensions up to 4 million and total number of processors are 32. MPI(Message Passing Interface) library was used for interprocessor communications. Our results show that in general the Multi-Color ILU(0) gives the best performance.

A Performance Study of Embedded Multicore Processor Architectures (임베디드 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.1
    • /
    • pp.163-169
    • /
    • 2013
  • Recently, the importance of embedded system is growing rapidly. In-order to satisfy the real-time constraints of the system, high performance embedded processor is required. Therefore, as in general purpose computer systems, embedded processor should be designed as multicore architecture as well. Using MiBench benchmarks as input, the trace-driven simulation has been performed and analyzed for the 2-core to 16-core embedded processor architectures with different types of cores from simple RISC to in-order and out-of-order superscalar processors, extensively. As a result, the achievable performance is as high as 23 times over the single core embedded RISC processor.

L-shaped Submesh Allocation Scheme for Mesh-Connected Multicomputers (메쉬 멀티컴퓨터에서 L-모양 서브메쉬 할당기법)

  • 서경희;김성천
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.1
    • /
    • pp.1-11
    • /
    • 2003
  • Fragmentation is the main performance bottleneck of large, multi-user multicomputer system. This paper presents an L-Shaped Submesh Allocation(LSSA) strategy, which lifts the restriction on the rectangular shape formed by allocated processors in order to address the problem of fragmentation. LSSA can manipulate the shape of the required submesh to fit into the fragmented mesh system. Thus, LSSA accommodates incoming jobs faster than other strategies and results in the reduction of job response time. Extensive simulations show that LSSA performs more efficiently than other strategies in terms of the external fragmentation, the job response time and the system utilization.

Analysis on the Performance and Temperature of 3D Multi-core Processors according to TLB Architecture (TLB 구조에 따른 3차원 멀티코어 프로세서의 성능, 온도 분석)

  • Son, Dong-Oh;Choi, Hong-Jun;Kim, Cheol-Hong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06b
    • /
    • pp.5-8
    • /
    • 2011
  • 3차원 멀티코어 프로세서는 기존의 멀티코어 프로세서에서 문제가 되던 연결망 지연시간과 전력문제를 해결할 수 있는 새로운 프로세서 설계기술이다. 하지만, 전력밀도의 증가로 인해 발생하는 열섬현상은 3차원 멀티코어 프로세서의 새로운 문제점으로 두드러지고 있다. 이러한 문제를 해결하기 위해서 동적 온도 관리 기법이 사용되지만, 동적 온도 관리 기법을 적용하면 시스템에 성능 저하가 발생하게 된다. 따라서 본 논문에서는 3차원 멀티코어 프로세서에서 문제가 되는 열섬현상을 해결하기 위해 고온의 유닛을 대상으로 동적 온도 관리 기법을 적용하고자 한다. 실험대상으로는 시스템 성능에 많은 영향을 미치고 높은 접근 때문에 고온이 발생하는 TLB 유닛을 사용하고자 한다. 특히, 시스템의 성능 저하를 줄이기 위해서 기존의 시스템보다 낮은 성능을 보이는 마이크로 TLB 구조를 적용해 보고자 한다. 성능이 낮은 구조의 경우 일반적으로 더 낮은 온도 분포를 보이며 동적 온도 관리 기법에 영향을 덜 받기 때문에 동적 온도 관리 기법만 적용한 구조보다 더 낮은 성능 저하를 보일 수 있다. 실험결과 동적 온도 관리 기법을 적용한 경우 기존의 시스템에 비해 23.4%의 성능 저하가 발생하고 마이크로 TLB 구조를 적용한 경우 27.1%의 성능 저하가 발생함을 알 수 있다.