• Title/Summary/Keyword: CPU Throughput

Search Result 73, Processing Time 0.027 seconds

Throughput Improvement and Power-Interruption Consideration of Fly-By-Wire Flight Control Computer (비행제어 컴퓨터의 Throughput 향상 및 Power-Interuption 대처 설계)

  • Lee, Cheol;Seo, Joon-Ho;Ham, Heung-Bin;Cho, In-Je;Woon, Hyung-Sik
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.35 no.10
    • /
    • pp.940-947
    • /
    • 2007
  • For the performance upgrade of a supersonic jet fighter, the processor and FLCC(Flight Control Computer) Architecture were upgraded from a baseline FLCC. Prior to the hardware implementation phase, the exact CPU throughput estimation is necessary. For this purpose, an experimental method for new FLCC throughput estimation was introduced in this study. While baseline FLCC operating, the CPU address bus was collected with logic analyzer, and then decoded to get the exact access times to each memory-memory and the number of program Instruction branches. Based on these data, a throughput test in CPU demo-board of the new FLCC configuration was performed. From test results, the CPU-Memory architecture was design-changed before FLCC hardware implementation phase. To check the flight stability degradation due to power-interrupt problem due to CPU-Memory architecture change, the piloted HILS (Hardware-In-the Loop Simulator) test was conducted.

Performance Evaluation of TCP/IP on ATM LAM Testbed (ATM LAN 시험망에서 TCP/IP 프로토콜의 성능분석)

  • Jang, Woo-Hyun;Lee, Se-Yul;Hwang, Sun-Myung;Lee, Bong-Hwan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.12
    • /
    • pp.3634-3641
    • /
    • 1999
  • LAN Emulation and IPOA are the two most widei y accepted network protocols which allow to provide conventional LAN-based data services on ATM LAN environment. In this paper, the performance of IPOA and LAN Emulation on ATM LAN testbed is compared and the results are compared with performance of Ethernet as well. For performance comparison, metrics such as application throughput, latency, CPU usage are used. In addition, a network program that uses the socket based TCP/IP application programming interface to send large data files from client to server via ATM LAN switch is developed. IPOA provides lower latency and higher throughput than LAN Emulation while LAN Emulation consumes more CPU usage.

  • PDF

HW Matrix Multiplier Implementation & Performance Measurement for Low Earth Orbit Satellite (저궤도 위성을 위한 HW 행렬 곱셈기의 구현과 성능 측정)

  • Lee, Yunki;Kim, Jihoon
    • Journal of Satellite, Information and Communications
    • /
    • v.10 no.2
    • /
    • pp.115-120
    • /
    • 2015
  • Until now, AOCS SW has used FPU which is one of CPU resources for satellite attitude control. And most of the SW Throughput was consumed to calculate Matrix Multiply. As SW throughput margin is decreasing seriously with shorter control period and more computational burden at next satellite programs, a dedicated HW matrix multiplier is absolutely required. This paper represents results of HW implementation & performance measurement and mentions several techniques for performance improvement, further works.

Implementation of Memory Copy Reduction Scheme for Networked Multimedia Service in Linux (리눅스 커널에서 네트워크 멀티미디어 서비스를 위한 메모리 복사 감소 기법 구현)

  • Kim, Jeong-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.2B
    • /
    • pp.129-137
    • /
    • 2003
  • Multimedia streams, like MPEG continuously retrieve multimedia data because of their incessant playback. While these streams need an efficient support of kernel, the current buffer cache mechanism of Linux kernel such as Unix operating system was designed apt for small files, which is aperiodically requested as well as time uncritical. But, in case of continuous media, the CPU must enormously copy memory from kernel address space to user address space. This must lead to a large CPU overhead. This overhead both degrades system throughput and cannot guarantee QOS. In this paper, we have designed and implemented two memory copy reduction schemes in Linux kernel, direct I/O and one copy. The direct I/O skips the buffer cache layer of Linux kernel and results in dramatic reduction of CPU memory copy overhead. And, the one copy provides a fast disk-to-network data path without copying to user address space. The experimental results show considerable reduction of CPU overhead and throughput improvements.

Energy Efficient and Low-Cost Server Architecture for Hadoop Storage Appliance

  • Choi, Do Young;Oh, Jung Hwan;Kim, Ji Kwang;Lee, Seung Eun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4648-4663
    • /
    • 2020
  • This paper proposes the Lempel-Ziv 4(LZ4) compression accelerator optimized for scale-out servers in data centers. In order to reduce CPU loads caused by compression, we propose an accelerator solution and implement the accelerator on an Field Programmable Gate Array(FPGA) as heterogeneous computing. The LZ4 compression hardware accelerator is a fully pipelined architecture and applies 16 dictionaries to enhance the parallelism for high throughput compressor. Our hardware accelerator is based on the 20-stage pipeline and dictionary architecture, highly customized to LZ4 compression algorithm and parallel hardware implementation. Proposing dictionary architecture allows achieving high throughput by comparing input sequences in multiple dictionaries simultaneously compared to a single dictionary. The experimental results provide the high throughput with intensively optimized in the FPGA. Additionally, we compare our implementation to CPU implementation results of LZ4 to provide insights on FPGA-based data centers. The proposed accelerator achieves the compression throughput of 639MB/s with fine parallelism to be deployed into scale-out servers. This approach enables the low power Intel Atom processor to realize the Hadoop storage along with the compression accelerator.

High Throughput Parallel KMP Algorithm Considering CPU-GPU Memory Hierarchy (CPU-GPU 메모리 계층을 고려한 고처리율 병렬 KMP 알고리즘)

  • Park, Soeun;Kim, Daehee;Lee, Myungho;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.5
    • /
    • pp.656-662
    • /
    • 2018
  • Pattern matching algorithm is widely used in many application fields such as bio-informatics, intrusion detection, etc. Among many string matching algorithms, KMP (Knuth-Morris-Pratt) algorithm is commonly used because of its fast execution time when using large texts. However, the processing speed of KMP algorithm is also limited when the text size increases significantly. In this paper, we propose a high throughput parallel KMP algorithm considering CPU-GPU memory hierarchy based on OpenCL in GPGPU (General Purpose computing on Graphic Processing Unit). We focus on the optimization for the allocation of work-times and work-groups, the local memory copy of the pattern data and the failure table, and the overlapping of the data transfer with the string matching operations. The experimental results show that the execution time of the optimized parallel KMP algorithm is about 3.6 times faster than that of the non-optimized parallel KMP algorithm.

THE PERFORMANCE OF A MEMORY RESTRICTED COMPUTER WITH A STATE-DEPENDENT JOB ADMISSION POLICY

  • Lim, Jong-Seul
    • Journal of applied mathematics & informatics
    • /
    • v.2 no.2
    • /
    • pp.21-46
    • /
    • 1995
  • Congestion and memory occupancy in computer system may be reduced further if new jobs are admitted only when the num-ber of jobs queued at CPU is below CPU run queue cutoff (RQ). In this paper we prove that response time of a job is invariant with respect to RQ if jobs do not communicate each other. We also demonstrate this invariance property numerically using marix-geometric methods and present an approximate method for the delay due to context switch-ing under time slicing. The approximation suggests that time slicing with constant overhead yields a throughput similar to an FCFS system without overhead.

A Function-characteristic Aware Thread-mapping Strategy for an SEDA-based Message Processor in Multi-core Environments (멀티코어 환경에서 SEDA 기반 메시지 처리기의 수행함수 특성을 고려한 쓰레드 매핑 기법)

  • Kang, Heeeun;Park, Sungyong;Lee, Younjeong;Jee, Seungbae
    • Journal of KIISE
    • /
    • v.44 no.1
    • /
    • pp.13-20
    • /
    • 2017
  • A message processor is server software that receives various message formats from clients, creates the corresponding threads to process them, and lastly delivers the results to the destination. Considering that each function of an SEDA-based message processor has its own characteristics such as CPU-bound or IO-bound, this paper proposes a thread-mapping strategy called "FC-TM" (function-characteristic aware thread mapping) that schedules the threads to the cores based on the function characteristics in multi-core environments. This paper assumes that message-processor functions are static in the sense that they are pre-defined when the message processor is built; therefore, we profile each function in advance and map each thread to a core using the information in order to maximize the throughput. The benchmarking results show that the throughput increased by up to a maximum of 72 % compared with the previous studies when the ratio of the IO-bound functions to the CPU-bound functions exceeds a certain percentage.

Analysis of the CPU/GPU Temperature and Energy Efficiency depending on Executed Applications (응용프로그램 실행에 따른 CPU/GPU의 온도 및 컴퓨터 시스템의 에너지 효율성 분석)

  • Choi, Hong-Jun;Kang, Seung-Gu;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.5
    • /
    • pp.9-19
    • /
    • 2012
  • As the clock frequency increases, CPU performance improves continuously. However, power and thermal problems in the CPU become more serious as the clock frequency increases. For this reason, utilizing the GPU to reduce the workload of the CPU becomes one of the most popular methods in recent high-performance computer systems. The GPU is a specialized processor originally designed for graphics processing. Recently, the technologies such as CUDA which utilize the GPU resources more easily become popular, leading to the improved performance of the computer system by utilizing the CPU and GPU simultaneously in executing various kinds of applications. In this work, we analyze the temperature and the energy efficiency of the computer system where the CPU and the GPU are utilized simultaneously, to figure out the possible problems in upcoming high-performance computer systems. According to our experimentation results, the temperature of both CPU and GPU increase when the application is executed on the GPU. When the application is executed on the CPU, CPU temperature increases whereas GPU temperature remains unchanged. The computer system shows better energy efficiency by utilizing the GPU compared to the CPU, because the throughput of the GPU is much higher than that of the CPU. However, the temperature of the system tends to be increased more easily when the application is executed on the GPU, because the GPU consumes more power than the CPU.

The Performance Study of a Virtualized Multicore Web System

  • Lu, Chien-Te;Yeh, C.S. Eugene;Wang, Yung-Chung;Yang, Chu-Sing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5419-5436
    • /
    • 2016
  • Enhancing the performance of computing systems has been an important topic since the invention of computers. The leading-edge technologies of multicore and virtualization dramatically influence the development of current IT systems. We study performance attributes of response time (RT), throughput, efficiency, and scalability of a virtualized Web system running on a multicore server. We build virtual machines (VMs) for a Web application, and use distributed stress tests to measure RTs and throughputs under varied combinations of virtual cores (VCs) and VM instances. Their gains, efficiencies and scalabilities are also computed and compared. Our experimental and analytic results indicate: 1) A system can perform and scale much better by adopting multiple single-VC VMs than by single multiple-VC VM. 2) The system capacity gain is proportional to the number of VM instances run, but not proportional to the number of VCs allocated in a VM. 3) A system with more VMs or VCs has higher physical CPU utilization, but lower vCPU utilization. 4) The maximum throughput gain is less than VM or VC gain. 5) Per-core computing efficiency does not correlate to the quality of VCs or VMs employed. The outcomes can provide valuable guidelines for selecting instance types provided by public Cloud providers and load balancing planning for Web systems.