• Title/Summary/Keyword: CPU clock speed

Search Result 7, Processing Time 0.047 seconds

Performance Scalability of SPEC CPU2000 Benchmark over CPU Clock Speed (CPU 주파수 속도에 대한 SPEC CPU2000 성능 변화)

  • Yi, Jong-Su;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.5
    • /
    • pp.1-8
    • /
    • 2005
  • SPEC CPU2000 is an widely used benchmark program, both in industry and in academy, for measuring compute-intensive performance of computer systems with various architectures. However, there has been little effort to investigate its characteristics with respect to hardware components. This paper presents the performance scalability of SPEC CPU2000 benchmark over CPU clock speed. For an Intel x86-based system running at various clock speed, we measure the performance of SPEC CPU2000 benchmark, and analyze the characteristic of SPEC CPU2000 in a system aspect. In the experiment, we found that the overall performance of SPEC CPU2000 increases monotonically and linearly as the CPU clock speed increases and that the scale efficiencies of SPEC CPU2000 component benchmarks are quite evenly distributed.

Performance scalability of SPEC CPU2000 benchmark over CPU clock speed (CPU 주파수 속도의 증가에 따른 SPEC CPU2000 벤치마크의 성능 변화)

  • 이정수;김준성
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1351-1354
    • /
    • 2003
  • 본 논문에서는 시스템 성능 향상에 상대적으로 많은 영향을 미치는 CPU 주파수 속도의 변화에 따른 SPEC CPU2000 벤치마크 프로그램의 성능의 변화를 고찰한다. x86 기반의 단일 프로세서 시스템에서 서로 다른 주파수의 CPU를 사용하여 SPEC CPU2000 벤치마크 프로그램의 성능을 측정함으로써 SPEC CPU2000벤치마크 프로그램의 특성을 시스템 적 측면에서 해석하였다. 실험을 통하여 SPEC CPU2000 벤치마크 프로그램의 성능은 CPU 주파수 속도의 변화를 유연하게 반영할 수 있도록 그 의존도가 고르게 분포되어 있음을 알 수 있다.

  • PDF

Low-Complexity Deeply Embedded CPU and SoC Implementation (낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현)

  • Park, Chester Sungchung;Park, Sungkyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.3
    • /
    • pp.699-707
    • /
    • 2016
  • This paper proposes a low-complexity central processing unit (CPU) that is suitable for deeply embedded systems, including Internet of things (IoT) applications. The core features a 16-bit instruction set architecture (ISA) that leads to high code density, as well as a multicycle architecture with a counter-based control unit and adder sharing that lead to a small hardware area. A co-processor, instruction cache, AMBA bus, internal SRAM, external memory, on-chip debugger (OCD), and peripheral I/Os are placed around the core to make a system-on-a-chip (SoC) platform. This platform is based on a modified Harvard architecture to facilitate memory access by reducing the number of access clock cycles. The SoC platform and CPU were simulated and verified at the C and the assembly levels, and FPGA prototyping with integrated logic analysis was carried out. The CPU was synthesized at the ASIC front-end gate netlist level using a $0.18{\mu}m$ digital CMOS technology with 1.8V supply, resulting in a gate count of merely 7700 at a 50MHz clock speed. The SoC platform was embedded in an FPGA on a miniature board and applied to deeply embedded IoT applications.

Microcode based Controller for Compact CNN Accelerators Aimed at Mobile Devices (모바일 디바이스를 위한 소형 CNN 가속기의 마이크로코드 기반 컨트롤러)

  • Na, Yong-Seok;Son, Hyun-Wook;Kim, Hyung-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.355-366
    • /
    • 2022
  • This paper proposes a microcode-based neural network accelerator controller for artificial intelligence accelerators that can be reconstructed using a programmable architecture and provide the advantages of low-power and ultra-small chip size. In order for the target accelerator to support various neural network models, the neural network model can be converted into microcode through microcode compiler and mounted on accelerator to control the operators of the accelerator such as datapath and memory access. While the proposed controller and accelerator can run various CNN models, in this paper, we tested them using the YOLOv2-Tiny CNN model. Using a system clock of 200 MHz, the Controller and accelerator achieved an inference time of 137.9 ms/image for VOC 2012 dataset to detect object, 99.5ms/image for mask detection dataset to detect wearing mask. When implementing an accelerator equipped with the proposed controller as a silicon chip, the gate count is 618,388, which corresponds to 65.5% reduction in chip area compared with an accelerator employing a CPU-based controller (RISC-V).

Optimization of the computing environment to improve the speed of the modeling (WRF and CMAQ) calculation of the National Air Quality Forecast System (국가 대기질 예보 시스템의 모델링(기상 및 대기질) 계산속도 향상을 위한 전산환경 최적화 방안)

  • Myoung, Jisu;Kim, Taehee;Lee, Yonghee;Suh, Insuk;Jang, Limsuk
    • Journal of Environmental Science International
    • /
    • v.27 no.8
    • /
    • pp.723-735
    • /
    • 2018
  • In this study, to investigate an optimal configuration method for the modeling system, we performed an optimization experiment by controlling the types of compilers and libraries, and the number of CPU cores because it was important to provide reliable model data very quickly for the national air quality forecast. We were made up the optimization experiment of twelve according to compilers (PGI and Intel), MPIs (mvapich-2.0, mvapich-2.2, and mpich-3.2) and NetCDF (NetCDF-3.6.3 and NetCDF-4.1.3) and performed wall clock time measurement for the WRF and CMAQ models based on the built computing resources. In the result of the experiment according to the compiler and library type, the performance of the WRF (30 min 30 s) and CMAQ (47 min 22 s) was best when the combination of Intel complier, mavapich-2.0, and NetCDF-3.6.3 was applied. Additionally, in a result of optimization by the number of CPU cores, the WRF model was best performed with 140 cores (five calculation servers), and the CMAQ model with 120 cores (five calculation servers). While the WRF model demonstrated obvious differences depending on the number of CPU cores rather than the types of compilers and libraries, CMAQ model demonstrated the biggest differences on the combination of compilers and libraries.

Performance Comparison between Haskell Eval Monad and Cloud Haskell (Haskell Eval 모나드와 Cloud Haskell 간의 성능 비교)

  • Kim, Yeoneo;An, Hyungjun;Byun, Sugwoo;Woo, Gyun
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.791-802
    • /
    • 2017
  • Competition in the modern CPU market has shifted from speeding up the clock speed of a single core to increasing the number of cores. As such, there is a growing interest in parallel programming to maximize the use of resources of many core processors. In this paper, we propose parallel programming models in Haskell to find an advisable parallel programming model for many-core environments. Specifically, we used Eval monad and Cloud Haskell to develop two versions of parallel programs: plagiarism detection and K-means. Then, we evaluated the performance of the developed programs in 32-core and 120-core environments. The results of our experiment show that the Eval monad is highly efficient in an environment with a small number of cores. On the other hand, the Cloud Haskell runtime shows 37% improvement over Eval monad and the scalability shows a 134% improvement over Eval monad as the number of cores increases.

Analysis of Distributed Computational Loads in Large-scale AC/DC Power System using Real-Time EMT Simulation (대규모 AC/DC 전력 시스템 실시간 EMP 시뮬레이션의 부하 분산 연구)

  • In Kwon, Park;Yi, Zhong Hu;Yi, Zhang;Hyun Keun, Ku;Yong Han, Kwon
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.8 no.2
    • /
    • pp.159-179
    • /
    • 2022
  • Often a network becomes complex, and multiple entities would get in charge of managing part of the whole network. An example is a utility grid. While the entire grid would go under a single utility company's responsibility, the network is often split into multiple subsections. Subsequently, each subsection would be given as the responsibility area to the corresponding sub-organization in the utility company. The issue of how to make subsystems of adequate size and minimum number of interconnections between subsystems becomes more critical, especially in real-time simulations. Because the computation capability limit of a single computation unit, regardless of whether it is a high-speed conventional CPU core or an FPGA computational engine, it comes with a maximum limit that can be completed within a given amount of execution time. The issue becomes worsened in real time simulation, in which the computation needs to be in precise synchronization with the real-world clock. When the subject of the computation allows for a longer execution time, i.e., a larger time step size, a larger portion of the network can be put on a computation unit. This translates into a larger margin of the difference between the worst and the best. In other words, even though the worst (or the largest) computational burden is orders of magnitude larger than the best (or the smallest) computational burden, all the necessary computation can still be completed within the given amount of time. However, the requirement of real-time makes the margin much smaller. In other words, the difference between the worst and the best should be as small as possible in order to ensure the even distribution of the computational load. Besides, data exchange/communication is essential in parallel computation, affecting the overall performance. However, the exchange of data takes time. Therefore, the corresponding consideration needs to be with the computational load distribution among multiple calculation units. If it turns out in a satisfactory way, such distribution will raise the possibility of completing the necessary computation in a given amount of time, which might come down in the level of microsecond order. This paper presents an effective way to split a given electrical network, according to multiple criteria, for the purpose of distributing the entire computational load into a set of even (or close to even) sized computational loads. Based on the proposed system splitting method, heavy computation burdens of large-scale electrical networks can be distributed to multiple calculation units, such as an RTDS real time simulator, achieving either more efficient usage of the calculation units, a reduction of the necessary size of the simulation time step, or both.