• Title/Summary/Keyword: multi-core scalable

Search Result 12, Processing Time 0.023 seconds

Multi-core Scalable Fair I/O Scheduling for Multi-queue SSDs (멀티큐 SSD를 위해 멀티코어 확장성을 제공하는 공정한 입출력 스케줄링)

  • Cho, Minjung;Kang, Hyeongseok;Kim, Kanghee
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.469-475
    • /
    • 2017
  • The emerging NVMe-based multi-queue SSDs provides a high bandwidth by parallel I/O, i.e., each core performs I/O through its dedicated queue in parallel with other cores. To provide a bandwidth share for each application with I/O, a fair-share scheduler that provides a bandwidth share to each core is required. In this study, we proposed a multi-core scalable fair-queuing algorithm for multi-queue SSDs. The algorithm adopts randomization to minimize the inter-core synchronization overheads and provides a weight-proportional bandwidth share to each core. The results of our experiments indicated that the proposed algorithm gives accurate bandwidth partitioning and outperforms the existing FlashFQ scheduler, regardless of the number of cores for a Linux kernel with block-mq.

Multi-core Scalable Real-time Flash Storage Simulation (멀티 코어 확장성을 제공하는 실시간 플래시 저장장치 시뮬레이션)

  • Lee, Hyeon-gyu;Min, Sang Lyul;Kim, Kanghee
    • Journal of KIISE
    • /
    • v.44 no.6
    • /
    • pp.566-572
    • /
    • 2017
  • As NAND flash storage is being widely used, its simulation methodologies have been studied in various aspects such as performance, reliability, and endurance. As a result, there have been advances in NAND flash storage simulation for both functional modeling and timing modeling. However, in addition to these advances, there is a need to drastically reduce the long simulation time that is required to evaluate the aging effect on flash storage. This paper proposes a so-called multi-core scalable real-time flash storage simulation method, which can control the simulation speed according to the user's preference. According to this method, it is possible to speed up the simulation in proportion to the number of CPU cores arbitrarily given while guaranteeing the correctness of the simulation result. Using our simulator implemented in the form of the Linux kernel module, we demonstrate the multi-core scalability and correctness of the proposed method.

Scalable Graphics Algorithms (스케일러블 그래픽스 알고리즘)

  • Yoon, Sung-Eui
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02c
    • /
    • pp.224-224
    • /
    • 2008
  • Recent advances in model acquisition, computer-aided design, and simulation technologies have resulted in massive databases of complex geometric data occupying multiple gigabytes and even terabytes. In various graphics/geometric applications, the major performance bottleneck is typically in accessing these massive geometric data due to the high complexity of such massive geometric data sets. However, there has been a consistent lower growth rate of data access speed compared to that of computational processing speed. Moreover, recent multi-core architectures aggravate this phenomenon. Therefore, it is expected that the current architecture improvement does not offer the solution to the problem of dealing with ever growing massive geometric data, especially in the case of using commodity hardware. In this tutorial, I will focus on two orthogonal approaches--multi-resolution and cache-coherent layout techniques--to design scalable graphics/geometric algorithms. First, I will discuss multi-resolution techniques that reduce the amount of data necessary for performing geometric methods within an error bound. Second, I will explain cache-coherent layouts that improve the cache utilization of runtime geometric applications. I have applied these two techniques into rendering, collision detection, and iso-surface extractions and, thereby, have been able to achieve significant performance improvement. I will show live demonstrations of view-dependent rendering and collision detection between massive models consisting of tens of millions of triangles on a laptop during the talk.

  • PDF

Inductorless 8.9 mW 25 Gb/s 1:4 DEMUX and 4 mW 13 Gb/s 4:1 MUX in 90 nm CMOS

  • Sekiguchi, Takayuki;Amakawa, Shuhei;Ishihara, Noboru;Masu, Kazuya
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.10 no.3
    • /
    • pp.176- 184
    • /
    • 2010
  • A low-power inductorless 1:4 DEMUX and a 4:1 MUX for a 90 nm CMOS are presented. The DEMUX can be operated at a speed of 25 Gb/s with the power supply voltage of 1.05 V, and the power consumption is 8.9 mW. The area of the DEMUX core is $29\;{\times}\;40\;{\mu}m^2$. The operation speed of the 4:1 MUX is 13 Gb/s at a power supply voltage of 1.2 V, and the power consumption is 4 mW. The area of the MUX core is $30\;{\times}\;18\;{\mu}m^2$. The MUX/DEMUX mainly consists of differential pseudo-NMOS. In these MUX/DEMUX circuits, logic swing is nearly rail-to-rail, and a low $V_{dd}$. The component circuit is more scalable than a CML circuit, which is commonly used in a high-performance MUX/DEMUX. These MUX/DEMUX circuits are compatible with conventional CMOS logic circuit, and it can be directly connected to CMOS logic gates without logic level conversion. Furthermore, the circuits are useful for core-to-core interconnection in the system LSI or chip-to-chip communication within a multi-chip module, because of its low power, small footprint, and reasonable operation speed.

Development of Thermal Image System Based Multi-Core Image Processor (멀티코어 이미지 프로세서 기반 열화상 이미지 시스템 개발)

  • Cha, Jeong Woo;Han, Joon Hwan;Park, Chan;Kim, Young Jin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.2
    • /
    • pp.25-30
    • /
    • 2020
  • The thermal image system was widely used in the defence-related industry because of detect infrared light from the object without light. but, as the demand in the security system and automobile market increases, the civilian industry are expanding to the private sector. There are difficult to apply various requirement because of previous systems are based by FPGA, so it need new system that apply to various requirement. The proposed paper is thermal image processing system using common image processor. It has various requirement and scalable to support image input/output interface and device driver. If it is used to proposed system, it reduce development cost and period than previous system based FPGA. Because there has very high accessibility. In addition, it expect to have satisfaction of customer requirements, development cost, development period, release date of product.

Fine-scalable SPIHT Hardware Design for Frame Memory Compression in Video Codec

  • Kim, Sunwoong;Jang, Ji Hun;Lee, Hyuk-Jae;Rhee, Chae Eun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.3
    • /
    • pp.446-457
    • /
    • 2017
  • In order to reduce the size of frame memory or bus bandwidth, frame memory compression (FMC) recompresses reconstructed or reference frames of video codecs. This paper proposes a novel FMC design based on discrete wavelet transform (DWT) - set partitioning in hierarchical trees (SPIHT), which supports fine-scalable throughput and is area-efficient. In the proposed design, multi-cores with small block sizes are used in parallel instead of a single core with a large block size. In addition, an appropriate pipelining schedule is proposed. Compared to the previous design, the proposed design achieves the processing speed which is closer to the target system speed, and therefore it is more efficient in hardware utilization. In addition, a scheme in which two passes of SPIHT are merged into one pass called merged refinement pass (MRP) is proposed. As the number of shifters decreases and the bit-width of remained shifters is reduced, the size of SPIHT hardware significantly decreases. The proposed FMC encoder and decoder designs achieve the throughputs of 4,448 and 4,000 Mpixels/s, respectively, and their gate counts are 76.5K and 107.8K. When the proposed design is applied to high efficiency video codec (HEVC), it achieves 1.96% lower average BDBR and 0.05 dB higher average BDPSNR than the previous FMC design.

Computer Generated Hologram for Beam Control of LCOS based Wavelength Selective Switch (LCOS기반의 파장선택스위치 빔제어용 컴퓨터 생성 홀로그램)

  • Lee, Yong-Min;Han, Chang Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.6
    • /
    • pp.744-749
    • /
    • 2016
  • This paper presents the design of a computer-generated hologram for beam control of an LCOS-based wavelength selective switch, which is the core technology for next-generation ROADM. By introducing a computer-generated hologram instead of general grating patterns to control the LCOS device, we contribute to building a more efficient wavelength selective switch. With the use of phase modulation properties of LCOS devices, we designed the hologram for five-port output and a 40-channel wavelength selective switch. We applied a multi-level phase modulation technique with the Gerchberg-Saxton algorithm to produce the hologram, which is easily scalable to any different type of wavelength selective switch. With an experimental setup, we verified the usability of the hologram designed for five-port output. We also suggest a hologram design technique for beam control of a 40-channel wavelength selective switch.

The Study on the Design and Optimization of Storage for the Recording of High Speed Astronomical Data (초고속 관측 데이터 수신 및 저장을 위한 기록 시스템 설계 및 성능 최적화 연구)

  • Song, Min-Gyu;Kang, Yong-Woo;Kim, Hyo-Ryoung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.1
    • /
    • pp.75-84
    • /
    • 2017
  • It becomes more and more more important for the storage that supports high speed recording and stable access from network environment. As one field of basic science which produces massive astronomical data, VLBI(: Very Long Baseline Interferometer) is now demanding more data writing performance and which is directly related to astronomical observation with high resolution and sensitivity. But most of existing storage are cloud model based for the high throughput of general IT, finance, and administrative service, and therefore it not the best choice for recording of big stream data. Therefore, in this study, we design storage system optimized for high performance of I/O and concurrency. To solve this problem, we implement packet read and writing module through the use of libpcap and pf_ring API on the multi core CPU environment, and build a scalable storage based on software RAID(: Redundant Array of Inexpensive Disks) for the efficient process of incoming data from external network.

Design of Border Surveillance and Control System Based on Wireless Sensor Network (WSN 기반 국경 감시 및 제어 시스템 설계)

  • Hwang, Bo Ram;An, Sun Shin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.1
    • /
    • pp.11-14
    • /
    • 2015
  • WSN (Wireless Sensor Network) based on low-power is one of the core technologies in the ubiquitous society. In this paper, we present a border surveillance and control system in WSN environment. The system consists of static sensor node, mobile sensor node, static gateway, mobile gateway, server and mobile application. Mobile applications are divided into user mode and manager mode. So users monitor border surveillance through mobile phone and get information of border network environment without time and space constraints. In manager mode, for the flexible operation of nodes, manager can update to the software remotely and adjust the position of the mobile node. And also we implement a suitable multi-hop routing protocol for scalable low-power sensor nodes and confirm that the system operates well in WSN environment.

TOUSE: A Fair User Selection Mechanism Based on Dynamic Time Warping for MU-MIMO Networks

  • Tang, Zhaoshu;Qin, Zhenquan;Zhu, Ming;Fang, Jian;Wang, Lei;Ma, Honglian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4398-4417
    • /
    • 2017
  • Multi-user Multiple-Input and Multiple-Output (MU-MIMO) has potential for prominently enhancing the capacity of wireless network by simultaneously transmitting to multiple users. User selection is an unavoidable problem which bottlenecks the gain of MU-MIMO to a great extent. Major state-of-the-art works are focusing on improving network throughput by using Channel State Information (CSI), however, the overhead of CSI feedback becomes unacceptable when the number of users is large. Some work does well in balancing tradeoff between complexity and achievable throughput but is lack of consideration of fairness. Current works universally ignore the rational utilizing of time resources, which may lead the improvements of network throughput to a standstill. In this paper, we propose TOUSE, a scalable and fair user selection scheme for MU-MIMO. The core design is dynamic-time-warping-based user selection mechanism for downlink MU-MIMO, which could make full use of concurrent transmitting time. TOUSE also presents a novel data-rate estimation method without any CSI feedback, providing supports for user selections. Simulation result shows that TOUSE significantly outperforms traditional contention-based user selection schemes in both throughput and fairness in an indoor condition.