• Title/Summary/Keyword: Buffer(Memory)

Search Result 369, Processing Time 0.032 seconds

Reconfiguration of Apache Storm for InfiniBand Communications (InfiniBand RDMA 통신을 위한 Apache Storm의 재구성)

  • Yang, Seokwoo;Son, Siwoon;Moon, Yang-Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.8
    • /
    • pp.297-306
    • /
    • 2018
  • In this paper, we address how to apply Apache Storm, a distributed stream processing framework, to InfiniBand, a high performance communication device. An easy way to run Storm on InfiniBand is to simply use IPoIP (IP over InfiniBand). However, this method causes a serious CPU load on the node, which is caused by frequent context switches and buffer copies. To solve this problem, we propose a new communication method using InfiniBand's Remote Direct Memory Access (RDMA) function in Storm. First, we design and implement RJ-Netty (RDMA/JXIO Netty), a new framework that replaces Netty, the legacy framework, to exploit RDMA functionality. Second, we reimplement the related classes so that Storm can use both existing Netty and new RJ-Netty. Third, we extend the JXIO server functionality so as to support multi-threading to maximize the performance of RJ-Netty. Experimental results show that the proposed RJ-Netty significantly reduces CPU load while improving message throughput compared to IPoIB as well as Ethernet. This paper is the first attempt to run Apache Storm on InfiniBand, and we believe that it is an excellent research result that improves the performance of Storm by using InfiniBand RDMA.

Instructions and Data Prefetch Mechanism using Displacement History Buffer (변위 히스토리 버퍼를 이용한 명령어 및 데이터 프리페치 기법)

  • Jeong, Yong Su;Kim, JinHyuk;Cho, Tae Hwan;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.82-94
    • /
    • 2015
  • In this paper, we propose hardware prefetch mechanism with an efficient cache replacement policy by giving priority to the trigger block in which a spatial region and producing a spatial region by using the displacement field. It could be taken into account the sequence of the program since a history is based on the trigger block of history record, and it could be quickly prefetching the instructions or data address by adding a stored value to the trigger address and displacement field since a history is stored as a displacement value. Also, we proposed a method of replacing at random by the cache replacement policy from the low priority block when the cache area is full after giving priority to the trigger block. We analyzed using the memory simulator program gem5 and PARSEC benchmark to assess the performance of the hardware prefetcher. As a result, compared to the existing hardware prefecture to generate the spatial region using a bit vector, L1 data cache miss rate was reduced about 44.5% on average and an average of 26.1% of L1 instruction misses occur. In addition, IPC (Instruction Per Cycle) showed an improvement of about 23.7% on average.

A Dynamic Transaction Routing Algorithm with Primary Copy Authority (주사본 권한을 이용한 동적 트랜잭션 분배 알고리즘)

  • Kim, Ki-Hyung;Cho, Hang-Rae;Nam, Young-Hwan
    • The KIPS Transactions:PartD
    • /
    • v.10D no.7
    • /
    • pp.1067-1076
    • /
    • 2003
  • Database sharing system (DSS) refers to a system for high performance transaction processing. In DSS, the processing nodes are locally coupled via a high speed network and share a common database at the disk level. Each node has a local memory and a separate copy of operating system. To reduce the number of disk accesses, the node caches database pages in its local memory buffer. In this paper, we propose a dynamic transaction routing algorithm to balance the load of each node in the DSS. The proposed algorithm is novel in the sense that it can support node-specific locality of reference by utilizing the primary copy authority assigned to each node; hence, it can achieve better cache hit ratios and thus fewer disk I/Os. Furthermore, the proposed algorithm avoids a specific node being overloaded by considering the current workload of each node. To evaluate the performance of the proposed algorithm, we develop a simulation model of the DSS, and then analyze the simulation results. The results show that the proposed algorithm outperforms the existing algorithms in the transaction processing rate. Especially the proposed algorithm shows better performance when the number of concurrently executed transactions is high and the data page access patterns of the transactions are not equally distributed.

Hardware Design of High Performance In-loop Filter in HEVC Encoder for Ultra HD Video Processing in Real Time (UHD 영상의 실시간 처리를 위한 고성능 HEVC In-loop Filter 부호화기 하드웨어 설계)

  • Im, Jun-seong;Dennis, Gookyi;Ryoo, Kwang-ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.401-404
    • /
    • 2015
  • This paper proposes a high-performance in-loop filter in HEVC(High Efficiency Video Coding) encoder for Ultra HD video processing in real time. HEVC uses in-loop filter consisting of deblocking filter and SAO(Sample Adaptive Offset) to solve the problems of quantization error which causes image degradation. In the proposed in-loop filter encoder hardware architecture, the deblocking filter and SAO has a 2-level hybrid pipeline structure based on the $32{\times}32CTU$ to reduce the execution time. The deblocking filter is performed by 6-stage pipeline structure, and it supports minimization of memory access and simplification of reference memory structure using proposed efficient filtering order. Also The SAO is implemented by 2-statge pipeline for pixel classification and applying SAO parameters and it uses two three-layered parallel buffers to simplify pixel processing and reduce operation cycle. The proposed in-loop filter encoder architecture is designed by Verilog HDL, and implemented by 205K logic gates in TSMC 0.13um process. At 110MHz, the proposed in-loop filter encoder can support 4K Ultra HD video encoding at 30fps in realtime.

  • PDF

Performance Evaluation of Channel Estimation for WCDMA Forward Link with Space-Time Block Coding Transmit Diversity (시공간 블록 부호 송신 다이버시티를 적용한 WCDMA 하향 링크에서 채널 추정기의 성능 평가)

  • 강형욱;이영용;김용석;최형진
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.6A
    • /
    • pp.341-350
    • /
    • 2003
  • In this paper, we evaluate the performance of a moving average (MA) channel estimation filter when space-time block coding transmit diversity (STBC-TD) is applied to the wideband direct sequence code division multiple access (WCDMA) forward link. And we present the infinite impulse response (IIR) filter scheme that can reduce the required memory buffer and the channel estimation delay time. This paper also compares the performance between MA filter scheme and IIR filter scheme in various Rayleigh fading channel environments through the bit error rate (BER) and the frame error rate (FER). Extensive computer simulation results show that transmission with STBC-TD provides a significant gain in performance over no transmit diversity technique, particularly at pedestrian speeds. If STBC-TD technique is employed in the channel estimator based on MA filter, it provides considerable performance gains against Rayleigh fading and reduces the optimum filter tap number. Consequently, the channel estimation delay time and the complexity of the receiver are reduced. In addition, the channel estimator based on IIR filter has the advantages such as little memory requirement and no delay time compared to the MA scheme. However, IIR filter coefficients is very sensitive to the mobile speed change and it exerts a serious influence upon the performance. For that reason, it is important to set uP the optimum IIR filter coefficients.

An Efficient Page-Level Mapping Algorithm for Handling Write Requests in the Flash Translation Layer by Exploiting Temporal Locality (플래시 변환 계층에서 시간적 지역성을 이용하여 쓰기 요청을 처리하는 효율적인 페이지 레벨 매핑 알고리듬)

  • Li, Hai-Long;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.10
    • /
    • pp.1167-1175
    • /
    • 2016
  • This paper proposes an efficient page-level mapping algorithm that reduces the erase count in the FTL for flash memory systems. By maintaining the weight for each write request in the request buffer, the proposed algorithm estimates the degree of temporal locality for each incoming write request. To exploit temporal locality deliberately for determination of hot request, the degree of temporal locality should be much higher than the reference point determined experimentally. While previous LRU algorithm treats a new write request to have high temporal locality, the proposed algorithm allows write requests that are estimated to have high temporal locality to access hot blocks to store hot data intensively. The pages are more frequently updated in hot blocks than warm blocks. A hot block that has most of invalid pages is always selected as victim block at Garbage Collection, which results in delayed erase operation and in reduced erase count. Experimental results show that erase count is reduced by 9.3% for real I/O workloads, when compared to the previous LRU algorithm.

A Comparative Study of Machine Learning Algorithms Using LID-DS DataSet (LID-DS 데이터 세트를 사용한 기계학습 알고리즘 비교 연구)

  • Park, DaeKyeong;Ryu, KyungJoon;Shin, DongIl;Shin, DongKyoo;Park, JeongChan;Kim, JinGoog
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.3
    • /
    • pp.91-98
    • /
    • 2021
  • Today's information and communication technology is rapidly developing, the security of IT infrastructure is becoming more important, and at the same time, cyber attacks of various forms are becoming more advanced and sophisticated like intelligent persistent attacks (Advanced Persistent Threat). Early defense or prediction of increasingly sophisticated cyber attacks is extremely important, and in many cases, the analysis of network-based intrusion detection systems (NIDS) related data alone cannot prevent rapidly changing cyber attacks. Therefore, we are currently using data generated by intrusion detection systems to protect against cyber attacks described above through Host-based Intrusion Detection System (HIDS) data analysis. In this paper, we conducted a comparative study on machine learning algorithms using LID-DS (Leipzig Intrusion Detection-Data Set) host-based intrusion detection data including thread information, metadata, and buffer data missing from previously used data sets. The algorithms used were Decision Tree, Naive Bayes, MLP (Multi-Layer Perceptron), Logistic Regression, LSTM (Long Short-Term Memory model), and RNN (Recurrent Neural Network). Accuracy, accuracy, recall, F1-Score indicators and error rates were measured for evaluation. As a result, the LSTM algorithm had the highest accuracy.

Extended BSD Socket API Supporting Kernel-level RTP (커널 레벨 RTP를 지원하는 확장 BSD 소켓 API)

  • Choi Mun-Seon;Kim Kyung-San;Kim Sung-Jo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.6
    • /
    • pp.326-336
    • /
    • 2006
  • Due to the evolution of wired and wireless communication technologies and the Internet, multimedia services such as Internet broadcast and VOD have been prevalent recently. RTP is designed to be suitable for transmission of real-time multimedia data on the Internet by IETF While a variety of applications have utilized different RTPs implemented as a library, embeddedRTP is RTP-based kernel-level protocol that resolved performance issues of this kind of RTPs. This paper proposes the ExtendedERTP protocol based on existing embeddedRTP. This new protocol resolves a couple of issues such as packet processing overhead and buffer requirement and combines its APIs with BSD socket APIs which have been widely utilized in network applications. This paper demonstrates that this integration makes it possible to transmit real-time multimedia data through the accustomed interface of BSD socket APIs with nominal extra overhead. This paper also proposes a scheme for improving packet processing time by 15$\sim$20% and another scheme for reducing memory requirement for packet processing to about 3.5%, comparing with those of embeddedRTP.

A Segment Space Recycling Scheme for Optimizing Write Performance of LFS (LFS의 쓰기 성능 최적화를 위한 세그먼트 공간 재활용 기법)

  • Oh, Yong-Seok;Kim, Eun-Sam;Choi, Jong-Moo;Lee, Dong-Hee;Noh, Sam-H.
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.963-967
    • /
    • 2009
  • The Log-structured File System (LFS) collects all modified data into a memory buffer and writes them sequentially to a segment on disk. Therefore, it has the potential to utilize the maximum bandwidth of storage devices where sequential writes are much faster than random writes. However, as disk space is finite, LFS has to conduct cleaning to produce free segments. This cleaning operation is the main reason LFS performance deteriorates when file system utilization is high. To overcome painful cleaning and reduced performance of LFS, we propose the segment space recycling (SSR) scheme that directly writes modified data to invalid areas of the segments and describe the classification method of data and segment to consider locality of reference for optimizing SSR scheme. We implement U-LFS, which employs our segment space recycling scheme in LFS, and experimental results show that SSR scheme increases performance of WOLF by up to 1.9 times in HDD and 1.6 times in SSD when file system utilization is high.

Implementation of High Speed Image Data Transfer using XDMA

  • Gwon, Hyeok-Jin;Choi, Doo-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.7
    • /
    • pp.1-8
    • /
    • 2020
  • In this paper, we present an implementation of high speed image data transfer using XDMA for a video signal generation / acquisition device developed as a military test equipment. The technology proposed in this study obtains efficiency by replacing the method of copying data using the system buffer in the kernel area with the transmission and reception through the DMA engine in the FPGA. For this study, the device was developed as a PXIe platform in consideration of life cycle, and performance was maximized by using a low-cost FPGA considering mass productivity. The video I/O board implemented in this paper was tested by changing the AXI interface clock frequency and link speed through the existing memory copy method. In addition, the board was constructed using the DMA engine of the FPGA, and as a result, it was confirmed that the transfer speed was increased from 5~8Hz to 140Hz. The proposed method will contribute to strengthening defense capability by reducing the cost of device development using the PXIe platform and increasing the technology level.