• Title/Summary/Keyword: Processing-in-Memory

Search Result 1,846, Processing Time 0.033 seconds

Design and Evaluation of a NIC-Driven Host-Independent Network System (네트워크 인터페이스 카드에 기반한 호스트 독립적인 네트워크 시스템의 설계 및 성능평가)

  • Yim Keun Soo;Cha Hojung;Koh Kern
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.11
    • /
    • pp.626-634
    • /
    • 2004
  • In a client-server model, network server systems suffer from both heavy communication and computational loads. While communication channels become increasingly speedy, the existing protocol stack architectures still include mainly three performance bottlenecks of protocol stack processing, system call, and network interrupt overheads. To address these obstacles, in this paper we present a host-independent network system where a network interface card (NIC) is utilized in an efficient manner. First, by offloading network-related portion to the NIC, the host can fully utilize its processing power for other useful purposes. Second, it eliminates the system call overhead, such as context-switching and memory copy operations, since the host communicates with the NIC through its user-level libraries. Third, it a] so reduces the network interrupt operation count as the host handles the interrupt in a segment instead of a packet. The experimental results show that the proposed network system reduces the host CPU overhead for communication system by 68-71%. It also shows that the proposed system improves the communication speed by 11-83% under heavy computational and communication load conditions.

Design and Implementation of NMEA Multiplexer in the Optimized Queue (최적화된 큐에서의 NMEA 멀티플렉서의 설계 및 구현)

  • Kim Chang-Soo;Jung Sung-Hun;Yim Jae-Hong
    • Journal of Navigation and Port Research
    • /
    • v.29 no.1 s.97
    • /
    • pp.91-96
    • /
    • 2005
  • The National Marine Electronics Association(NMEA) is nonprofit-making cooperation composed with manufacturers, distributors, wholesalers and educational institutions. We use the basic port of equipment in order to process the signal from NMEA signal using equipment. When we don't have enough one, we use the multi-port for processing. However, we need to have module development simulation which could multiplex and provide NMEA related signal that we could solve the problems in multi-port application and exclusive equipment generation for a number of signal. For now, we don't have any case or product using NMEA multiplexer so that we import expensive foreign equipment or embody NMEA signal transmission program like software, using multi-port. These have problems since we have to pay lots ci money and build separate processing part for every application programs. Besides, every equipment generating NMEA signal are from different manufactures and have different platform so that it could cause double waste and loss of recourse. For making up for it, I suggest the NMEA multiplexer embodiment, which could independently move by reliable process and high performance single hardware module, improve the memory efficiency of module by designing the optimized Queue, and keep having reliability for realtime communication among the equipment such as main input sensor equipment Gyrocompass, Echo-sound, and GPS.

Randomness based Static Wear-Leveling for Enhancing Reliability in Large-scale Flash-based Storage (대용량 플래시 저장장치에서 신뢰성 향상을 위한 무작위 기반 정적 마모 평준화 기법)

  • Choi, Kilmo;Kim, Sewoog;Choi, Jongmoo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.126-131
    • /
    • 2015
  • As flash-based storage systems have been actively employed in large-scale servers and data centers, reliability has become an indispensable element. One promising technique for enhancing reliability is static wear-leveling, which distributes erase operations evenly among blocks so that the lifespan of storage systems can be prolonged. However, increasing the capacity makes the processing overhead of this technique non-trivial, mainly due to searching for blocks whose erase count would be minimum (or maximum) among all blocks. To reduce this overhead, we introduce a new randomized block selection method in static wear-leveling. Specifically, without exhaustive search, it chooses n blocks randomly and selects the maximal/minimal erased blocks among the chosen set. Our experimental results revealed that, when n is 2, the wear-leveling effects can be obtained, while for n beyond 4, the effect is close to that obtained from traditional static wear-leveling. For quantitative evaluation of the processing overhead, the scheme was actually implemented on an FPGA board, and overhead reduction of more than 3 times was observed. This implies that the proposed scheme performs as effectively as the traditional static wear-leveling while reducing overhead.

Design of Software and Hardware Modules for a TCP/IP Offload Engine with Separated Transmission and Reception Paths (송수신 분리형 TCP/IP Offload Engine을 위한 소프트웨어 및 하드웨어 모듈의 설계)

  • Jang Hank-Kok;Chung Sang-Hwa;Choi Young-In
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.691-698
    • /
    • 2006
  • TCP/IP Offload Engine (TOE) is a technology that processes TCP/IP on a network adapter instead of a host CPU to reduce protocol processing overhead from the host CPU. There have been some approaches to implementing TOE: software TOE based on an embedded processor; hardware TOE based on ASIC implementation; and hybrid TOE in which software and hardware functions are combined. In this paper, we designed software modules and hardware modules for a hybrid TOE on an FPGA that had two processor cores. Software modules are based on the embedded Linux. Hardware modules are for data transmission (TX) and reception (RX). One core controls the TX path and the other controls the RX path of the Linux. This TX/RX path separation mechanism can reduce task switching overheads between processes and overcome poor performance of single embedded processor. Hardware modules deal with creating headers for outgoing packets, processing headers of incoming packets, and fetching or storing data from or to the host memory by DMA. These can make it possible to improve the performance of data transmission and reception. We proved performance of the TOE with separated transmission and reception paths by performing experiments with a TOE network adapter that was equipped with the FPGA having processor cores.

Reliability improvement of an ion-measuring system using FET sensor array (FET 센서 어레이를 이용한 이온 측정 시스템의 신뢰도 개선)

  • Choi, Jung-Tae;Lee, Seung-Hyup;Kim, Young-Jin;Lee, Young-Chul;Cho, Byung-Woog;Sohn, Byung-Ki
    • Journal of Sensor Science and Technology
    • /
    • v.8 no.4
    • /
    • pp.341-346
    • /
    • 1999
  • In general cases, compared with glass electrode, FET type electrolyte sensors have many advantages. But the drift, memory effect and the poor reproducibility of the FET type electrolyte sensor cause the decrease of the reliability in the measurement system. To improve the reliability, an ion-measuring system using FET type electrolyte sensor array with 8 sensors has been developed. Developed system employed the electronic switchs to connect a signal detecting circuit with 8 sensor array and the system can measure ion concentration of 4 different type electrolyte($H^+$, $Na^+$, $K^+$, $Ca^{2+}$). The signal processing algorithm with insertion sorting method was adopted to enhance the reliability. We measured 3 different ion($H^+$, $Na^+$, $K^+$) to evaluate the performance of developed system. The results show that the designed signal processing algorithm can reduce the error range in comparison with a simple arithmetic mean and the developed system has a good reliability over the previous single channel sensor system.

  • PDF

A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization (GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법)

  • Thuan, Do Cong;Choi, Yong;Kim, Jong Myon;Kim, Cheol Hong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.5
    • /
    • pp.219-230
    • /
    • 2017
  • General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism. By using programming models like CUDA, and OpenCL, GPGPUs are becoming the best in exploiting plentiful thread-level parallelism caused by parallel applications. Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications. One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance. This paper studies the effects of hardware thread scheduling policy on GPGPU performance. We propose a novel warp scheduling policy that can alleviate the drawbacks of the traditional round-robin policy. The proposed warp scheduler first classifies the warps of a thread block into two groups, warps with long latency and warps with short latency and then schedules the warps with long latency before the warps with short latency. Furthermore, to support the proposed warp scheduler, we also propose a supplemental technique that can dynamically reduce the number of streaming multiprocessors to which will be assigned thread blocks when encountering a high contention degree at the memory and interconnection network. Based on our experiments on a 15-streaming multiprocessor GPGPU platform, the proposed warp scheduling policy provides an average IPC improvement of 7.5% over the baseline round-robin warp scheduling policy. This paper also shows that the GPGPU performance can be improved by approximately 8.9% on average when the two proposed techniques are combined.

Novel Radix-26 DF IFFT Processor with Low Computational Complexity (연산복잡도가 적은 radix-26 FFT 프로세서)

  • Cho, Kyung-Ju
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.1
    • /
    • pp.35-41
    • /
    • 2020
  • Fast Fourier transform (FFT) processors have been widely used in various application such as communications, image, and biomedical signal processing. Especially, high-performance and low-power FFT processing is indispensable in OFDM-based communication systems. This paper presents a novel radix-26 FFT algorithm with low computational complexity and high hardware efficiency. Applying a 7-dimensional index mapping, the twiddle factor is decomposed and then radix-26 FFT algorithm is derived. The proposed algorithm has a simple twiddle factor sequence and a small number of complex multiplications, which can reduce the memory size for storing the twiddle factor. When the coefficient of twiddle factor is small, complex constant multipliers can be used efficiently instead of complex multipliers. Complex constant multipliers can be designed more efficiently using canonic signed digit (CSD) and common subexpression elimination (CSE) algorithm. An efficient complex constant multiplier design method for the twiddle factor multiplication used in the proposed radix-26 algorithm is proposed applying CSD and CSE algorithm. To evaluate performance of the previous and the proposed methods, 256-point single-path delay feedback (SDF) FFT is designed and synthesized into FPGA. The proposed algorithm uses about 10% less hardware than the previous algorithm.

A Dynamic Prefetch Filtering Schemes to Enhance Usefulness Of Cache Memory (캐시 메모리의 유용성을 높이는 동적 선인출 필터링 기법)

  • Chon Young-Suk;Lee Byung-Kwon;Lee Chun-Hee;Kim Suk-Il;Jeon Joong-Nam
    • The KIPS Transactions:PartA
    • /
    • v.13A no.2 s.99
    • /
    • pp.123-136
    • /
    • 2006
  • The prefetching technique is an effective way to reduce the latency caused memory access. However, excessively aggressive prefetch not only leads to cache pollution so as to cancel out the benefits of prefetch but also increase bus traffic leading to overall performance degradation. In this thesis, a prefetch filtering scheme is proposed which dynamically decides whether to commence prefetching by referring a filtering table to reduce the cache pollution due to unnecessary prefetches In this thesis, First, prefetch hashing table 1bitSC filtering scheme(PHT1bSC) has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete block address table filtering scheme(CBAT) has been introduced to be used as a reference for the comparative study. A prefetch block address lookup table scheme(PBALT) has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the PHT1bSC scheme, the contents of each entry have the fields the same as CBAT scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. On commonly used prefetch schemes and general benchmarks and multimedia programs simulates change cache parameters. The PBALT scheme compared with no filtering has shown enhanced the greatest 22%, the cache miss ratio has been decreased by 7.9% by virtue of enhanced filtering accuracy compared with conventional PHT2bSC. The MADT of the proposed PBALT scheme has been decreased by 6.1% compared with conventional schemes to reduce the total execution time.

EFFICIENT IMPLEMENTATION OF GRAYSCALE MORPHOLOGICAL OPERATORS (형태학 필터의 효과적 구현 방안에 관한 연구)

  • 고성제;이경훈
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.10
    • /
    • pp.1861-1871
    • /
    • 1994
  • This paper presents efficient real time software implementation methods for the grayscale morphological composite function processing (FP) system. The proposed method is based on a matrix representation of the composite FP system using a basis matrix composed of structuring elements. We propose a procedure to derive the basis matrix for composite FP systems with any grayscale structuring element (GSE). It is shown that composite FP operations including morphological opening and closing are more efficiently accomplished by a local matrix operation with the basis matrix rather than cascade operations, eliminating delays and requiring less memory storage. In the second part of this paper, a VLSI implementation architecture for grayscale morphological operators is presented. The proposed implementation architecture employs a bit-serial approach which allows grayscale morphological operations to be decomposed into bit-level binary operation unit for the p-bit grayscale singnal. It is shown that this realization is simple and modular structure and thus is suitable for VLSI implementation.

  • PDF

Design and Implementation of User Authentication Protocol for Wireless Devices based on Java Card (자바카드 기반 무선단말기용 사용자 인증 프로토콜의 설계 및 구현)

  • Lee, Ju-Hwa;Seol, Kyoung-Su;Jung, Min-Soo
    • The KIPS Transactions:PartC
    • /
    • v.10C no.5
    • /
    • pp.585-594
    • /
    • 2003
  • Java card is one of promising smart card platform with java technology. Java card defines necessary packages and classes for Embedded device that have small memory such as smart card Jana card is compatible with EMV that is Industry specification standard and ISO-7816 that is international standard. However, Java card is not offers user authentication protocol. In this paper, We design and implement an user authentication protocol applicable wireless devices based on Java Card using standard 3GPP Specification (SMS), Java Card Specification (APDU), Cryptography and so on. Our Java Card user authentication techniques can possibly be applied to the area of M-Commerce, Wireless Security, E-Payment System, Mobile Internet, Global Position Service, Ubiquitous Computing and so on.