• Title/Summary/Keyword: Multi-Access Memory System

Search Result 53, Processing Time 0.028 seconds

Proposal of 3D Graphic Processor Using Multi-Access Memory System (Multi-Access Memory System을 이용한 3D 그래픽 프로세서 제안)

  • Lee, S-Ra-El;Kim, Jae-Hee;Ko, Kyung-Sik;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.4
    • /
    • pp.119-128
    • /
    • 2019
  • Due to the nature of the 3D graphics processor system, many mathematical calculations are required and parallel processing research using GPU (Graphics Processing Unit) is being performed for high-speed processing. In this paper, we propose a 3D graphics processor using MAMS, a parallel processor that does not use cache memory, to solve the GPU problem of increasing bandwidth caused by cache memory miss and the problem that 3D shader processing speed is not constant. The 3D graphics processor using MAMS proposed in this paper designed Vertex shader, Pixel shader, Tiling and Rasterizing structure using DirectX command analysis, the FPGA(Xilinx Virtex6@100MHz) board for MAMS was constructed and designed using Verilog. We compared the processing time of the developed FPGA (100Mhz) and nVidia GeForce GTX 660 (980Mhz), the processing time using GTX 660 was not constant and suing MAMS was constant.

Feature Extraction System for High-Speed Fingerprint Recognition using the Multi-Access Memory System (다중 접근 메모리 시스템을 이용한 고속 지문인식 특징추출 시스템)

  • Park, Jong Seon;Kim, Jea Hee;Ko, Kyung-Sik;Park, Jong Won
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.8
    • /
    • pp.914-926
    • /
    • 2013
  • Among the recent security systems, security system with fingerprint recognition gets many people's interests through the strengths such as exclusiveness, convenience, etc, in comparison with other security systems. The most important matters for fingerprint recognition system are reliability of matching between the fingerprint in database and user's fingerprint and rapid process of image processing algorithms used for fingerprint recognition. The existing fingerprint recognition system reduces the processing time by removing some processes in the feature extraction algorithms but has weakness of a reliability. This paper realizes the fingerprint recognition algorithm using MAMS(Multi-Access Memory System) for both the rapid processing time and the reliability in feature extraction and matching accuracy. Reliability of this process is verified by the correlation between serial processor's results and MAMS-PP64's results. The performance of the method using MAMS-PP64 is 1.56 times faster than compared serial processor.

A Mobile Flash File System - MJFFS (모바일 플래시 파일 시스템 - MJFFS)

  • 김영관;박현주
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.2
    • /
    • pp.29-43
    • /
    • 2004
  • As the development of an information technique, gradually, mobile device is going to be miniaturized and operates at high speed. By such the requirements, the devices using a flash memory as a storage media are increasing. The flash memory consumes low power, is a small size, and has a fast access time like the main memory. But the flash memory must erase for recording and the erase cycle is limited. JFFS is a representative filesystem which reflects the characteristics of the flash memory. JFFS to be consisted of LSF structure, writes new data to the flash memory in sequential, which is not related to a file size. Mounting a filesystem or an error recovery is achieved through the sequential approach. Therefore, the mounting delay time is happened according to the file system size. This paper proposes a MJFFS to use a multi-checkpoint information to manage a mass flash file system efficiently. A MJFFS, which improves JFFS, divides a flash memory into the block for suitable to the block device, and stores file information of a checkpoint structure at fixed interval. Therefore mounting and error recovery processing reduce efficiently a number of filesystem access by collecting a smaller checkpoint information than capacity of actual files. A MJFFS will be suitable to a mobile device owing to accomplish fast mounting and error recovery using advantage of log foundation filesystem and overcoming defect of JFFS.

  • PDF

An Analysis of Multi-processor System Performance Depending on the Input/Output Types (입출력 형태에 따른 다중처리기 시스템의 성능 분석)

  • Moon, Wonsik
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.4
    • /
    • pp.71-79
    • /
    • 2016
  • This study proposes a performance model of a shared bus multi-processor system and analyzes the effect of input/output types on system performance and overload of shared resources. This system performance model reflects the memory reference time in relation to the effect of input/output types on shared resources and the input/output processing time in relation to the input/output processor, disk buffer, and device standby places. In addition, it demonstrates the contribution of input/output types to system performance for comprehensive analysis of system performance. As the concept of workload in the probability theory and the presented model are utilized, the result of operating and analyzing the model in various conditions of processor capability, cache miss ratio, page fault ratio, disk buffer hit ratio (input/output processor and controller), memory access time, and input/output block size. A simulation is conducted to verify the analysis result.

Multi-mode Embedded Compression Algorithm and Architecture for Code-block Memory Size and Bandwidth Reduction in JPEG2000 System (JPEG2000 시스템의 코드블록 메모리 크기 및 대역폭 감소를 위한 Multi-mode Embedded Compression 알고리즘 및 구조)

  • Son, Chang-Hoon;Park, Seong-Mo;Kim, Young-Min
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.8
    • /
    • pp.41-52
    • /
    • 2009
  • In Motion JPEG2000 encoding, huge bandwidth requirement of data memory access is the bottleneck in required system performance. For the alleviation of this bandwidth requirement, a new embedded compression(EC) algorithm with a little bit of image quality drop is devised. For both random accessibility and low latency, very simple and efficient entropy coding algorithm is proposed. We achieved significant memory bandwidth reductions (about 53${\sim}$81%) and reduced code-block memory to about half size through proposed multi-mode algorithms, without requiring any modification in JPEG2000 standard algorithm.

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.

Performance Analysis of NVMe SSDs and Design of Direct Access Engine on Virtualized Environment (가상화 환경에서 NVMe SSD 성능 분석 및 직접 접근 엔진 개발)

  • Kim, Sewoog;Choi, Jongmoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.129-137
    • /
    • 2018
  • NVMe(Non-Volatile Memory Express) SSD(Solid State Drive) is a high-performance storage that makes use of flash memory as a storage cell, PCIe as an interface and NVMe as a protocol on the interface. It supports multiple I/O queues which makes it feasible to process parallel-I/Os on multi-core environments and to provide higher bandwidth than SATA SSDs. Hence, NVMe SSD is considered as a next generation-storage for data-center and cloud computing system. However, in the virtualization system, the performance of NVMe SSD is not fully utilized due to the bottleneck of the software I/O stack. Especially, when it uses I/O stack of the hypervisor or the host operating system like Xen and KVM, I/O performance degrades seriously due to doubled-I/O stack between host and virtual machine. In this paper, we propose a new I/O engine, called Direct-AIO (Direct-Asynchronous I/O) engine, that can access NVMe SSD directly for I/O performance improvements on QEMU emulator. We develop our proposed I/O engine and analyze I/O performance differences between the existed I/O engine and Direct-AIO engine.

Adaptive Writeback-aware Cache Management Policy for Lifetime Extension of Non-volatile Memory

  • Hwang, Sang-Ho;Choi, Ju Hee;Kwak, Jong Wook
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.4
    • /
    • pp.514-523
    • /
    • 2017
  • In this paper, we propose Adaptive Writeback-aware Cache management (AWC) to prolong the lifetime of non-volatile main memory systems by reducing the number of writebacks. The last-level cache in AWC is partitioned into Least Recently Used (LRU) segment and LRU using Dirty block Precedence (DP-LRU) segment. The DP-LRU segment evicts clean blocks first for giving reuse opportunity to dirty blocks. AWC can also determine the efficient size of DP-LRU segment for reducing the number of writebacks according to memory access patterns of programs. In the performance evaluation, we showed that AWC reduced the number of writebacks up to 29% and 46%, and saved the energy of a main memory system up to 23% and 49% in a single-core and multi-core, respectively. AWC also reduced the runtime by 1.5% and 3.2% on average compared to previous cache managements for non-volatile main memory systems, in a single-core and a multi-core, respectively.

Memory Latency Hiding Techniques (메모리 지연을 감추는 기법들)

  • Ki, An-Do
    • Electronics and Telecommunications Trends
    • /
    • v.13 no.3 s.51
    • /
    • pp.61-70
    • /
    • 1998
  • The obvious way to make a computer system more powerful is to make the processor as fast as possible. Furthermore, adopting a large number of such fast processors would be the next step. This multiprocessor system could be useful only if it distributes workload uniformly and if its processors are fully utilized. To achieve a higher processor utilization, memory access latency must be reduced as much as possible and even more the remaining latency must be hidden. The actual latency can be reduced by using fast logic and the effective latency can be reduced by using cache. This article discusses what the memory latency problem is, how serious it is by presenting analytical and simulation results, and existing techniques for coping with it; such as write-buffer, relaxed consistency model, multi-threading, data locality optimization, data forwarding, and data prefetching.

FPGA Design and SoC Implementation of Constant-Amplitude Multicode Bi-Orthogonal Modulation (정진폭 다중 부호 이진 직교 변복조기의 FPGA 설계 및 SoC 구현)

  • Hong, Dae-Ki;Kim, Yong-Seong;Kim, Sun-Hee;Cho, Jin-Woong;Kang, Sung-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.11C
    • /
    • pp.1102-1110
    • /
    • 2007
  • In this paper, we design the FPGA (Field-Programmable Gate Array) of the CAMB (Constant-Amplitude Multi-code Biorthogonal) modulation, and implement the SoC (System on Chip). The ASIC (Application Specific Integrated Circuit) chip is be implemented through targeting and board test. This 12Mbps modem SoC includes the ARM (Advanced RISC Machine)7TDMI, 64Kbyte SRAM(Static Random Access Memory) and ADC (Analog to Digital Converter)/DAC (Digital to Analog Converter) for flexible applications. Additionally, the modem SoC can support the variable communication interfaces such as the 16-bits PCMCIA (Personal Computer Memory Card International Association), USB (Universal Serial Bus) 1.1, and 16C550 Compatible UART (Universal Asynchronous Receiver/Transmitter).