Search | Korea Science

Low Power TLB System by Using Continuous Accessing Distinction Algorithm (연속적 접근 판별 알고리즘을 이용한 저전력 TLB 구조)

Lee, Jung-Hoon
- The KIPS Transactions:PartA
- /
- v.14A no.1 s.105
- /
- pp.47-54
- /
- 2007
In this paper we present a translation lookaside buffer (TLB) system with low power consumption for imbedded processors. The proposed TLB is constructed as multiple banks, each with an associated block buffer and a corresponding comparator. Either the block buffer or the main bank is selectively accessed on the basis of two bits in the block buffer (tag buffer). Dynamic power savings are achieved by reducing the number of entries accessed in parallel, as a result of using the tag buffer as a filtering mechanism. The performance overhead of the proposed TLB is negligible compared with other hierarchical TLB structures. For example, the two-cycle overhead of the proposed TLB is only about 1%, as compared with 5% overhead for a filter (micro)-TLB and 14% overhead for a same structure without continuos accessing distinction algorithm. We show that the average hit ratios of the block buffers and the main banks of the proposed TLB are 95% and 5% respectively. Dynamic power is reduced by about 95% with respect to with a fully associative TLB, 90% with respect to a filter-TLB, and 40% relative to a same structure without continuos accessing distinction algorithm.
https://doi.org/10.3745/KIPSTA.2007.14-A.1.047 인용 PDF KSCI

Buffer Cache Management of Smartphones Exploiting Write-Only-Once Characteristics (1회성 쓰기 참조 특성을 고려하는 스마트폰 버퍼캐쉬 관리 기법)

Kim, Dohee;Bahn, Hyokyung
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.15 no.6
- /
- pp.129-134
- /
- 2015
This paper analyzes file access characteristics of smartphone apps and finds that a large portion of file writes are performed only once. Based on this observation, we present a new buffer cache management scheme that considers this characteristics. Buffer cache improves storage performance by maintaining hot file data in memory thereby servicing subsequent requests without storage accesses. However, it should flush modified data to storage in order to resist system crashes. The proposed scheme evicts cache data that has been written only once upon flushes, thus improving cache space utilization. Simulation experiments show that the proposed scheme improves cache hit ratio by 5-33% and power consumption by 27-92%.
https://doi.org/10.7236/JIIBC.2015.15.6.129 인용 PDF KSCI

A Study on the Improvement of Frame Memory Interface of MPEG-2 Video Encoder (MPEG-2 비디오 부호화기의 프레임 메모리 인터페이스 개선에 관한 연구)

이인섭;임순자;김환용
- Journal of the Korea Computer Industry Society
- /
- v.2 no.2
- /
- pp.211-218
- /
- 2001
In this paper, we propose the structure of utilizing the memory map, which is using not conventional DRAM but SDRAM, for the hardware implementation of frame memory interface module to the video encoder. As reducing the size of memory map and interface buffer within the same bus, the hardware complexity is improved and the hardware size is minimized as simplifying the interface logic. The conventional system is wasted access time, because of accessing randomly stored data in order to store and output the memories in macro-block unit. therefore the method, which is proposed in this paper, can be effectively reducing the access time of memory, because of the data is stored and processed by line unit.
PDF

Mutual exclusion of shared memory access in the simulation software of the midclass commuter (중형항공기 시뮬레이션 소프트웨어의 작업간 공유메모리 사용의 상호배제)

이인석;이해창;이상혁
- 제어로봇시스템학회:학술대회논문집
- /
- 1996.10b
- /
- pp.207-209
- /
- 1996
The software of the midclass commuter flight simulation is running on multiprocessor/multitasking environments The software is consist of tasks which are periodically alive at a given interval. Each task communicates via shared memory. The data shared by tasks is divided by several block. Only one task, called producer, can produce data for a data block but several tasks, called consumers, can read data from the data block. Double buffer and conditional flag are used to implement a mutual exclusion which prevents the producer and consumers from accessing the same data block simultaneously.
PDF

Memory Latency Hiding Techniques (메모리 지연을 감추는 기법들)

Ki, An-Do
- Electronics and Telecommunications Trends
- /
- v.13 no.3 s.51
- /
- pp.61-70
- /
- 1998
The obvious way to make a computer system more powerful is to make the processor as fast as possible. Furthermore, adopting a large number of such fast processors would be the next step. This multiprocessor system could be useful only if it distributes workload uniformly and if its processors are fully utilized. To achieve a higher processor utilization, memory access latency must be reduced as much as possible and even more the remaining latency must be hidden. The actual latency can be reduced by using fast logic and the effective latency can be reduced by using cache. This article discusses what the memory latency problem is, how serious it is by presenting analytical and simulation results, and existing techniques for coping with it; such as write-buffer, relaxed consistency model, multi-threading, data locality optimization, data forwarding, and data prefetching.
https://doi.org/10.22648/ETRI.1998.J.130305 인용 PDF

A study on memory structure of real time video magnifyng chip (실시간 영상확대 칩의 메모리 구조에 관한 연구)

여경현;박인규
- Proceedings of the IEEK Conference
- /
- 1999.11a
- /
- pp.1109-1112
- /
- 1999
본 논문에서는 영상확대 chip의 video 입력부에 부분화면을 저장할 frame memory의 구조를 개선하고자 하였다. 영상확대 video scaler인 gm833×2는 입력단 측에 frame buffer memory가 필요하게 되지만, 이를 외부에 장착하려면 일반적으로 대용량의 FIFO 메모리를 사용하게 된다. 이것은 dualport SRAM으로 구성이 되며, 메모리 제어를 고가의 FIFO칩에 의존하는 결과를 가져온다. 또한 기존의 scaler chip은 단순히 확대처리만을 하며, 입력 전, 후에 data의 변경 또는 이미지처리가 불가능한 구조가 된다. 본 논문에서는 외부에 필요한 메모리를 내장한 새로운 기능의 chip을 설계하는 데에 있어 필수적인 메모리제어 로직을 제안하고자 한다. 여기서는 더 나은 기능의 향상된 메모리 제어회로를 제시하고 이를 One-chip에 집적할 수 있도록 하였다 이를 사용한 Video Scaler Processor chip은 SDRAM을 별도의 제어회로 없이 외부에 장착할 수 있도록 하여 scaler의 기능을 향상시키면서 전체 시스템의 구조를 간단히 할 수 있을 것으로 기대된다. 본 논문에서는 먼저 메모리 제어회로를 포함한 Video Scaler Processor chip의 메모리제어 하드웨어의 구조를 제시하고, 메모리 access model과 제어로직을 소개하고자 한다.
PDF

Large-scale 3D fast Fourier transform computation on a GPU

Jaehong Lee;Duksu Kim
- ETRI Journal
- /
- v.45 no.6
- /
- pp.1035-1045
- /
- 2023
We propose a novel graphics processing unit (GPU) algorithm that can handle a large-scale 3D fast Fourier transform (i.e., 3D-FFT) problem whose data size is larger than the GPU's memory. A 1D FFT-based 3D-FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data-transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large-scale benchmarks and compare its performance with the state-of-the-art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU-based 3D-FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive-scale problems, whereas our method's performance is stable.
https://doi.org/10.4218/etrij.2022-0297 인용 PDF

The structure of ATM Switch with the Shared Buffer Memory and The Construction of Switching Network for Large Capacity ATM (대용량 ATM을 위한 공유 버퍼 메모리 스위치 구조 및 교환 망의 구성 방안)

양충렬;김진태
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.1
- /
- pp.80-90
- /
- 1996
The efficienty of ATM is based on the statical multiplexing of fixed-length packets, which are called cells. The most important technical point for realizing ATM switching network is an arrangement of the buffers and switches. Current most ATM switching networks are being achieved by using the switching modules based on the unit switch of $8{\times}8$ 150Mb/s or $16{\times}16$ 150Mb/s, the unit switch of $32{\times}32$150Mb/s for a large scale system is under study in many countries. In this paper, we proposed a new $32{\times}32$(4.9Gb/s throughput) ATM switch using Shared buffer memory switch which provides superior traffic characteristics in the cell loss, delay and throughput performance and easy LSI(Large Scale Integrated circuit). We analytically estimated and simulated by computer the buffer size into it. We also proposed the configuration of the large capacity ATM switching network($M{\times}M$.M>1,000) consisting of multistage to improve the link speed by non-blocking.
PDF

An Evaluation of Multimedia Data Downstream with PDA in an Infrastructure Network

Hong, Youn-Sik;Hur, Hye-Sun
- Journal of Information Processing Systems
- /
- v.2 no.2
- /
- pp.76-81
- /
- 2006
A PDA is used mainly for downloading data from a stationary server such as a desktop PC in an infrastructure network based on wireless LAN. Thus, the overall performance depends heavily on the performance of such downloading with PDA. Unfortunately, for a PDA the time taken to receive data from a PC is longer than the time taken to send it by 53%. Thus, we measured and analyzed all possible factors that could cause the receiving time of a PDA to be delayed with a test bed system. There are crucial factors: the TCP window size, file access time of a PDA, and the inter-packet delay that affects the receiving time of a PDA. The window size of a PDA during the downstream is reduced dramatically to 686 bytes from 32,581 bytes. In addition, because flash memory is embedded into a PDA, writing data into the flash memory takes twice as long as reading the data from it. To alleviate these, we propose three distinct remedies: First, in order to keep the window size at a sender constant, both the size of a socket send buffer for a desktop PC and the size of a socket receive buffer for a PDA should be increased. Second, to shorten its internal file access time, the size of an application buffer implemented in an application should be doubled. Finally, the inter-packet delay of a PDA and a desktop PC at the application layer should be adjusted asymmetrically to lower the traffic bottleneck between these heterogeneous terminals.
https://doi.org/10.3745/JIPS.2006.2.2.076 인용 PDF KSCI

Design and Implementation of Transactional Write Buffer Cache with Storage Class Memory (트랜잭션 단위 쓰기를 보장하는 스토리지 클래스 메모리 쓰기 버퍼캐시의 설계 및 구현)

Kim, Young-Jin;Doh, In-Hwan;Kim, Eun-Sam;Choi, Jong-Moo;Lee, Dong-Hee;Noh, Sam-H.
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.2
- /
- pp.247-251
- /
- 2010
Using SCM in storage systems introduce new potentials for improving I/O performance and reliability. In this paper, we study the use of SCM as a buffer cache that guarantees transactional unit writes. Our proposed method can improve storage system reliability and performance at the same time and can recover the storage system immediately upon a system crash. The Proposed method is based on the LINUX JBD(Journaling Block Device), thus reliability is equivalent to JBD. In our experiments, the file system that adopts our method shows better I/O performance even while guaranteeing high reliability and shows fast file system recovery time (about 0.2 seconds).
PDF KSCI

Search Result 369, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)