Search | Korea Science

Automated optimization for memory-efficient high-performance deep neural network accelerators

Kim, HyunMi;Lyuh, Chun-Gi;Kwon, Youngsu
- ETRI Journal
- /
- 제42권4호
- /
- pp.505-517
- /
- 2020
The increasing size and complexity of deep neural networks (DNNs) necessitate the development of efficient high-performance accelerators. An efficient memory structure and operating scheme provide an intuitive solution for high-performance accelerators along with dataflow control. Furthermore, the processing of various neural networks (NNs) requires a flexible memory architecture, programmable control scheme, and automated optimizations. We first propose an efficient architecture with flexibility while operating at a high frequency despite the large memory and PE-array sizes. We then improve the efficiency and usability of our architecture by automating the optimization algorithm. The experimental results show that the architecture increases the data reuse; a diagonal write path improves the performance by 1.44× on average across a wide range of NNs. The automated optimizations significantly enhance the performance from 3.8× to 14.79× and further provide usability. Therefore, automating the optimization as well as designing an efficient architecture is critical to realizing high-performance DNN accelerators.
https://doi.org/10.4218/etrij.2020-0125 인용 PDF KSCI

Performance Analysis of Adaptive Partition Cache Replacement using Various Monitoring Ratios for Non-volatile Memory Systems

Hwang, Sang-Ho;Kwak, Jong Wook
- Journal of the Korea Society of Computer and Information
- /
- 제23권4호
- /
- pp.1-8
- /
- 2018
In this paper, we propose an adaptive partition cache replacement policy and evaluate the performance of our scheme using various monitoring ratios to help lifetime extension of non-volatile main memory systems without performance degradation. The proposal combines conventional LRU (Least Recently Used) replacement policy and Early Eviction Zone (E2Z), which considers a dirty bit as well as LRU bits to select a candidate block. In particular, this paper shows the performance of non-volatile memory using various monitoring ratios and determines optimized monitoring ratio and partition size of E2Z for reducing the number of writebacks using cache hit counter logic and hit predictor. In the experiment evaluation, we showed that 1:128 combination provided the best results of writebacks and runtime, in terms of performance and complexity trade-off relation, and our proposal yielded up to 42% reduction of writebacks, compared with others.
https://doi.org/10.9708/jksci.2018.23.04.001 인용 PDF KSCI

Implementation of the Shared Memory in the Dual Core System (Dual Core 시스템에서 Shared Memory 기능 구현)

Jang, Seung-Ju
- The Journal of the Korea Contents Association
- /
- 제8권9호
- /
- pp.27-33
- /
- 2008
This paper designs Shared Memory on the Dual Core system so that it operates a general System V IPC on the Linux O.S. Shared Memory is the technique that many processes can access to identical memory area. We treat Shared Memory which is SVR in a kernel step. We design a share memory facility of Linux operating system on the Dual Core System. In this paper the suggesting of share memory facility design plan in Dual Core system is enhance the performance in existing an unity processor system as a dual core practical use. We attemp a performance enhance in each CPU for each process which uses a share memory.
https://doi.org/10.5392/JKCA.2008.8.9.027 인용 PDF

The Effects of the Older Adults' Depression on Metamemory and Memory Performance (노인의 우울이 메타기억과 기억수행에 미치는 영향)

Min, Hye Sook;Suh, Moon Ja
- Korean Journal of Adult Nursing
- /
- 제12권1호
- /
- pp.17-29
- /
- 2000
The purpose of this study is to find out the effects of depression on older adults' metamemory and memory performances. The subjects of the study consisted of 103 older adults over the age of 60 who are living in Kangwon Province. Some data were collected by means of the interview method, using questionnaires for metamemory (MIA questionnaire by Hultsch, et al., 1988), and depression(GDS by Yesavage and Sheikl, 1986). Other data were collected by a testing method on the memory performance, such as the immediate word recall task, the delayed word recall task, the word recognition task(Elderly Verbal Learning Test by Kyung Mi Choi, 1998), and the face recognition task(Face Recognition Task tool developed by this study). The results of this study were as follows: 1) The average point of depressed older persons' metamemory is 3.2 on a 5 point scale and was significantly lower than nondepressed older persons' point of 3.6. Looking into each sub-concept of metamemory, depressed persons' points are higher in terms of task(4.1), but are lower in terms of change(2.3), locus(2.6), and strategy(2.9) in comparison with nondepressed persons' points. 2) Depressed older persons' memory performances are all significantly lower than nondepressed person's, especially in terms of face recognition task(t=7.26, p<.0082) and word recognition task(t=6.58, p<.01). 3) In both depressed and nondepressed persons, metamemory has a close correlation with all memory tasks. In particular, depressed older persons' correlation is higher across the board, especially in memory self-efficacy of metamemory(r=.36 - .49) in comparison with nondepressed persons. 4) According to the results of analysis on the relations between metamemory and memory performances of each memory task using canonical analysis, in the case of depressed older persons, strategy, locus, capability and task have high correlation with word recognition task and delayed word recall task. Also in the case of nondepressed persons, achievement, strategy, change and locus variable have high correlation with face recognition task and immediate word recall task. As mentioned above, depression variables have a negative effect on older persons' metamemory and memory performance. In conclusion, when we care for depressed older persons with less memory ability, we have to consider the outcomes of this study are relevant. In addition, it is necessary to develop nursing intervention in order to prevent memory loss and improve memory performance in depressed older persons.
PDF

New Embedded Memory System for IoT (사물인터넷을 위한 새로운 임베디드 메모리 시스템)

Lee, Jung-Hoon
- IEMEK Journal of Embedded Systems and Applications
- /
- 제10권3호
- /
- pp.151-156
- /
- 2015
Recently, an embedded flash memory has been widely used for the Internet of Things(IoT). Due to its nonvolatility, economical feasibility, stability, low power usage, and fast speed. With respect to power consumption, the embedded memory system must consider the most significant design factor. The objective of this research is to design high performance and low power NAND flash memory architecture including a dual buffer as a replacement for NOR flash. Simulation shows that the proposed NAND flash system can achieve better performance than a conventional NOR flash memory. Furthermore, the average memory access time of the proposed system is better that of other buffer systems with three times more space. The use of a small buffer results in a significant reduction in power consumption.
https://doi.org/10.14372/IEMEK.2015.10.3.151 인용 PDF KSCI

WARP: Memory Subsystem Effective for Wrapping Bursts of a Cache

Jang, Wooyoung
- ETRI Journal
- /
- 제39권3호
- /
- pp.428-436
- /
- 2017
State-of-the-art processors require increasingly complicated memory services for high performance and low power consumption. In particular, they request transfers within a burst in a wrap-around order to minimize the miss penalty of a cache. However, synchronous dynamic random access memories (SDRAMs) do not always generate transfers in the wrap-round order required by the processors. Thus, a memory subsystem rearranges the SDRAM transfers in the wrap-around order, but the rearrangement process may increase memory latency and waste the bandwidth of on-chip interconnects. In this paper, we present a memory subsystem that is effective for the wrapping bursts of a cache. The proposed memory subsystem makes SDRAMs generate transfers in an intermediate order, where the transfers are rearranged in the wrap-around order with minimal penalties. Then, the transfers are delivered with priority, depending on the program locality in space. Experimental results showed that the proposed memory subsystem minimizes the memory performance loss resulting from wrapping bursts and, thus, improves program execution time.
https://doi.org/10.4218/etrij.17.0116.0710 인용 PDF KSCI

An Empirical Evaluation Analysis of the Performance of In-memory Bigdata Processing Platform (메모리 기반 빅데이터 처리 프레임워크의 성능개선 연구)

Lee, Jae hwan;Choi, Jun;Koo, Dong hun
- Journal of Korea Society of Industrial Information Systems
- /
- 제21권3호
- /
- pp.13-19
- /
- 2016
Spark, an in-memory big-data processing framework is popular to use for real-time processing workload. Spark can store all intermediate data in the cluster memory so that Spark can minimize I/O access. However, when the resident memory of workload is larger that the physical memory amount of the cluster, the total performance can drop dramatically. In this paper, we analyse the factors of bottleneck on PageRank Application that needs many memory through experiment, and cluster the Spark with Tachyon File System for using memory to solve the factor of bottleneck and then we improve the performance about 18%.
https://doi.org/10.9723/jksiis.2016.21.3.013 인용 PDF KSCI

Performance Analysis of Cache and Internal Memory of a High Performance DSP for an Optimal Implementation of Motion Picture Encoder (고성능 DSP에서 동영상 인코더의 최적화 구현을 위한 캐쉬 및 내부 메모리 성능 분석)

Lim, Se-Hun;Chung, Sun-Tae
- The Journal of the Korea Contents Association
- /
- 제8권5호
- /
- pp.72-81
- /
- 2008
High Performance DSP usually supports cache and internal memory. For an optimal implementation of a multimedia stream application on such a high performance DSP, one needs to utilize the cache and internal memory efficiently. In this paper, we investigate performance analysis of cache, and internal memory configuration and placement necessary to achieve an optimal implementation of multimedia stream applications like motion picture encoder on high performance DSP, TMS320C6000 series, and propose strategies to improve performance for cache and internal memory placement. From the results of analysis and experiments, it is verified that 2-way L2 cache configuration with the remaining memory configured as internal memory shows relatively good performance. Also, it is shown that L1P cache hit rate is enhanced when frequently called routines and routines having caller-callee relationships with them are continuously placed in the internal memory and that L1D cache hit rate is enhanced by the simple change of the data size. The results in the paper are expected to contribute to the optimal implementation of multimedia stream applications on high performance DSPs.
https://doi.org/10.5392/JKCA.2008.8.5.072 인용 PDF

Duplication-Aware Garbage Collection for Flash Memory-Based Virtual Memory Systems (플래시 메모리 기반의 가상 메모리 시스템을 위한 중복성을 고려한 GC 기법)

Ji, Seung-Gu;Shin, Dong-Kun
- Journal of KIISE:Computer Systems and Theory
- /
- 제37권3호
- /
- pp.161-171
- /
- 2010
As embedded systems adopt monolithic kernels, NAND flash memory is used for swap space of virtual memory systems. While flash memory has the advantages of low-power consumption, shock-resistance and non-volatility, it requires garbage collections due to its erase-before-write characteristic. The efficiency of garbage collection scheme largely affects the performance of flash memory. This paper proposes a novel garbage collection technique which exploits data redundancy between the main memory and flash memory in flash memory-based virtual memory systems. The proposed scheme takes the locality of data into consideration to minimize the garbage collection overhead. Experimental results demonstrate that the proposed garbage collection scheme improves performance by 37% on average compared to previous schemes.
PDF KSCI

Mapping Cache for High-Performance Memory Mapped File I/O in Memory File Systems (메모리 파일 시스템 기반 고성능 메모리 맵 파일 입출력을 위한 매핑 캐시)

Kim, Jiwon;Choi, Jungsik;Han, Hwansoo
- Journal of KIISE
- /
- 제43권5호
- /
- pp.524-530
- /
- 2016
The desire to access data faster and the growth of next-generation memories such as non-volatile memories, contribute to the development of research on memory file systems. It is recommended that memory mapped file I/O, which has less overhead than read-write I/O, is utilized in a high-performance memory file system. Memory mapped file I/O, however, brings a page table overhead, which becomes one of the big overheads that needs to be resolved in the entire file I/O performance. We find that same overheads occur unnecessarily, because a page table of a file is removed whenever a file is opened after being closed. To remove the duplicated overhead, we propose the mapping cache, a technique that does not delete a page table of a file but saves the page table to be reused when the mapping of the file is released. We demonstrate that mapping cache improves the performance of traditional file I/O by 2.8x and web server performance by 12%.
https://doi.org/10.5626/JOK.2016.43.5.524 인용 KSCI

검색결과 3,126건 처리시간 0.029초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)