• Title/Summary/Keyword: memory bottleneck

Search Result 90, Processing Time 0.022 seconds

An Empirical Evaluation Analysis of the Performance of In-memory Bigdata Processing Platform (메모리 기반 빅데이터 처리 프레임워크의 성능개선 연구)

  • Lee, Jae hwan;Choi, Jun;Koo, Dong hun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.13-19
    • /
    • 2016
  • Spark, an in-memory big-data processing framework is popular to use for real-time processing workload. Spark can store all intermediate data in the cluster memory so that Spark can minimize I/O access. However, when the resident memory of workload is larger that the physical memory amount of the cluster, the total performance can drop dramatically. In this paper, we analyse the factors of bottleneck on PageRank Application that needs many memory through experiment, and cluster the Spark with Tachyon File System for using memory to solve the factor of bottleneck and then we improve the performance about 18%.

A Study on the Block Lookup and Replacement in Global Memory (전역적 메모리에서의 블록 룩업과 재배치에 관한 연구)

  • 이영섭;김은경;정병수
    • Proceedings of the IEEK Conference
    • /
    • 2000.11c
    • /
    • pp.51-54
    • /
    • 2000
  • Due to the emerging of high-speed network, lots of interests of access to remote data have increased. Those interests motivate using of Cooperative Caching that uses remote cache like local cache by sharing other clients' cache. The conventional algorithm like GMS(Global Memory Service) has some disadvantages that occurred bottleneck and decreasing performance because of exchanges of many messages to server or manager. On the other hand, Hint-based algorithm resolves a GMS's server bottleneck as each client has hint information of all blocks. But Hint-based algorithm also causes some problems such as inaccurate information in it, if it has too old hint information. In this paper, we offer the policy that supplement bottleneck and inaccuracy; by using file identifier that can search for the lookup table and by exchanging oldest block information between each client periodically.

  • PDF

Non-Intrusive Speech Intelligibility Estimation Using Autoencoder Features with Background Noise Information

  • Jeong, Yue Ri;Choi, Seung Ho
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.220-225
    • /
    • 2020
  • This paper investigates the non-intrusive speech intelligibility estimation method in noise environments when the bottleneck feature of autoencoder is used as an input to a neural network. The bottleneck feature-based method has the problem of severe performance degradation when the noise environment is changed. In order to overcome this problem, we propose a novel non-intrusive speech intelligibility estimation method that adds the noise environment information along with bottleneck feature to the input of long short-term memory (LSTM) neural network whose output is a short-time objective intelligence (STOI) score that is a standard tool for measuring intrusive speech intelligibility with reference speech signals. From the experiments in various noise environments, the proposed method showed improved performance when the noise environment is same. In particular, the performance was significant improved compared to that of the conventional methods in different environments. Therefore, we can conclude that the method proposed in this paper can be successfully used for estimating non-intrusive speech intelligibility in various noise environments.

The Early Write Back Scheme For Write-Back Cache (라이트 백 캐쉬를 위한 빠른 라이트 백 기법)

  • Chung, Young-Jin;Lee, Kil-Whan;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.101-109
    • /
    • 2009
  • Generally, depth cache and pixel cache of 3D graphics are designed by using write-back scheme for efficient use of memory bandwidth. Also, there are write after read operations of same address or only write operations are occurred frequently in 3D graphics cache. If a cache miss is detected, an access to the external memory for write back operation and another access to the memory for handling the cache miss are operated simultaneously. So on frequent cache miss situations, as the memory access bandwidth limited, the access time of the external memory will be increased due to memory bottleneck problem. As a result, the total performance of the processor or the IP will be decreased, also the problem will increase peak power consumption. So in this paper, we proposed a novel early write back cache architecture so as to solve the problems issued above. The proposed architecture controls the point when to access the external memory as to copy the valid data block. And this architecture can improve the cache performance with same hit ratio and same capacity cache. As a result, the proposed architecture can solve the memory bottleneck problem by preventing intensive memory accesses. We have evaluated the new proposed architecture on 3D graphics z cache and pixel cache on a SoC environment where ARM11, 3D graphic accelerator and various IPs are embedded. The simulation results indicated that there were maximum 75% of performance increase when using various simulation vectors.

Design of a scalable general-purpose parallel associative processor using content-addressable memory (Content-Addressable Memory를 이용한 확장 가능한 범용 병렬 Associative Processor 설계)

  • Park, Tae-Geun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.2 s.344
    • /
    • pp.51-59
    • /
    • 2006
  • Von Neumann architecture suffers from the interface between the central processing unit and the memory, which is called 'Von Neumann bottleneck' In this paper, we propose a scalable general-purpose associative processor (AP) based on content-addressable memory (CAM) which solves this problem and is suitable for the search-oriented applications. We propose an efficient instruction set and a structural scalability to extend for larger applications. We define twelve instructions and provide some reduced instructions to speed up which execute two instructions in a single instruction cycle. The proposed AP performs in a bit-serial, word-parallel fashion and can be considered as a 32-bit general-purpose parallel processor with a massively parallel SIMD structure. We design and simulate a maximum/minumum search greater-than/less-than search, and parallel addition to verify the proposed architecture. The algorithms are executed in a constant time O(k) regardless of the number of input data.

Recovery Methods in Main Memory DBMS

  • Kim, Jeong-Joon;Kang, Jeong-Jin;Lee, Ki-Young
    • International journal of advanced smart convergence
    • /
    • v.1 no.2
    • /
    • pp.26-29
    • /
    • 2012
  • Recently, to efficiently support the real-time requirements of RTLS( Real Time Location System) services, interest in the main memory DBMS is rising. In the main memory DBMS, because all data can be lost when the system failure happens, the recovery method is very important for the stability of the database. Especially, disk I/O in executing the log and the checkpoint becomes the bottleneck of letting down the total system performance. Therefore, it is urgently necessary to research about the recovery method to reduce disk I/O in the main memory DBMS. Therefore, In this paper, we analyzed existing log techniques and check point techniques and existing main memory DBMSs' recovery techniques for recovery techniques research for main memory DBMS.

An Optimized File System for SSD (SSD를 위한 최적화 파일시스템)

  • Park, Je-Ho
    • Journal of the Semiconductor & Display Technology
    • /
    • v.9 no.2
    • /
    • pp.67-72
    • /
    • 2010
  • Recently increasing application of flash memory in mobile and ubiquitous related devices is due to its non-volatility, fast response time, shock resistance and low power consumption. Following this trend, SSD(Solid State Disk) using multiple flash chips, instead of hard-drive based storage system, started to widely used for its advantageous features. However, flash memory based storage subsystem should resolve the performance bottleneck for writing in perspective of speed and lifetime according to its disadvantageous physical property. In order to provide tangible performance, solutions are studied in aspect of reclaiming of invalid regions by decreasing the number of erasures and distributing the erasures uniformly over the whole memory space as much as possible. In this paper, we study flash memory recycling algorithms with multiple management units and demonstrate that the proposed algorithm provides feasible performance. The proposed method utilizes the partitions of the memory space by utilizing threshold values and reconfigures the management units if necessary. The performance of the proposed policies is evaluated through a number of simulation based experiments.

Divided Disk Cache and SSD FTL for Improving Performance in Storage

  • Park, Jung Kyu;Lee, Jun-yong;Noh, Sam H.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.1
    • /
    • pp.15-22
    • /
    • 2017
  • Although there are many efficient techniques to minimize the speed gap between processor and the memory, it remains a bottleneck for various commercial implementations. Since secondary memory technologies are much slower than main memory, it is challenging to match memory speed to the processor. Usually, hard disk drives include semiconductor caches to improve their performance. A hit in the disk cache eliminates the mechanical seek time and rotational latency. To further improve performance a divided disk cache, subdivided between metadata and data, has been proposed previously. We propose a new algorithm to apply the SSD that is flash memory-based solid state drive by applying FTL. First, this paper evaluates the performance of such a disk cache via simulations using DiskSim. Then, we perform an experiment to evaluate the performance of the proposed algorithm.

File System for Performance Improvement in Multiple Flash Memory Chips (다중 플래시 메모리 기반 파일시스템의 성능개선을 위한 파일시스템)

  • Park, Je-Ho
    • Journal of the Semiconductor & Display Technology
    • /
    • v.7 no.3
    • /
    • pp.17-21
    • /
    • 2008
  • Application of flash memory in mobile and ubiquitous related devices is rapidly being increased due to its low price and high performance. In addition, some notebook computers currently come out into market with a SSD(Solid State Disk) instead of hard-drive based storage system. Regarding this trend, applications need to increase the storage capacity using multiple flash memory chips for larger capacity sooner or later. Flash memory based storage subsystem should resolve the performance bottleneck for writing in perspective of speed and lifetime according to its physical property. In order to make flash memory storage work with tangible performance, reclaiming of invalid regions needs to be controlled in a particular manner to decrease the number of erasures and to distribute the erasures uniformly over the whole memory space as much as possible. In this paper, we study the performance of flash memory recycling algorithms and demonstrate that the proposed algorithm shows acceptable performance for flash memory storage with multiple chips. The proposed cleaning method partitions the memory space into candidate memory regions, to be reclaimed as free, by utilizing threshold values. The proposed algorithm handles the storage system in multi-layered style. The impact of the proposed policies is evaluated through a number of experiments.

  • PDF

Memory Delay Comparison between 2D GPU and 3D GPU (2차원 구조 대비 3차원 구조 GPU의 메모리 접근 효율성 분석)

  • Jeon, Hyung-Gyu;Ahn, Jin-Woo;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.7
    • /
    • pp.1-11
    • /
    • 2012
  • As process technology scales down, the number of cores integrated into a processor increases dramatically, leading to significant performance improvement. Especially, the GPU(Graphics Processing Unit) containing many cores can provide high computational performance by maximizing the parallelism. In the GPU architecture, the access latency to the main memory becomes one of the major reasons restricting the performance improvement. In this work, we analyze the performance improvement of the 3D GPU architecture compared to the 2D GPU architecture quantitatively and investigate the potential problems of the 3D GPU architecture. In general, memory instructions account for 30% of total instructions, and global/local memory instructions constitutes 60% of total memory instructions. Therefore, the performance of the 3D GPU is expected to be improved significantly compared to the 2D GPU by reducing the delay of memory instructions. However, according to our experimental results, the 3D architecture improves the GPU performance only by 2% compared to the 2D architecture due to the memory bottleneck, since the performance reduction due to memory bottleneck in the 3D GPU architecture increases by 245% compared to the 2D architecture. This paper provides the guideline for suitable memory design by analyzing the efficiency of the memory architecture in 3D GPU architecture.