• Title/Summary/Keyword: memory bottleneck

Search Result 90, Processing Time 0.027 seconds

MAXIMUM TOLERABLE ERROR BOUND IN DISTRIBUTED SIMULATED ANNEALING

  • Hong, Chul-Eui;McMillin, Bruce M.;Ahn, Hee-Il
    • ETRI Journal
    • /
    • v.15 no.3
    • /
    • pp.1-26
    • /
    • 1994
  • Simulated annealing is an attractive, but expensive, heuristic method for approximating the solution to combinatorial optimization problems. Attempts to parallel simulated annealing, particularly on distributed memory multicomputers, are hampered by the algorithm's requirement of a globally consistent system state. In a multicomputer, maintaining the global state S involves explicit message traffic and is a critical performance bottleneck. To mitigate this bottleneck, it becomes necessary to amortize the overhead of these state updates over as many parallel state changes as possible. By using this technique, errors in the actual cost C(S) of a particular state S will be introduced into the annealing process. This paper places analytically derived bounds on this error in order to assure convergence to the correct optimal result. The resulting parallel simulated annealing algorithm dynamically changes the frequency of global updates as a function of the annealing control parameter, i.e. temperature. Implementation results on an Intel iPSC/2 are reported.

  • PDF

On-time Production and Delivery Improvements through the Demand-Lot Pegging Framework for a Semiconductor Business (반도체 산업에서 생산용량을 고려한 오더-로트 페깅기반의 납기약속 방법의 정합성 향상에 대한 연구)

  • Seo, Jeong-Cheol;Bang, June-Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.37 no.4
    • /
    • pp.126-133
    • /
    • 2014
  • This paper addresses order-lot pegging issues in the supply chain of a semiconductor business. In such a semiconductor business (memory or system LSI) order-lot pegging issues are critical to achieving the goal of ATP (Available to Promise) and on-time production and delivery. However existing pegging system and researches do not consider capacity limit on bottleneck steps. This paper presents an order-lot pegging algorithm for assigning a lot to an order considering quality constraints of each lot and capacity of bottleneck steps along the entire FAB. As a result, a quick and accurate response can be provided to customer order enquiries and pegged lot lists for each promised orders can be shown transparently and short or late orders can be detected before fixing the order.

Traffic Speed Prediction Based on Graph Neural Networks for Intelligent Transportation System (지능형 교통 시스템을 위한 Graph Neural Networks 기반 교통 속도 예측)

  • Kim, Sunghoon;Park, Jonghyuk;Choi, Yerim
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.1
    • /
    • pp.70-85
    • /
    • 2021
  • Deep learning methodology, which has been actively studied in recent years, has improved the performance of artificial intelligence. Accordingly, systems utilizing deep learning have been proposed in various industries. In traffic systems, spatio-temporal graph modeling using GNN was found to be effective in predicting traffic speed. Still, it has a disadvantage that the model is trained inefficiently due to the memory bottleneck. Therefore, in this study, the road network is clustered through the graph clustering algorithm to reduce memory bottlenecks and simultaneously achieve superior performance. In order to verify the proposed method, the similarity of road speed distribution was measured using Jensen-Shannon divergence based on the analysis result of Incheon UTIC data. Then, the road network was clustered by spectrum clustering based on the measured similarity. As a result of the experiments, it was found that when the road network was divided into seven networks, the memory bottleneck was alleviated while recording the best performance compared to the baselines with MAE of 5.52km/h.

Hardware based set-associative IP address lookup scheme (하드웨어 기란 집합연관 IP 주소 검색 방식)

  • Yun Sang-Kyun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.8B
    • /
    • pp.541-548
    • /
    • 2005
  • IP lookup and forwarding process becomes the bottleneck of packet transmission as IP traffic increases. Previous hardware-based IP address lookup schemes using an index-based table are not memory-efficient due to sparse distribution of the routing prefixes. In this paper, we propose memory-efficient hardware based IP lookup scheme called set-associative IP address lookup scheme, which provides the same IP lookup speed with much smaller memory requirement. In the proposed scheme, an NHA entry stores the prefix and next hop together. The IP lookup procedure compares a destination IP address with eight entries in a corresponding set simultaneously and finds the longest matched prefix. The memory requirement of the proposed scheme is about $42\%$ of that of Lin's scheme. Thus, the set-associative IP address lookup scheme is a memory-efficient hardware based IP address lookup scheme.

Performance Evaluation for a Multiprocessor Computer System Using a Commercial Workload (상용 작업부하를 이용한 다중프로세서 컴퓨터 시스템 성능 평가)

  • 박진원
    • Journal of the Korea Society for Simulation
    • /
    • v.8 no.1
    • /
    • pp.35-49
    • /
    • 1999
  • The CC-NUMA based, distributed shared memory is an emerging architecture for multiprocessor computer systems because of its scalability and easy of programming. In this paper, we analyzed performance of a ring-based, CC-NUMA multiprocessor computer system using a commercial workload targeted for popular OLTP applications. Based on the traces collected from real machines, the characteristics of the commercial workload could be obtained. The simulation results showed that the bottleneck on the ring could be effectively removed by using a dual ring structure. We believe our simulation methodology and results will help us to design better multiprocessor computer systems for commercial application domains.

  • PDF

Cache Sensitive T-tree Index Structure (캐시를 고려한 T-트리 인덱스 구조)

  • Lee Ig-hoon;Kim Hyun Chul;Hur Jae Yung;Lee Snag-goo;Shim JunHo;Chang Juho
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.12-23
    • /
    • 2005
  • In the past decade, advances in speed of commodity CPUs have iu out-paced advances in memory latency Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. To reduce memory access latency, cache memory incorporated in the memory subsystem. but cache memories can reduce the memory latency only when the requested data is found in the cache. This mainly depends on the memory access pattern of the application. At this point, previous research has shown that B+ trees perform much faster than T-trees because B+ trees are more cache conscious than T-trees, and also proposed 'Cache Sensitive B+trees' (CSB. trees) that are more cache conscious than B+trees. The goal of this paper is to make T-trees be cache conscious as CSB-trees. We propose a new index structure called a 'Cache Sensitive T-trees (CST-trees)'. We implemented CST-trees and compared performance of CST-trees with performance of other index structures.

Performance Evaluation of a RAM based Storage System NGS

  • Kang, Yun-Hee;Kung, Jae-Ha;Cheong, Seung-Kook
    • International Journal of Contents
    • /
    • v.5 no.4
    • /
    • pp.75-80
    • /
    • 2009
  • Recently high-speed memory array based on RAM, which is a type of solid-state drive (SSD), has been introduced to handle the input/output (I/O) bottleneck. But there are only a few performance studies on RAM based SSD storage with regard to diverse workloads. In this paper, we focus on the file system for RAM based memory array based NGS (Next Generation Storage) system which is running on Linux operating system. Then we perform benchmark tests on practical file systems including Ext3, ReiserFS, XFS. The result shows XFS significantly outperforms other file systems in tests that represent the storage and data requests typically made by enterprise applications in many aspects. The experiment is used to design the dedicated file system for NGS system. The results presented here can help enterprises improve their performance significantly.

A Concurrent Testing of DRAMs Utilizing On-Chip Networks (온칩네트워크를 활용한 DRAM 동시 테스트 기법)

  • Lee, Changjin;Nam, Jonghyun;Ahn, Jin-Ho
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.2
    • /
    • pp.82-87
    • /
    • 2020
  • In this paper, we introduce the novel idea to improve the B/W usage efficiency of on-chip networks used for TAM to test multiple DRAMs. In order to avoid the local bottleneck of test packets caused by an ATE, we make test patterns using microcode-based instructions within ATE and adopt a test bus to transmit test responses from DRAM DFT (Design for Testability) called Test Generator (TG) to ATE. The proposed test platform will contribute to increasing the test economics of memory IC industry.

Resolving Memory Bottlenecks in Hardware Accelerators with Data Prefetch

  • Hyein Lee;Jinoo Joung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.1-12
    • /
    • 2024
  • Deep learning with faster and more accurate results requires large amounts of storage space and large computations. Accordingly, many studies are using hardware accelerators for quick and accurate calculations. However, the performance bottleneck is due to data movement between the hardware accelerators and the CPU. In this paper, we propose a data prefetch strategy that can efficiently reduce such operational bottlenecks. The core idea of the data prefetch strategy is to predict the data needed for the next task and upload it to local memory while the hardware accelerator (Matrix Multiplication Unit, MMU) performs a task. This strategy can be enhanced by using a dual buffer to perform read and write operations simultaneously. This reduces latency and execution time of data transfer. Through simulations, we demonstrate a 24% improvement in the performance of hardware accelerators by maximizing parallel processing with dual buffers and bottlenecks between memories with data prefetch.

IP Lookup Table Design Using LC-Trie with Memory Constraint (메모리 제약을 가진 LC-Trie를 이용한 IP 참조 테이블 디자인)

  • Lee, Chae-Y.;Park, Jae-G.
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.4
    • /
    • pp.406-412
    • /
    • 2001
  • IP address lookup is to determine the next hop destination of an incoming packet in the router. The address lookup is a major bottleneck in high performance router due to the increased routing table sizes, increased traffic, higher speed links, and the migration to 128 bits IPv6 addresses. IP lookup time is dependent on data structure of lookup table and search scheme. In this paper, we propose a new approach to build a lookup table that satisfies the memory constraint. The design of lookup table is formulated as an optimization problem. The objective is to minimize average depth from the root node for lookup. We assume that the frequencies with which prefixes are accessed are known and the data structure is level compressed trie with branching factor $\kappa$ at the root and binary at all other nodes. Thus, the problem is to determine the branching factor k at the root node such that the average depth is minimized. A heuristic procedure is proposed to solve the problem. Experimental results show that the lookup table based on the proposed heuristic has better average and the worst-case depth for lookup.

  • PDF