Search | Korea Science

Reducing False Sharing based on Memory Reference Patterns in Distributed Shared Memory Systems (분산 공유 메모리 시스템에서 메모리 참조 패턴에 근거한 거짓 공유 감속 기법)

Jo, Seong-Je
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.4
- /
- pp.1082-1091
- /
- 2000
In Distributed Shared Memory systems, false sharing occurs when two different data items, not shared but accessed by two different processors, are allocated to a single block and is an important factor in degrading system performance. The paper first analyzes shared memory allocation and reference patterns in parallel applications that allocate memory for shared data objects using a dynamic memory allocator. The shared objects are sequentially allocated and generally show different reference patterns. If the objects with the same size are requested successively as many times as the number of processors, each object is referenced by only a particular processor. If the objects with the same size are requested successively much more than the number of processors, two or more successive objects are referenced by only particular processors. On the basis of these analyses, we propose a memory allocation scheme which allocates each object requested by different processors to different pages and evaluate the existing memory allocation techniques for reducing false sharing faults. Our allocation scheme reduces a considerable amount of false sharing faults for some applications with a little additional memory space.
PDF

A Parallel Speech Recognition Model on Distributed Memory Multiprocessors (분산 메모리 다중프로세서 환경에서의 병렬 음성인식 모델)

정상화;김형순;박민욱;황병한
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.5
- /
- pp.44-51
- /
- 1999
This paper presents a massively parallel computational model for the efficient integration of speech and natural language understanding. The phoneme model is based on continuous Hidden Markov Model with context dependent phonemes, and the language model is based on a knowledge base approach. To construct the knowledge base, we adopt a hierarchically-structured semantic network and a memory-based parsing technique that employs parallel marker-passing as an inference mechanism. Our parallel speech recognition algorithm is implemented in a multi-Transputer system using distributed-memory MIMD multiprocessors. Experimental results show that the parallel speech recognition system performs better in recognition accuracy than a word network-based speech recognition system. The recognition accuracy is further improved by applying code-phoneme statistics. Besides, speedup experiments demonstrate the possibility of constructing a realtime parallel speech recognition system.
PDF

Efficient Data Management for Finite Element Analysis with Pre-Post Processing of Large Structures (전-후 처리 과정을 포함한 거대 구조물의 유한요소 해석을 위한 효율적 데이터 구조)

박시형;박진우;윤태호;김승조
- Proceedings of the Computational Structural Engineering Institute Conference
- /
- 2004.04a
- /
- pp.389-395
- /
- 2004
We consider the interface between the parallel distributed memory multifrontal solver and the finite element method. We give in detail the requirement and the data structure of parallel FEM interface which includes the element data and the node array. The full procedures of solving a large scale structural problem are assumed to have pre-post processors, of which algorithm is not considered in this paper. The main advantage of implementing the parallel FEM interface is shown up in the case that we use a distributed memory system with a large number of processors to solve a very large scale problem. The memory efficiency and the performance effect are examined by analyzing some examples on the Pegasus cluster system.
PDF

Bandwidth-aware Memory Placement on Hybrid Memories targeting High Performance Computing Systems

Lee, Jongmin
- Journal of the Korea Society of Computer and Information
- /
- v.24 no.8
- /
- pp.1-8
- /
- 2019
Modern computers provide tremendous computing capability and a large memory system. Hybrid memories consist of next generation memory devices and are adopted in high performance systems. However, the increased complexity of the microprocessor makes it difficult to operate the system effectively. In this paper, we propose a simple data migration method called Bandwidth-aware Data Migration (BDM) to efficiently use memory systems for high performance processors with hybrid memory. BDM monitors the status of applications running on the system using hardware performance monitoring tools and migrates the appropriate pages of selected applications to High Bandwidth Memory (HBM). BDM selects applications whose bandwidth usages are high and also evenly distributed among the threads. Experimental results show that BDM improves execution time by an average of 20% over baseline execution.
https://doi.org/10.9708/jksci.2019.24.08.001 인용 PDF KSCI

A Development of Distributed Parallel Processing algorithm for Power Flow analysis (전력 조류 계산의 분산 병렬처리기법에 관한 연구)

Lee, Chun-Mo;Lee, Hae-Ki
- Proceedings of the KIEE Conference
- /
- 2001.07e
- /
- pp.134-140
- /
- 2001
Parallel processing has the potential to be cost effectively used on computationally intense power system problems. But this technology is not still available is not only parallel computer but also parallel processing scheme. Testing these algorithms to ensure accuracy, and evaluation of their performance is also an issue. Although a significant amount of parallel algorithms of power system problem have been developed in last decade, actual testing on processor architectures lies in the beginning stages. This paper presents the parallel processing algorithm to supply the base being able to treat power flow by newton's method by the distributed memory type parallel computer. This method is to assign and to compute teared blocks of sparse matrix at each parallel processors. The testing to insure accuracy of developed method have been done on serial computer by trying to simulate a parallel environment.
PDF

GLOBAL EXPONENTIAL STABILITY OF BAM FUZZY CELLULAR NEURAL NETWORKS WITH DISTRIBUTED DELAYS AND IMPULSES

Li, Kelin;Zhang, Liping
- Journal of applied mathematics & informatics
- /
- v.29 no.1_2
- /
- pp.211-225
- /
- 2011
In this paper, a class of bi-directional associative memory (BAM) fuzzy cellular neural networks with distributed delays and impulses is formulated and investigated. By employing an integro-differential inequality with impulsive initial conditions and the topological degree theory, some sufficient conditions ensuring the existence and global exponential stability of equilibrium point for impulsive BAM fuzzy cellular neural networks with distributed delays are obtained. In particular, the estimate of the exponential convergence rate is also provided, which depends on the delay kernel functions and system parameters. It is believed that these results are significant and useful for the design and applications of BAM fuzzy cellular neural networks. An example is given to show the effectiveness of the results obtained here.
https://doi.org/10.14317/jami.2011.29.1_2.211 인용 PDF KSCI

The T-tree index recovery for distributed main-memory database systems in ATM switching systems (ATM 교환기용 분산 주기억장치 상주 데이터베이스 시스템에서의 T-tree 색인 구조의 회복 기법)

이승선;조완섭;윤용익
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.22 no.9
- /
- pp.1867-1879
- /
- 1997
DREAM-S is a distributed main-memory database system for the real-time processing of shared operational datra in ATM switching systems. DREAM-S has a client-server architecture in which only the server has the diskstorage, and provides the T-Tree index structure for efficient accesses to the data. We propose a recovery technique for the T-Tree index structre in DREAM-S. Although main-memory database system offer efficient access performance, the database int he main-memory may be broken when system failure such as database transaction failure or power failure occurs. Therfore, a recovery technique that recovers the database (including index structures) is essential for fault tolerant ATM switching systems. Proposed recovery technique relieves the bottleneck of the server processors disk operations by maintaining the T-Tree index structure only in the main-memory. In addition, fast recovery is guaranteed even in large number of client systems since the T-Tree index structure(s) in each system can be recovered cncurrently.
PDF

Data Replication and Migration Scheme for Load Balancing in Distributed Memory Environments (분산 인-메모리 환경에서 부하 분산을 위한 데이터 복제와 이주 기법)

Choi, Kitae;Yoon, Sangwon;Park, Jaeyeol;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
- KIISE Transactions on Computing Practices
- /
- v.22 no.1
- /
- pp.44-49
- /
- 2016
Recently, data has been growing dramatically along with the growth of social media and digital devices. A distributed memory processing system has been used to efficiently process large amounts of data. However, if a load is concentrated in a certain node in distributed environments, a node performance significantly degrades. In this paper, we propose a load balancing scheme to distribute load in a distributed memory environment. The proposed scheme replicates hot data to multiple nodes for managing a node's load and migrates the data by considering the load of the nodes when nodes are added or removed. The client reduces the number of accesses to the central server by directly accessing the data node through the metadata information of the hot data. In order to show the superiority of the proposed scheme, we compare it with the existing load balancing scheme through performance evaluation.
https://doi.org/10.5626/KTCP.2016.22.1.44 인용 KSCI

Distributed Simulator for General Control System in CEMTool

Lee, Tai-Ri;Lee, Young-Sam;Lee, Kwan-Ho;Kwon, Wook-Hyun
- 제어로봇시스템학회:학술대회논문집
- /
- 2003.10a
- /
- pp.2230-2234
- /
- 2003
This paper proposes a distributed simulator for general control system in CEMTool. Systems can be described by SIMTool likes the simulink in Matlab. For distributed simulation, we can seperate any system into several parallel subsystems in SIMTool. The number of parallel subsystem can be determined by the system's property. After seperation, parallel simulator will do initialization, one-step-ahead simulation, block-distribution and ordering and so on. Finally, simulator will create independent C codes and executive files for each subsystem. The whole system is fulfilled by several PCs, and each PC executes one subsystem. There are communications among these subsystem using reflective memory or ethernet. We have made several experiments, and the 5-stand cold rolling mill control system is our main target. The result of parallel simulation has shown effective speedup in comparison with one pc simulation.
PDF

Study of In-Memory based Hybrid Big Data Processing Scheme for Improve the Big Data Processing Rate (빅데이터 처리율 향상을 위한 인-메모리 기반 하이브리드 빅데이터 처리 기법 연구)

Lee, Hyeopgeon;Kim, Young-Woon;Kim, Ki-Young
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.12 no.2
- /
- pp.127-134
- /
- 2019
With the advancement of IT technology, the amount of data generated has been growing exponentially every year. As an alternative to this, research on distributed systems and in-memory based big data processing schemes has been actively underway. The processing power of traditional big data processing schemes enables big data to be processed as fast as the number of nodes and memory capacity increases. However, the increase in the number of nodes inevitably raises the frequency of failures in a big data infrastructure environment, and infrastructure management points and infrastructure operating costs also increase accordingly. In addition, the increase in memory capacity raises infrastructure costs for a node configuration. Therefore, this paper proposes an in-memory-based hybrid big data processing scheme for improve the big data processing rate. The proposed scheme reduces the number of nodes compared to traditional big data processing schemes based on distributed systems by adding a combiner step to a distributed system processing scheme and applying an in-memory based processing technology at that step. It decreases the big data processing time by approximately 22%. In the future, realistic performance evaluation in a big data infrastructure environment consisting of more nodes will be required for practical verification of the proposed scheme.
https://doi.org/10.17661/jkiiect.2019.12.2.127 인용 PDF KSCI HTML

Search Result 211, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)