• Title/Summary/Keyword: 공유 메모리 구조

Search Result 143, Processing Time 0.028 seconds

Resource Sharing Method to Reduce Duplicate Operation Cost of Multiple Spatial Aggregates in u-GIS Environment (u-GIS 환경에서 다중 공간 집계 질의의 중복연산 비용을 감소시키기 위한 자원공유 기법)

  • Seo, Min-ho;Kim, Sang-Ki;Baek, Sung-Ha;Li, Yan;Lee, Dong-Wook;Bae, Hae-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.344-347
    • /
    • 2009
  • 데이터 스트림을 처리하기 위한 연속집계질의 수행 시 중복연산 및 메모리의 절약을 위하여 큐를 공유하는 자원공유기법이 연구되었다. 기존의 자원공유 기법들은 질의의 프리디킷이 일치할 때만 처리하기 때문에, 질의의 프리디킷이 차이가 나는 경우가 많은 다중공간 집계질의가 자주 요청되는 u-GIS 환경에서 효율적으로 중복영역을 처리할 수 있는 자원공유 기법이 요구된다. 본 논문에서는 공간영역을 효율적으로 그룹화하는 R-tree 의 특징을 이용하여 질의간의 중복영역을 그룹화하고 중복영역의 자원을 패인(Pane)구조를 이용하여 공유한다. 노드 수에 제한이 없고 레벨을 1로 하는 R-tree 로 유사한 위치의 질의들을 그룹화 한 후, 그 질의들의 영역이 겹쳐지는 부분을 패인을 이용해 집계 값을 공유하여 중복계산을 피하는 방법이다. 제안 기법은 공간 집계질의를 처리할 수 있고, 기존의 계층구조의 자원공유 기법을 사용할 때에 비해 자원을 적게 사용하고 질의 처리 시간을 단축시켰다. 성능평가를 통하여 제안기법이 메모리 사용량을 감소시키는 것을 보였으며, 질의 처리 속도가 증가하였다.

A Branch Target Buffer Using Shared Tag Memory with TLB (TLB 태그 공유 구조의 분기 타겟 버퍼)

  • Lee, Yong-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.899-902
    • /
    • 2005
  • Pipeline hazard due to branch instructions is the major factor of the degradation on the performance of microprocessors. Branch target buffer predicts whether a branch will be taken or not and supplies the address of the next instruction on the basis of that prediction. If the branch target buffer predicts correctly, the instruction flow will not be stalled. This leads to the better performance of microprocessor. In this paper, the architecture of a tag memory that branch target buffer and TLB can share is presented. Because the two tag memories used for branch target buffer and TLB each is replaced by single shared tag memory, we can expect the smaller ship size and the faster prediction. This hared tag architecture is more advantageous for the microprocessors that uses more bits of address and exploits much more instruction level parallelism.

  • PDF

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs (GP-GPU의 캐시메모리를 활용하기 위한 병렬 블록 LU 분해 프로그램의 구현)

  • Kim, Youngtae;Kim, Doo-Han;Yu, Myoung-Han
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.41-47
    • /
    • 2013
  • GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.

A Study on Implementation of a VXIbus System Using Shared Memory Protocol (공유메모리 프로토콜을 이용한 VXIbus 시스템 구현에 관한 연구)

  • 노승환;강민호;김덕진
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.9
    • /
    • pp.1332-1347
    • /
    • 1993
  • Existing instruments are composed independently according to their function and user constructed instrumentation system with those instruments. But in the late 1980s VXI bus enables to construct instrumentation system with various modular type instruments. For an VXI bus system with the word serial protocol, an increase of data size can degrade the system performance. In this paper shared memory protocol is proposed to overcome performance degradation. The shared memory protocol is analyzed using the GSPN and compared with that of the word serial protocol. It is shown that the shared memory protocol has a better performance than the word serial protocol. The VXI bus message based-system with the proposed shared memory protocol is constructed and experimented with signal generating device and FFT analyzing device. Up to 80 KHz input signal the result of FFT analysis is accurate and that result is agree with that of conventional FFT analyzer. In signal generating experiment from 100 KHz to 1.1 GHz sine wave is generated.

  • PDF

Join Operation of Parallel Database System with Large Main Memory (대용량 메모리를 가진 병렬 데이터베이스 시스템의 조인 연산)

  • Park, Young-Kyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.3
    • /
    • pp.51-58
    • /
    • 2007
  • The shared-nothing multiprocessor architecture has advantages in scalability, this architecture has been adopted in many multiprocessor database system. But, if the data are not uniformly distributed across the processors, load will be unbalanced. Therefore, the whole system performance will deteriorate. This is the data skew problem, which usually occurs in processing parallel hash join. Balancing the load before performing join will resolve this problem efficiently and the whole system performance can be improved. In this paper, we will present an algorithm using merit of very large memory to reduce disk access overhead in performing load balancing and to efficiently solve the data skew problem. Also, we will present analytical model of our new algorithm and present the result of some performance study we made comparing our algorithm with the other algorithms in handling data skew.

  • PDF

Application Software Structure of Compact Nuclear Simulator based on Shared Memory Variables (공유메모리 변수 기반의 CNS 응용 소프트웨어 구조)

  • 박근옥;서용석;이종복
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.544-564
    • /
    • 2001
  • CNS(Compact Nuclear Simulator)는 원자력발전산업에 종사하는 조직구성원을 교육훈련 시키는 필수도구로써 원자력 시뮬레이터의 유형 중에 중형규모에 속한다. 원자력 시뮬레이터는 다양한 기능과 복잡성을 갖는 이질적인 응용 소프트웨어가 요구되기 때문에 개발기간이 길고 비용이 많이 든다. 본 연구는 이를 극복하기 위한 일환으로 상용도구의 과감한 활용, 소프트웨어 생명주기의 준수, 단순 명료한 시뮬레이션 응용 소프트웨어 구조개발을 수행하고 있다. 본 논문에서는 CNS 응용 소프트웨어 유형과 기능, 공유메모리 변수를 사용한 응용 소프트웨어 구조개발의 경험을 살펴본다. 또한, 본 연구를 통하여 얻은 CNS 응용 소프트웨어 개발효과와 향후 유사한 시뮬레이터의 개발방향을 토의한다.

  • PDF

A Study on Efficient Executions of MPI Parallel Programs in Memory-Centric Computer Architecture

  • Lee, Je-Man;Lee, Seung-Chul;Shin, Dongha
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.1-11
    • /
    • 2020
  • In this paper, we present a technique that executes MPI parallel programs, that are developed on processor-centric computer architecture, more efficiently on memory-centric computer architecture without program modification. The technique we present here improves performance by replacing low-speed data communication over the network of MPI library functions with high-speed data communication using the property called fast large shared memory of memory-centric computer architecture. The technique we present in the paper is implemented in two programs. The first program is a modified MPI library called MC-MPI-LIB that runs MPI parallel programs more efficiently on memory-centric computer architecture preserving the semantics of MPI library functions. The second program is a simulation program called MC-MPI-SIM that simulates the performance of memory-centric computer architecture on processor-centric computer architecture. We developed and tested the programs on distributed systems environment deployed on Docker based virtualization. We analyzed the performance of several MPI parallel programs and showed that we achieved better performance on memory-centric computer architecture. Especially we could see very high performance on the MPI parallel programs with high communication overhead.

Multi-Programmed Simulation of a Shared Memory Multiprocessor System (공유메모리 다중프로세서 시스템의 다중 프로그래밍 모의실험 기법)

  • 최효진;전주식
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.3_4
    • /
    • pp.194-204
    • /
    • 2003
  • The performance of a shared memory multiprocessor system is dependent on the system software such as scheduling policy as well as hardware system. Most of existing simulators, however, do not support simulation for multi-programmed environment because they can execute only a single benchmark application at a time. We propose a multi-programmed simulation method on a program-driven simulator, which enables the concurrent executions of multiple parallel workloads contending for limited system resources. Using the proposed method, system developers can measure and analyze detailed effects of resource conflicts among the concurrent applications as well as the effects of scheduling policies on a program-driven simulator. As a result, the proposed multi-programmed simulation provides more accurate and realistic performance projection to design a multiprocessor system.

Design and Implementation Systolic Array FFT Processor Based on Shared Memory (공유 메모리 기반 시스토릭 어레이 FFT 프로세서 설계 및 구현)

  • Jeong, Dongmin;Roh, yunseok;Son, Hanna;Jung, Yongchul;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.797-802
    • /
    • 2020
  • In this paper, we presents the design and implementation results of the FFT processor, which supports 4096 points of operation with less memory by sharing several memory used in the base-4 systolic array FFT processor into one memory. Sharing memory provides the advantage of reducing the area, and also simplifies the flow of data as I/O of the data progresses in one memory. The presented FFT processor was implemented and verified on the FPGA device. The implementation resulted in 51,855 CLB LUTs, 29,712 CLB registers, 8 block RAM tiles and 450 DSPs, and confirmed that the memory area could be reduced by 65% compared to the existing base-4 systolic array structure.

The Design and Implementation of the ParaC Language (ParaC 언어의 설계 및 구현)

  • Lee, Kyoung-Seok;Woo, Young-Choon;Kim, Jin-Mee;Chi, Dong-Hae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.11
    • /
    • pp.2903-2913
    • /
    • 1997
  • This paper describes the design and implementation of the ParaC language that supports parallel programming on the shared memory and distributed memory parallel machine. The ParaC language is designed for the effective use of system resources of scalable parallel systems. The goal is achieved by adding parallel and synchronization constructs for shared address spaces, and remote task constructs for distributed address spaces. This paper also shows the translation method, and we implement the translator and the run-time library for parallel execution of extended constructs.

  • PDF