• Title/Summary/Keyword: 분산 메모리 환경

Search Result 118, Processing Time 0.023 seconds

Trends in Lightweight Kernel for Manycore Based High-Performance Computing (매니코어 기반 고성능 컴퓨팅을 지원하는 경량커널 동향)

  • Kim, J.M.;Cha, S.J.;Jeon, S.H.;Koh, K.W.;Jeong, Y.J.;Kim, K.H.;Jung, S.I.
    • Electronics and Telecommunications Trends
    • /
    • v.32 no.4
    • /
    • pp.48-56
    • /
    • 2017
  • 대규모 고성능 컴퓨팅 시스템에서 경량커널은 전통적으로 계산 노드에 탑재되어 특정 연산만을 수행한다. 특히 경량커널은 병렬 프로그램을 실행함에 있어 성능을 최대한 끌어올리기 위하여 자원 간의 간섭을 최소화할 수 있도록 개발되어 사용되고 있다. 최근에는 수천 개의 코어가 장착된 고성능 컴퓨팅 환경은 병렬프로그램뿐만 아니라 일반 응용 및 대규모 분산 응용에서도 필요하다. 고성능 컴퓨팅 환경에서는 매니코어와 메모리 자원이 늘어남에 따라 성능 확장성을 요구하는 현실적인 운영체제의 구조로서 경량커널과 리눅스를 같이 실행하는 멀티커널 구조를 선호하고 있다. 본고에서는 이러한 선행연구를 소개하고 매니코어 시스템에서 활용되는 최근 경량커널의 동향에 대해 살펴본다.

Impact of Process Scheduling on Network Performance over Multi-Core Systems (멀티 코어 시스템에서 통신 프로세스의 스케줄링에 따른 성능 분석)

  • Jang, Hye-Churn;Jin, Hyun-Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.827-829
    • /
    • 2009
  • 현재 멀티 코어 프로세서는 많은 서버에 적용되어 사용되고 있으며, 향후에는 하나의 프로세서 패키지에 포함될 코어의 개수는 계속해서 증가할 것이다. 그러나 현재 운영체제들은 멀티 코어 시스템을 멀티 프로세서 환경과 거의 동일하게 다루고 있으며 아직 멀티 코어 특성을 고려한 성능 최적화 시도는 미흡한 상태이다. 본 논문은 SMP와 NUMA 구조의 멀티 코어 프로세서 환경에서 통신 프로세스와 네트워크 인터럽트의 프로세서 친화도를 변화시키며 네트워크 처리율과 코어의 유휴 자원 양을 정량적으로 분석한다. 측정 결과 프로세서 친화도에 따라 통신 처리율은 크게 변하지 않지만 프로세서 자원의 요구량에는 크게 영향을 주는 것을 보인다. 또한 이러한 프로세서 자원의 영향은 멀티 코어 프로세서의 캐쉬 공유 구조 및 메모리 분산 구조와 밀접한 관계를 갖고 있음을 밝힌다.

A Reconfigurable Load and Performance Balancing Scheme for Parallel Loops in a Clustered Computing Environment (클러스터 컴퓨팅 환경에서 병렬루프 처리를 위한 재구성 가능한 부하 및 성능 균형 방법)

  • 김태형
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.1
    • /
    • pp.49-56
    • /
    • 2004
  • Load imbalance is a serious impediment to achieving good performance in parallel processing. Global load balancing schemes cannot adequately manage to balance parallel tasks generated from a single application. Dynamic loop scheduling methods are known to be useful in balancing parallel loops on shared-memory multiprocessor machines. However, their centralized nature causes a bottleneck for the relatively small number of processors in a network of workstations because of order-of-magniture differences in communication overheads. Moreover, improvements of basis loops scheduling methods have not effectively dealt with irregularly distributed workloads in parallel loops, which commonly occur in applications for a network of workstation. In this paper, we present a new reconfigurable and decentralized balancing method for parallel loops on a network of workstations. Since our method supplements performance balancing with those tranditional load balancing methods, it minimizes the overall execution time.

ABox Realization Reasoning in Distributed In-Memory System (분산 메모리 환경에서의 ABox 실체화 추론)

  • Lee, Wan-Gon;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.42 no.7
    • /
    • pp.852-859
    • /
    • 2015
  • As the amount of knowledge information significantly increases, a lot of progress has been made in the studies focusing on how to reason large scale ontology effectively at the level of RDFS or OWL. These reasoning methods are divided into TBox classifications and ABox realizations. A TBox classification mainly deals with integrity and dependencies in schema, whereas an ABox realization mainly handles a variety of issues in instances. Therefore, the ABox realization is very important in practical applications. In this paper, we propose a realization method for analyzing the constraint of the specified class, so that the reasoning system automatically infers the classes to which instances belong. Unlike conventional methods that take advantage of the object oriented language based distributed file system, we propose a large scale ontology reasoning method using spark, which is a functional programming-based in-memory system. To verify the effectiveness of the proposed method, we used instances created from the Wine ontology by W3C(120 to 600 million triples). The proposed system processed the largest 600 million triples and generated 951 million triples in 51 minutes (696 K triple / sec) in our largest experiment.

Performance Analysis of a Multiprocessor System Using Simulator Based on Parsec (Parsec 기반 시뮬레이터를 이용한 다중처리시스템의 성능 분석)

  • Lee Won-Joo;Kim Sun-Wook;Kim Hyeong-Rae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.35-42
    • /
    • 2006
  • In this paper we implement a new simulator for performance analysis of a parallel digital signal processing distributed shared memory multiprocessor systems. using Parsec The key idea of this simulator is suitable in simulation of system that uses DMA function of TMS320C6701 DSP chip and local memory which have fast access time. Also, because correction of performance parameter and reconfiguration for hardware components are easy, we can analyze performance of system in various execution environments. In the simulation, FET, 2D FET, Matrix Multiplication. and Fir Filter, which are widely used DSP algorithms. have been employed. Using our simulator, the result has been recorded according to different the number of processor, data sizes, and a change of hardware element. The performance of our simulator has been verified by comparing those recorded results.

  • PDF

Performance Comparison of Spatial Split Algorithms for Spatial Data Analysis on Spark (Spark 기반 공간 분석에서 공간 분할의 성능 비교)

  • Yang, Pyoung Woo;Yoo, Ki Hyun;Nam, Kwang Woo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.29-36
    • /
    • 2017
  • In this paper, we implement a spatial big data analysis prototype based on Spark which is an in-memory system and compares the performance by the spatial split algorithm on this basis. In cluster computing environments, big data is divided into blocks of a certain size order to balance the computing load of big data. Existing research showed that in the case of the Hadoop based spatial big data system, the split method by spatial is more effective than the general sequential split method. Hadoop based spatial data system stores raw data as it is in spatial-divided blocks. However, in the proposed Spark-based spatial analysis system, there is a difference that spatial data is converted into a memory data structure and stored in a spatial block for search efficiency. Therefore, in this paper, we propose an in-memory spatial big data prototype and a spatial split block storage method. Also, we compare the performance of existing spatial split algorithms in the proposed prototype. We presented an appropriate spatial split strategy with the Spark based big data system. In the experiment, we compared the query execution time of the spatial split algorithm, and confirmed that the BSP algorithm shows the best performance.

An Optimization Method for Hologram Generation on Multiple GPU-based Parallel Processing (다중 GPU기반 홀로그램 생성을 위한 병렬처리 성능 최적화 기법)

  • Kook, Joongjin
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.9-15
    • /
    • 2019
  • Since the computational complexity for hologram generation increases exponentially with respect to the size of the point cloud, parallel processing using CUDA and/or OpenCL library based on multiple GPUs has recently become popular. The CUDA kernel for parallelization needs to consist of threads, blocks, and grids properly in accordance with the number of cores and the memory size in the GPU. In addition, in case of multiple GPU environments, the distribution in grid-by-grid, in block-by-block, or in thread-by-thread is needed according to the number of GPUs. In order to evaluate the performance of CGH generation, we compared the computational speed in CPU, in single GPU, and in multi-GPU environments by gradually increasing the number of points in a point cloud from 10 to 1,000,000. We also present a memory structure design and a calculation method required in the CUDA-based parallel processing to accelerate the CGH (Computer Generated Hologram) generation operation in multiple GPU environments.

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

The QCE:A Binding Environment for Distributed Memory Multiprocessors (분산메모리 멀티프로세서 시스템을 위한 바인딩 환경(QCE))

  • Lee, Yong-Du;Kim, Hui-Cheol;Chae, Su-Hwan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.7
    • /
    • pp.1719-1726
    • /
    • 1996
  • In the OR-parallel execution of logic programs, binding environments have a critical impact on the performance. Particularly, this is true for distributed execution on parallel systems with a non-single address space. The reason is that in such systems, the remote accesses across processing elements deteriorate the performance. To solve this problem, some binding methods were previously proposed specifically for a non-single address space. However, compared with the binding methods for a single address space, they are far less efficient due to the overhead of newly introduced operations such as environment closing and back-unification, In this paper, we propose a new binding environment is a hybrid that combines both the binding methods for a single address space and those for anon-single address space. It acomplishes high efficiency by making closing operations unnecessary both at unification and at back-unification, while mainthing the restricted accesses.

  • PDF

Garbage Collection Synchronization Technique for Improving Tail Latency of Cloud Databases (클라우드 데이터베이스에서의 꼬리응답시간 감소를 위한 가비지 컬렉션 동기화 기법)

  • Han, Seungwook;Hahn, Sangwook Shane;Kim, Jihong
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.767-773
    • /
    • 2017
  • In a distributed system environment, such as a cloud database, the tail latency needs to be kept short to ensure uniform quality of service. In this paper, through experiments on a Cassandra database, we show that long tail latency is caused by a lack of memory space because the database cannot receive any request until free space is reclaimed by writing the buffered data to the storage device. We observed that, since the performance of the storage device determines the amount of time required for writing the buffered data, the performance degradation of Solid State Drive (SSD) due to garbage collection results in a longer tail latency. We propose a garbage collection synchronization technique, called SyncGC, that simultaneously performs garbage collection in the java virtual machine and in the garbage collection in SSD concurrently, thus hiding garbage collection overheads in the SSD. Our evaluations on real SSDs show that SyncGC reduces the tail latency of $99.9^{th}$ and, $99.9^{th}-percentile$ by 31% and 36%, respectively.