• Title/Summary/Keyword: computation scalability

Search Result 72, Processing Time 0.03 seconds

A Study on the Scalability of Multi-core-PC Cluster for Seismic Design of Reinforced-Concrete Structures based on Genetic Algorithm (유전알고리즘 기반 콘크리트 구조물의 최적화 설계를 위한 멀티코어 퍼스널 컴퓨터 클러스터의 확장 가능성 연구)

  • Park, Keunhyoung;Choi, Se Woon;Kim, Yousok;Park, Hyo Seon
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.26 no.4
    • /
    • pp.275-281
    • /
    • 2013
  • In this paper, determination of the scalability of the cluster composed common personal computer was performed when optimization of reinforced concrete structure using genetic algorithm. The goal of this research is watching the potential of multi-core-PC cluster for optimization of seismic design of reinforced-concrete structures. By increasing the number of core-processer of cluster, decreasing of computation time per each generation of genetic algorithm was observed. After classifying the components in singular personal computer, the estimation of the expected bottle-neck phenomenon and comparison with wall-clock time and Amdahl's law equation was performed. So we could obseved the scalability of the cluster appear complex tendency. For separating the bottle-neck phenomenon of physical and algorithm, the different size of population was selected for genetic algorithm cases. When using 64 core-processor, the efficiency of cluster is low as 31.2% compared with Amdahl's law efficiency.

Service Curve Allocation Schemes for High Network Utilization with a Constant Deadline Computation Cost (상수의 데드라인 계산 비용으로 높은 네트웍 유용도를 얻는 서비스 곡선 할당 방식)

  • 편기현;송준화;이흥규
    • Journal of KIISE:Information Networking
    • /
    • v.30 no.4
    • /
    • pp.535-544
    • /
    • 2003
  • Integrated services networks should guarantee end-to-end delay bounds for real-time applications to provide high quality services. A real-time scheduler is installed on all the output ports to provide such guaranteed service. However, scheduling algorithms studied so far have problems with either network utilization or scalability. Here, network utilization indicates how many real-time sessions can be admitted. In this paper, we propose service curve allocation schemes that result in both high network utilization and scalability in a service curve algorithm. In service curve algorithm, an adopted service curve allocation scheme determines both network utilization and scalability. Contrary to the common belief, we have proved that only a part of a service curve is used to compute deadlines, not the entire curve. From this fact, we propose service curve allocation schemes that result in a constant time for computing deadlines. We through a simulation study that our proposed schemes can achieve better network utilizations than Generalized processor Sharing (GPS) algorithms including the multirate algorithm. To our knowledge, the service curve algorithm adopting our schemes can achieve the widest network utilization among existing scheduling algorithms that have the same scalability.

A Parallel Emulation Scheme for Data-Flow Architecture on Loosely Coupled Multiprocessor Systems (이완 결합형 다중 프로세서 시스템을 사용한 데이터 플로우 컴퓨터 구조의 병렬 에뮬레이션에 관 한 연구)

  • 이용두;채수환
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1902-1918
    • /
    • 1993
  • Parallel architecture based on the von Neumann computation model has a limitation as a massively parallel architecture due to its inherent drawback of architectural features. The data-flow model of computation has a high programmability in software perspective and high scalability in hardware perspective. However, the practical programming and experimentaion of date-flow architectures are hardly available due to the absence of practical data-flow, we present a programming environment for performing the data-flow computation on conventional parallel machines in general, loosely compled multiprocessor system in particular. We build an emulator for tagged token data-flow architecture on the iPSC/2 hypercube, a loosely coupled multiprocessor system. The emulator is a shallow layer of software executing on an iPSC/2 system, and thus makes the iPSC/2 system work as a data-flow architecture from the programmer`s viewpoint. We implement various numerical and non-numerical algorithm in a data-flow assembler language, and then compare the performance of the program with those of the versions of conventional C language, Consequently, We verify the effectiveness of this programming environment based on the emulator in experimenting the data-flow computation on a conventional parallel machine.

  • PDF

Further Improvement of Direct Solution-based FETI Algorithm (직접해법 기반의 FETI 알고리즘의 개선)

  • Kang, Seung-Hoon;Gong, DuHyun;Shin, SangJoon
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.35 no.5
    • /
    • pp.249-257
    • /
    • 2022
  • This paper presents an improved computational framework for the direct-solution-based finite element tearing and interconnecting (FETI) algorithm. The FETI-local algorithm is further improved herein, and localized Lagrange multipliers are used to define the interface among its subdomains. Selective inverse entry computation, using a property of the Boolean matrix, is employed for the computation of the subdomain interface stiffness and load, in which the original FETI-local algorithm requires a full matrix inverse computation of a high computational cost. In the global interface computation step, the original serial computation is replaced by a parallel multi-frontal method. The performance of the improved FETI-local algorithm was evaluated using a numerical example with 64 million degrees of freedom (DOFs). The computational time was reduced by up to 97.8% compared to that of the original algorithm. In addition, further stable and improved scalability was obtained in terms of a speed-up indicator. Furthermore, a performance comparison was conducted to evaluate the differences between the proposed algorithm and commercial software ANSYS using a large-scale computation with 432 million DOFs. Although ANSYS is superior in terms of computational time, the proposed algorithm has an advantage in terms of the speed-up increase per processor increase.

A CDMA-Based Communication Network for a Multiprocessor SoC (다중 프로세서를 갖는 SoC 를 위한 CDMA 기술에 기반한 통신망 설계)

  • Chun, Ik-Jae;Kim, Bo-Gwan
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.707-710
    • /
    • 2005
  • In this paper, we propose a new communication network for on-chip communication. The network is based on a direct sequence code division multiple access (DS-CDMA) technique. The new communication network is suitable for a parallel processing system and also drastically reduces the I/O pin count. Our network architecture is mainly divided into a CDMA-based network interface (CNI), a communication channel, a synchronizer. The network includes a reverse communication channel for reducing latency. The network decouples computation task from communication task by the CNI. An extreme truncation is considered to simplify the communication link. For the scalability of the network, we use a PN-code reuse method and a hierarchical structure. The network elements have a modular architecture. The communication network is done using fully synthesizable Verilog HDL to enhance the portability between process technologies.

  • PDF

Efficient Parallel Block-layered Nonbinary Quasi-cyclic Low-density Parity-check Decoding on a GPU

  • Thi, Huyen Pham;Lee, Hanho
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.210-219
    • /
    • 2017
  • This paper proposes a modified min-max algorithm (MMMA) for nonbinary quasi-cyclic low-density parity-check (NB-QC-LDPC) codes and an efficient parallel block-layered decoder architecture corresponding to the algorithm on a graphics processing unit (GPU) platform. The algorithm removes multiplications over the Galois field (GF) in the merger step to reduce decoding latency without any performance loss. The decoding implementation on a GPU for NB-QC-LDPC codes achieves improvements in both flexibility and scalability. To perform the decoding on the GPU, data and memory structures suitable for parallel computing are designed. The implementation results for NB-QC-LDPC codes over GF(32) and GF(64) demonstrate that the parallel block-layered decoding on a GPU accelerates the decoding process to provide a faster decoding runtime, and obtains a higher coding gain under a low $10^{-10}$ bit error rate and low $10^{-7}$ frame error rate, compared to existing methods.

High-Performance Korean Morphological Analyzer Using the MapReduce Framework on the GPU

  • Cho, Shi-Won;Lee, Dong-Wook
    • Journal of Electrical Engineering and Technology
    • /
    • v.6 no.4
    • /
    • pp.573-579
    • /
    • 2011
  • To meet the scalability and performance requirements of data analyses, which often involve voluminous data, efficient parallel or concurrent algorithms and frameworks are essential. We present a high-performance Korean morphological analyzer which employs the MapReduce framework on the graphics processing unit (GPU). MapReduce is a programming framework introduced by Google to aid the development of web search applications on a large number of central processing units (CPUs). GPUs are designed as a special-purpose co-processor. Their programming interfaces are typically formulated for graphics applications. Compared to CPUs, GPUs have greater computation power and memory bandwidth; however, GPUs are more difficult to program because of the design of their architectures. The performance of the Korean morphological analyzer using the MapReduce framework on the GPU is evaluated in comparison with the CPU-based model. The proposed Korean Morphological analyzer shows promising scalable performance on distributed computing with the GPU.

NEUTRONICS MODELING AND SIMULATION OF SHARP FOR FAST REACTOR ANALYSIS

  • Yang, W.S.;Smith, M.A.;Lee, C.H.;Wollaber, A.;Kaushik, D.;Mohamed, A.S.
    • Nuclear Engineering and Technology
    • /
    • v.42 no.5
    • /
    • pp.520-545
    • /
    • 2010
  • This paper presents the neutronics modeling capabilities of the fast reactor simulation system SHARP, which ANL is developing as part of the U.S. DOE's NEAMS program. We discuss the three transport solvers (PN2ND, SN2ND, and MOCFE) implemented in the UNIC code along with the multigroup cross section generation code $MC^2$-3. We describe the solution methods and modeling capabilities, and discuss the improvement needs for each solver, focusing on massively parallel computation. We present the performance test results against various benchmark problems and ZPR-6 and ZPPR critical experiments. We also discuss weak and strong scalability results for the SN2ND solver on the ZPR-6 critical assembly benchmarks.

Collaborative Recommendations using Adjusted Product Hierarchy : Methodology and Evaluation (재구성된 제품 계층도를 이용한 협업 추천 방법론 및 그 평가)

  • Cho, Yoon-Ho;Park, Su-Kyung;Ahn, Do-Hyun;Kim, Jae-Kyeong
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.29 no.2
    • /
    • pp.59-75
    • /
    • 2004
  • Recommendation is a personalized information filtering technology to help customers find which products they would like to purchase. Collaborative filtering works by matching customer preferences to other customers in making recommendations. But collaborative filtering based recommendations have two major limitations, sparsity and scalability. To overcome these problems we suggest using adjusted product hierarchy, grain. This methodology focuses on dimensionality reduction and uses a marketer's specific knowledge or experience to improve recommendation quality. The qualify of recommendations using each grain is compared with others by several experimentations. Experiments present that the usage of a grain holds the promise of allowing CF-based recommendations to scale to large data sets and at the same time produces better recommendations. In addition. our methodology is proved to save the computation time by 3∼4 times compared with collaborative filtering.

Load Balancing Strategies for Network-based Cluster System

  • Jung, Hoon-Jin;Choung Shik park;Park, Sang-Bang
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.314-317
    • /
    • 2000
  • Cluster system provides attractive scalability in terms of computation power and memory size. With the advances in high speed computer network technology, cluster systems are becoming increasingly competitive compared to expensive parallel machines. In parallel processing program, each task load is difficult to predict before running the program and each task is interdependent each other in many ways. Load imbalancing induces an obstacle to system performance. Most of researches in load balancing were concerned with distributed system but researches in cluster system are few. In cluster system, the dynamic load balancing algorithm which evaluates each processor's load in runtime is purpose that the load of each node are evenly distributed. But, if communication cost or node complexity becomes high, it is not effective method for all nodes to attend load balancing process. In that circumstances, it is good to reduce the number of node which attend to load balancing process. We have modeled cluster systems and proposed marginal dynamic load balancing algorithms suitable for that circumstances.

  • PDF