• Title/Summary/Keyword: Computations Execution

Search Result 33, Processing Time 0.032 seconds

A Network-Distributed Design Optimization Approach for Aerodynamic Design of a 3-D Wing (3차원 날개 공력설계를 위한 네트워크 분산 설계최적화)

  • Joh, Chang-Yeol;Lee, Sang-Kyung
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.32 no.10
    • /
    • pp.12-19
    • /
    • 2004
  • An aerodynamic design optimization system for three-dimensional wing was developed as a part of the future MDO framework. The present design optimization system includes four modules such as geometry design, grid generation, flow solver and optimizer. All modules were based on commercial softwares and programmed to have automated execution capability in batch mode utilizing built-in script and journaling. The integration of all modules into the system was accomplished through programming using Visual Basic language. The distributed computational environment based on network communication was established to save computational time especially for time-consuming aerodynamic analyses. The distributed aerodynamic computations were performed in conjunction with the global optimization algorithm of response surface method, instead of using usual parallel computation based on domain decomposition. The application of the design system in the drag minimization problem demonstrated considerably enhanced efficiency of the design process while the final design showed reasonable results of reduced drag.

HTCaaS(High Throughput Computing as a Service) in Supercomputing Environment (슈퍼컴퓨팅환경에서의 대규모 계산 작업 처리 기술 연구)

  • Kim, Seok-Kyoo;Kim, Jik-Soo;Kim, Sangwan;Rho, Seungwoo;Kim, Seoyoung;Hwang, Soonwook
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.5
    • /
    • pp.8-17
    • /
    • 2014
  • Petascale systems(so called supercomputers) have been mainly used for supporting communication-intensive and tightly-coupled parallel computations based on message passing interfaces such as MPI(HPC: High-Performance Computing). On the other hand, computing paradigms such as High-Throughput Computing(HTC) mainly target compute-intensive (relatively low I/O requirements) applications consisting of many loosely-coupled tasks(there is no communication needed between them). In Korea, recently emerging applications from various scientific fields such as pharmaceutical domain, high-energy physics, and nuclear physics require a very large amount of computing power that cannot be supported by a single type of computing resources. In this paper, we present our HTCaaS(High-Throughput Computing as a Service) which can leverage national distributed computing resources in Korea to support these challenging HTC applications and describe the details of our system architecture, job execution scenario and case studies of various scientific applications.

Analysis of Programming Techniques for Creating Optimized CUDA Software (최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석)

  • Kim, Sung-Soo;Kim, Dong-Heon;Woo, Sang-Kyu;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.775-787
    • /
    • 2010
  • Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.

A Study on the Encrypted Scheme Using Key Management Method Based on the Random Number Rearrangement for the Effective E-Document Management (효율적인 전자문서 관리를 위한 난수 재배열 기반의 키 관리 방법을 이용한 암호화 기법에 관한 연구)

  • Kim, Tae-Wook;Sung, Kyung-Sang;Kim, Jung-Jae;Min, Byoung-Muk;Oh, Hae-Seok
    • The KIPS Transactions:PartC
    • /
    • v.16C no.5
    • /
    • pp.575-582
    • /
    • 2009
  • With all merits of electronic documents, there exist threats to the security such as illegal outflow, destroying, loss, distortion, etc. The techniques to protect the electronic documents against illegal forgery, alteration, removal are strongly requested. Even though various security technologies have been developed for electronic documents, most of them are emphasized to prevention of forgery or repudiation. This paper presents some problems in cryptography technologies currently used in the existing electronic document systems, and offer efficient methods to adopt cryptography algorithms to improve and secure the electronic document systems. To validate performance of the proposed random rearrangement method comparing with the existing cryptographies, basic elements have been compared, and it has been proved that the proposed method gives better results both in security and efficiency.

Symbolic computation and differential quadrature method - A boon to engineering analysis

  • Rajasekaran, S.
    • Structural Engineering and Mechanics
    • /
    • v.27 no.6
    • /
    • pp.713-739
    • /
    • 2007
  • Nowadays computers can perform symbolic computations in addition to mere number crunching operations for which they were originally designed. Symbolic computation opens up exciting possibilities in Structural Mechanics and engineering. Classical areas have been increasingly neglected due to the advent of computers as well as general purpose finite element software. But now, classical analysis has reemerged as an attractive computer option due to the capabilities of symbolic computation. The repetitive cycles of simultaneous - equation sets required by the finite element technique can be eliminated by solving a single set in symbolic form, thus generating a truly closed-form solution. This consequently saves in data preparation, storage and execution time. The power of Symbolic computation is demonstrated by six examples by applying symbolic computation 1) to solve coupled shear wall 2) to generate beam element matrices 3) to find the natural frequency of a shear frame using transfer matrix method 4) to find the stresses of a plate subjected to in-plane loading using Levy's approach 5) to draw the influence surface for deflection of an isotropic plate simply supported on all sides 6) to get dynamic equilibrium equations from Lagrange equation. This paper also presents yet another computationally efficient and accurate numerical method which is based on the concept of derivative of a function expressed as a weighted linear sum of the function values at all the mesh points. Again this method is applied to solve the problems of 1) coupled shear wall 2) lateral buckling of thin-walled beams due to moment gradient 3) buckling of a column and 4) static and buckling analysis of circular plates of uniform or non-uniform thickness. The numerical results obtained are compared with those available in existing literature in order to verify their accuracy.

A Vertical Partitioning Algorithm based on Fuzzy Graph (퍼지 그래프 기반의 수직 분할 알고리즘)

  • Son, Jin-Hyun;Choi, Kyung-Hoon;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.315-323
    • /
    • 2001
  • The concept of vertical partitioning has been discussed so far in an objective of improving the performance of query execution and system throughput. It can be applied to the areas where the match between data and queries affects performance, which includes partitioning of individual files in centralized environments, data distribution in distributed databases, dividing data among different levels of memory hierarchies, and so on. In general, a vertical partitioning algorithm should support n-ary partitioning as well as a globally optimal solution for the generation of all meaningful fragments. Most previous methods, however, have some limitations to support both of them efficiently. Because the vertical partitioning problem basically includes the fuzziness property, the proper management is required for the fuzziness problem. In this paper we propose an efficient vertical $\alpha$-partitioning algorithm which is based on the fuzzy theory. The method can not only generate all meaningful fragments but also support n-ary partitioning without any complex mathematical computations.

  • PDF

A Case Study of Drug Repositioning Simulation based on Distributed Supercomputing Technology (분산 슈퍼컴퓨팅 기술에 기반한 신약재창출 시뮬레이션 사례 연구)

  • Kim, Jik-Soo;Rho, Seungwoo;Lee, Minho;Kim, Seoyoung;Kim, Sangwan;Hwang, Soonwook
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.15-22
    • /
    • 2015
  • In this paper, we present a case study for a drug repositioning simulation based on distributed supercomputing technology that requires highly efficient processing of large-scale computations. Drug repositioning is the application of known drugs and compounds to new indications (i.e., new diseases), and this process requires efficient processing of a large number of docking tasks with relatively short per-task execution times. This mechanism shows the main characteristics of a Many-Task Computing (MTC) application, and as a representative case of MTC applications, we have applied a drug repositioning simulation in our HTCaaS system which can leverage distributed supercomputing infrastructure, and show that efficient task dispatching, dynamic resource allocation and load balancing, reliability, and seamless integration of multiple computing resources are crucial to support these challenging scientific applications.

Design of Fault-tolerant MA Migration Scheme based on Encrypted Checkpoints (암호화된 체크포인트를 이용한 결함 허용성을 가지는 이동 에이전트의 이주 기법 설계)

  • 김구수;엄영익
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.13 no.6
    • /
    • pp.77-84
    • /
    • 2003
  • A mobile agent is a program which represents a user in a network and is capable of migrating from one node to another node, performing computations on behalf of the user. In this paper, we suggest a scheme that can safely recover mobile agent using the checkpoint that is saved at the platform that it visited previously and restart its execution from the abnormal termination point of the mobile agent. For security, mobile agent uses its public key to encrypt the checkpoint and the home platform uses the private key of the mobile agent to decrypt the encrypted checkpoints at the recovery stage. When home platform receives the checkpoint of the mobile agent, home platform verifies the checkpoint using message digest. Home platform verifies the correctness of the checkpoint by comparing the message digest generated at checkpoint mention time with the message digest generated at mobile agent recovery time.

Resolving Memory Bottlenecks in Hardware Accelerators with Data Prefetch

  • Hyein Lee;Jinoo Joung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.1-12
    • /
    • 2024
  • Deep learning with faster and more accurate results requires large amounts of storage space and large computations. Accordingly, many studies are using hardware accelerators for quick and accurate calculations. However, the performance bottleneck is due to data movement between the hardware accelerators and the CPU. In this paper, we propose a data prefetch strategy that can efficiently reduce such operational bottlenecks. The core idea of the data prefetch strategy is to predict the data needed for the next task and upload it to local memory while the hardware accelerator (Matrix Multiplication Unit, MMU) performs a task. This strategy can be enhanced by using a dual buffer to perform read and write operations simultaneously. This reduces latency and execution time of data transfer. Through simulations, we demonstrate a 24% improvement in the performance of hardware accelerators by maximizing parallel processing with dual buffers and bottlenecks between memories with data prefetch.

The Performance Analysis of GPU-based Cloth simulation according to the Change of Work Group Configuration (워크 그룹 구성 변화에 따른 GPU 기반 천 시뮬레이션의 성능 분석)

  • Choi, Young-Hwan;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.29-36
    • /
    • 2017
  • In these days, 3D dynamic simulation is closely related to many industries. In the past, physically-based 3D simulation was used mainly in the car crash or construction related fields, but it also plays an important role in movies or games today. Many mathematical computations are needed to represent the 3D object realistically, but it is difficult to process a large amount of calculations for simulation of application based on CPU in real-time. Recently, with the advanced graphic hardware and improved architecture, GPU can be utilized for the general purposes of computation function as well as graphic computation. Many approaches using GPU have been applied for various research fields. In this paper, we analyze the performance variation of two cloth simulation algorithms based on GPU according to the change of execution properties of GPU shaders in oder to optimize the performance of GPU-based cloth simulation. Cloth simulation is implemented by the spring centric algorithm and node centric algorithm with GPU parallel computing using compute shader of GLSL 4.3. We compare the performance of between these algorithms according to the change of the size and dimension of work group. The experiment is repeated to 10 times during 5,000 frames for each test and experimental results are provided by averaging of FPS. The experimental result shows that the node centric algorithm is executed in higher speed than the spring centric algorithm.