• Title/Summary/Keyword: multi-core processing

Search Result 220, Processing Time 0.031 seconds

Development of Double Rotation C-Scanning System and Program for Under-Sodium Viewing of Sodium-Cooled Fast Reactor (소듐냉각고속로 소듐 내부 가시화를 위한 이중회전구동 C-스캔 시스템 및 프로그램 개발)

  • Joo, Young-Sang;Bae, Jin-Ho;Park, Chang-Gyu;Lee, Jae-Han;Kim, Jong-Bum
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.30 no.4
    • /
    • pp.338-344
    • /
    • 2010
  • A double rotation C-scanning system and a software program Under-Sodium MultiVIEW have been developed for the under-sodium viewing of a reactor core and in-vessel structures of a sodium-cooled fast reactor KALIMER-600. Double rotation C-scanning system has been designed and manufactured by the reproduction of double rotation plug of a reactor head in KALIMER-600. Hardware system which consists of a double rotating scanner, ultrasonic waveguide sensors, a high power ultrasonic pulser-receiver, a scanner driving module and a multi channel A/D board have been constructed. The functions of scanner control, image mapping and signal processing of Under-Sodium MultiVIEW program have been implemented by using a LabVIEW graphical programming language. The performance of Under-Sodium MultiVIEW program was verified by a double rotation C-scanning test in water.

Memory Management based Hybrid Transactional Memory Scheme for Efficiently Processing Transactions in Multi-core Environment (멀티코어 환경에서 효율적인 트랜잭션 처리를 위한 메모리 관리 기반 하이브리드 트랜잭셔널 메모리 기법)

  • Jang, Yeon-Woo;Kang, Moon-Hwan;Chang, Jae-Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.795-798
    • /
    • 2017
  • 최근 멀티코어 프로세서가 개발됨에 따라 병렬 프로그래밍은 멀티코어를 효과적으로 활용하기 위한 기법으로 그 중요성이 높아지고 있다. 트랜잭셔널 메모리는 처리 방식에 따라 HTM, STM, HyTM으로 구분되며, 최근 HTM 및 STM 결합한 HyTM 이 활발히 연구되고 있다. 그러나 기존의 HyTM 는 HTM과 STM의 동시성 제어를 위해 블룸필터를 사용하는 반면, 블룸필터의 자체적인 긍정 오류를 해결하지 못한다. 아울러, 트랜잭션 처리를 위한 메모리 할당/해제를 기존의 락 메커니즘을 사용하여 관리한다. 따라서 멀티코어 환경에서 스레드 수가 증가할수록 트랜잭션 처리 효율이 떨어진다. 본 논문에서는 멀티코어 환경에서 효율적인 트랜잭션 처리를 위한 메모리 관리 기반 하이브리드 트랜잭셔널 메모리 기법을 제안한다. 제안하는 기법은 트랜잭션 처리에 최적화된 블룸필터를 제공함으로써, 병렬적으로 동시에 수행되는 서로 다른 환경의 트랜잭션에 대해 일관성 있는 처리를 지원한다. 아울러, CPU 캐시라인에 최적화된 메모리 기법을 통해, 메모리 할당량이 적은 트랜잭션은 로컬 캐시에 할당함으로써 트랜잭션의 빠른 처리를 지원한다.

AN EFFICIENT INCOMPRESSIBLE FREE SURFACE FLOW SIMULATION USING GPU (GPU를 이용한 효율적인 비압축성 자유표면유동 해석)

  • Hong, H.E.;Ahn, H.T.;Myung, H.J.
    • Journal of computational fluids engineering
    • /
    • v.17 no.2
    • /
    • pp.35-41
    • /
    • 2012
  • This paper presents incompressible Navier-Stokes solution algorithm for 2D Free-surface flow problems on the Cartesian mesh, which was implemented to run on Graphics Processing Units(GPU). The INS solver utilizes the variable arrangement on the Cartesian mesh, Finite Volume discretization along Constrained Interpolation Profile-Conservative Semi-Lagrangian(CIP-CSL). Solution procedure of incompressible Navier-Stokes equations for free-surface flow takes considerable amount of computation time and memory space even in modern multi-core computing architecture based on Central Processing Units(CPUs). By the recent development of computer architecture technology, Graphics Processing Unit(GPU)'s scientific computing performance outperforms that of CPU's. This paper focus on the utilization of GPU's high performance computing capability, and presents an efficient solution algorithm for free surface flow simulation. The performance of the GPU implementations with double precision accuracy is compared to that of the CPU code using an representative free-surface flow problem, namely. dam-break problem.

A study on game physics engine focused on real time physics (물리 엔진에 관한 고찰 : 실시간 물리 기술을 중심으로)

  • Ha, You-Jong;Park, Kyoung-Ju
    • Journal of Korea Game Society
    • /
    • v.9 no.5
    • /
    • pp.43-52
    • /
    • 2009
  • This paper analyzes the four game physics engines in terms of real time techniques. Real time physics is the technology that simplifies the physics-based simulation to apply for the real time applications such as game. Our study includes two commercial physics engines, Havok's Physics SDK and NVIDIA's PhysX SDK, and two open source projects, Open Dynamics Engine and Bullet physics engine. As a result, most of them covers rigid body dynamics and some include either deformable body simulation or fluids simulation, or both. For real time simulation, they adopt the simplified numerical methods, the effective in collision detection/response, and also use the parallel processing hardwares, i.e., multi core CPU, Physics processing unit(PPU), or graphics processing unit(GPU).

  • PDF

A Performance Evaluation on Classic Mutual Exclusion Algorithms for Exploring Feasibility of Practical Application (실제 적용 타당성 탐색을 위한 고전적 상호배제 알고리즘 성능 평가)

  • Lee, Hyung-Bong;Kwon, Ki-Hyeon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.12
    • /
    • pp.469-478
    • /
    • 2017
  • The mutual exclusion is originally based on the theory of race condition prevention in symmetric multi-processor operating systems. But recently, due to the generalization of multi-core processors, its application range has been rapidly shifted to parallel processing application domain. POSIX thread, WIN32 thread, and Java thread, which are typical parallel processing application development environments, provide a unique mutual exclusion mechanism for each of them. Applications that are very sensitive to performance in these environments may want to reduce the burden of mutual exclusion, even at some cost, such as inconvenience of coding. In this study, we implement Dekker's and Peterson's algorithm in the form of busy-wait and processor-yield in various platforms, and compare the performance of them with the built-in mutual exclusion mechanisms to evaluate the usability of the classic algorithms. The analysis result shows that Dekker's algorithm of processor-yield type is superior to the built-in mechanisms in POSIX and WIN32 thread environments at least 2 times and up to 70 times, and confirms that the practicality of the algorithm is sufficient.

A Low Memory Bandwidth Motion Estimation Core for H.264/AVC Encoder Based on Parallel Current MB Processing (병렬처리 기반의 H.264/AVC 인코더를 위한 저 메모리 대역폭 움직임 예측 코어설계)

  • Kim, Shi-Hye;Choi, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.2
    • /
    • pp.28-34
    • /
    • 2011
  • In this paper, we present integer and fractional motion estimation IP for H.264/AVC encoder by hardware-oriented algorithm. In integer motion engine, the reference block is used to share for consecutive current macro blocks in parallel processing which exploits data reusability and reduces off-chip bandwidth. In fractional motion engine, instead of two-step sequential refinement, half and quarter pel are processed in parallel manner in order to discard unnecessary candidate positions and double throughput. The H.264/AVC motion estimation chip is fabricated on a MPW(Multi-Project Wafer) chip using the chartered $0.18{\mu}m$ standard CMOS 1P5M technology and achieves high throughput supporting HDTV 720p 30 fps.

Analysis of several VERA benchmark problems with the photon transport capability of STREAM

  • Mai, Nhan Nguyen Trong;Kim, Kyeongwon;Lemaire, Matthieu;Nguyen, Tung Dong Cao;Lee, Woonghee;Lee, Deokjung
    • Nuclear Engineering and Technology
    • /
    • v.54 no.7
    • /
    • pp.2670-2689
    • /
    • 2022
  • STREAM - a lattice transport calculation code with method of characteristics for the purpose of light water reactor analysis - has been developed by the Computational Reactor Physics and Experiment laboratory (CORE) of the Ulsan National Institute of Science and Technology (UNIST). Recently, efforts have been taken to develop a photon module in STREAM to assess photon heating and the influence of gamma photon transport on power distributions, as only neutron transport was considered in previous STREAM versions. A multi-group photon library is produced for STREAM based on the ENDF/B-VII.1 library with the use of the library-processing code NJOY. The developed photon solver for the computation of 2D and 3D distributions of photon flux and energy deposition is based on the method of characteristics like the neutron solver. The photon library and photon module produced and implemented for STREAM are verified on VERA pin and assembly problems by comparison with the Monte Carlo code MCS - also developed at UNIST. A short analysis of the impact of photon transport during depletion and thermal hydraulics feedback is presented for a 2D core also from the VERA benchmark.

A Study on Machine Learning Compiler and Modulo Scheduler (머신러닝 컴파일러와 모듈로 스케쥴러에 관한 연구)

  • Doosan Cho
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.1
    • /
    • pp.87-95
    • /
    • 2024
  • This study is on modulo scheduling algorithms for multicore processor in machine learning applications. Machine learning algorithms are designed to perform a large amount of operations such as vectors and matrices in order to quickly process large amounts of data stream. To support such large amounts of computations, processor architectures to support applications such as artificial intelligence, neural networks, and machine learning are designed in the form of parallel processing such as multicore. To effectively utilize these multi-core hardware resources, various compiler techniques are being used and studied. In this study, among these compiler techniques, we analyzed the modular scheduler, which is especially important in one core's computation pipeline. This paper looked at and compared the iterative modular scheduler and the swing modular scheduler, which are the most widely used and studied. As a result, both schedulers provided similar performance results, and when measuring register pressure as an indicator, it was confirmed that the swing modulo scheduler provided slightly better performance. In this study, a technique that divides recurrence edge is proposed to improve the minimum initiation interval of the modulo schedulers.

Analysis on the Thermal Efficiency of Branch Prediction Techniques in 3D Multicore Processors (3차원 구조 멀티코어 프로세서의 분기 예측 기법에 관한 온도 효율성 분석)

  • Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • The KIPS Transactions:PartA
    • /
    • v.19A no.2
    • /
    • pp.77-84
    • /
    • 2012
  • Speculative execution for improving instruction-level parallelism is widely used in high-performance processors. In the speculative execution technique, the most important factor is the accuracy of branch predictor. Unfortunately, complex branch predictors for improving the accuracy can cause serious thermal problems in 3D multicore processors. Thermal problems have negative impact on the processor performance. This paper analyzes two methods to solve the thermal problems in the branch predictor of 3D multi-core processors. First method is dynamic thermal management which turns off the execution of the branch predictor when the temperature of the branch predictor exceeds the threshold. Second method is thermal-aware branch predictor placement policy by considering each layer's temperature in 3D multi-core processors. According to our evaluation, the branch predictor placement policy shows that average temperature is $87.69^{\circ}C$, and average maximum temperature gradient is $11.17^{\circ}C$. And, dynamic thermal management shows that average temperature is $89.64^{\circ}C$ and average maximum temperature gradient is $17.62^{\circ}C$. Proposed branch predictor placement policy has superior thermal efficiency than the dynamic thermal management. In the perspective of performance, the proposed branch predictor placement policy degrades the performance by 3.61%, while the dynamic thermal management degrades the performance by 27.66%.

The development of parallel computation method for the fire-driven-flow in the subway station (도시철도역사에서 화재유동에 대한 병렬계산방법연구)

  • Jang, Yong-Jun;Lee, Chang-Hyun;Kim, Hag-Beom;Park, Won-Hee
    • Proceedings of the KSR Conference
    • /
    • 2008.06a
    • /
    • pp.1809-1815
    • /
    • 2008
  • This experiment simulated the fire driven flow of an underground station through parallel processing method. Fire analysis program FDS(Fire Dynamics Simulation), using LES(Large Eddy Simulation), has been used and a 6-node parallel cluster, each node with 3.0Ghz_2set installed, has been used for parallel computation. Simulation model was based on the Kwangju-geumnan subway station. Underground station, and the total time for simulation was set at 600s. First, the whole underground passage was divided to 1-Mesh and 8-Mesh in order to compare the parallel computation of a single CPU and Multi-CPU. With matrix numbers($15{\times}10^6$) more than what a single CPU can handle, fire driven flow from the center of the platform and the subway itself was analyzed. As a result, there seemed to be almost no difference between the single CPU's result and the Multi-CPU's ones. $3{\times}10^6$ grid point one employed to test the computing time with 2CPU and 7CPU computation were computable two times and fire times faster than 1CPU respectively. In this study it was confirmed that CPU could be overcome by using parallel computation.

  • PDF