• Title/Summary/Keyword: GPU Computing

Search Result 228, Processing Time 0.026 seconds

GPU Based Feature Profile Simulation for Deep Contact Hole Etching in Fluorocarbon Plasma

  • Im, Yeon-Ho;Chang, Won-Seok;Choi, Kwang-Sung;Yu, Dong-Hun;Cho, Deog-Gyun;Yook, Yeong-Geun;Chun, Poo-Reum;Lee, Se-A;Kim, Jin-Tae;Kwon, Deuk-Chul;Yoon, Jung-Sik;Kim3, Dae-Woong;You, Shin-Jae
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2012.08a
    • /
    • pp.80-81
    • /
    • 2012
  • Recently, one of the critical issues in the etching processes of the nanoscale devices is to achieve ultra-high aspect ratio contact (UHARC) profile without anomalous behaviors such as sidewall bowing, and twisting profile. To achieve this goal, the fluorocarbon plasmas with major advantage of the sidewall passivation have been used commonly with numerous additives to obtain the ideal etch profiles. However, they still suffer from formidable challenges such as tight limits of sidewall bowing and controlling the randomly distorted features in nanoscale etching profile. Furthermore, the absence of the available plasma simulation tools has made it difficult to develop revolutionary technologies to overcome these process limitations, including novel plasma chemistries, and plasma sources. As an effort to address these issues, we performed a fluorocarbon surface kinetic modeling based on the experimental plasma diagnostic data for silicon dioxide etching process under inductively coupled C4F6/Ar/O2 plasmas. For this work, the SiO2 etch rates were investigated with bulk plasma diagnostics tools such as Langmuir probe, cutoff probe and Quadruple Mass Spectrometer (QMS). The surface chemistries of the etched samples were measured by X-ray Photoelectron Spectrometer. To measure plasma parameters, the self-cleaned RF Langmuir probe was used for polymer deposition environment on the probe tip and double-checked by the cutoff probe which was known to be a precise plasma diagnostic tool for the electron density measurement. In addition, neutral and ion fluxes from bulk plasma were monitored with appearance methods using QMS signal. Based on these experimental data, we proposed a phenomenological, and realistic two-layer surface reaction model of SiO2 etch process under the overlying polymer passivation layer, considering material balance of deposition and etching through steady-state fluorocarbon layer. The predicted surface reaction modeling results showed good agreement with the experimental data. With the above studies of plasma surface reaction, we have developed a 3D topography simulator using the multi-layer level set algorithm and new memory saving technique, which is suitable in 3D UHARC etch simulation. Ballistic transports of neutral and ion species inside feature profile was considered by deterministic and Monte Carlo methods, respectively. In case of ultra-high aspect ratio contact hole etching, it is already well-known that the huge computational burden is required for realistic consideration of these ballistic transports. To address this issue, the related computational codes were efficiently parallelized for GPU (Graphic Processing Unit) computing, so that the total computation time could be improved more than few hundred times compared to the serial version. Finally, the 3D topography simulator was integrated with ballistic transport module and etch reaction model. Realistic etch-profile simulations with consideration of the sidewall polymer passivation layer were demonstrated.

  • PDF

3D feature profile simulation for nanoscale semiconductor plasma processing

  • Im, Yeon Ho
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2015.08a
    • /
    • pp.61.1-61.1
    • /
    • 2015
  • Nanoscale semiconductor plasma processing has become one of the most challenging issues due to the limits of physicochemical fabrication routes with its inherent complexity. The mission of future and emerging plasma processing for development of next generation semiconductor processing is to achieve the ideal nanostructures without abnormal profiles and damages, such as 3D NAND cell array with ultra-high aspect ratio, cylinder capacitors, shallow trench isolation, and 3D logic devices. In spite of significant contributions of research frontiers, these processes are still unveiled due to their inherent complexity of physicochemical behaviors, and gaps in academic research prevent their predictable simulation. To overcome these issues, a Korean plasma consortium began in 2009 with the principal aim to develop a realistic and ultrafast 3D topography simulator of semiconductor plasma processing coupled with zero-D bulk plasma models. In this work, aspects of this computational tool are introduced. The simulator was composed of a multiple 3D level-set based moving algorithm, zero-D bulk plasma module including pulsed plasma processing, a 3D ballistic transport module, and a surface reaction module. The main rate coefficients in bulk and surface reaction models were extracted by molecular simulations or fitting experimental data from several diagnostic tools in an inductively coupled fluorocarbon plasma system. Furthermore, it is well known that realistic ballistic transport is a simulation bottleneck due to the brute-force computation required. In this work, effective parallel computing using graphics processing units was applied to improve the computational performance drastically, so that computer-aided design of these processes is possible due to drastically reduced computational time. Finally, it is demonstrated that 3D feature profile simulations coupled with bulk plasma models can lead to better understanding of abnormal behaviors, such as necking, bowing, etch stops and twisting during high aspect ratio contact hole etch.

  • PDF

Random Partial Haar Wavelet Transformation for Single Instruction Multiple Threads (단일 명령 다중 스레드 병렬 플랫폼을 위한 무작위 부분적 Haar 웨이블릿 변환)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.16 no.5
    • /
    • pp.805-813
    • /
    • 2015
  • Many researchers expect the compressive sensing and sparse recovery problem can overcome the limitation of conventional digital techniques. However, these new approaches require to solve the l1 norm optimization problems when it comes to signal reconstruction. In the signal reconstruction process, the transform computation by multiplication of a random matrix and a vector consumes considerable computing power. To address this issue, parallel processing is applied to the optimization problems. In particular, due to huge size of original signal, it is hard to store the random matrix directly in memory, which makes one need to design a procedural approach in handling the random matrix. This paper presents a new parallel algorithm to calculate random partial Haar wavelet transform based on Single Instruction Multiple Threads (SIMT) platform.

Exotic Weeds Classification : Hierarchical Approach with Convolutional Neural Network (외래잡초 분류 : 합성곱 신경망 기반 계층적 구조)

  • Yu, Gwanghyun;Lee, Jaewon;Trong, Vo Hoang;Vu, Dang Thanh;Nguyen, Huy Toan;Lee, JooHwan;Shin, Dosung;Kim, Jinyoung
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.12
    • /
    • pp.81-92
    • /
    • 2019
  • Weeds are a major object which is very harmful to crops. To remove the weeds effectively, we have to classify them accurately and use herbicides. As computing technology has developed, image-based machine learning methods have been studied in this field, specially convolutional neural network(CNN) based models have shown good performance in public image dataset. However, CNN with numerous training parameters and high computational amount. Thus, it works under high hardware condition of expensive GPUs in real application. To solve these problems, in this paper, a hierarchical architecture based deep-learning model is proposed. The experimental results show that the proposed model successfully classify 21 species of the exotic weeds. That is, the model achieve 97.2612% accuracy with a small number of parameters. Our proposed model with a few parameters is expected to be applicable to actual application of network based classification services.

Compression of DNN Integer Weight using Video Encoder (비디오 인코더를 통한 딥러닝 모델의 정수 가중치 압축)

  • Kim, Seunghwan;Ryu, Eun-Seok
    • Journal of Broadcast Engineering
    • /
    • v.26 no.6
    • /
    • pp.778-789
    • /
    • 2021
  • Recently, various lightweight methods for using Convolutional Neural Network(CNN) models in mobile devices have emerged. Weight quantization, which lowers bit precision of weights, is a lightweight method that enables a model to be used through integer calculation in a mobile environment where GPU acceleration is unable. Weight quantization has already been used in various models as a lightweight method to reduce computational complexity and model size with a small loss of accuracy. Considering the size of memory and computing speed as well as the storage size of the device and the limited network environment, this paper proposes a method of compressing integer weights after quantization using a video codec as a method. To verify the performance of the proposed method, experiments were conducted on VGG16, Resnet50, and Resnet18 models trained with ImageNet and Places365 datasets. As a result, loss of accuracy less than 2% and high compression efficiency were achieved in various models. In addition, as a result of comparison with similar compression methods, it was verified that the compression efficiency was more than doubled.

Parallel Implementations of Digital Focus Indices Based on Minimax Search Using Multi-Core Processors

  • HyungTae, Kim;Duk-Yeon, Lee;Dongwoon, Choi;Jaehyeon, Kang;Dong-Wook, Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.2
    • /
    • pp.542-558
    • /
    • 2023
  • A digital focus index (DFI) is a value used to determine image focus in scientific apparatus and smart devices. Automatic focus (AF) is an iterative and time-consuming procedure; however, its processing time can be reduced using a general processing unit (GPU) and a multi-core processor (MCP). In this study, parallel architectures of a minimax search algorithm (MSA) are applied to two DFIs: range algorithm (RA) and image contrast (CT). The DFIs are based on a histogram; however, the parallel computation of the histogram is conventionally inefficient because of the bank conflict in shared memory. The parallel architectures of RA and CT are constructed using parallel reduction for MSA, which is performed through parallel relative rating of the image pixel pairs and halved the rating in every step. The array size is then decreased to one, and the minimax is determined at the final reduction. Kernels for the architectures are constructed using open source software to make it relatively platform independent. The kernels are tested in a hexa-core PC and an embedded device using Lenna images of various sizes based on the resolutions of industrial cameras. The performance of the kernels for the DFIs was investigated in terms of processing speed and computational acceleration; the maximum acceleration was 32.6× in the best case and the MCP exhibited a higher performance.

Development and Validation of the GPU-based 3D Dynamic Analysis Code for Simulating Rock Fracturing Subjected to Impact Loading (충격 하중 시 암석의 파괴거동해석을 위한 GPGPU 기반 3차원 동적해석기법의 개발과 검증 연구)

  • Min, Gyeong-Jo;Fukuda, Daisuke;Oh, Se-Wook;Cho, Sang-Ho
    • Explosives and Blasting
    • /
    • v.39 no.2
    • /
    • pp.1-14
    • /
    • 2021
  • Recently, with the development of high-performance processing devices such as GPGPU, a three-dimensional dynamic analysis technique that can replace expensive rock material impact tests has been actively developed in the defense and aerospace fields. Experimentally observing or measuring fracture processes occurring in rocks subjected to high impact loads, such as blasting and earth penetration of small-diameter missiles, are difficult due to the inhomogeneity and opacity of rock materials. In this study, a three-dimensional dynamic fracture process analysis technique (3D-DFPA) was developed to simulate the fracture behavior of rocks due to impact. In order to improve the operation speed, an algorithm capable of GPGPU operation was developed for explicit analysis and contact element search. To verify the proposed dynamic fracture process analysis technique, the dynamic fracture toughness tests of the Straight Notched Disk Bending (SNDB) limestone samples were simulated and the propagation of the reflection and transmission of the stress waves at the rock-impact bar interfaces and the fracture process of the rock samples were compared. The dynamic load tests for the SNDB sample applied a Pulse Shape controlled Split Hopkinson presure bar (PS-SHPB) that can control the waveform of the incident stress wave, the stress state, and the fracture process of the rock models were analyzed with experimental results.

An Accelerated Approach to Dose Distribution Calculation in Inverse Treatment Planning for Brachytherapy (근접 치료에서 역방향 치료 계획의 선량분포 계산 가속화 방법)

  • Byungdu Jo
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.5
    • /
    • pp.633-640
    • /
    • 2023
  • With the recent development of static and dynamic modulated brachytherapy methods in brachytherapy, which use radiation shielding to modulate the dose distribution to deliver the dose, the amount of parameters and data required for dose calculation in inverse treatment planning and treatment plan optimization algorithms suitable for new directional beam intensity modulated brachytherapy is increasing. Although intensity-modulated brachytherapy enables accurate dose delivery of radiation, the increased amount of parameters and data increases the elapsed time required for dose calculation. In this study, a GPU-based CUDA-accelerated dose calculation algorithm was constructed to reduce the increase in dose calculation elapsed time. The acceleration of the calculation process was achieved by parallelizing the calculation of the system matrix of the volume of interest and the dose calculation. The developed algorithms were all performed in the same computing environment with an Intel (3.7 GHz, 6-core) CPU and a single NVIDIA GTX 1080ti graphics card, and the dose calculation time was evaluated by measuring only the dose calculation time, excluding the additional time required for loading data from disk and preprocessing operations. The results showed that the accelerated algorithm reduced the dose calculation time by about 30 times compared to the CPU-only calculation. The accelerated dose calculation algorithm can be expected to speed up treatment planning when new treatment plans need to be created to account for daily variations in applicator movement, such as in adaptive radiotherapy, or when dose calculation needs to account for changing parameters, such as in dynamically modulated brachytherapy.