• Title/Summary/Keyword: gpu

Search Result 967, Processing Time 0.028 seconds

Development and Validation of the GPU-based 3D Dynamic Analysis Code for Simulating Rock Fracturing Subjected to Impact Loading (충격 하중 시 암석의 파괴거동해석을 위한 GPGPU 기반 3차원 동적해석기법의 개발과 검증 연구)

  • Min, Gyeong-Jo;Fukuda, Daisuke;Oh, Se-Wook;Cho, Sang-Ho
    • Explosives and Blasting
    • /
    • v.39 no.2
    • /
    • pp.1-14
    • /
    • 2021
  • Recently, with the development of high-performance processing devices such as GPGPU, a three-dimensional dynamic analysis technique that can replace expensive rock material impact tests has been actively developed in the defense and aerospace fields. Experimentally observing or measuring fracture processes occurring in rocks subjected to high impact loads, such as blasting and earth penetration of small-diameter missiles, are difficult due to the inhomogeneity and opacity of rock materials. In this study, a three-dimensional dynamic fracture process analysis technique (3D-DFPA) was developed to simulate the fracture behavior of rocks due to impact. In order to improve the operation speed, an algorithm capable of GPGPU operation was developed for explicit analysis and contact element search. To verify the proposed dynamic fracture process analysis technique, the dynamic fracture toughness tests of the Straight Notched Disk Bending (SNDB) limestone samples were simulated and the propagation of the reflection and transmission of the stress waves at the rock-impact bar interfaces and the fracture process of the rock samples were compared. The dynamic load tests for the SNDB sample applied a Pulse Shape controlled Split Hopkinson presure bar (PS-SHPB) that can control the waveform of the incident stress wave, the stress state, and the fracture process of the rock models were analyzed with experimental results.

Impacts of Seasonal and Interannual Variabilities of Sea Surface Temperature on its Short-term Deep-learning Prediction Model Around the Southern Coast of Korea (한국 남부 해역 SST의 계절 및 경년 변동이 단기 딥러닝 모델의 SST 예측에 미치는 영향)

  • JU, HO-JEONG;CHAE, JEONG-YEOB;LEE, EUN-JOO;KIM, YOUNG-TAEG;PARK, JAE-HUN
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.2
    • /
    • pp.49-70
    • /
    • 2022
  • Sea Surface Temperature (SST), one of the ocean features, has a significant impact on climate, marine ecosystem and human activities. Therefore, SST prediction has been always an important issue. Recently, deep learning has drawn much attentions, since it can predict SST by training past SST patterns. Compared to the numerical simulations, deep learning model is highly efficient, since it can estimate nonlinear relationships between input data. With the recent development of Graphics Processing Unit (GPU) in computer, large amounts of data can be calculated repeatedly and rapidly. In this study, Short-term SST will be predicted through Convolutional Neural Network (CNN)-based U-Net that can handle spatiotemporal data concurrently and overcome the drawbacks of previously existing deep learning-based models. The SST prediction performance depends on the seasonal and interannual SST variabilities around the southern coast of Korea. The predicted SST has a wide range of variance during spring and summer, while it has small range of variance during fall and winter. A wide range of variance also has a significant correlation with the change of the Pacific Decadal Oscillation (PDO) index. These results are found to be affected by the intensity of the seasonal and PDO-related interannual SST fronts and their intensity variations along the southern Korean seas. This study implies that the SST prediction performance using the developed deep learning model can be significantly varied by seasonal and interannual variabilities in SST.

Real-time Steel Surface Defects Detection Appliocation based on Yolov4 Model and Transfer Learning (Yolov4와 전이학습을 기반으로한 실시간 철강 표면 결함 검출 연구)

  • Bok-Kyeong Kim;Jun-Hee Bae;NGUYEN VIET HOAN;Yong-Eun Lee;Young Seok Ock
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.31-41
    • /
    • 2022
  • Steel is one of the most fundamental components to mechanical industry. However, the quality of products are greatly impacted by the surface defects in the steel. Thus, researchers pay attention to the need for surface defects detector and the deep learning methods are the current trend of object detector. There are still limitations and rooms for improvements, for example, related works focus on developing the models but don't take into account real-time application with practical implication on industrial settings. In this paper, a real-time application of steel surface defects detection based on YOLOv4 is proposed. Firstly, as the aim of this work to deploying model on real-time application, we studied related works on this field, particularly focusing on one-stage detector and YOLO algorithm, which is one of the most famous algorithm for real-time object detectors. Secondly, using pre-trained Yolov4-Darknet platform models and transfer learning, we trained and test on the hot rolled steel defects open-source dataset NEU-DET. In our study, we applied our application with 4 types of typical defects of a steel surface, namely patches, pitted surface, inclusion and scratches. Thirdly, we evaluated YOLOv4 trained model real-time performance to deploying our system with accuracy of 87.1 % mAP@0.5 and over 60 fps with GPU processing.

An Accelerated Approach to Dose Distribution Calculation in Inverse Treatment Planning for Brachytherapy (근접 치료에서 역방향 치료 계획의 선량분포 계산 가속화 방법)

  • Byungdu Jo
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.5
    • /
    • pp.633-640
    • /
    • 2023
  • With the recent development of static and dynamic modulated brachytherapy methods in brachytherapy, which use radiation shielding to modulate the dose distribution to deliver the dose, the amount of parameters and data required for dose calculation in inverse treatment planning and treatment plan optimization algorithms suitable for new directional beam intensity modulated brachytherapy is increasing. Although intensity-modulated brachytherapy enables accurate dose delivery of radiation, the increased amount of parameters and data increases the elapsed time required for dose calculation. In this study, a GPU-based CUDA-accelerated dose calculation algorithm was constructed to reduce the increase in dose calculation elapsed time. The acceleration of the calculation process was achieved by parallelizing the calculation of the system matrix of the volume of interest and the dose calculation. The developed algorithms were all performed in the same computing environment with an Intel (3.7 GHz, 6-core) CPU and a single NVIDIA GTX 1080ti graphics card, and the dose calculation time was evaluated by measuring only the dose calculation time, excluding the additional time required for loading data from disk and preprocessing operations. The results showed that the accelerated algorithm reduced the dose calculation time by about 30 times compared to the CPU-only calculation. The accelerated dose calculation algorithm can be expected to speed up treatment planning when new treatment plans need to be created to account for daily variations in applicator movement, such as in adaptive radiotherapy, or when dose calculation needs to account for changing parameters, such as in dynamically modulated brachytherapy.

Transfer Learning using Multiple ConvNet Layers Activation Features with Principal Component Analysis for Image Classification (전이학습 기반 다중 컨볼류션 신경망 레이어의 활성화 특징과 주성분 분석을 이용한 이미지 분류 방법)

  • Byambajav, Batkhuu;Alikhanov, Jumabek;Fang, Yang;Ko, Seunghyun;Jo, Geun Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.205-225
    • /
    • 2018
  • Convolutional Neural Network (ConvNet) is one class of the powerful Deep Neural Network that can analyze and learn hierarchies of visual features. Originally, first neural network (Neocognitron) was introduced in the 80s. At that time, the neural network was not broadly used in both industry and academic field by cause of large-scale dataset shortage and low computational power. However, after a few decades later in 2012, Krizhevsky made a breakthrough on ILSVRC-12 visual recognition competition using Convolutional Neural Network. That breakthrough revived people interest in the neural network. The success of Convolutional Neural Network is achieved with two main factors. First of them is the emergence of advanced hardware (GPUs) for sufficient parallel computation. Second is the availability of large-scale datasets such as ImageNet (ILSVRC) dataset for training. Unfortunately, many new domains are bottlenecked by these factors. For most domains, it is difficult and requires lots of effort to gather large-scale dataset to train a ConvNet. Moreover, even if we have a large-scale dataset, training ConvNet from scratch is required expensive resource and time-consuming. These two obstacles can be solved by using transfer learning. Transfer learning is a method for transferring the knowledge from a source domain to new domain. There are two major Transfer learning cases. First one is ConvNet as fixed feature extractor, and the second one is Fine-tune the ConvNet on a new dataset. In the first case, using pre-trained ConvNet (such as on ImageNet) to compute feed-forward activations of the image into the ConvNet and extract activation features from specific layers. In the second case, replacing and retraining the ConvNet classifier on the new dataset, then fine-tune the weights of the pre-trained network with the backpropagation. In this paper, we focus on using multiple ConvNet layers as a fixed feature extractor only. However, applying features with high dimensional complexity that is directly extracted from multiple ConvNet layers is still a challenging problem. We observe that features extracted from multiple ConvNet layers address the different characteristics of the image which means better representation could be obtained by finding the optimal combination of multiple ConvNet layers. Based on that observation, we propose to employ multiple ConvNet layer representations for transfer learning instead of a single ConvNet layer representation. Overall, our primary pipeline has three steps. Firstly, images from target task are given as input to ConvNet, then that image will be feed-forwarded into pre-trained AlexNet, and the activation features from three fully connected convolutional layers are extracted. Secondly, activation features of three ConvNet layers are concatenated to obtain multiple ConvNet layers representation because it will gain more information about an image. When three fully connected layer features concatenated, the occurring image representation would have 9192 (4096+4096+1000) dimension features. However, features extracted from multiple ConvNet layers are redundant and noisy since they are extracted from the same ConvNet. Thus, a third step, we will use Principal Component Analysis (PCA) to select salient features before the training phase. When salient features are obtained, the classifier can classify image more accurately, and the performance of transfer learning can be improved. To evaluate proposed method, experiments are conducted in three standard datasets (Caltech-256, VOC07, and SUN397) to compare multiple ConvNet layer representations against single ConvNet layer representation by using PCA for feature selection and dimension reduction. Our experiments demonstrated the importance of feature selection for multiple ConvNet layer representation. Moreover, our proposed approach achieved 75.6% accuracy compared to 73.9% accuracy achieved by FC7 layer on the Caltech-256 dataset, 73.1% accuracy compared to 69.2% accuracy achieved by FC8 layer on the VOC07 dataset, 52.2% accuracy compared to 48.7% accuracy achieved by FC7 layer on the SUN397 dataset. We also showed that our proposed approach achieved superior performance, 2.8%, 2.1% and 3.1% accuracy improvement on Caltech-256, VOC07, and SUN397 dataset respectively compare to existing work.

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.