• Title/Summary/Keyword: Parallel Computing

Search Result 807, Processing Time 0.027 seconds

Benchmark Results of a Monte Carlo Treatment Planning system (몬데카를로 기반 치료계획시스템의 성능평가)

  • Cho, Byung-Chul
    • Progress in Medical Physics
    • /
    • v.13 no.3
    • /
    • pp.149-155
    • /
    • 2002
  • Recent advances in radiation transport algorithms, computer hardware performance, and parallel computing make the clinical use of Monte Carlo based dose calculations possible. To compare the speed and accuracies of dose calculations between different developed codes, a benchmark tests were proposed at the XIIth ICCR (International Conference on the use of Computers in Radiation Therapy, Heidelberg, Germany 2000). A Monte Carlo treatment planning comprised of 28 various Intel Pentium CPUs was implemented for routine clinical use. The purpose of this study was to evaluate the performance of our system using the above benchmark tests. The benchmark procedures are comprised of three parts. a) speed of photon beams dose calculation inside a given phantom of 30.5 cm$\times$39.5 cm $\times$ 30 cm deep and filled with 5 ㎣ voxels within 2% statistical uncertainty. b) speed of electron beams dose calculation inside the same phantom as that of the photon beams. c) accuracy of photon and electron beam calculation inside heterogeneous slab phantom compared with the reference results of EGS4/PRESTA calculation. As results of the speed benchmark tests, it took 5.5 minutes to achieve less than 2% statistical uncertainty for 18 MV photon beams. Though the net calculation for electron beams was an order of faster than the photon beam, the overall calculation time was similar to that of photon beam case due to the overhead time to maintain parallel processing. Since our Monte Carlo code is EGSnrc, which is an improved version of EGS4, the accuracy tests of our system showed, as expected, very good agreement with the reference data. In conclusion, our Monte Carlo treatment planning system shows clinically meaningful results. Though other more efficient codes are developed such like MCDOSE and VMC++, BEAMnrc based on EGSnrc code system may be used for routine clinical Monte Carlo treatment planning in conjunction with clustering technique.

  • PDF

Development and evaluation of a 2-dimensional land surface flood analysis model using uniform square grid (정형 사각 격자 기반의 2차원 지표면 침수해석 모형 개발 및 평가)

  • Choi, Yun-Seok;Kim, Joo-Hun;Choi, Cheon-Kyu;Kim, Kyung-Tak
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.5
    • /
    • pp.361-372
    • /
    • 2019
  • The purpose of this study is to develop a two-dimensional land surface flood analysis model based on uniform square grid using the governing equations except for the convective acceleration term in the momentum equation. Finite volume method and implicit method were applied to spatial and temporal discretization. In order to reduce the execution time of the model, parallel computation techniques using CPU were applied. To verify the developed model, the model was compared with the analytical solution and the behavior of the model was evaluated through numerical experiments in the virtual domain. In addition, inundation analyzes were performed at different spatial resolutions for the domestic Janghowon area and the Sebou river area in Morocco, and the results were compared with the analysis results using the CAESER-LISFLOOD (CLF) model. In model verification, simulation results were well matched with the analytical solution, and the flow analyses in the virtual domain were also evaluated to be reasonable. The results of inundation simulations in the Janghowon and the Sebou river area by this study and CLF model were similar with each other and for Janghowon area, the simulation result was also similar to the flooding area of flood hazard map. The different parts in the simulation results of this study and the CLF model were compared and evaluated for each case. The results of this study suggest that the model proposed in this study can simulate the flooding well in the floodplain. However, in case of flood analysis using the model presented in this study, the characteristics and limitations of the model by domain composition method, governing equation and numerical method should be fully considered.

An Installation and Model Assessment of the UM, U.K. Earth System Model, in a Linux Cluster (U.K. 지구시스템모델 UM의 리눅스 클러스터 설치와 성능 평가)

  • Daeok Youn;Hyunggyu Song;Sungsu Park
    • Journal of the Korean earth science society
    • /
    • v.43 no.6
    • /
    • pp.691-711
    • /
    • 2022
  • The state-of-the-art Earth system model as a virtual Earth is required for studies of current and future climate change or climate crises. This complex numerical model can account for almost all human activities and natural phenomena affecting the atmosphere of Earth. The Unified Model (UM) from the United Kingdom Meteorological Office (UK Met Office) is among the best Earth system models as a scientific tool for studying the atmosphere. However, owing to the expansive numerical integration cost and substantial output size required to maintain the UM, individual research groups have had to rely only on supercomputers. The limitations of computer resources, especially the computer environment being blocked from outside network connections, reduce the efficiency and effectiveness of conducting research using the model, as well as improving the component codes. Therefore, this study has presented detailed guidance for installing a new version of the UM on high-performance parallel computers (Linux clusters) owned by individual researchers, which would help researchers to easily work with the UM. The numerical integration performance of the UM on Linux clusters was also evaluated for two different model resolutions, namely N96L85 (1.875° ×1.25° with 85 vertical levels up to 85 km) and N48L70 (3.75° ×2.5° with 70 vertical levels up to 80 km). The one-month integration times using 256 cores for the AMIP and CMIP simulations of N96L85 resolution were 169 and 205 min, respectively. The one-month integration time for an N48L70 AMIP run using 252 cores was 33 min. Simulated results on 2-m surface temperature and precipitation intensity were compared with ERA5 re-analysis data. The spatial distributions of the simulated results were qualitatively compared to those of ERA5 in terms of spatial distribution, despite the quantitative differences caused by different resolutions and atmosphere-ocean coupling. In conclusion, this study has confirmed that UM can be successfully installed and used in high-performance Linux clusters.

Comparison of the wall clock time for extracting remote sensing data in Hierarchical Data Format using Geospatial Data Abstraction Library by operating system and compiler (운영 체제와 컴파일러에 따른 Geospatial Data Abstraction Library의 Hierarchical Data Format 형식 원격 탐사 자료 추출 속도 비교)

  • Yoo, Byoung Hyun;Kim, Kwang Soo;Lee, Jihye
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.1
    • /
    • pp.65-73
    • /
    • 2019
  • The MODIS (Moderate Resolution Imaging Spectroradiometer) data in Hierarchical Data Format (HDF) have been processed using the Geospatial Data Abstraction Library (GDAL). Because of a relatively large data size, it would be preferable to build and install the data analysis tool with greater computing performance, which would differ by operating system and the form of distribution, e.g., source code or binary package. The objective of this study was to examine the performance of the GDAL for processing the HDF files, which would guide construction of a computer system for remote sensing data analysis. The differences in execution time were compared between environments under which the GDAL was installed. The wall clock time was measured after extracting data for each variable in the MODIS data file using a tool built lining against GDAL under a combination of operating systems (Ubuntu and openSUSE), compilers (GNU and Intel), and distribution forms. The MOD07 product, which contains atmosphere data, were processed for eight 2-D variables and two 3-D variables. The GDAL compiled with Intel compiler under Ubuntu had the shortest computation time. For openSUSE, the GDAL compiled using GNU and intel compilers had greater performance for 2-D and 3-D variables, respectively. It was found that the wall clock time was considerably long for the GDAL complied with "--with-hdf4=no" configuration option or RPM package manager under openSUSE. These results indicated that the choice of the environments under which the GDAL is installed, e.g., operation system or compiler, would have a considerable impact on the performance of a system for processing remote sensing data. Application of parallel computing approaches would improve the performance of the data processing for the HDF files, which merits further evaluation of these computational methods.

Real-time Color Recognition Based on Graphic Hardware Acceleration (그래픽 하드웨어 가속을 이용한 실시간 색상 인식)

  • Kim, Ku-Jin;Yoon, Ji-Young;Choi, Yoo-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.1-12
    • /
    • 2008
  • In this paper, we present a real-time algorithm for recognizing the vehicle color from the indoor and outdoor vehicle images based on GPU (Graphics Processing Unit) acceleration. In the preprocessing step, we construct feature victors from the sample vehicle images with different colors. Then, we combine the feature vectors for each color and store them as a reference texture that would be used in the GPU. Given an input vehicle image, the CPU constructs its feature Hector, and then the GPU compares it with the sample feature vectors in the reference texture. The similarities between the input feature vector and the sample feature vectors for each color are measured, and then the result is transferred to the CPU to recognize the vehicle color. The output colors are categorized into seven colors that include three achromatic colors: black, silver, and white and four chromatic colors: red, yellow, blue, and green. We construct feature vectors by using the histograms which consist of hue-saturation pairs and hue-intensity pairs. The weight factor is given to the saturation values. Our algorithm shows 94.67% of successful color recognition rate, by using a large number of sample images captured in various environments, by generating feature vectors that distinguish different colors, and by utilizing an appropriate likelihood function. We also accelerate the speed of color recognition by utilizing the parallel computation functionality in the GPU. In the experiments, we constructed a reference texture from 7,168 sample images, where 1,024 images were used for each color. The average time for generating a feature vector is 0.509ms for the $150{\times}113$ resolution image. After the feature vector is constructed, the execution time for GPU-based color recognition is 2.316ms in average, and this is 5.47 times faster than the case when the algorithm is executed in the CPU. Our experiments were limited to the vehicle images only, but our algorithm can be extended to the input images of the general objects.

Determination of Equivalent Hydraulic Conductivity of Rock Mass Using Three-Dimensional Discontinuity Network (삼차원 불연속면 연결망을 이용한 암반의 등가수리전도도 결정에 대한 연구)

  • 방상혁;전석원;최종근
    • Tunnel and Underground Space
    • /
    • v.13 no.1
    • /
    • pp.52-63
    • /
    • 2003
  • Discontinuities such as faults, fractures and joints in rock mass play the dominant role in the mechanical and hydraulic properties of the rock mass. The key factors that influence on the flow of groundwater are hydraulic and geometric characteristics of discontinuities and their connectivity. In this study, a program that analyzes groundwater flow in the 3D discontinuity network was developed on the assumption that the discontinuity characteristics such as density, trace length, orientation and aperture have particular distribution functions. This program generates discontinuities in a three-dimensional space and analyzes their connectivity and groundwater flow. Due to the limited computing capacity In this study, REV was not exactly determined, but it was inferred to be greater than 25$\times$25$\times$25 ㎥. By calculating the extent of aperture that influences on the groundwater flow, it was found that the discontinuities with the aperture smaller than 30% of the mean aperture had little influence on the groundwater flow. In addition, there was little difference in the equivalent hydraulic conductivity for the the two cases when considering and not considering the boundary effect. It was because the groundwater flow was mostly influenced by the discontinuities with large aperture. Among the parameters considered in this study, the length, aperture, and orientation of discontinuities had the greatest influence on the equivalent hydraulic conductivity of rock mass in their order. In case of existence of a fault in rock mass, elements of the equivalent hydraulic conductivity tensor parallel to the fault fairly increased in their magnitude but those perpendicular to the fault were increased in a very small amount at the first stage and then converged.

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF