• Title/Summary/Keyword: Parallel computation

Search Result 592, Processing Time 0.026 seconds

Calculation of Sputter Yield using Monte Carlo Techniques (몬테카를로 방식에 의한 스퍼터율 계산에 관한 연구)

  • 반용찬;이제희;원태영
    • Journal of the Korean Institute of Telematics and Electronics D
    • /
    • v.35D no.12
    • /
    • pp.59-67
    • /
    • 1998
  • In this paper, a rigorous three-dimensional Monte Carlo approach to simulate the sputter yield as a function of the incident ion energy and the incident angle as well as the atomic ejection distribution of the target is presented. The sputter yield of the target atom (Cu, Al) has been calculated for the different species of the incident atoms with the incident energy range of 10 eV ~ 100 KeV, which coincides with the previously reported experimental results. According to the simulation results, the calculated sputter yield tends to increase with the amount of the energy of the incident atoms. Our simulation revealed that the maximum sputter yield can be obtained for the incident atom with 10 KeV for the heavy ion, while the maximum sputter yield for the light ion is for the incident atoms with an energy less than 1 KeV. The sputter yield increases with angle of incidence and seems to have the maximum value at 68$^{\circ}$. For angular distributions of the sputtered particle, the atoms in the direction normal to the surface increase with angle of incidence. Furthermore, we has conducted the parallel computation on CRAY T3E supercomputer and built a GUI(Graphic User Interface) system running the sputter simulator.

  • PDF

Implementation of a 3D Graphics Hardwired T&L Accelerator based on a SoC Platform for a Mobile System (SoC 플랫폼 기반 모바일용 3차원 그래픽 Hardwired T&L Accelerator 구현)

  • Lee, Kwang-Yeob;Koo, Yong-Seo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.9
    • /
    • pp.59-70
    • /
    • 2007
  • In this paper, we proposed an effective T&L(Transform & Lighting) Processor architecture for a real time 3D graphics acceleration SoC(System on a Chip) in a mobile system. We designed Floating point arithmetic IPs for a T&L processor. And we verified IPs using a SoC Platform. Designed T&L Processor consists of 24 bit floating point data format and 16 bit fixed point data format, and supports the pipeline keeping the balance between Transform process and Lighting process using a parallel computation of 3D graphics. The delay of pipeline processing only Transform operation is almost same as the delay processing both Transform operation and Lighting operation. Designed T&L Processor is implemented and verified using a SoC Platform. The T&L Processor operates at 80MHz frequency in Xilinx-Virtex4 FPGA. The processing speed is measured at the rate of 20M Vertexes/sec.

Implementation of MPEG/Audio Decoder based on RISC Processor With Minimized DSP Accelerator (DSP 가속기가 내장된 RISC 프로세서 기반 MPEG/Audio 복호화기의 구현)

  • Bang Kyoung Ho;Lee Ken Sup;Park Young Cheol;Youn Dae Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.12C
    • /
    • pp.1617-1622
    • /
    • 2004
  • MPEG/Audio decoder for mobile multimedia systems requires low power consumption. Implementations of AV decoder using a single RISC processor often need high power consumption owing to cash-miss in case of insufficient cash memory. In this paper, we present a MPEG/Audio decoder for mobile handset applications and implement it on a RISC processor embedding a minimized DSP accelerator. Audio decoding algorithm is splined into two parts; computation intensive and control intensive parts. Those parts we, respectively, allocated to DSP and RISC core, which are designed to run in parallel to increase the processing efficiency. The proposed system implements MP3 and AAC decoders at l7MHz and 24MHz clocks, which are reductions of 48% and 40% of complexities in comparison with implementations on a single RISC processor. The proposed method is adequate for mobile multimedia applications with insufficient cash memory.

Realization of the Pulse Doppler Radar Signal Processor with an Expandable Feature using the Multi-DSP Based Morocco-2 Board (다중 DSP 구조의 Morocco-2 보드를 이용한 확장성을 갖는 펄스 도플러 레이다 신호처리기 구현)

  • 조명제;임중수
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.12 no.7
    • /
    • pp.1147-1156
    • /
    • 2001
  • In this paper, a new design architecture of radar signal processor in real time is proposed. It has been designed and implemented under the consideration to minimize the inter-processor communication overhead and to maintain the coherence in Doppler pulse domain and in range domain. Its structure can be easily reconfigured and reprogrammed in accordance with an addition of function algorithm or a modification of operational scenario. As we designed a task configuration for parallel processing from measures of computation time for function algorithms and transmission time for results by signal processing, data exchange between processors for performing of function algorithms could be fully removed. Morocco-2 board equipped ADSP-21060 processor of Analog Devices inc. and APEX-3.2 developed for SHARC DSP were used to construct the radar signal processor.

  • PDF

An Efficient Angular Space Partitioning Based Skyline Query Processing Using Sampling-Based Pruning (데이터 샘플링 기반 프루닝 기법을 도입한 효율적인 각도 기반 공간 분할 병렬 스카이라인 질의 처리 기법)

  • Choi, Woosung;Kim, Minseok;Diana, Gromyko;Chung, Jaehwa;Jung, Soonyong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.1
    • /
    • pp.1-8
    • /
    • 2017
  • Given a multi-dimensional dataset of tuples, a skyline query returns a subset of tuples which are not 'dominated' by any other tuples. Skyline query is very useful in Big data analysis since it filters out uninteresting items. Much interest was devoted to the MapReduce-based parallel processing of skyline queries in large-scale distributed environment. There are three requirements to improve parallelism in MapReduced-based algorithms: (1) workload should be well balanced (2) avoid redundant computations (3) Optimize network communication cost. In this paper, we introduce MR-SEAP (MapReduce sample Skyline object Equality Angular Partitioning), an efficient angular space partitioning based skyline query processing using sampling-based pruning, which satisfies requirements above. We conduct an extensive experiment to evaluate MR-SEAP.

A Simple Stereo Matching Algorithm using PBIL and its Alternative (PBIL을 이용한 소형 스테레오 정합 및 대안 알고리즘)

  • Han Kyu-Phil
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.429-436
    • /
    • 2005
  • A simple stereo matching algorithm using population-based incremental learning(PBIL) is proposed in this paper to decrease the general problem of genetic algorithms, such as memory consumption and inefficiency of search. PBIL is a variation of genetic algorithms using stochastic search and competitive teaming based on a probability vector. The structure of PBIL is simpler than that of other genetic algorithm families, such as serial and parallel ones, due to the use of a probability vector. The PBIL strategy is simplified and adapted for stereo matching circumstances. Thus, gene pool, chromosome crossover, and gene mutation we removed, while the evolution rule, that fitter chromosomes should have higher survival probabilities, is preserved. As a result, memory space is decreased, matching rules are simplified and computation cost is reduced. In addition, a scheme controlling the distance of neighbors for disparity smoothness is inserted to obtain a wide-area consistency of disparities, like a result of coarse-to-fine matchers. Because of this scheme, the proposed algorithm can produce a stable disparity map with a small fixed-size window. Finally, an alterative version of the proposed algorithm without using probability vector is also presented for simpler set-ups.

A Fast SAD Algorithm for Area-based Stereo Matching Methods (영역기반 스테레오 영상 정합을 위한 고속 SAD 알고리즘)

  • Lee, Woo-Young;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.7 no.2
    • /
    • pp.8-12
    • /
    • 2012
  • Area-based stereo matchng algorithms are widely used for image analysis for stereo vision. SAD (Sum of Absolute Difference) algorithm is one of well known area-based stereo matchng algorithms with the characteristics of data intensive computing application. Therefore, it requires very high computation capabilities and its processing speed becomes very slow with software realization. This paper proposes a fast SAD algorithm utilizing SSE (Streaming SIMD Extensions) instructions based on SIMD (Single Instruction Multiple Data) parallism. CPU supporing SSE instructions has 16 XMM registers with 128 bits. For the performance evaluation of the proposed scheme, we compare the processing speed between SAD with/without SSE instructions. The proposed scheme achieves four times performance improvement over the general SAD, which shows the possibility of the software realization of real time SAD algorithm.

An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU (Coloring이 적용된 Gauss-Seidel 해법을 통한 CPU와 GPU의 연산 효율에 관한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.41 no.2
    • /
    • pp.117-124
    • /
    • 2017
  • The performance of the colored Gauss-Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss-Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss-Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.

The efficient implementation of the multi-channel active noise controller using a low-cost microcontroller unit (저가 microcontoller unit을 이용한 효율적인 다채널 능동 소음 제어기 구현)

  • Chung, Ik Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.1
    • /
    • pp.9-22
    • /
    • 2019
  • In this paper, we propose a method that can be applied to the efficient implementation of multi-channel active noise controller. Since the normalized MFxLMS (Modified Filtered-x Least Mean Square) algorithm for the multi-channel active noise control requires a large amount of computation, the difficulty has lied in implementing the algorithm using a low-cost MCU (Microcontoller Unit). We implement the multi-channel active noise controller efficiently by optimizing the software based on the features of the MCU. By maximizing the usage of single-cycle MAC (Multiply- Accumulate) operations and minimizing move operations of the delay memory, we can achieve more than 3 times the performance in the aspect of computational optimization, and by parellel processing using the auxillary processor included in the MCU, we can also obtain more than 4 times the performance. In addition, the usage of additional parts can be minimized by maximizing the usage of the peripherals embedded in the MCU.

Efficient implementation of AES CTR Mode for a Mobile Environment (모바일 환경을 위한 AES CTR Mode의 효율적 구현)

  • Park, Jin-Hyung;Paik, Jung-Ha;Lee, Dong-Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.5
    • /
    • pp.47-58
    • /
    • 2011
  • Recently, there are several technologies for protecting information in the lightweight device, One of them, the AES[1] algorithm and CRT mode, is used for numerous services(e,g, OMA DRM, VoIP, IPTV) as encryption technique for preserving confidentiality. Although it is possible that the AES algorithm CRT mode can parallel process transmitting data, IPTV Set-top Box or Mobile Device that uses these streaming service has limited computation-ability. So optimizing crypto algorithm and enhancing its efficiency for those environment have become an important issue. In this paper, we propose implementation method that can improve efficiency of the AES-CRT Mode by improving algorithm logics. Moreover, we prove the performance of our proposal on the mobile device which has limited capability.