• Title/Summary/Keyword: Parallel computation

Search Result 591, Processing Time 0.033 seconds

Penalized Likelihood Regression: Fast Computation and Direct Cross-Validation

  • Kim, Young-Ju;Gu, Chong
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.05a
    • /
    • pp.215-219
    • /
    • 2005
  • We consider penalized likelihood regression with exponential family responses. Parallel to recent development in Gaussian regression, the fast computation through asymptotically efficient low-dimensional approximations is explored, yielding algorithm that scales much better than the O($n^3$) algorithm for the exact solution. Also customizations of the direct cross-validation strategy for smoothing parameter selection in various distribution families are explored and evaluated.

  • PDF

A Parallel Algorithm of Davidson Method for Eigenproblems (고유치 솔버 Davidson Method 의 병렬화)

  • Kim, Hyoung-Joong;Zhu, Yu
    • Proceedings of the KIEE Conference
    • /
    • 1997.07a
    • /
    • pp.12-14
    • /
    • 1997
  • The analysis of eigenvalue and eigenvector is a crucial procedure for many electromagnetic computation problems. However, eigenpair computation is timing-consuming task. Thus, its parallelization is required for designing large-scale and precision three-dimensional electromagnetic machines. In this paper, the Davidson method is parallelized on a cluster of workstations. Performance of the parallelization scheme is reported. This scheme is applied to a ridged waveguide design problem.

  • PDF

New Parallel MDC FFT Processor for Low Computation Complexity (연산복잡도 감소를 위한 새로운 8-병렬 MDC FFT 프로세서)

  • Kim, Moon Gi;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.75-81
    • /
    • 2015
  • This paper proposed the new eight-parallel MDC FFT processor using the eight-parallel MDC architecture and the efficient scheduling scheme. The proposed FFT processor supports the 256-point FFT based on the modified radix-$2^6$ FFT algorithm. The proposed scheduling scheme can reduce the number of complex multipliers from eight to six without increasing delay buffers and computation cycles. Moreover, the proposed FFT processor can be used in OFDM systems required high throughput and low hardware complexity. The proposed FFT processor has been designed and implemented with a 90nm CMOS technology. The experimental result shows that the area of the proposed FFT processor is $0.27mm^2$. Furthermore, the proposed eight-parallel MDC FFT processor can achieve the throughput rate up to 2.7 GSample/s at 388MHz.

Parallel Algorithm for Optimal Stack Filters on MCC and CCC (MCC 및 CCC에서의 최적 스택 필터를 위한 병렬 알고리즘)

  • Jeon, Byeong-Mun;Jeong, Chang-Seong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.10
    • /
    • pp.1185-1193
    • /
    • 1999
  • 최적 스택 필터는 시그널 또는 영상의 임의의 특성 정보를 보존하고자 하는 요구조건에 의해 강제된 구조적 제약 하에서 최대의 잡음제거 효과를 얻을 수 있다. 그리고 임계치 분할 특성과 양의 부울 함수에 기반한 이진 영역에서의 처리 특성은 이 필터가 높은 병렬성을 갖고 있음을 보여준다. 본 논문에서는 두 개의 병렬 계산 모델 MCC(Mesh-Connected Computer)와 CCC(Cube-Connected Computer)에서 최적 스택 필터를 위한 1차원 병렬 알고리즘을 개발한다. 최적 스택 필터의 실행 시간은 주로 이진 median 연산에 의해 결정되고 본 논문에서 제안된 알고리즘은 선형 분리성에 의해 이 연산을 구현한다. 이를 바탕으로, M 레벨의 1-D 시그널의 길이가 L이고 윈도우 폭이 N이라고 가정할 때, 제안된 알고리즘은 {{{{root M times root M`` MCC에서 O(L sqrt{M}`) 시간에 그리고 M 개의 PE를 갖는 CCC에서 O(L log M)시간에 수행될 수 있다. 또한 잡음을 더욱 효과적으로 제거하기 위해 윈도우 폭 N을 증가시킬 때, 제안된 병렬 알고리즘의 계산 시간은 일정하게 유지됨을 보인다.Abstract An optimal stack filter achieves the maximum noise attenuation under the structural constraints imposed by the requirement of preserving certain signal or image features. And the filter provides a high parallelism due to the principles of threshold decomposition and binary processing based on positive Boolean functions(PBFs). In this paper, we develop an one-dimensional parallel algorithm for the optimal stack filter on two parallel computation models, MCC(Mesh-Connected Computer) and CCC(Cube-Connected Computer). The running time of the optimal stack filter depends mainly on the binary median operation and our algorithm realizes this operation by the linear separability. Based on this scheme, our parallel algorithm can be performed in {{{{O(L sqrt{M}`) MCC and inO(L log M) time on CCC with M PEs, when the length of M``-valued 1-D signal is L`` and window width is N`` Also, we show that the computation time of our parallel algorithm keeps constant when the window width N increases in order to achieve the best noise attenuation.

Parallel Structure Design Method for Mass Spring Simulation (질량스프링 시뮬레이션을 위한 병렬 구조 설계 방법)

  • Sung, Nak-Jun;Choi, Yoo-Joo;Hong, Min
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.3
    • /
    • pp.55-63
    • /
    • 2019
  • Recently, the GPU computing method has been utilized to improve the performance of the physics simulation field. In particular, in the case of a deformed object simulation requiring a large amount of computation, a GPU-based parallel processing algorithm is required to guarantee real-time performance. We have studied the parallel structure design method to improve the performance of the mass spring simulation method which is one of the methods of implementing the deformation object simulation. We used OpenGL's GLSL, a graphics library that allows direct access to the GPU, and implemented the GPGPU environment using an independent pipeline, the compute shader. In order to verify the effectiveness of the parallel structure design method, the mass - spring system was implemented based on CPU and GPU. Experimental results show that the proposed method improves computation speed by about 6,000% compared to the CPU Environment. It is expected that the lightweight simulation technology can be effectively applied to the augmented reality and the virtual reality field by using the design method proposed later in this research.

A New Algorithm and High-Performance Hardware Design for 2-Dimensional Parallel Generation of Digital Hologram (디지털 홀로그램의 2차원적인 병렬 생성을 위한 알고리즘 및 고성능 하드웨어 설계)

  • Yang, Wol-Sung;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.1
    • /
    • pp.133-142
    • /
    • 2012
  • In this paper, we propose and implement a high-speed algorithm for CGH that is to calculate digital hologram by modeling the interference phenomenon for tow lights. This algorithm changes the computation equations into a parallel-computable ones and implements it with a structure consisting of two kinds of cells (initial calculation cell, and update calculation cell). The parallel computation algorithm is to get the rest hologram pixels concurrently after calculation the first hologram column. Here, the initial calculation cells compute the first column of the hologram and the update calculation cells compute the rest of the hologram. The two kinds of cells performs a pipeline operation to complete the operations of the two cells at the same time. A CGH calculator to compute the hole hologram for a light source is structured by arranging the two kinds of cells. Results from simulation showed that the maximum operation frequency is about 215MHz. So, experiments are performed by setting this frequency and the same environments as the method showing the best performance. As the results, the proposed one could complete the computation of 81.75 CGH frames per second, while the previous method computes 62.9 CGH frames per second.

The syntax of Linear logic (선형논리의 통사론)

  • Cheong, Kye-Seop
    • Journal for History of Mathematics
    • /
    • v.25 no.3
    • /
    • pp.29-39
    • /
    • 2012
  • As a product of modern proof theory, linear logic is a new form of logic developed for the purpose of enhancing programming language by Professor Jean-Yves Girard of Marseille University (France) in 1987 by supplementing intuitionist logic in a sophisticated manner. Thus, linear logic' s connectives can be explained using information processing terms such as sequentiality and parallel computation. For instance, A${\otimes}$B shows two processes, A and B, carried out one after another. A&B is linked to an internal indeterminate, allowing an observer to select either A or B. A${\oplus}$B is an external indeterminate, and as such, an observer knows that either A or B holds true, but does not know which process will be true. A ${\wp}$ B signifies parallel computation of process A and process B; linear negative exhibits synchronization, that is, in order for the process A to be carried out, both A and $A^{\bot}$ have to be accomplished simultaneously. Since the field of linear logic is not very active in Korea at present, this paper deals only with syntax aspect of linear logic in order to arouse interest in the subject, leaving semantics and proof nets for future studies.

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

  • Son, Dong Oh;Choi, Hong Jun;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.2
    • /
    • pp.11-19
    • /
    • 2020
  • Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.

Shape Design Sensitivity Analysis of Dynamic Crack Propagation Problems using Peridynamics and Parallel Computation (페리다이나믹스 이론과 병렬연산을 이용한 균열진전 문제의 형상 설계민감도 해석)

  • Kim, Jae-Hyun;Cho, Seonho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.27 no.4
    • /
    • pp.297-303
    • /
    • 2014
  • Using the bond-based peridynamics and the parallel computation with binary decomposition, an adjoint shape design sensitivity analysis(DSA) method is developed for the dynamic crack propagation problems. The peridynamics includes the successive branching of cracks and employs the explicit scheme of time integration. The adjoint variable method is generally not suitable for path-dependent problems but employed since the path of response analysis is readily available. The accuracy of analytical design sensitivity is verified by comparing it with the finite difference one. The finite difference method is susceptible to the amount of design perturbations and could result in inaccurate design sensitivity for highly nonlinear peridynamics problems with respect to the design. It turns out that $C^1$-continuous volume fraction is necessary for the accurate evaluation of shape design sensitivity in peridynamic discretization.

High-resolution Urban Flood Modeling using Cellular Automata-based WCA2D in the Oncheon-cheon Catchment in Busan, South Korea (셀룰러 오토마타 기반 WCA2D 모형을 이용한 부산 온천천 유역 고해상도 도시 침수 해석)

  • Choi, Hyeonjin;Lee, Songhee;Woo, Hyuna;Noh, Seong Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.587-599
    • /
    • 2023
  • As climate change increasesthe frequency and risk of flooding in major cities around theworld, the importance ofsimulation technology that can quickly and accurately analyze high-resolution 2D flooding information in large-scale areasis emerging. The physically-based approaches based on the Shallow Water Equations (SWE) often requires huge computer resources hindering high-resolution flood prediction. This study investigated the theoretical background of Weighted Cellular Automata 2D (WCA2D), which simulates spatio-temporal changes offlooding using transition rules and weight-based system, and assessed feasibility to simulate pluvial flooding in the urbancatchment, theOncheon-cheon catchmentinBusan, SouthKorea.Inaddition,the computation performancewas compared by applying versions using OpenComputing Language (OpenCL) andOpenMulti-Processing (OpenMP) parallel computing techniques. Simulationresultsshowed that the maximuminundation depthmap by theWCA2Dmodel cansimilarly reproduce historical inundation maps. Also, it can precisely simulate spatio-temporal changes of flooding extent in the urban catchment with complex topographic characteristics. For computation efficiency, parallel computing schemes, theOpenCLandOpenMP, improved the computation by about 8~14 and 5~6 folds respectively, compared to the sequential computation.