• Title/Summary/Keyword: Partial parallel

Search Result 219, Processing Time 0.029 seconds

An efficient parallel solution algorithm on the linear second-order partial differential equations with large sparse matrix being based on the block cyclic reduction technique (Block Cyclic Reduction 기법에 의한 대형 Sparse Matrix 선형 2계편미분방정식의 효율적인 병렬 해 알고리즘)

  • 이병홍;김정선
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.15 no.7
    • /
    • pp.553-564
    • /
    • 1990
  • The co-efficient matrix of linear second-order partial differential equations in the general form is partitioned with (n-1)x(n-1) submartices and is transformed into the block tridiagonal system. Then the cyclic odd-even reduction technique is applied to this system with the large-grain data granularity and the block cyclic reduction algorithm to solve unknown vectors of this system is created. But this block cyclic reduction technique is not suitable for the parallel processing system because of its parallelism chanigng at every computing stages. So a new algorithm for solving linear second-order partical differential equations is presentes by the block cyclic reduction technique which is modified in order to keep its parallelism constant, and to reduce gteatly its execution time. Both of these algoriths are compared and studied.

  • PDF

Lubrication Characteristics of Surface Textured Parallel Thrust Bearing with Ellipsoidal Dimples (타원체 딤플로 Texturing한 평행 스러스트 베어링의 윤활특성)

  • Park, Tae-Jo;Kim, Min-Gyu
    • Tribology and Lubricants
    • /
    • v.32 no.5
    • /
    • pp.147-153
    • /
    • 2016
  • Friction reduction between machine components is important for improving their efficiency and lifespan. In recent years, surface texturing has received considerable attention as a viable means to enhance the efficiency and tribological performance of highly sliding mechanical components such as parallel thrust bearings, mechanical face seals, and piston rings. In this study, we perform lubrication analysis to investigate the effect of dimple shapes and orientations on the lubrication characteristics of a surface textured parallel thrust bearing. Numerical analysis involves solving the continuity and Navier-Stokes equations using a commercial computational fluid dynamics (CFD) code, FLUENT. We use dimples consisting of hemispherical and different semiellipsoidal orientations for simulation. We compare pressure and streamline distributions, load capacity, friction force, and leakage flowrate for different numbers of dimples and orientations. We find that the dimple shapes, orientations, and their numbers starting from an inlet influence the lubrication characteristics. The results show that partial texturing of the bearing inlet region, and the ellipsoidal dimples with the major axis aligned along the lubricant flow direction exhibit the best lubrication characteristics in terms of higher load capacity and lower friction. The results can be used in the design of optimum dimple characteristics for parallel thrust bearings, for which further research is required.

One-node and two-node hybrid coarse-mesh finite difference algorithm for efficient pin-by-pin core calculation

  • Song, Seongho;Yu, Hwanyeal;Kim, Yonghee
    • Nuclear Engineering and Technology
    • /
    • v.50 no.3
    • /
    • pp.327-339
    • /
    • 2018
  • This article presents a new global-local hybrid coarse-mesh finite difference (HCMFD) method for efficient parallel calculation of pin-by-pin heterogeneous core analysis. In the HCMFD method, the one-node coarse-mesh finite difference (CMFD) scheme is combined with a nodal expansion method (NEM)-based two-node CMFD method in a nonlinear way. In the global-local HCMFD algorithm, the global problem is a coarse-mesh eigenvalue problem, whereas the local problems are fixed source problems with boundary conditions of incoming partial current, and they can be solved in parallel. The global problem is formulated by one-node CMFD, in which two correction factors on an interface are introduced to preserve both the surface-average flux and the net current. Meanwhile, for accurate and efficient pin-wise core analysis, the local problem is solved by the conventional NEM-based two-node CMFD method. We investigated the numerical characteristics of the HCMFD method for a few benchmark problems and compared them with the conventional two-node NEM-based CMFD algorithm. In this study, the HCMFD algorithm was also parallelized with the OpenMP parallel interface, and its numerical performances were evaluated for several benchmarks.

A PARALLEL PRECONDITIONER FOR GENERALIZED EIGENVALUE PROBLEMS BY CG-TYPE METHOD

  • MA, SANGBACK;JANG, HO-JONG
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.5 no.2
    • /
    • pp.63-69
    • /
    • 2001
  • In this study, we shall be concerned with computing in parallel a few of the smallest eigenvalues and their corresponding eigenvectors of the eigenvalue problem, $Ax={\lambda}Bx$, where A is symmetric, and B is symmetric positive definite. Both A and B are large and sparse. Recently iterative algorithms based on the optimization of the Rayleigh quotient have been developed, and CG scheme for the optimization of the Rayleigh quotient has been proven a very attractive and promising technique for large sparse eigenproblems for small extreme eigenvalues. As in the case of a system of linear equations, successful application of the CG scheme to eigenproblems depends also upon the preconditioning techniques. A proper choice of the preconditioner significantly improves the convergence of the CG scheme. The idea underlying the present work is a parallel computation of the Multi-Color Block SSOR preconditioning for the CG optimization of the Rayleigh quotient together with deflation techniques. Multi-Coloring is a simple technique to obatin the parallelism of order n, where n is the dimension of the matrix. Block SSOR is a symmetric preconditioner which is expected to minimize the interprocessor communication due to the blocking. We implemented the results on the CRAY-T3E with 128 nodes. The MPI(Message Passing Interface) library was adopted for the interprocessor communications. The test problems were drawn from the discretizations of partial differential equations by finite difference methods.

  • PDF

A Parallel Combinatory OFDM System with Weighted Phase Subcarriers

  • Zheng, Hui;Shrestha, Robin;Hwang, Jae-Ho;Kim, Jae-Mong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.1
    • /
    • pp.322-340
    • /
    • 2012
  • Orthogonal Frequency Division Multiplexing (OFDM) is usually regarded as a spectral efficient multicarrier modulation technique, yet it suffers from a high peak-to-average power ratio (PAPR) problem. Among all the existing PAPR reduction techniques in OFDM systems, side information based PAPR reduction techniques such as partial transmit sequence (PTS) and selective mapping (SLM) schemes, have attracted the most attention. However, the transmission of side information results in somewhat spectral loss and this does not significantly improve the bit error rate (BER) performance. Parallel combinatory (PC) OFDM yields higher spectral efficiency (SE) and better BER performance on Gaussian channels,while is a little but not obvious PAPR improvement over the ordinary OFDM system. This investigation aimed to design a 'perfect' OFDM system. We introduce the side information to rotate the subcarrier phases of our novel PC-OFDM system structure, and call this new system the SIPC(Side information based Parallel Combinatory)-OFDM system. The proposed system achieves better PAPR and SE performance. In addition, considering the tradeoff of system parameters, the proposed system also has the properties of a higher BER.

Comparison of Parallel Preconditioners for Solving Large Sparse Linear Systems on a Massively Parallel Machine (대형이산 행렬 시스템의 초대형병렬컴퓨터에서의 해법을 위한 병렬준비 행렬의 비교)

  • Ma, Sang-Baek
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.4
    • /
    • pp.535-542
    • /
    • 1995
  • In this paper we present two preconditioners for solving large sparse linear systems arising from elliptic partial differential equations on massively parallel machines, such as the CM-5. Most massively parallel machines do heavily rely on the message-passing for the interprocessor communications. but according to the current manufacturing standards the cost of communications is very high compared to that of floating point arithmetic computations. Due to this we need an algorithm which minimizes the amount of interprocessor communication on the massively parallel machines. We will show that Block SOR(Successive Over Relaxation) method coupled with the multi-coloring technique is one of such preconditioner on the massively parallel machines, by conducting experiments in the CM-5. Also, we implemented the ADI(Alternation Direction Implicit) method in the CM-5, which has been conventionally one of the most powerful parallel preconditioner. Our experiment shows that Block SOR method coupled with the multi-coloring technique could yield a speedup with 50% efficiency with the range of number of processors form 16 to 512 for a matrix with dimension 512x512. On the other hand, the ADI method shows a very poor performance.

  • PDF

Development of Optimum Parameters Sampling Program for Mica Capacitor Design (마이카 커패시터 설계를 위한 최적 파라미터 추출 프로그램 개발)

  • Kim, Jae-Wook;Ryu, Chang-Keun
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.194-199
    • /
    • 2009
  • In this study, ultra high-voltage (170kV AC), reliable 80pF mica capacitors for partial discharge system application were investigated. For capacitors design, Program was developed to sampling of series and parallel parameters. Mica was used as the dielectric of the capacitors. Using the conservative design rule, over 3 individual 50$\mu$m thick mica sheets with a size of 30mm$\times$35mm were used with lead foils to form a parallel capacitor element and 20 mica sheets were interleaved with lead foils to form a series stack of parallel capacitor element to meet the requirements of the capacitors. The dimension of the fabricated 80pF capacitor for 17kV AC were 90mm$\times$90mm. The high-frequency characteristics of the capacitance (C) and dissipation factor (D) of the developed capacitors were measured using a capacitance meter. The developed capacitor exhibited C of 79.5pF, had D of 0.001% over the frequency ranges of 150kHz to 50MHz, had a self-resonant frequency of 65MHz.

  • PDF

Adaptive Parallel and Iterative QRDM Detection Algorithms based on the Constellation Set Grouping (성상도 집합 그룹핑 기반의 적응형 병렬 및 반복적 QRDM 검출 알고리즘)

  • Mohaisen, Manar;An, Hong-Sun;Chang, Kyung-Hi;Koo, Bon-Tae;Baek, Young-Seok
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.2A
    • /
    • pp.112-120
    • /
    • 2010
  • In this paper, we propose semi-ML adaptive parallel QRDM (APQRDM) and iterative QRDM (AIQRDM) algorithms based on set grouping. Using the set grouping, the tree-search stage of QRDM algorithm is divided into partial detection phases (PDP). Therefore, when the treesearch stage of QRDM is divided into 4 PDPs, the APQRDM latency is one fourth of that of the QRDM, and the hardware requirements of AIQRDM is approximately one fourth of that of QRDM. Moreover, simulation results show that in $4{\times}4$ system and at Eb/N0 of 12 dB, APQRDM decreases the average computational complexity to approximately 43% of that of the conventional QRDM. Also, at Eb/N0 of 0dB, AIQRDM reduces the computational complexity to about 54% and the average number of metric comparisons to approximately 10% of those required by the conventional QRDM and AQRDM.

Random Partial Haar Wavelet Transformation for Single Instruction Multiple Threads (단일 명령 다중 스레드 병렬 플랫폼을 위한 무작위 부분적 Haar 웨이블릿 변환)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.16 no.5
    • /
    • pp.805-813
    • /
    • 2015
  • Many researchers expect the compressive sensing and sparse recovery problem can overcome the limitation of conventional digital techniques. However, these new approaches require to solve the l1 norm optimization problems when it comes to signal reconstruction. In the signal reconstruction process, the transform computation by multiplication of a random matrix and a vector consumes considerable computing power. To address this issue, parallel processing is applied to the optimization problems. In particular, due to huge size of original signal, it is hard to store the random matrix directly in memory, which makes one need to design a procedural approach in handling the random matrix. This paper presents a new parallel algorithm to calculate random partial Haar wavelet transform based on Single Instruction Multiple Threads (SIMT) platform.

Design of Partial Product Accumulator using Multi-Operand Decimal CSA and Improved Decimal CLA (다중 피연산자 십진 CSA와 개선된 십진 CLA를 이용한 부분곱 누산기 설계)

  • Lee, Yang;Park, TaeShin;Kim, Kanghee;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.56-65
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.