• Title/Summary/Keyword: 부분 병렬 알고리즘

Search Result 94, Processing Time 0.032 seconds

Implementation of a parallel traversal scheme for O(n!) search space exploiting cost constraint (비용 제약조건을 이용한 병렬 O(n!) 서치 스페이스 탐색 기법의 구현)

  • Lee, Junghoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.1501-1502
    • /
    • 2010
  • DualCore 혹은 MultiCore 플랫폼의 보급에 따라 높은 시간복잡도를 갖는 응용들도 사용자의 컴퓨터나 단말에서 수행되어 다양한 서비스를 제공할 수 있게 되었다. 본 논문에서는 관광 스케줄을 효율적으로 결정하기 위한 다중목적지 방문 문제에 대해 이중 쓰레드에 기반한 서치 스페이스 탐색 알고리즘을 구현한다. 이는 Traveling Salesman Problem의 한 종류로서 O(n!) 시간 복잡도를 갖고 있으며 검색시의 독립성때문에 각 쓰레드는 병렬적으로 최적의 스케줄을 탐색할 수 있다. 또 현재까지 발견된 최적값을 기반으로 부분 경로의 비용이 이미 최적값을 넘는 경우는 하위 탐색을 제거하여 상당한 성능의 향상을 가져온다. 2.4 GHz Intel(R) Core DuoCPU와 3 GB 메모리로 구성된 플랫폼 상에서 구현된 서비스는 11개의 목적지에 대한 방문 스케줄을 생성함에 있어서 단일 쓰레드 버전은 14.196초, 이중 쓰레드 버전은 6.411초, 제약조건을 포함한 이중 쓰레드 버전은 0.14초에 최적의 스케줄을 찾아낼 수 있다.

An Efficient Parallelization Implementation of PU-level ME for Fast HEVC Encoding (고속 HEVC 부호화를 위한 효율적인 PU레벨 움직임예측 병렬화 구현)

  • Park, Soobin;Choi, Kiho;Park, Sang-Hyo;Jang, Euee Seon
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.178-184
    • /
    • 2013
  • In this paper, we propose an efficient parallelization technique of PU-level motion estimation (ME) in the next generation video coding standard, high efficiency video coding (HEVC) to reduce the time complexity of video encoding. It is difficult to encode video in real-time because ME has significant complexity (i.e., 80 percent at the encoder). In order to solve this problem, various techniques have been studied, and among them is the parallelization, which is carefully concerned in algorithm-level ME design. In this regard, merge estimation method using merge estimation region (MER) that enables ME to be designed in parallel has been proposed; but, parallel ME based on MER has still unconsidered problems to be implemented ideally in HEVC test model (HM). Therefore, we propose two strategies to implement stable parallel ME using MER in HM. Through experimental results, the excellence of our proposed methods is shown; the encoding time using the proposed method is reduced by 25.64 percent on average of that of HM which uses sequential ME.

Enhanced NOW-Sort on a PC Cluster with a Low-Speed Network (저속 네트웍 PC 클러스터상에서 NOW-Sort의 성능향상)

  • Kim, Ji-Hyoung;Kim, Dong-Seung
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.10
    • /
    • pp.550-560
    • /
    • 2002
  • External sort on cluster computers requires not only fast internal sorting computation but also careful scheduling of disk input and output and interprocessor communication through networks. This is because the overall time for the execution is determined by reflecting the times for all the jobs involved, and the portion for interprocessor communication and disk I/O operations is significant. In this paper, we improve the sorting performance (sorting throughput) on a cluster of PCs with a low-speed network by developing a new algorithm that enables even distribution of load among processors, and optimizes the disk read and write operations with other computation/communication activities during the sort. Experimental results support the effectiveness of the algorithm. We observe the algorithm reduces the sort time by 45% compared to the previous NOW-sort[1], and provides more scalability in the expansion of the computing nodes of the cluster as well.

Distributed Test Method using Logical Clock (Logical Clock을 이용한 분산 시험)

  • Choi, Young-Joon;Kim, Myeong-Chul;Seol, Soon-Uk
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.9
    • /
    • pp.469-478
    • /
    • 2001
  • It is difficult to test a distributed system because of the task of controlling concurrent events,. Existing works do not propose the test sequence generation algorithm in a formal way and the amount of message is large due to synchronization. In this paper, we propose a formal test sequence generation algorithm using logical clock to control concurrent events. It can solve the control-observation problem and makes the test results reproducible. It also provides a generic solution such that the algorithm can be used for any possible communication paradigm. In distributed test, the number of channels among the testers increases non-linearly with the number of distributed objects. We propose a new remote test architecture for solving this problem. SDL Tool is used to verify the correctness of the proposed algorithm and it is applied to the message exchange for the establishment of Q.2971 point-to-multipoint call/connection as a case study.

  • PDF

PMSM Sensorless Control using Parallel Reduced-Order Extended Kalman Filter (병렬형 칼만 필터를 사용한 영구 자석 동기 전동기의 센서리스 제어)

  • Jang, Jin-Su;Park, Byoung-Gun;Kim, Tae-Sung;Lee, Dong-Myung;Hyun, Dong-Seok
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.13 no.5
    • /
    • pp.336-343
    • /
    • 2008
  • This paper proposes a novel sensorless control scheme for a Permanent Magnet Synchronous Motor (PMSM) by using a parallel reduced-order Extended Kalman Filter. The proposed scheme can obtain rotor position and speed by back-EKF that is estimated by reduced-order ETD and save computation time great)y due to using a parallel structure that works by turns every sampling time. Therefore, proposed scheme has merits of conventional EKF, and problems of parameter sensitivity are partially overcome. And proposed scheme can safely estimate rotor speed and position by using new algorithms according to driving regions. Experimental results show the validity of the proposed estimation technique, and to verify the merit of the proposed scheme, a comparison of a new reduced-order EKF algorithm with a conventional EKF algorithm has been also made in terms of computation time.

Studies of Parallelism and Performance Enhancements of Computing View Factor for Satellite Thermal Analysis (인공위성 열해석을 위한 복사형상계수 계산기법의 병렬화 및 성능향상 기법 연구)

  • Kim, Min-Ki
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.43 no.12
    • /
    • pp.1079-1088
    • /
    • 2015
  • Parallelism and performance enhancement of calculating view factors in KSDS developed by KARI is introduced in this paper. View factor is an essential parameters of radiation thermal analysis for a spacecraft, and the amount of computation of them is not negligible. Especially, independent integration of view factors at each position of the orbit because the relative displace between solar panel and main body of a satellite varies with the position on the orbit. This paper introduces a range of parallelism of computing view factor and their performance, detection of obstructions by spatial search algorithm based on KD-Tree, and the reduction of the calculation of view factors of a satellite with relative motion between solar panel and main body, called updating fractional view factor matrix, for satellite thermal analysis.

Design of Montgomery Algorithm and Hardware Architecture over Finite Fields (유한 체상의 몽고메리 알고리즘 및 하드웨어 구조 설계)

  • Kim, Kee-Won;Jeon, Jun-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.2
    • /
    • pp.41-46
    • /
    • 2013
  • Finite field multipliers are the basic building blocks in many applications such as error-control coding, cryptography and digital signal processing. Recently, many semi-systolic architectures have been proposed for multiplications over finite fields. Also, Montgomery multiplication algorithm is well known as an efficient arithmetic algorithm. In this paper, we induce an efficient multiplication algorithm and propose an efficient semi-systolic Montgomery multiplier based on polynomial basis. We select an ideal Montgomery factor which is suitable for parallel computation, so our architecture is divided into two parts which can be computed simultaneously. In analysis, our architecture reduces 30%~50% of time complexity compared to typical architectures.

Design of Cryptographic Coprocessor for SEED Algorithm (SEED 알고리즘용 암호 보조 프로세서의 설계)

  • 최병윤
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.9B
    • /
    • pp.1609-1617
    • /
    • 2000
  • In this paper a design of cryptographic coprocessor which implements SEED algorithm is described. To satisfy trade-off between area and speed, the coprocessor has structure in which 1 round operation is divided into three subrounds and then subround is executed for one clock. To improve clock frequency online precomputation scheme for round key is used. To apply the coprocessor to various applications, four operating modes such as ECB, CBC, CFB, and OFB are supported. Also to eliminate performance degradation due to data input and data output time between host computer and coprocesor, background input/output method is used. The cryptographic coprocessor is designed using $0.25{\mu}{\textrm}{m}$ CMOS technology and consists of about 29,300 gates. Its peak performance is about 237 Mbps encryption or decryption rate under 100 Mhz clock frequncy and ECB mode.

  • PDF

An Efficient 4$\times$4 Integer Transform Algorithm on SIMD (SIMD 기반의 효율적인 4$\times$4 정수변환 방법)

  • 유상준;오승준;안창범
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.55-57
    • /
    • 2004
  • DCT(Discrete Cosine Transform)는 현존하는 블록기반 영상 압축 코딩기법의 핵심이 되는 부분이다. 많은 고속 방법이 제안되었으며, 최근 들어 SIMD 병렬구조를 이용한 고속방법들이 제안되고 있다. 본 논문에서는 SIMD명령어를 가지는 프로세서에서 4$\times$4 정수변환의 속도를 최적화하기 위한 알고리즘을 제안한다. 본 논문에서 제안하는 알고리즘은 128비트 SIMD영령어로 확장이 가능하며 비슷한 구조를 가지는 Hadamard 변환에서 적용할 수 있다. 제안하는 방법을 펜티엄4 2.4G에서 구현할 경우 H.264 참조 부호화기의 4$\times$4 정수변환 방법보다 64비트 SIMD 명령어를 사용할 경우 4.34배 128-bit SIMD 명령어를 사용할 경우 6.77배의 성능을 얻을 수 있다.

  • PDF

Efficient Power Allocation Algorithm for Wireless Networks (무선망의 효율적 전력 할당 알고리즘)

  • Ahn, Hong-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.1
    • /
    • pp.103-108
    • /
    • 2016
  • In communication systems the solution of the problem of maximizing the mutual information between the input and output of a channel composed of several subchannels under total power constraint has a waterfilling structure. OFDM and MIMO can be decomposed into parallel subchannels with CSI. Waterfilling solves the problem of optimal power allocation to these subchannels to achieve the rate approaching the channel capacity under total power constraint. In waterfilling, more power is alloted to good channels(high SNR) and less or no power to bad channels to increase the rate of good channels, resulting in channel capacity. Waterfilling finds the exact water level satisfying the power constraint employing an iterative algorithm to estimate and update the water level. In this process computation of partial sums of inverse of square of subchannel gain is repeatedly required. In this paper we reduced the computation time of waterfilling algorithm by replacing the partial sum computation with reference to an array which contains the precomputed partial sums in initialization phase.