Search | Korea Science

Design of Parallel Algorithms for Conventional Matched-Field Processing over Array of DSP Processors (다중 DSP 프로세서 기반의 병렬 수중정합장처리 알고리즘 설계)

Kim, Keon-Wook
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.44 no.4 s.316
- /
- pp.101-108
- /
- 2007
Parallel processing algorithms, coupled with advanced networking and distributed computing architectures, improve the overall computational performance, dependability, and versatility of a digital signal processing system In this paper, novel parallel algorithms are introduced and investigated for advanced sonar algorithm, conventional matched-field processing (CMFP). Based on a specific domain, each parallel algorithm decomposes the sequential workload in order to obtain scalable parallel speedup. Depending on the processing requirement of the algorithm, the computational performance of the parallel algorithm reveals different characteristics. The high-complexity algorithm, CMFP shows scalable parallel performance on the array of DSP processors. The impact on parallel performance due to workload balancing, communication scheme, algorithm complexity, processor speed, network performance, and testbed configuration is explored.
PDF KSCI

Analysis of Barrier Waiting Time and A Synchronizatiion Primitive for High Processor Utilizatiion (배리어 대기시간의분석과 높은 프로세서 효율을 위한 동기화 프리미티브)

Jeong, In-Beom;Lee, Jun-Won
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.2
- /
- pp.189-197
- /
- 1999
배리어 프리미티브는 병렬 프로그램을 수행할 때 계산에 참여한 프로세서들 사이에 동기화를 위하여 사용된다. 그러나 배리어에 일찍 도착한 프로세서들은 나머지 프로세서들이 배리어에 도착할 때 까지 배리어에서 기다리게 되므로 프로세서들의 활용율이 떨어진다. 본 논문에서는 배리어 대기시간의 원인들을 찾기 위하여 병렬 프로그램들을 다양한 그레인 크기들로 수행하였다. 모든 프로세서들이 동일한 개수의 그레인들을 수행함에도 불구하고 그레인 크기에 따라 변화되는 명령어들과 캐쉬 실패들은 배리어 대기 시간에 영향을 미치고 있음을 시험한다. 또한 배리어에서의 맹목적 대기시간을 감소시키기 위하여 동기화 기능을 두 단계로 나누어 수행하는 두 단계 배리어를 제안한다. 모의 실험 결과는 병렬 프로그램의 그레인 크기는 배리어 대기시간에 영향을 미치고 있음을 보여주며 또한 제안된 두단계 배리어가 기존의 배리어 프리미티브보다 배리어에서의 대기시간을 줄여주고 있음을 보여준다.

An Inquiry into‘Multi-Dimensional Construction for UOWHF’: In case of using Finite Processors (‘UOWHF에 대한 다차원 구성 방법’에 대한 고찰： 유한개의 프로세서를 사용한 경우)

장동훈;이원일;이상진;성수학
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 2003.07a
- /
- pp.62-66
- /
- 2003
지금까지 여러 암호학자들에 의해 UOWHF에 대한 구성 방법들, 즉 BLH［1］, XLH［1］, BTH［1］, XTH［1］, Shoup 구성 방법［8］, Suku 구성 방법［5］, 다차원 구성 방법［2］등이 제안되었다. 이중에 BLH, XLH, Shoup 구성 방법은 오직 하나의 프로세서를 이용한다. 반면 BTH, XTH, Sarkar의 구성 방법, 다차원 구성 방법은 병렬 처리 구성 방법으로 처리 속도 측면에서 효율적인 구성 방법들이다. 하지만 BTH, XTH, Sarkar의 구성 방법, 다차원 구성 방법은 입력 메시지의 길이에 따라 필요한 프로세서의 수와 메모리 크기의 증가를 필요로 한다. Sarkar는［6］에서 유한개의 프로세서와 한정된 메모리만 갖고서도 병렬처리 할 수 있는 구성 방법(PUA)을 처음으로 제안하였다. 하지만［6］에서 제안된 구성 방법 PUA는 키 확장 길이 측면에서 Shoup의 구성 방법에 비해 비효율적이다. 본 논문에서는 유한개의 프로세서와 한정된 메모리를 갖고서도 병렬처리(parallel processing)할 수 있으며, 동시에 키 확장 길이 측면에서 Shoup 구성 방법과 동일한‘유한개의 프로세서를 사용한 다차원 구성 방법’을 처음으로 제안한다.
PDF

Design and Performance Analysis of a Parallel Optimal Branch-and-Bound Algorithm for MIN-based Multiprocessors (MIN-based 다중 처리 시스템을 위한 효율적인 병렬 Branch-and-Bound 알고리즘 설계 및 성능 분석)

Yang, Myung-Kook
- Journal of IKEEE
- /
- v.1 no.1 s.1
- /
- pp.31-46
- /
- 1997
In this paper, a parallel Optimal Best-First search Branch-and-Bound(B&B) algorithm(pobs) is designed and evaluated for MIN-based multiprocessor systems. The proposed algorithm decomposes a problem into G subproblems, where each subproblem is processed on a group of P processors. Each processor group uses tile sub-Global Best-First search technique to find a local solution. The local solutions are broadcasted through the network to compute the global solution. This broadcast provides not only the comparison of G local solutions but also the load balancing among the processor groups. A performance analysis is then conducted to estimate the speed-up of the proposed parallel B&B algorithm. The analytical model is developed based on the probabilistic properties of the B&B algorithm. It considers both the computation time and communication overheads to evaluate the realistic performance of the algorithm under the parallel processing environment. In order to validate the proposed evaluation model, the simulation of the parallel B&B algorithm on a MIN-based system is carried out at the same time. The results from both analysis and simulation match closely. It is also shown that the proposed Optimal Best-First search B&B algorithm performs better than other reported schemes with its various advantageous features such as: less subproblem evaluations, prefer load balancing, and limited scope of remote communication.
PDF

Join Operation of Parallel Database System with Large Main Memory (대용량 메모리를 가진 병렬 데이터베이스 시스템의 조인 연산)

Park, Young-Kyu
- Journal of the Korea Society of Computer and Information
- /
- v.12 no.3
- /
- pp.51-58
- /
- 2007
The shared-nothing multiprocessor architecture has advantages in scalability, this architecture has been adopted in many multiprocessor database system. But, if the data are not uniformly distributed across the processors, load will be unbalanced. Therefore, the whole system performance will deteriorate. This is the data skew problem, which usually occurs in processing parallel hash join. Balancing the load before performing join will resolve this problem efficiently and the whole system performance can be improved. In this paper, we will present an algorithm using merit of very large memory to reduce disk access overhead in performing load balancing and to efficiently solve the data skew problem. Also, we will present analytical model of our new algorithm and present the result of some performance study we made comparing our algorithm with the other algorithms in handling data skew.
PDF

시뮬레이션을 이용한 MIND 형 병렬 컴퓨터의 성능분석

Kim, Jong-Hyeon
- ETRI Journal
- /
- v.10 no.3
- /
- pp.101-112
- /
- 1988
본 연구에서는 과학계산용 병렬 컴퓨터 시스팀의 구조를 설계하고, 설계된 컴퓨터 구조의 소프트웨어 시뮬레이터를 개발하였으며, 여러가지 시뮬레이션을 통하여 시스팀의 성능을 분석하였다. 설계된 시스팀은 H/V-bus 병렬 처리 시스팀 아키텍쳐에 기반을 둔것으로 각종 과학계산을 위한 고속의 프로세서간 통신 메카니즘이 확장 설계되었다. SLAM II 및 FORTRAN을 이용하여 개발된 시뮬레이터는 시스팀 변수들을 이용하여 프로세서의 수와 속도 및 통신 메카니즘의 속도를 쉽게 변화시킬 수 있게하여 여러 조건하에서의 시스팀 성능을 분석하는데 사용되었다. 또한 실제 프로그램이 수행되는 상황에서 프로세서 및 통신 메카니즘의 속도가 시스팀 전체 성능에 미치는 영향을 측정하고 분석하기 위하여 벤치마크를 시뮬레이터를 이용하여 풀었다.
PDF

The Improved Processer Bound for Parallel Exponentiation in GF(2^n) (GF(2^n)상에서 병렬 멱승 연산의 프로세서 바운드 향상 기법)

김윤정;박근수;조유근
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.04a
- /
- pp.701-703
- /
- 2000
본 논문에서는 정규 기저 표현(normal bases repersentation)을 갖는 GF(2n)상에서의 병렬 멱승 연산에 있어서 2 가지의 개선 사항을 기술한다. 첫째는,k를 윈도우 길이로 할 때 라운드가 [log k]+[log[n/k]]로 고정된 경우에 현재까지 알려진 방법보다 더 작은 수의 프로세서를 갖는 방안이다. 둘째는 점근적인(asymptotic)분석을 통하여 GF(2n)상에서의 병렬 멱승 연산이 O(n/log2n)개의 프로세서로 O(logn)라운드에 수행될 수 있음을 보인다. 이것은 m로세서 $\times$라운드의 바운드를 O(n/logn)으로 하는 것으로 이전까지 알려졌던 O(n)을 개선한 것이다.
PDF

Compression-Based Volume Rendering on Distributed Memory Parallel Computers (분산 메모리 구조를 갖는 병렬 컴퓨터 상에서의 압축 기반 볼륨 렌더링)

Koo, Gee-Bum;Park, Sang-Hun;Song, Dong-Sub;Ihm, In-Sung
- Journal of KIISE:Computing Practices and Letters
- /
- v.6 no.5
- /
- pp.457-467
- /
- 2000
본 논문에서는 분산 메모리 구조를 갖는 병렬 컴퓨터 상에서 방대한 크기를 갖는 볼륨 데이터의 효과적인 가시화를 위한 병렬 광선 투사법을 제안한다. 데이터의 압축을 기반으로 하는 본 기법은 다른 프로세서의 메모리로부터 데이터를 읽기보다는 자신의 지역 메모리에 존재하는 압축된 데이터를 빠르게 복원함으로써 병렬 렌더링 성능을 향상시키는 것을 목표로 한다. 본 기법은 객체-순서와 영상-순서 탐색 알고리즘 모두의 정점을 이용하여 성능을 향상시켰다. 즉, 블록 단위의 최대-최소 팔진트리의 탐색과 각 픽셀의 불투명도 값을 동적으로 유지하는 실시간 사진트리를 응용함으로써 객체-공간과 영상-공간 각각의 응집성을 이용하였다. 본 논문에서 제안하는 압축 기반 병렬 볼륨 렌더링 방법은 렌더링 수행 중 발생하는 프로세서간의 통신을 최소화하도록 구현되었는데, 이러한 특징은 프로세서 사이의 상당히 높은 데이터 통신 비용을 감수하여야 하는 PC 및 워크스테이션의 클러스터와 같은 더욱 실용적인 분산 환경에서 매우 유용하다. 본 논문에서는 Cray T3E 병렬 컴퓨터 상에서 Visible Man 데이터를 이용하여 실험을 수행하였다.
PDF

Acceleration for Removing Sea-fog using Graphic Processors and Parallel Processing (그래픽 프로세서를 이용한 병렬연산 기반 해무 제거 고속화)

Kim, Young-doo;Kwak, Jae-min;Seo, Young-ho;Choi, Hyun-jun
- Journal of Advanced Navigation Technology
- /
- v.21 no.5
- /
- pp.485-490
- /
- 2017
In this paper, we propose a technique for high speed removal of sea-fog using a graphic processor. This technique uses a host processor(CPU) and several graphics processors(GPU) capable of parallel processing to remove sea-fog from the input image. In the process of removing sea-fog, the dark channel extraction, the maximum brightness channel extraction, and the calculation of the transmission are performed by the host processor, and the process of refining the transmission by applying the bidirectional filter is performed in parallel through the graphic processor. To verify the proposed parallel processing method, three NVIDIA GTX 1070 GPUs were used to construct the verification environment. As a result, it takes about 140ms when implemented with one graphics processor, and 26ms when implemented using OpenMP and multiple GPGPUs. The proposed a parallel processing algorithm based on the graphics processor unit can be used for safe navigation, port control and monitoring system.
https://doi.org/10.12673/jant.2017.21.5.485 인용 PDF KSCI

Improved Parallel Loop Scheduling Algorithm on Shared Memory Systems (공유메모리 시스템에서 개선된 병렬 루프 스케쥴링 알고리즘)

이영규;박두순
- Proceedings of the Korea Multimedia Society Conference
- /
- 2000.04a
- /
- pp.453-457
- /
- 2000
병렬 시스템 환경에서 최적의 스케쥴링을 수행하기 위해서는 병렬성을 가진 iteration 들에 대해 최소의 동기화 오버헤드와 load balance 가 달성하도록 스케쥴링을 수행해야한다. 다중 프로세서들은 실행을 위하여 메모리로부터 iteration 들에 대한 chunk를 계산한 후 할당받게 된다. 이때, 각 프로세서들의 상호 배타적인 메모리 접근으로 많은 오버헤드 및 병목현상이 발생된다. 또한, 프로세서에게 할당된 chunk 내 iteration 들의 실행시간 분포가 서로 상이한 경우에는 load imbalance 의 원인이 되어 결과적으로 전체 스케쥴링에 나쁜 영향을 준다. 따라서, 최적의 스케쥴링을 수행하기 위해서 본 논문에서는 기존의 스케쥴링 방법들에서 문제점들을 도출하고 자료의 국부성과 프로세서 동족성을 고려한 개선된 병렬 루프 알고리즘을 제안하고, 성능평가를 통해 개선된 알고리즘이라는 것을 보였다.
PDF

Search Result 578, Processing Time 0.046 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)