Search | Korea Science

Parallelization of CUSUM Test in a CUDA Environment (CUDA 환경에서 CUSUM 검증의 병렬화)

Son, Changhwan;Park, Wooyeol;Kim, HyeongGyun;Han, KyungSook;Pyo, Changwoo
- KIISE Transactions on Computing Practices
- /
- v.21 no.7
- /
- pp.476-481
- /
- 2015
We have parallelized the cumulative sum (CUSUM) test of NIST's statistical random number test suite in a CUDA environment. Storing random walks in an array instead of in scalar variables eliminates data dependence. The change in data structure makes it possible to apply parallel scans, scatters, and reductions at each stage of the test. In addition, serial data exchanges between CPU and GPU are removed by migrating CPU's tasks to GPU. Finally we have optimized global memory accesses. The overall speedup is 23 times over the sequential version. Our results contribute to improving security of random numbers for cryptographic keys as well as reducing the time for evaluation of randomness.
https://doi.org/10.5626/KTCP.2015.21.7.476 인용 KSCI

The Simplified Coarse FFT Window Position Recovery Algorithm for OFDM System (OFDM 시스템의 단순화된 대략적인 FFT 윈도우 위치 복원 알고리즘)

박소라;도상현;김동규;최형진;최장진
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 1997.11a
- /
- pp.107-110
- /
- 1997
OFDM 방식을 다중 반송파 변조방식(Multi-Carrier Modulation)의 일종으로 다수의 부반송파를 이용하여 전송신호를 병렬로 전송하는 방식이다. OFDM 시스템에서는 심볼간의 간섭을 줄이기 위해 보호구간을 삽입하여 송신한다. 이 보호구간은 유효 데이터 구간의 일부분을 복사하여 신호의 앞부분에 첨가하는 것으로 복조에 사용되지 않는 여분의 신호이다. 수신된 OFDM 신호를 복조하기 위해 FFT를 사용하는데 FFT의 입력에는 보호구간을 제외한 유효 데이터 부분만을 이용해야 한다. 이 때 보호구간 제거를 위해 사용되는 것이 FFT 윈도우이다. 이 논문에서는 반송파 주파수 옵셋의 영향을 받지 않는 대략적인 FFT 윈도우 위치 복원에 대한 알고리즘을 제안하고 AWGN과 20개의 다중경로 채널 환경에서 컴퓨터 모의 수행으로 성능을 평가하였다.
PDF

Systolic Array Implementaion for 2-D IIR Digital Filter and Design of PE Cell (2-D IIR 디지탈필터의 시스토릭 어레이 실현 및 PE셀 설계)

박노경;문대철;차균현
- The Journal of the Acoustical Society of Korea
- /
- v.12 no.1E
- /
- pp.39-47
- /
- 1993
2-Dimension IIR 디지털 필터를 시스토릭 어레이 구조로 실현하는 방법을 보였다. 시스토릭 어레이는 1-D IIR 디지털 필터로 부분 실현한 후 종속연결하여 구현하였다. 부분 실현한 시스토릭 어레이의 종속 연결은 신호 지연에 사용되는 요소를 감소 시킨다. 여기서 1-D 시스토릭 어레이는 local communication 접근에 의해 DG를 설계한후 SFG로의 사상을 통해 유도하였다. 유도된 구조는 매우 간단하며, 입력 샘플이 공급되어지면 매 샘플링 기간마다 새로운 출력을 얻는 매우 높은 데이터 처리율을 갖는다. 2-Dimension IIR 디지털 필터를 시스토릭 어레이로 실현함으로써 규칙적이고, modularity, local interconnection, 높은 농기형 다중처리의 특징을 갖기 때문에 VLSI 실현에 매우 적합하다. 또한 PE셀의 승산기 설계에서는 modified Booth's 알고리즘과 Ling's 알고리즘에 기초를 두고 고도의 병렬처리를 행할수 있도록 설계하였다.
PDF

Architecture design of small Reed-Solomon decoder by Berlekamp-Massey algorithm (Berlekamp-Massey 알고리즘을 이용한 소형 Reed-Solomon 디코우더의 아키텍쳐 설계)

Chun, Woo-Hyung;Song, Nag-Un
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.1
- /
- pp.306-312
- /
- 2000
In this paper, the efficient architecture of small Reed-solomon architecture is suggested. Here, 3-stage pipeline is adopted. In decoding, error-location polynomials are obtained by BMA using fast iteration method, and syndrome polynomials, where calculation complexity is required, are obtained by parallel calculation using ROM table, and the roots of error location polynomial are calculated by ROM table using Chein search algorithm. In the suggested decoder, it is confirmed that 3 symbol random errors can be corrected and 124Mbps decoding rate is obtained using 25 Mhz system clock.
PDF

Parallelization of Probabilistic RoadMap for Generating UAV Path on a DTED Map (DTED 맵에서 무인기 경로 생성을 위한 Probabilistic RoadMap 병렬화)

Noh, Geemoon;Park, Jihoon;Min, Chanoh;Lee, Daewoo
- Journal of the Korean Society for Aeronautical & Space Sciences
- /
- v.50 no.3
- /
- pp.157-164
- /
- 2022
In this paper, we describe how to implement the mountainous terrain, radar, and air defense network for UAV path planning in a 3-D environment, and perform path planning and re-planning using the PRM algorithm, a sampling-based path planning algorithm. In the case of the original PRM algorithm, the calculation to check whether there is an obstacle between the nodes is performed 1:1 between nodes and is performed continuously, so the amount of calculation is greatly affected by the number of nodes or the linked distance between nodes. To improve this part, the proposed LineGridMask method simplifies the method of checking whether obstacles exist, and reduces the calculation time of the path planning through parallelization. Finally, comparing performance with existing PRM algorithms confirmed that computational time was reduced by up to 88% in path planning and up to 94% in re-planning.
https://doi.org/10.5139/JKSAS.2022.50.3.157 인용 PDF KSCI

Implementation of LDPC Decoder using High-speed Algorithms in Standard of Wireless LAN (무선 랜 규격에서의 고속 알고리즘을 이용한 LDPC 복호기 구현)

Kim, Chul-Seung;Kim, Min-Hyuk;Park, Tae-Doo;Jung, Ji-Won
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.12
- /
- pp.2783-2790
- /
- 2010
In this paper, we first review LDPC codes in general and a belief propagation algorithm that works in logarithm domain. LDPC codes, which is chosen 802.11n for wireless local access network(WLAN) standard, require a large number of computation due to large size of coded block and iteration. Therefore, we presented three kinds of low computational algorithms for LDPC codes. First, sequential decoding with partial group is proposed. It has the same H/W complexity, and fewer number of iterations are required with the same performance in comparison with conventional decoder algorithm. Secondly, we have apply early stop algorithm. This method reduces number of unnecessary iterations. Third, early detection method for reducing the computational complexity is proposed. Using a confidence criterion, some bit nodes and check node edges are detected early on during decoding. Through the simulation, we knew that the iteration number are reduced by half using subset algorithm and early stop algorithm is reduced more than one iteration and computational complexity of early detected method is about 30% offs in case of check node update, 94% offs in case of check node update compared to conventional scheme. The LDPC decoder have been implemented in Xilinx System Generator and targeted to a Xilinx Virtx5-xc5vlx155t FPGA. When three algorithms are used, amount of device is about 45% off and the decoding speed is about two times faster than convectional scheme.
https://doi.org/10.6109/jkiice.2010.14.12.2783 인용 PDF KSCI

Improved Parallel Thinning Algorithm for Fingerprint image Processing (지문영상 처리를 위한 개선된 병렬 세선화 알고리즘)

권준식
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.41 no.3
- /
- pp.73-81
- /
- 2004
To extract the creditable features in fingerprint image, many people use the thinning algorithm that has a very important position in the preprocessing. In this paper, we propose the robust parallel thinning algorithm that can preserve the connectivity of the binarized fingerprint image, make the thinnest skeleton with 1-pixel width and get near to the medial axis extremely. The proposed thinning method repeats three sub-iterations. The first sub-iteration takes off only the outer boundary pixel by using the interior points. To extract the one side skeletons, the second sub-iteration finds the skeletons with 2-pixel width. The third sub-iteration prunes the needless pixels with 2-pixel width existing in the obtained skeletons and then the proposed thinning algorithm has the robustness against the rotation and noise and can make the balanced medial axis. To evaluate the performance of the proposed thinning algorithm we compare with and analyze the previous algorithms.
PDF KSCI

A Co-design Method for JPEG2000 Video Compression System in Telemetry using DSP and FPGA (DSP와 FPGA의 Co-design을 이용한 원격측정용 임베디드 JPEG2000 시스템구현)

Yu, Jae-Taeg;Hyun, Myung-Han;Nam, Ju-Hun
- Journal of the Korean Society for Aeronautical & Space Sciences
- /
- v.39 no.9
- /
- pp.896-903
- /
- 2011
In this paper, a co-design method for JPEG2000 video compression system using DSP and FPGA is presented. By profiling the complexity of JPEG2000 algorithm, it is noticed that a MQ-coder is the most complex part. Thus, we implement the MQ-coder on FPGA for the parallel processing using VHDL to reduce the complexity. In order to verify the performance of the MQ-coder, JBIG2 standard test vector and images are used. The experimental results show that the proposed MQ-coder enhances the processing time approximately 3 times compared with the previous software MQ-coder.
https://doi.org/10.5139/JKSAS.2011.39.9.896 인용 PDF KSCI

A Thread Partitioning of Conditional Expression of Non-Strict Programs for Multithreaded Models (다중스레드 모델을 위한 Non-Strict 프로그램의 조건식 스레드 분할)

조선문;김기태;고훈준;이갑래;유원희
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04a
- /
- pp.67-69
- /
- 2001
다중스레드 모델은 긴 메모리 참조 지체 시간과 동기화의 문제점을 해결할 수 있다는 점에서 대규모 병렬 시스템에 매우 효과적이다. 다중스레드 병렬기계를 위하여 Non-Strict 함수 프로그램을 번역할 때 가장 중요한 것은 순차적으로 수행될 수 있는 부분을 찾아내어 스레드로 분할하는 것이다. 스레드 분할의 목적은 스레드의 크기를 크게 만들어 Non-Strict 함수 프로그램이 수행되는 동안 발생하는 동기화 횟수와 스레드간의 문맥 전환 횟수를 최소화하는 것이다. 본 논문에서는 Non-Strict 함수 프로그램을 보다 큰 스레드로 분할하는 조건식의 스레드 분할 알고리즘을 제안한다.

The Enhanced Thread Partitioning of Conditional Expressions of Non-Strict Programs (Non-Strict 프로그램 조건식의 향상된 스레드 분할)

Jo, Sun-Moon;Yang, Chang-Mo;Yoo, Weon-Hee
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.04a
- /
- pp.277-280
- /
- 2000
다중스레드 병렬기계(multithreaded parallel machine)를 위하여 함수 프로그램을 번역할 때 스레드 분할이란 수행 순서를 번역시간에 알 수 있어 정적 스케줄링이 가능한 프로그램의 부분을 식별하여 스레드로 모으는 작업을 말한다. 조건식에서 연산의 수행 순서는 판단식 -> 참실행식 또는 판단식 -> 거짓실행식이므로 번역시간에는 수행순서를 결정할 수 없다. 따라서 기존의 분할 알고리즘은 조건식의 판단식, 참실행식, 거짓실행식을 기본 블록으로 나누고 각각에 대하여 지역 분할을 적용한다. 이러한 제약은 스레드의 정의를 약간 수정하여 스레드 내에서의 분기를 허용한다면 좀더 좋은 분할을 얻을 수 있다. 스레드내에서의 분기는 병렬성을 감소시키거나 동기화의 횟수를 증가시키거나 또는 교착상태를 발생시키는 등의 스레드 분할의 기본 원칙을 어기지 않으며 오히려 스레드 길이를 증가시키거나 동기화 횟수를 줄이는 장점을 가질 수 있다. 본 논문에서는 조건식의 세 가지 기본 블록을 하나 또는 두 개의 기본 블록으로 병합함으로서 스레드 분할을 향상시키는 방법을 제안한다.
PDF

Search Result 94, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)