• Title/Summary/Keyword: 병렬 최적 구현

Search Result 88, Processing Time 0.023 seconds

Comparison on Various Acquisition Method for GPS L1 C/A (GPS L1 C/A 기반의 신호 획득부 구현 및 비교)

  • Park, Jiwoon;Yoo, Hoyoung
    • Journal of IKEEE
    • /
    • v.24 no.2
    • /
    • pp.649-653
    • /
    • 2020
  • GPS is a representative satellite navigation system that provides users with accurate location and time information. GPS L1 C / A is opened for civilian and thus utilized in various fields. When the satellite signal reaches the receiver, signal acquisition unit of the digital signal processing hardware searches and acquires the signal among visible satellites. The signal acquisition unit has different implementation methods depending on the signal searching method, such as serial search acquisition, parallel frequency search, parallel code phase search. In this paper, we compare and analyze the three representative acquisition hardwares using live GPS L1 C/A signals. According to the comparison, the parallel code phase search acquisition outperforms the other methods due to reduction of the number of the searchings and a high resolution.

Implementation of Parallel Local Alignment Method for DNA Sequence using Apache Spark (Apache Spark을 이용한 병렬 DNA 시퀀스 지역 정렬 기법 구현)

  • Kim, Bosung;Kim, Jinsu;Choi, Dojin;Kim, Sangsoo;Song, Seokil
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.608-616
    • /
    • 2016
  • The Smith-Watrman (SW) algorithm is a local alignment algorithm which is one of important operations in DNA sequence analysis. The SW algorithm finds the optimal local alignment with respect to the scoring system being used, but it has a problem to demand long execution time. To solve the problem of SW, some methods to perform SW in distributed and parallel manner have been proposed. The ADAM which is a distributed and parallel processing framework for DNA sequence has parallel SW. However, the parallel SW of the ADAM does not consider that the SW is a dynamic programming method, so the parallel SW of the ADAM has the limit of its performance. In this paper, we propose a method to enhance the parallel SW of ADAM. The proposed parallel SW (PSW) is performed in two phases. In the first phase, the PSW splits a DNA sequence into the number of partitions and assigns them to multiple nodes. Then, the original Smith-Waterman algorithm is performed in parallel at each node. In the second phase, the PSW estimates the portion of data sequence that should be recalculated, and the recalculation is performed on the portions in parallel at each node. In the experiment, we compare the proposed PSW to the parallel SW of the ADAM to show the superiority of the PSW.

Performance Improvement of Parallel Processing System through Runtime Adaptation (실행시간 적응에 의한 병렬처리시스템의 성능개선)

  • Park, Dae-Yeon;Han, Jae-Seon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.7
    • /
    • pp.752-765
    • /
    • 1999
  • 대부분 병렬처리 시스템에서 성능 파라미터는 복잡하고 프로그램의 수행 시 예견할 수 없게 변하기 때문에 컴파일러가 프로그램 수행에 대한 최적의 성능 파라미터들을 컴파일 시에 결정하기가 힘들다. 본 논문은 병렬 처리 시스템의 프로그램 수행 시, 변화하는 시스템 성능 상태에 따라 전체 성능이 최적화로 적응하는 적응 수행 방식을 제안한다. 본 논문에서는 이 적응 수행 방식 중에 적응 프로그램 수행을 위한 이론적인 방법론 및 구현 방법에 대해 제안하고 적응 제어 수행을 위해 프로그램의 데이타 공유 단위에 대한 적응방식(적응 입도 방식)을 사용한다. 적응 프로그램 수행 방식은 프로그램 수행 시 하드웨어와 컴파일러의 도움으로 프로그램 자신이 최적의 성능을 얻을 수 있도록 적응하는 방식이다. 적응 제어 수행을 위해 수행 시에 병렬 분산 공유 메모리 시스템에서 프로세서 간 공유될 수 있은 데이타의 공유 상태에 따라 공유 데이타의 크기를 변화시키는 적응 입도 방식을 적용했다. 적응 입도 방식은 기존의 공유 메모리 시스템의 공유 데이타 단위의 통신 방식에 대단위 데이타의 전송 방식을 사용자의 입장에 투명하게 통합한 방식이다. 시뮬레이션 결과에 의하면 적응 입도 방식에 의해서 하드웨어 분산 공유 메모리 시스템보다 43%까지 성능이 개선되었다. Abstract On parallel machines, in which performance parameters change dynamically in complex and unpredictable ways, it is difficult for compilers to predict the optimal values of the parameters at compile time. Furthermore, these optimal values may change as the program executes. This paper addresses this problem by proposing adaptive execution that makes the program or control execution adapt in response to changes in machine conditions. Adaptive program execution makes it possible for programs to adapt themselves through the collaboration of the hardware and the compiler. For adaptive control execution, we applied the adaptive scheme to the granularity of sharing adaptive granularity. Adaptive granularity is a communication scheme that effectively and transparently integrates bulk transfer into the shared memory paradigm, with a varying granularity depending on the sharing behavior. Simulation results show that adaptive granularity improves performance up to 43% over the hardware implementation of distributed shared memory systems.

Design and Implementation of a Scalable Framework for Parallel Program Performance Visualization (병렬 프로그램 성능가시화를 위한 확장성 있는 프레임워크 설계 및 구현)

  • Moon, Sang-Su;Moon, Young-Shik;Kim, Jung-Sun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.2
    • /
    • pp.109-120
    • /
    • 2001
  • In this paper, we propose the design and implementation of a portable, extensible, and efficient performance visualization framework for high performance parallel program development. The framework adopts a layered architecture:consists of three independent layers instrumentation layer, trace interface layer and visualization layer. The instrumentation layer was constructed as an ECL which captures generated events, and the EDL/JPAL constitutes the trace interface layer to provide problem-oriented interfaces between visualization layer and instrumentation layer. Finally, the visualization layer was designed as plug-and-play style for easy elimination, addition and composition of various filters, views and view groups, The proposed performance visualization framework is expected to be used as an independent performance debugging and analysis tool and as a core component in an integrated parallel programming environment.

  • PDF

Parallel Multistage Interconnection Switching Network for Broadband ISDN (광대역 ISDN을 위한 병렬 다단계 상호 연결 스위치 네트워크)

  • 박병수
    • Proceedings of the KAIS Fall Conference
    • /
    • 2002.11a
    • /
    • pp.209-211
    • /
    • 2002
  • 본 연구에서는 비교적 하드웨어의 복잡성보다는 효율적인 Routing 알고리즘을 통하여 스위칭 네트워크의 성능을 향상시킬 수 있는 Sort-Banyan을 기본으로 한 스위칭 구조를 근간으로 하여 하드웨어 구조의 개선과 그에 맞는 최적의 Routing 알고리즘을 개발하고자 한다. 따라서 고속 통신망의 스위치 네트워크를 구현하기 위해 두 단계의 패킷 분배 결정 알고리즘을 구성하고 그에 따라 분배를 결정하여 패킷이 전송될 때 출력 단에서 충돌이 발생하지 않도록 사전에 선택적으로 전송함으로서 패킷의 손실을 방지하는 패킷 스위치 네트워크를 제안한다.

Optimal control of DSTATCOM for voltage sag compensation (EMTDC를 이용한 배전 선로 전압 보상을 위한 병렬 보상기의 최적 제어기 구현)

  • Jung, Soo-Young;Moon, Seung-Il;Kim, Tae-Hyun;Han, Byung-Moon
    • Proceedings of the KIEE Conference
    • /
    • 2001.11b
    • /
    • pp.320-322
    • /
    • 2001
  • 본 논문에서는 전압 sag보상을 하기 위한 DSTATCOM 제어기를 설계하고 EMTDC/PSCAD로 확인하였다. DSTATCOM의 전류성분을 d,q분해 해석을 통하여 상태방정식을 유도하고 부하모델과 네트워크의 제약조건을 결합 모델을 제시하였다. 1선 지락 사고시 PI 제어시보다 LQR 제어의 응답 특성이 우수함을 검증하고 전압 sag가 개선됨을 보였다.

  • PDF

Exploration of Optimization Environment for CUDA-based Cholesky Decomposition (CUDA 기반 숄레스키 분해 성능 최적화 환경 탐색)

  • Junbeom Kang;Myungho Lee;Neungsoo Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.15-17
    • /
    • 2024
  • 최근 다양한 연구 분야에서는 CUDA 프레임워크를 이용하여 병렬 처리를 통해 연산 시간을 단축하는데 성공하고 있다. 이 중 숄레스키 분해는 양의 정부호 행렬을 하삼각행렬로 분해하는 과정에서 많은 행렬 곱셈이 요구되어 GPU 의 구조적 특징을 활용하면 상당한 가속화가 가능하다. 따라서 이 논문에서는 CUDA 코어에 연산을 할당할 때, 핵심 요소인 블록의 개수와 블록 당 쓰레드 개수를 조절할 수 있는 병렬 숄레스키 분해 연산 프로그램을 구현하였다. 서로 다른 세 종류의 행렬 크기에 대해 다양한 블록 수-쓰레드 수 환경을 설정하여 가속화 정도를 측정한 결과, 각 행렬 별 최적 환경에서 동일 그룹 내 최장 시간 대비, 1000x1000 행렬에서는 약 1.80 배, 2000x2000 행렬에서는 약 2.94 배의 추가적인 가속화를 달성하였다.

Parallel Sorting Algorithm by Median-Median (중위수의 중위수에 의한 병렬 분류 알고리즘)

  • Min, Yong-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1E
    • /
    • pp.14-21
    • /
    • 1995
  • This paper presents a parallel sorting algorithm suitable for the SIMD multiprocessor. The algorithm finds pivots for partitioning the data into ordered subsets. The data can be evenly distributed to be sorted since it uses the probability theory. For n data elements to be sorted on p processors, when $n{\geq}p^2$, the algorithm is shown to be asymptotically optimal. In practice, sorting 8 million data items on 64 processors achieved a 48.43-fold speedup, while the PSRS required a 44.4-fold speedup. On a variety of shared and distributed memory machines, the algorithm achieved better than half-linear speedups.

  • PDF

Parallel Implementation of LSH Using SSE and AVX (SSE와 AVX를 활용한 LSH의 병렬 최적 구현)

  • Pack, Cheolhee;Kim, Hyun-il;Hong, Dowon;Seo, Changho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.1
    • /
    • pp.31-39
    • /
    • 2016
  • Hash function is a cryptographic primitive which conduct authentication, signature and data integrity. Recently, Wang et al. found collision of standard hash function such as MD5, SHA-1. For that reason, National Security Research Institute in Korea suggests a secure structure and efficient hash function, LSH. LSH consists of three steps, initialization, compression, finalization and computes hash value using addition in modulo $2^W$, bit-wise substitution, word-wise substitution and bit-wise XOR. These operation is parallelizable because each step is independently conducted at the same time. In this paper, we analyse LSH structure and implement it over SIMD-SSE, AVX and demonstrate the superiority of LSH.

Optimal-synchronous Parallel Simulation for Large-scale Sensor Network (대규모 센서 네트워크를 위한 최적-동기식 병렬 시뮬레이션)

  • Kim, Bang-Hyun;Kim, Jong-Hyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.5
    • /
    • pp.199-212
    • /
    • 2008
  • Software simulation has been widely used for the design and application development of a large-scale wireless sensor network. The degree of details of the simulation must be high to verify the behavior of the network and to estimate its execution time and power consumption of an application program as accurately as possible. But, as the degree of details becomes higher, the simulation time increases. Moreover, as the number of sensor nodes increases, the time tends to be extremely long. We propose an optimal-synchronous parallel discrete-event simulation method to shorten the time in a large-scale sensor network simulation. In this method, sensor nodes are partitioned into subsets, and each PC that is interconnected with others through a network is in charge of simulating one of the subsets. Results of experiments using the parallel simulator developed in this study show that, in the case of the large number of sensor nodes, the speedup tends to approach the square of the number of PCs participating in the simulation. In such a case, the ratio of the overhead due to parallel simulation to the total simulation time is so small that it can be ignored. Therefore, as long as PCs are available, the number of sensor nodes to be simulated is not limited. In addition, our parallel simulation environment can be constructed easily at the low cost because PCs interconnected through LAN are used without change.