DOI QR코드

DOI QR Code

Parallel Algorithms for Finding Consensus of Circular Strings

환형문자열에 대한 대표문자열을 찾는 병렬 알고리즘

  • 김동희 (인하대학교 컴퓨터정보공학과) ;
  • 심정섭 (인하대학교 컴퓨터정보공학과)
  • Received : 2014.11.20
  • Accepted : 2014.12.29
  • Published : 2015.03.15

Abstract

The consensus problem is finding a representative string, called a consensus, of a given set S of k strings. Circular strings are different from linear strings in that the last symbol precedes the first symbol. Given a set S of circular strings of length n over an alphabet ${\Sigma}$, we first present an $O({\mid}{\Sigma}{\mid}nlogn)$ time parallel algorithm for finding a consensus of S minimizing both radius and distance sum when k=3 using O(n) threads. Then we present an $O({\mid}{\Sigma}{\mid}n^2logn)$ time parallel algorithm for finding a consensus of S minimizing distance sum when k=4 using O(n) threads. Finally, we compare execution times of our algorithms implemented using CUDA with corresponding sequential algorithms.

대표문자열 문제는 k개의 문자열로 구성된 집합 S가 주어졌을 때 S를 대표하는 한 문자열인 대표문자열을 찾는 문제이다. 환형문자열은 일반적인 문자열과는 달리 문자열의 첫 글자와 마지막 글자가 연결되어 원 모양을 이루는 문자열이다. 본 논문에서는 먼저 k=3이고 길이 n인 환형문자열들로 구성된 S에 대해, 거리반경과 거리합을 동시에 고려한 대표문자열 문제를 O(n)개의 쓰레드를 사용하여 $O({\mid}{\Sigma}{\mid}nlogn)$ 시간에 병렬적으로 해결하는 알고리즘을 제시한다. 이때, ${\Sigma}$는 각 문자열을 구성하는 문자집합이다. 다음으로 k=4이고 길이 n인 환형문자열들로 구성된 S에 대해 거리합 기반 대표문자열 문제를 O(n)개의 쓰레드를 사용하여 $O({\mid}{\Sigma}{\mid}n^2logn)$ 시간에 병렬적으로 해결하는 알고리즘을 제시한다. 이후 두 문제에 대한 병렬 알고리즘들을 CUDA를 이용하여 구현하고 순차 알고리즘들과의 실행 속도를 비교한 결과를 제시한다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. D. Gusfield, "Algorithms on Strings, Tree, and Sequences," Cambridge University Press, Cambridge, 1997.
  2. J. D. Thompson, T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins, "The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools," Nucleic acids research, Vol. 25, No. 24, pp. 4876-4882, 1997. https://doi.org/10.1093/nar/25.24.4876
  3. R. C. Edgar, "MUSCLE: a multiple sequence alignment method with reduced time and space complexity," BMC bioinformatics, Vol. 5, No. 1, pp. 113, 2004. https://doi.org/10.1186/1471-2105-5-113
  4. A. Amir, G. M. Landau, J. C. Na, H. Park, K. Park, and J. S. Sim, "Efficient algorithms for consensus string problems minimizing both distance sum and radius," Theoretical Computer Science, Vol. 412, No. 39, pp. 5239-5246, 2011. https://doi.org/10.1016/j.tcs.2011.05.034
  5. J. C. Na, J. S. Sim, "The Consensus String Problem based on Radius is NP-complete," Journal of KIISE: Computer Systems and Theory, Vol. 36, No. 3, pp. 135-139, 2009. (in Korean)
  6. M. Frances, A. Litman, "On covering problems of codes," Theory of Computing Systems, Vol. 30, No. 2, pp. 113-119, 1997. https://doi.org/10.1007/BF02679443
  7. J. Gramm, R. Niedermeier, and P. Rossmanith, "Exact solutions for closest string and related problems," Proc. of the 12th International Symposium on Algorithms and Computation, pp. 441-453, 2001.
  8. T. Lee, J. C. Na, K. Park, and J. S. Sim, "Multiple Sequence Alignment for Circular Strings based on Sum-of-Pairs," Journal of KIISE : Computer Systems and Theory, Vol. 38, No. 3, pp. 117-122, 2011. (in Korean)
  9. T. Lee, J. C. Na, H. Park, K. Park, and J. S. Sim, "Finding consensus and optimal alignment of circular strings," Theoretical Computer Science, Vol. 468, pp. 92-101, 2013. https://doi.org/10.1016/j.tcs.2012.11.018
  10. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, "Introduction to Algorithms," 3rd edition, The MIT Press, 2009.
  11. A. Tumeo, O. Villa, and D. Sciuto, "Efficient pattern matching on GPUs for intrusion detection systems," Proc. of the 7th ACM international conference on Computing frontiers, pp. 87-88, 2010.
  12. A. Tumeo, O. Villa, "Accelerating DNA analysis applications on GPU clusters," Application Specific Processors (SASP), 2010 IEEE 8th Symposium on. IEEE, pp. 71-76, 2010.
  13. H. Yoon, J. S. Sim, "Parallel Construction for the Graph Model of the Longest Common Non-superstring using CUDA," Journal of KIISE : Computer Systems and Theory, Vol. 39, No. 3, pp. 202-208, 2012. (in Korean)
  14. Y. H. Kim, J. H. Jeong, D. W. Kang, and J. S. Sim, "Parallel Computation For The Edit Distance Based On The Four-Russians' Algorithm," KIPS Transactions on Computer and Communication Systems, Vol. 2, No. 2, pp. 67-74, 2013. (in Korean) https://doi.org/10.3745/KTCCS.2013.2.2.067
  15. D. H. Kim, J. S. Sim, "A CUDA Implementation of Approximate Pattern Matching Using Polynomial Multiplication," Journal of KIISE : Computer Systems and Theory, Vol. 40, No. 6, pp. 290-295, 2013. (in Korean)
  16. A. Amir, M. Lewenstein, and E. Porat, "Faster algorithms for string matching with k mismatches," Journal of Algorithms, Vol. 50, No. 2, pp. 257-275, 2004. https://doi.org/10.1016/S0196-6774(03)00097-X