DOI QR코드

DOI QR Code

Parallel Algorithms for Finding δ-approximate Periods and γ-approximate Periods of Strings over Integer Alphabets

정수문자열의 δ-근사주기와 γ-근사주기를 찾는 병렬알고리즘

  • 투고 : 2017.02.20
  • 심사 : 2017.04.11
  • 발행 : 2017.08.15

초록

Repetitive strings have been studied in diverse fields such as data compression, bioinformatics and so on. Recently, two problems of approximate periods of strings over integer alphabets were introduced, finding minimum ${\delta}-approximate$ periods and finding minimum ${\gamma}-approximate$ periods. Both problems can be solved in $O(n^2)$ time when n is the length of the string. In this paper, we present two parallel algorithms for solving the above two problems in O(n) time using $O(n^2)$ threads, respectively. The experimental results show that our parallel algorithms for finding minimum ${\delta}-approximate$ (resp. ${\gamma}-approximate$) periods run approximately 19.7 (resp. 40.08) times faster than the sequential algorithms when n = 10,000.

반복적인 문자열은 데이터압축, 생물정보학 등 여러 분야에서 연구되어 왔다. 본 논문에서는 음악서열이나 주가지수와 같이 정수로 표현될 수 있는 문자열에 대한 반복에 대해 연구한다. 최근 정수문자열의 최소 ${\delta}$-근사주기와 최소 ${\gamma}$-근사주기를 찾는 문제들이 소개되었고, 문자열의 길이가 n일 때, 두 문제를 각각 $O(n^2)$ 시간에 해결하는 알고리즘들이 제시되었다. 본 논문에서는 위의 두 문제에 대해 각각 $O(n^2)$개의 스레드를 이용하여 O(n) 시간에 해결하는 병렬알고리즘을 제시한다. 실험결과, n = 10,000일 때, 본 논문에서 제시하는 병렬알고리즘은 순차알고리즘보다 최소 ${\delta}$-근사주기를 계산하는데 약 19.7배, 최소 ${\gamma}$-근사주기를 계산하는데 약 40.08배 빠른 수행시간을 보였다.

키워드

과제정보

연구 과제 주관 기관 : 한국연구재단

참고문헌

  1. M. Burrows, D. J. Wheeler, "A Block-sorting Lossless Data Compression Algorithm," Technical report 124. Palo Alto, CA: Digital Equipment Corporation, 1994.
  2. A. T. Castelo, W. Martins and G. R. Gao, "TROLL-Tandem Repeat Occurrence Locator," Bioinformatics, Vol. 18, No. 4, pp. 634-636, 2002. https://doi.org/10.1093/bioinformatics/18.4.634
  3. A. Apostolico, D. Breslauer, "An Optimal O(loglog N) Time Parallel Algorithm for Detecting all Squares in a String," SIAM Journal on Computing, Vol. 25, No. 6, pp. 1318-1331, 1996. https://doi.org/10.1137/S0097539793260404
  4. J. S. Sim, C. S. Iliopoulos, K. Park, W. F. Smyth, "Approximate periods of strings," Theoretical Computer Science, Vol. 262, pp. 557-568, 2001. https://doi.org/10.1016/S0304-3975(00)00365-0
  5. J. H. Jeong, Y. H. Kim, J. C. Na, J. S. Sim, "Approximate periods of strings based on distance sum for DNA sequence analysis," KIPS Transactions on Software and Data Engineering, Vol. 2, No. 2, pp. 119-122, 2013. (in Korean) https://doi.org/10.3745/KTSDE.2013.2.2.119
  6. E. Cambouropoulos, M. Crochemore, C. S. Iliopoulos, L. Mouchard, and Y. J. Pinzon, "Algorithms for computing approximate repetitions in musical sequences," International Journal of Computer Mathematics, Vol. 79. No. 11, pp. 1135-1148, 2002. https://doi.org/10.1080/00207160213939
  7. C. S. Iliopoulos, T. Lecroq, L. Mouchard, and Y. J. Pinzon, "Computing Approximate Repetitions in Musical Sequences," Proc. of Prague Stringology Club Workshop PSCW'00, 2000.
  8. I. Lee, J. Mendivelso, Y. J. Pinzon, "${\delta}{\gamma}$-Parameterized Matching," International Symposium on String Processing and Information Retrieval, pp. 236-248, 2008.
  9. Y. Kim, J. S. Sim, "$\delta$-approximate Periods and $\gamma$-approximate Periods of Strings over Integer Alphabets," Journal of KIISE, Vol. 43, No. 10, pp. 1073-1078, 2016. https://doi.org/10.5626/JOK.2016.43.10.1073
  10. W. Liu, B. Schmidt, G. Voss, A. Schroder, and W. Muller-Wittig, "Bio-Sequence Database Scanning on a GPU," 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2006) (HICOMB Workshop), Rhode Island, Greece, 2006.
  11. D. W. Kang, Y. Kim, and J. S. Sim, "Parallel Computation for Extended Edit Distances Including Swap Operations," Journal of KIISE: Computer Systems and Theory, Vol. 41, No. 4, pp. 175-181, 2014. (in Korean)
  12. Y. Kim, J. C. Na, J. S. Sim, "Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU," KIPS Transactions on Computer and Communication Systems, Vol. 4, No. 7, pp. 213-218, 2015. (in Korean) https://doi.org/10.3745/KTCCS.2015.4.7.213
  13. P. Bloomfield, W. L. Steiger, Least Absolute Deviations: theory, applications, and algorithms, Birkhauser Boston, Inc., 1983.