Search | Korea Science

Kim, Sung-Kwon;Kim, Soo-Cheol;Cho, Jung-Sik
- The KIPS Transactions:PartA
- /
- v.15A no.4
- /
- pp.239-242
- /
- 2008
We study the problem of finding the maximum suffix of a string on the external memory model of computation with one disk. In this model, we are primarily interested in designing algorithms that reduce the number of I/Os between the disk and the internal memory. A string of length N has N suffixes and among these, the lexicographically largest one is called the maximum suffix of the string. Finding the maximum suffix of a string plays a crucial role in solving some string problems. In this paper, we present an external memory algorithm for computing the maximum suffix of a string of length N. The algorithm uses four blocks in the internal memory and performs at most 4(N/L) disk I/Os, where L is the size of a block.
https://doi.org/10.3745/KIPSTA.2008.15-A.4.239 인용 PDF KSCI

배진욱;이석호
- Journal of KIISE:Databases
- /
- v.30 no.2
- /
- pp.168-175
- /
- 2003
Until now, substring selectivities have been estimated by two steps. First step is to build up a count-suffix tree, which has statistical information about substrings, and second step is to estimate substring selectivity using it. However, it's actually impossible to build up a count-suffix tree from biological sequences because their lengths are too long. So, this paper proposes a novel data structure, count q-gram tree, consisting of fixed length substrings. The Count q-gram tree retains the exact counts of all substrings whose lengths are equal to or less than q and this tree is generated in 0(N) time and in site not subject to total length of all sequences, N. This paper also presents an estimation technique, k-MO. k-MO can choose overlapping length of splitted substrings from a query string, and this choice will affect accuracy of selectivity and query processing time. Experiments show k-MO can estimate very accurately.
PDF KSCI