Abstract
The problem of computing tandem repetitions with K possible mismatches is studied. Two main definitions are considered, and for both of them an O(nK log K + S) algorithm is proposed (S the size of the output). This improves, in particular, the bound obtained in [17].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
G. Benson. An algorithm for finding tandem repeats of unspecified pattern size. In S. Istrail, P. Pevzner, and M. Waterman, editors, Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB 98), pages 20–29. ACM Press, 1998.
G. Benson. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research, 27(2):573–580, 1999.
G. Benson and M. Waterman. A method for fast database search for all k-nucleotide repeats. Nucleic Acids Research, 22:4828–4836, 1994.
R. Cole and R. Hariharan. Approximate string matching: A simpler faster algorithm. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 463–472, San Francisco, California, 25–27 January 1998.
M. Crochemore and W. Rytter. Squares, cubes, and time-space efficient string searching. Algorithmica, 13:405–425, 1995.
M. Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12:244–250, 1981.
M. Crochemore. Recherche linéaire d’un carré dans un mot. Comptes Rendus Acad. Sci. Paris Sér. I Math., 296:781–784, 1983.
M. Giraud and G. Kucherov. Maximal repetitions and application to DNA sequences. In Proceedings of the Journées Ouvertes: Biologie, Informatique et Mathématiques, pages 165–172, Montpelier, 3–5 mai 2000.
Z. Galil and J. Seiferas. Time-space optimal string matching. Journal of Computer and System Sciences, 26(3):280–294, 1983.
D. Gusfield. Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, 1997.
C.S. Iliopoulos, D. Moore, and W.F. Smyth. A characterization of the squares in a Fibonacci string. Theoretical Computer Science, 172:281–291, 1997.
R. Kolpakov and G. Kucherov. Finding maximal repetitions in a word in linear time. In Proceedings of the 1999 Symposium on Foundations of Computer Science, New York (USA). IEEE Computer Society, October 17–19 1999.
R. Kolpakov and G. Kucherov. On maximal repetitions in words. In Proceedings of the 12-th International Symposium on Fundamentals of Computation Theory, 1999, Iasi (Romania), Lecture Notes in Computer Science, August 30–September 3 1999.
R. Kolpakov and G. Kucherov. Finding approximate repetitions under hamming distance. Technical Report RR-4163, INRIA, avril 2001. available at http://www.inria.fr/rrrt/rr-4163.html.
S. R. Kosaraju. Computation of squares in string. In M. Crochemore and D. Gusfield, editors, Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, number 807 in Lecture Notes in Computer Science, pages 146–150. Springer Verlag, 1994.
M. Lothaire. Combinatorics on Words, volume 17 of Encyclopedia of Mathematics and Its Applications. Addison Wesley, 1983.
G. Landau and J. Schmidt. An algorithm for approximate tandem repeats. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching, number 684 in Lecture Notes in Computer Science, pages 120–133, Padova, Italy, 1993. Springer-Verlag, Berlin.
G. Landau, J. Schmidt, and D. Sokol. An algorithm for approximate tandem repeats. Journal of Computational Biology, 8(1):1–18, 2001.
M. G. Main. Detecting leftmost maximal periodicities. Discrete Applied Mathematics, 25:145–153, 1989.
M.G. Main and R.J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5(3):422–432, 1984.
E. Rivals, O. Delgrange, J-P. Delahaye, and M. Dauchet. A first step towards chromosome analysis by compression algorithms. In N.G. Bourbakis, editor, Proceedings of the 1st IEEE Symposium on Intelligence in Neural and Biological Systems (INBS), Herndon, VA. IEEE Computer Society, May 29–31 1995.
M. Rodeh, V.R. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal of the ACM, 28(1):16–24, Jan 1981.
J. Stoye and D. Gusfield. Linear time algorithms for finding and representing all the tandem repeats in a string. Technical Report CSE-98-4, Computer Science Department, University of California, Davis, 1998.
A.O. Slisenko. Detection of periodicities and string matching in real time. Journal of Soviet Mathematics, 22:1316–1386, 1983.
M.-F. Sagot and E.W. Myers. Identifying satellites in nucleic acid sequences. In fS. Istrail, P. Pevzner, and M. Waterman, editors, Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB 98), pages 234–242. ACM Press, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kolpakov, R., Kucherov, G. (2001). Finding Approximate Repetitions under Hamming Distance. In: auf der Heide, F.M. (eds) Algorithms — ESA 2001. ESA 2001. Lecture Notes in Computer Science, vol 2161. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44676-1_14
Download citation
DOI: https://doi.org/10.1007/3-540-44676-1_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42493-2
Online ISBN: 978-3-540-44676-7
eBook Packages: Springer Book Archive