Abstract
Regular expressions, or simply regex, have been widely used as a powerful pattern matching and text extractor tool through decades. Although they provide a powerful and flexible notation to define and retrieve patterns from text, the syntax and the grammatical rules of these regex notations are not easy to use, and even to understand. Any regex can be represented as a Deterministic or Non-Deterministic Finite Automata; so it is possible to design a representation to automatically build a regex, and a optimization algorithm able to find the best regex in terms of complexity. This paper introduces both, a graph-based representation for regex, and a particular heuristic-based evolutionary computing algorithm based on grammatical features from this language in a particular data extraction problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barrero, D.F., Camacho, D., R-Moreno, M.D.: Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions. In: Data Mining and Multiagent Integration. Springer, Heidelberg (2009)
Chang, C.-H., Paige, R.: From regular expressions to dfa’s using compressed nfa’s, pp. 90–110 (1992)
Cox, R. (ed.): Regular expression matching can be simple and fast (2007)
Dunay, B.D., Petry, F., Buckles, B.P.: Regular language induction with genetic programming. In: Proceedings of the 1994 IEEE World Congress on Computational Intelligence, Orlando, Florida, USA, pp. 396–400. IEEE Press, Los Alamitos (1994)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2008)
Friedl, J.E.F.: Mastering Regular Expressions. O’Reilly & Associates, Inc., Sebastopol (2002)
Gold, E.M.: Complexity of automaton identification from given data. Inform. Control 37, 302–320 (1978)
Kleene, S.C.: Representation of events in nerve nets and finite automata. In: Shannon, C.E., McCarthy, J. (eds.) Automata studies, vol. 34, pp. 3–40 (1956)
Thompson, K.: Regular expression search algorithm. Comm. Assoc. Comp. Mach. 11(6), 419–422 (1968)
Zipf, G.: The psycho-biology of language. Houghton Mifflin, Boston (1935)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
González-Pardo, A., Barrero, D.F., Camacho, D., R-Moreno, M.D. (2010). A Case Study on Grammatical-Based Representation for Regular Expression Evolution. In: Demazeau, Y., et al. Trends in Practical Applications of Agents and Multiagent Systems. Advances in Intelligent and Soft Computing, vol 71. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12433-4_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-12433-4_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12432-7
Online ISBN: 978-3-642-12433-4
eBook Packages: EngineeringEngineering (R0)