Abstract
Data Extraction from the World Wide Web is a well known, unsolved, and critical problem when complex information systems are designed. These problems are related to the extraction, management and reuse of the huge amount ofWeb data available. These data usually has a high heterogeneity, volatility and low quality (i.e. format and content mistakes), so it is quite hard to build reliable systems. This chapter proposes an Evolutionary Computation approach to the problem of automatically learn software entities based on Genetic Algorithms and regular expressions. These entities, also called wrappers, will be able to extract some kind of Web data structures from examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
David Camacho, Maria D. R-Moreno, David F. Barrero, and Rajendra Akerkar. Semantic wrappers for semi-structured data extraction. Computing Letters (COLE), 4(1), 2008.
Longbing Cao, Chao Luo, and Chengqi Zhang. Agent-mining interaction: An emerging area. In AIS-ADM, pages 60–73, 2007.
John H. Holland. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press, April 1992.
Marat Kanteev, Igor Minakov, George Rzevski, Petr Skobelev, and Simon Volman. Multiagent meta-search engine based on domain ontology. In AIS-ADM, pages 269–274, 2007.
Nicholas Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118:2000, 2000.
M. Michalowski, J.L. Ambite, S. Thakkar, R. Tuchinda, C.A. Knoblock, and S. Minton. Retrieving and semantically integrating heterogeneous data from the web. IEEE Intelligent Systems, 19(3), 2004.
Ken Thompson. Programming techniques: Regular expression search algorithm. Commun. ACM, 11(6):419–422, 1968.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Barrero, D.F., Camacho, D., R-Moreno, M.D. (2009). Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions. In: Cao, L. (eds) Data Mining and Multi-agent Integration. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0522-2_9
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0522-2_9
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0521-5
Online ISBN: 978-1-4419-0522-2
eBook Packages: Computer ScienceComputer Science (R0)