Abstract Tree structured data such as HTML/XML files are represented by rooted trees with ordered children and edge labels. Knowledge representations for tree structured data are quite important to discover interesting features which such tree structured data have. In this paper, as a representation of structural features we propose a structured ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and structured variables. A variable in a term tree can be substituted by an arbitrary tree.
Deciding whether or not each given tree structured data has structural features is a core problem for data mining of large tree structured data. We consider a problem of deciding whether or not a term tree t matches a tree T, that is, T is obtained from t by substituting some trees for variables in t. Such a problem is called a membership problem for t and T. Given a term tree t and a tree T, we present an O(nN) time algorithm of solving the membership problem for t and T, where n and N are the numbers of vertices in t and T, respectively. We also report some experiments on applying our matching algorithm to a collection of real Web documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, 2000.
T. R. Amoth, P. Cull, and P. Tadepalli. Exact learning of tree patterns from queries and counterexamples. Proc. COLT-98, ACM Press, pages 175–186, 1998.
T. R. Amoth, P. Cull, and P. Tadepalli. Exact learning of unordered tree patterns from queries. Proc. COLT-99, ACM Press, pages 323–332, 1999.
D. Angluin. Finding patterns common to a set of strings. Journal of Computer and System Science, 21:46–62, 1980.
H. Arimura, H. Sakamoto, and S. Arikawa. Efficient learning of semi-structured data from queries. Proc. ALT-2001, Springer-Verlag, LNAI 2225, pages 315–331, 2001.
M. Fernandez and D. Suciu. Optimizing regular path expressions using graph schemas. Proceedings of the 14th Internatinal Conference on Data Engineering (ICDE-98), IEEE Computer Society, pages 14–23, 1998.
S. Matsumoto, Y. Hayashi, and T. Shoudai. Polynomial time inductive inference of regular term tree languages from positive data. Proc. ALT-97, Springer-Verlag, LNAI 1316, pages 212–227, 1997.
S. Matsumoto, T. Shoudai, T. Miyahara, and T. Uchida. Learning of Finite Unions vof Tree Patterns with Internal Structured Variables from Queries. Proc. AI-2002, Springer-Verlag, LNAI (to appear), 2002.
T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Polynomial time matching algorithms for tree-like structured patterns in knowledge discovery. Proc. PAKDD-2000, Springer-Verlag, LNAI 1805, pages 5–16, 2000.
T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Discovery of frequent tree structuted patterns in semistructured web documents. Proc. PAKDD-2001, Springer-Verlag, LNAI 2035, pages 47–52, 2001.
T. Miyahara, Y. Suzuki, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Discovery of frequent tag tree patterns in semistructured web documents. Proc. PAKDD-2002, Springer-Verlag, LNAI 2336, pages 341–355, 2002.
T. Shinohara. Polynomial time inference of extended regular pattern languages. In Springer-Verlag, LNCS 147, pages 115–127, 1982.
T. Shoudai, T. Miyahara, T. Uchida, and S. Matsumoto. Inductive inference of regular term tree languages and its application to knowledge discovery. Information Modelling and Knowledge Bases XI, IOS Press, pages 85–102, 2000.
T. Shoudai, T. Uchida, and T. Miyahara. Polynomial time algorithms for finding unordered term tree patterns with internal variables. Proc. FCT-2001, Springer-Verlag, LNCS 2138, pages 335–346, 2001.
Y. Suzuki, R. Akanuma, T. Shoudai, T. Miyahara, and T. Uchida. Polynomial time inductive inference of ordered tree patterns with internal structured variables from positive data. Proc. COLT-2002, Springer-Verlag, LNAI 2375, pages 169–184, 2002.
Y. Suzuki, T. Shoudai, T. Miyahara, and T. Uchida. Ordered Term Tree Languages Which Are Polynomial Time Inductively Inferable from Positive Data. Proc. ALT-2002, Springer-Verlag, LNAI (to appear), 2002.
K. Wang and H. Liu. Discovering structural association of semistructured data. IEEE Trans. Knowledge and Data Engineering, 12:353–371, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suzuki, Y., Inomae, K., Shoudai, T., Miyahara, T., Uchida, T. (2003). A Polynomial Time Matching Algorithm of Structured Ordered Tree Patterns for Data Mining from Semistructured Data. In: Matwin, S., Sammut, C. (eds) Inductive Logic Programming. ILP 2002. Lecture Notes in Computer Science(), vol 2583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36468-4_18
Download citation
DOI: https://doi.org/10.1007/3-540-36468-4_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00567-4
Online ISBN: 978-3-540-36468-9
eBook Packages: Springer Book Archive