Abstract
Many documents such as Web documents or XML files have no rigid structure. Such semistructured documents have been rapidly increasing. We propose a new method for discovering frequent tree structured patterns in semistructured Web documents. We consider the data mining problem of finding all maximally frequent tag tree patterns in semistructured data such as Web documents. A tag tree pattern is an edge labeled tree which has hyperedges as variables. An edge label is a tag or a keyword in Web documents, and a variable can be substituted by any tree. So a tag tree pattern is suited for representing tree structured patterns in semistructured Web documents. We present an algorithm for finding all maximally frequent tag tree patterns. Also we report some experimental results on XML documents by using our algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, 2000.
T. Beyer and S. Hedetniemi. Constant time generation of rooted trees. SIAM J. Comput., 9:706–712, 1980.
M. Fernandez and Suciu D. Optimizing regular path expressions using graph schemas. Proc. Intl. Conf. on Data Engineering (ICDE-98), pages 14–23, 1998.
T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Polynomial time matching algorithms for tree-like structured patterns in knowledge discovery. Proc. PAKDD-2000, Springer-Verlag, LNAI 1805, pages 5–16, 2000.
T. Miyahara, T. Uchida, T. Kuboyama, T. Yamamoto, K. Takahashi, and H. Ueda. KD-FGS: a knowledge discovery system from graph data using formal graph system. Proc. PAKDD-99, Springer-Verlag, LNAI 1574, pages 438–442, 1999.
T. Miyahara, T. Shoudai and T. Uchida. Discovery of maximally frequent tag tree patterns in semistructured data. Proc. LA Winter Symposium, Kyoto, pages 15-1–15-10, 2001.
S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. Proc. ACM SIGMOD Conf., pages 295–306, 1998.
T. Shoudai, T. Miyahara, T. Uchida, and S. Matsumoto. Inductive inference of regular term tree languages and its application to knowledge discovery. Information Modelling and Knowledge Base XI, IOS Press, pages 85–102, 2000.
T. Uchida, T. Shoudai, and S. Miyano. Parallel algorithm for refutation tree problem on formal graph systems. IEICE Trans. Inf. Syst., E78-D(2):99–112, 1995.
K. Wang and H. Liu. Discovering structural association of semistructured data. IEEE Trans. Knowledge and Data Engineering, 12:353–371, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H. (2001). Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_8
Download citation
DOI: https://doi.org/10.1007/3-540-45357-1_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive