Abstract
In this paper, we study efficient computation of tree similarity for ordered trees based on compressed subtree enumeration. The compressed subtree enumeration is a new paradigm of enumeration algorithms that enumerates all subtrees of an input tree T in the form of their compressed bit signatures. For the task of enumerating all compressed bit signatures of k-subtrees in an ordered tree T, we first present an enumeration algorithm in O(k)-delay, and then, present another enumeration algorithm in constant-delay using O(n) time preprocessing that directly outputs bit signatures. These algorithms are designed based on bit-parallel speed-up technique for signature maintenance. By experiments on real and artificial datasets, both algorithms showed approximately 22% to 36% speed-up over the algorithms without bit-parallel signature maintenance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. KDD 2002, pp. 71–80 (2002)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: Proc. SDM 2002 (2002)
Chim, H., Deng, X.: A new suffix tree similarity measure for document clustering. In: Proc. WWW 2007, pp. 121–130 (2007)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proc. of Advances in Neural Information Processing Systems, NIPS, pp. 625–632 (2001)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press (2001)
Goldberg, L.A.: Polynomial space polynomial delay algorithms for listing families of graphs. In: Proc. ACM STOC 1993, pp. 218–225. ACM (1993)
Jansson, J., Sadakane, K., Sung, W.-K.: CRAM: Compressed random access memory. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 510–521. Springer, Heidelberg (2012)
Kashima, H., Koyanagi, T.: Kernels for semi-structured data. In: Proc. 19th ICML 2002, pp. 291–298. Morgan Kaufmann Publishers Inc. (2002)
Kimura, D., Kuboyama, T., Shibuya, T., Kashima, H.: A subpath kernel for rooted unordered trees. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 62–74. Springer, Heidelberg (2011)
Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F.: An efficient unordered tree kernel and its application to glycan classification. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS(LNAI), vol. 5012, pp. 184–195. Springer, Heidelberg (2008)
Kuboyama, T., Hirata, K., Kashima, H., Aoki-Kinoshita, K.F., Yasuda, H.: A spectrum tree kernel. Information and Media Technologies 22(2), 292–299 (2007)
Kudo, T., Maeda, E., Matsumoto, Y.: An application of boosting to graph classification. In: Proc. NIPS 2004 (2004)
Lakkaraju, P., Gauch, S., Speretta, M.: Document similarity based on concept tree distance. In: Proc. 19th ACM HT 2008, pp. 127–132 (2008)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press (2008)
Nakano, S.: Efficient generation of plane trees. IPL 84(3), 167–172 (2002)
Tsuda, K., Kudo, T.: Clustering graphs by weighted substructure mining. In: Proc. 23rd ICML, pp. 953–960. ACM (2006)
Wasa, K., Kaneta, Y., Uno, T., Arimura, H.: Constant time enumeration of bounded-size subtrees in trees and its application. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds.) COCOON 2012. LNCS, vol. 7434, pp. 347–359. Springer, Heidelberg (2012)
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: Proc. VLDB 2005, pp. 709–720 (2005)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proc. KDD 2002, pp. 71–80 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wasa, K., Hirata, K., Uno, T., Arimura, H. (2013). Faster Algorithms for Tree Similarity Based on Compressed Enumeration of Bounded-Sized Ordered Subtrees. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-41062-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41061-1
Online ISBN: 978-3-642-41062-8
eBook Packages: Computer ScienceComputer Science (R0)