Abstract
A tree pattern p is a first-order term in formal logic, and the language of p is the set of all the tree patterns obtainable by replacing each variable in p with a tree pattern containing no variables. We consider the inductive inference of the unions of these languages from positive examples using strategies that guarantee some forms of minimality during the learning process. By a result in our earlier work, the existence of a characteristic set for each language in a class \({\mathcal L }\) (within \({\mathcal L }\)) implies that \({\mathcal L }\) can be identified in the limit by a learner that simply conjectures a hypothesis containing the examples, that is minimal in the number of elements of up to an appropriate size. Furthermore, if there is a size ℓ such that each candidate hypothesis has a characteristic set (within the languages in \({\mathcal L }\) that intersects non-emptily with the examples) that consists only of elements of up to size ℓ, then the hypotheses containing the least number of elements of up to size ℓ are at the same time minimal with respect to inclusion. In this paper we show how to determine such a size ℓ for the unions of the tree pattern languages, and hence allowing us to learn the class using hypotheses that fulfill the two mentioned notions of minimality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Angluin, D.: Inductive inference of formal languages from positive data. Information and Control 45, 117–135 (1980)
Angluin, D.: Inference of reversible languages. J. of the ACM 29, 741–765 (1982)
Arimura, H., Ishizaka, H., Shinohara, T.: Learning unions of tree patterns using queries. Theoretical Computer Science 185(1), 47–62 (1997)
Arimura, H., Shinohara, T., Otsuki, S.: A polynomial time algorithm for finding finite unions of tree pattern languages. In: Brewka, G., Jantke, K.P., Schmitt, P.H. (eds.) NIL 1991. LNCS (LNAI), vol. 659, pp. 118–131. Springer, Heidelberg (1993)
Arimura, H., Shinohara, T., Otsuki, S.: Polynomial time inference of unions of two tree pattern languages. IEICE Transactions on Information and Systems E75-D(7), 426–434 (1992)
Arimura, H., Shinohara, T., Otsuki, S.: Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. In: Enjalbert, P., Mayr, E.W., Wagner, K.W. (eds.) STACS 1994. LNCS, vol. 775, pp. 649–660. Springer, Heidelberg (1994)
Arimura, H., Shinohara, T., Otsuki, S., Ishizaka, H.: A generalization of the least general generalization. Machine Intelligence 13 13, 59–85 (1994)
Chan, C., Garofalakis, M., Rastogi, R.: RE-tree: an efficient index structure for regular expressions. The VLDB Journal 12(2), 102–119 (2003)
Gold, E.M.: Language identification in the limit. Information and Control 10, 447–474 (1967)
W3C XML Core Working Group. Extensible Markup Language (XML) 1.0, 3rd edn. W3C Recommendation (2004)
Kobayashi, S., Yokomori, T.: Identifiability of subspaces and homomorphic images of zero-reversible languages. In: Proc. of Algorithmic Learning Theory, 8th Int. Conf., ALT 1997. LNCS (LNAI), vol. 1316, pp. 48–61. Springer, Heidelberg (1997)
Ng, Y.K., Ono, H., Shinohara, T.: Measuring over-generalization in the minimal multiple generalizations of biosequences. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 176–188. Springer, Heidelberg (2005)
Ng, Y.K., Shinohara, T.: Inferring unions of the pattern languages by the most fitting covers. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 269–282. Springer, Heidelberg (2005)
Ng, Y.K., Shinohara, T.: Finding consensus patterns in very scarce biosequence samples from their minimal multiple generalizations. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 540–545. Springer, Heidelberg (2006)
Plotkin, G.D.: A note on inductive generalization. Machine Intelligence 5, 153–163 (1970)
Sato, M.: Inductive inference of formal languages. Bulletin of Informatics and Cybernetics 27(1), 85–106 (1995)
Shinohara, T.: Polynomial time inference of extended regular pattern languages. In: Goto, E., Nakajima, R., Yonezawa, A., Nakata, I., Furukawa, K. (eds.) RIMS 1982. LNCS, vol. 147, pp. 115–127. Springer, Heidelberg (1983)
Wright, K.: Identification of unions of languages drawn from an identifiable class. In: Proc. of the Second Annual Workshop on Computational Learning Theory, pp. 328–333. Morgan Kaufmann, San Francisco (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ng, Y.K., Shinohara, T. (2006). Characteristic Sets for Inferring the Unions of the Tree Pattern Languages by the Most Fitting Hypotheses. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_25
Download citation
DOI: https://doi.org/10.1007/11872436_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45264-5
Online ISBN: 978-3-540-45265-2
eBook Packages: Computer ScienceComputer Science (R0)