Abstract
In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the E-step we apply a forward-backward Rauch-Tung-Striebel smoother to compute the network weights. For the M-step, we derive expressions to compute the model uncertainty and the measurement noise. We find that the method is intrinsically very powerful, simple and stable.
Similar content being viewed by others
References
A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society Series B, vol. 39, 1977, pp. 1–38.
C.F. Chen, “The EM Algorithm to the Multiple Indicators and Multiple Causes Model via the Estimation of the Latent Variable,” Journal of the American Statistical Association, vol. 76, no.375, 1981, pp. 704–708.
M.W. Watson and R.F. Engle, “Alternative Algorithms for the Estimation of Dynamic Factor, MIMIC and Varying Coefficient regression Models,” Journal of Econometric, vol. 23, no.3, 1983, pp. 385–400.
R.H. Shumway and D.S. Stoffer, “An Approach to Time Series Smoothing and Forecasting Using the EM Algorithm,” Journal of Time Series Analysis, vol. 3, no.4, 1982, pp. 253–264.
R.H. Shumway and D.S. Stoffer, “Dynamic Linear Models with Swithcing,” Journal of the American Statistical Association, vol. 86, no.415, 1991, pp. 763–769.
V. Digalakis, J.R. Rohlicek, and M. Ostendorf, “ML Estimation of a Stochastic Linear System with the EM Algorithm and its Application to Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 1, no.4, 1993, pp. 431–442.
B. North and A. Blake, “Learning Dynamical Models using Expectation-Maximisation,” in International Conference on Computer Vision, Mumbai, India, 1998, pp. 384–389.
R.P.N. Rao and D.H. Ballard, “Dynamic Model ofVisual Recognition Predicts Neural Response Properties in theVisual Cortex,” Neural Computation, vol. 9, no.4, 1997, pp. 721–763.
Z. Ghahramani, “Learning Dynamic Bayesian Networks,” in Adaptive Processing of Temporal information, C.L. Giles and M. Gori (Eds.), Springer-Verlag. Lecture Notes in Artificial Intelligence, vol. 1387.
S. Roweis and Z. Ghahramani, “A Unifying Review of Linear Gaussian Models,” Neural Computation, vol. 11, no.2, 1999, pp. 305–345.
S. Singhal and L. Wu, “Training Multilayer perceptrons with the Extended Kalman Algorithm,” in Advances in Neural Information Processing Systems, vol. 1, D.S. Touretzky (Ed.), San Mateo, CA, 1988, pp. 133–140.
S. Shah, F. Palmieri, and M. Datum, “Optimal Filtering Algorithms for Fast Learning in Feedforward Neural Networks,” Neural Networks, vol. 5, no.5, 1992, pp. 779–787.
G.V. Puskorius and L.A. Feldkamp, “Decoupled Extended Kalman Filter Training of Feedforward Layered Networks,” in International Joint Conference on Neural Networks, Seattle, 1991, pp. 307–312.
A. Gelb (Ed.), Applied Optimal Estimation, MIT Press, 1974.
A.H. Jazwinski, Stochastic processes and Filtering Theory, Academic Press, 1970.
H.E. Rauch, F. Tung, and C.T. Striebel, “Maximum Likelihood Estimates of Linear Dynamic Systems,” AIAA Journal, vol. 3, no.8, 1965, pp. 1445–1450.
J.F.G. de Freitas, M. Niranjan, and A.H. Gee, “Hierarchical bayesian-Kalman Models for Regularisation and ARD in Sequential Learning,” Technical Report CUED/F-INFENG/TR 307, Cambridge University Engineering Department, 1997.
A. Graham, Knonecker Products and Matrix Calculus with Applications, Ellis Horwood Limited, 1981.
C. Andrieu, J.F.G. de Freitas, and A. Doucet, “Robust Full Bayesian Learning for Neural Networks,” Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department, 1999.
C.C. Holmes and B.K. Mallick, “Bayesian Radial Basis Functions of Variable Dimesion,” Neural Computation, vol. 10, no.5, 1998, pp. 1217–1233.
D.J.C. Mackay, “A Practical bayesian Framework for Back-propagation Networks,” Neural Computation, vol. 4, no.3, 1992, pp. 448–472.
R.M. Neal, Bayesian Learning for Neural Networks, Springer-Verlag, New York, 1996. Lecture Notes in Statistics, vol. 118.
D. Rios Insua and P. Müller, “Feedforward Neural Networks for Nonparametric Regression,” Technical Report 98-02, Institute of Statistics and Decision Sciences, Duke University, 1998.
S.J. Roberts, W.D. Penny, and D. Pillot, “Novelty, Confidence and Errors in Connectionist Systems,” in IEE Colloquium on Intelligent Sensors andFault Detection, vol. 261, 1996, pp. 10/1–10/6.
J.M. Spyers-Ashby, P. Bain, and S.J. Roberts, “A Comparison of Fast Fourier Transform (FFT) and Autoregressive (AR) Spectral Estimation Techniques for the Analysis of Tremor Data,” Journal of Neuroscience Methods, vol. 83, no.1, 1998, pp. 35–43.
S.J. Roberts and W.D. Penny, “Bayesian Neural Network for Classification: How Useful is the Evidence Framwork?” Neural Networks, vol. 12, 1999, pp. 877–892.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
de Freitas, J., Niranjan, M. & Gee, A. Dynamic Learning with the EM Algorithm for Neural Networks. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 26, 119–131 (2000). https://doi.org/10.1023/A:1008103718973
Published:
Issue Date:
DOI: https://doi.org/10.1023/A:1008103718973