Abstract
In this paper we propose a linear regression model for multivariate modal symbolic data. The observed variables are probabilistic modal variables according to the definition given in (Bock and Diday (2000). Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer), i.e. variables whose realizations are frequency or probability distributions. The parameters are estimated through a Least Squares method based on a suitable squared distance between the predicted and the observed modal symbolic data: the squared ℓ 2 Wasserstein distance. Measures of goodness of fit are also presented and an application on real data corroborates the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that if x is a vector of scalars Eq. (8) becomes \(\mathbf{{x}^{T}y} =\sum \limits _{ i=1}^{n}x_{i} \cdot \bar{ y}_{i}\).
- 2.
- 3.
We supply the full table of histogram data, the MatlabTM code and workspace upon request.
References
Billard, L., & Diday, E. (2006). Symbolic data analysis: conceptual statistics and data mining. New York: Wiley.
Bock, H., & Diday, E. (2000). Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. New York: Springer.
Cuesta-Albertos, J. A., Matrán, C., & Tuero-Díaz, A. (1997). Optimal transportation plans and convergence in distribution. Journal of Multivariate Analysis, 60, 72–83.
Dias, S., & Brito, P. (2011). A new linear regression model for histogram-valued variables. In 58th ISI World Statistics Congress. Dublin, Ireland.
Diday, E., & Noirhomme-Fraiture, M. (2008). Symbolic data analysis and the SODAS software. New York: Wiley.
Dueñas, C., Fernández, M., Cañete, S., Carretero, J., & Liger, E. (2002). Assessment of ozone variations and meteorological effects in an urban area in the mediterranean coast. Science of The Total Environment, 299(1–3), 97–113.
Gibbs, A., & Su, F. (2002). On choosing and bounding probability metrics. International Statistical Review, 70(3), 419–435.
Irpino, A., & Romano, E. (2007). Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation. Revue des Nouvelles Technologies de l’Information, RNTI-E-9, 99–110.
Lawson, C. L., & Hanson, R. J. (1974). Solving least square problems. Edgeworth Cliff, NJ: Prentice Hall.
Verde, R., & Irpino, A. (2007). Dynamic clustering of histogram data: Using the right metric. In P. e. a. Brito (Ed.) Selected contributions in data analysis and classification (pp. 123–134). New York: Springer.
Verde, R., & Irpino, A. (2007). Dynamic clustering of histogram data: Using the right metric. In P. Brito, G. Cucumel, P. Bertrand, & F. De Carvalho (Eds.) Selected contributions in data analysis and classification (Chap. 12, pp. 123–134). Berlin, Heidelberg: Springer.
Verde, R., & Irpino, A. (2008). Comparing histogram data using a mahalanobis-wasserstein distance. In P. Brito (Ed.) COMPSTAT 2008 (Chap. 7, pp. 77–89). Heidelberg: Physica-Verlag HD.
Verde, R., & Irpino, A. (2010). Ordinary least squares for histogram data based on wasserstein distance. In Y. Lechevallier, & G. Saporta (Eds.) Proceedings of COMPSTAT’2010. (Chap. 60, pp. 581–588). Heidelberg: Physica-Verlag HD.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Irpino, A., Verde, R. (2013). A Metric Based Approach for the Least Square Regression of Multivariate Modal Symbolic Data. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-00032-9_19
Published:
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)