Abstract
Distribution testing deals with what information can be deduced about an unknown distribution over \(\{1,\ldots ,n\}\), where the algorithm is only allowed to obtain a relatively small number of independent samples from the distribution. In the extended conditional sampling model, the algorithm is also allowed to obtain samples from the restriction of the original distribution on subsets of \(\{1,\ldots ,n\}\). In 2015, Canonne, Diakonikolas, Gouleakis and Rubinfeld unified several previous results, and showed that for any property of distributions satisfying a “decomposability” criterion, there exists an algorithm (in the basic model) that can distinguish with high probability distributions satisfying the property from distributions that are far from it in the variation distance. We present here a more efficient yet simpler algorithm for the basic model, as well as very efficient algorithms for the conditional model, which until now was not investigated under the umbrella of decomposable properties. Additionally, we provide an algorithm for the conditional model that handles a much larger class of properties. Our core mechanism is an algorithm for efficiently producing an interval-partition of \(\{1,\ldots ,n\}\) that satisfies a “fine-grain” quality. We show that with such a partition at hand we can avoid the search for the “correct” partition of \(\{1,\ldots ,n\}\).
Similar content being viewed by others
Notes
The lower bounds for unconditional and non-adaptive conditional testing of L-decomposable properties with \(L=1\) are exactly the lower bounds for uniformity testing; the lower bound for adaptive conditional testing follows easily from the proved existence of properties that have no sub-linear complexity adaptive conditional tests; finally, the lower bound for properties k-characterized by atlases with \(k=1\) is just a bound for a symmetric property constructed there. About the last one, we conjecture that there exist properties with much higher lower bounds.
References
Alon, N., Andoni, A., Kaufman, T., Matulef, K., Rubinfeld, R., Xie, N.: Testing k-wise and almost k-wise independence. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (New York, NY, USA), STOC ’07, ACM, pp. 496–505 (2007)
Acharya, J., Canonne, C.L., Kamath, G.: A chasm between identity and equivalence testing with conditional queries. In: Garg, N., Jansen, K., Rao, A., Rolim, J.D.P., (eds.) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2015, August 24–26, 2015, Princeton, NJ, USA, LIPIcs, vol. 40, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, pp. 449–466 (2015)
Acharya, J., Daskalakis, C., Kamath, G.: Optimal testing for properties of distributions. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp. 3591–3599 (2015)
Batu, T., Fortnow, L., Fischer, E., Kumar, R., Rubinfeld, R., White, P.: Testing random variables for independence and identity. In: 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14–17 October 2001, Las Vegas, Nevada, USA, pp. 442–451 (2001)
Batu, T., Fortnow, L., Rubinfeld, R., Smith, W.D., White, P.: Testing that distributions are close. In: 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, 12–14 November 2000, Redondo Beach, California, USA, pp. 259–269. IEEE Computer Society (2000)
Batu, T., Kumar, R., Rubinfeld, R.: Sublinear algorithms for testing monotone and unimodal distributions,. In: Babai, L. (ed.) Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13–16, 2004, pp. 381–390. ACM (2004)
Canonne, C.L.: A survey on distribution testing: your data is big. But is it blue? Electron. Colloq. Comput. Complex. (ECCC) 22, 63 (2015)
Canonne, C.L., Diakonikolas, I., Gouleakis, T., Rubinfeld, R.: Testing shape restrictions of discrete distributions. In: 33rd Symposium on Theoretical Aspects of Computer Science, STACS 2016, February 17–20, 2016, Orléans, France, pp. 25:1–25:14 (2016)
Chakraborty, S., Fischer, E., Goldhirsh, Y., Matsliah, A.: On the power of conditional samples in distribution testing. SIAM J. Comput. 45(4), 1261–1296 (2016)
Canonne, C.L., Ron, D., Servedio, R.A.: Testing probability distributions using conditional samples. SIAM J. Comput. 44(3), 540–616 (2015)
Diakonikolas, I.: Learning structured distributions. In: Handbook of Big Data, p. 267 (2016)
Diakonikolas, I., Kane, D.M.: A new approach for testing properties of discrete distributions. In: IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9–11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pp. 685–694 (2016)
Diakonikolas, I., Lee, H.K., Matulef, K., Onak, K., Rubinfeld, R., Servedio, R.A., Wan, A.: Testing for concise representations, In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20–23, 2007, Providence, RI, USA, Proceedings, pp. 549–558 (2007)
Goldreich, O., Ron, D.: On testing expansion in bounded-degree graphs. Electron. Colloquium Comput. Complex. (ECCC) 7(20) (2000)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963). MR 0144363
Indyk, P., Levi, R., Rubinfeld, R.: Approximating and testing k-histogram distributions in sub-linear time. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20–24, 2012, pp. 15–22 (2012)
Paninski, L.: A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inf. Theory 54(10), 4750–4755 (2008)
Servedio, R.A.: Testing by implicit learning: a brief survey. In: Property Testing—Current Research and Surveys [outgrow of a workshop at the Institute for Computer Science (ITCS) at Tsinghua University, January 2010], pp. 197–210 (2010)
Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. Electron. Colloq. Comput. Complex. (ECCC) 17, 179 (2010)
Valiant, G., Valiant, P.: An automatic inequality prover and instance optimal identity testing. SIAM J. Comput. 46(1), 429–455 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version with less refined bounds appeared in the Proceedings of the 34th STACS (2017).
Rights and permissions
About this article
Cite this article
Fischer, E., Lachish, O. & Vasudev, Y. Improving and Extending the Testing of Distributions for Shape-Restricted Properties. Algorithmica 81, 3765–3802 (2019). https://doi.org/10.1007/s00453-019-00598-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-019-00598-1