Adaptive algorithms for crowd-aided categorization

Li, Yuanbing; Wu, Xian; Jin, Yifei; Li, Jian; Li, Guoliang; Feng, Jianhua

doi:10.1007/s00778-021-00685-2

Adaptive algorithms for crowd-aided categorization

Special Issue Paper
Published: 13 August 2021

Volume 31, pages 1311–1337, (2022)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Yuanbing Li¹,
Xian Wu²,
Yifei Jin¹,
Jian Li²,
Guoliang Li ORCID: orcid.org/0000-0002-1398-0621¹ &
…
Jianhua Feng¹

366 Accesses
Explore all metrics

This article has been updated

Abstract

We study the problem of utilizing human intelligence to categorize a large number of objects. In this problem, given a category hierarchy and a set of objects, we can ask humans to check whether an object belongs to a category, and our goal is to find the most cost-effective strategy to locate the appropriate category in the hierarchy for each object, such that the cost (i.e., the number of questions to ask humans) is minimized. There are many important applications of this problem, including image classification and product categorization. We develop an online framework, in which category distribution is gradually learned and thus an effective order of questions are adaptively determined. We prove that even if the true category distribution is known in advance, the problem is computationally intractable. We develop an approximation algorithm, and prove that it achieves an approximation factor of 2. We also show that there is a fully polynomial time approximation scheme for the problem. Furthermore, we propose an online strategy which achieves nearly the same performance guarantee as the offline optimal strategy, even if there is no knowledge about category distribution beforehand. We develop effective techniques to tolerate crowd errors. Experiments on a real crowdsourcing platform demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification-oriented dawid skene model for transferring intelligence from crowds to machines

Article 27 December 2022

The helpfulness of category labels in semi-supervised learning depends on category structure

Article 24 June 2015

Large-Scale R-CNN with Classifier Adaptive Quantization

Change history

30 July 2023
This article was revised due to update in article title.

Notes

http://www.mturk.com/.
An algorithm A is an FPTAS for an optimization problem P, if for any instance ${\mathcal {I}}$ and any constant $\epsilon > 0$, A computes a solution S in polynomial time w.r.t. the problem size n and $1/\epsilon $, of which the value of S satisfies $|\textsf {OPT} - \textsf {val}(A)| \le \epsilon \textsf {OPT}$, where OPT stands for the value of the optimal solution and $\textsf {val}(A)$ is the solution returned by A.
In the paper, for any category u, we use u as its corresponding node in $\mathcal {T}$, and a question node asked at u in $\mathcal {D}$.
Here we only assign non-negative weights to the nodes, which could be normalized to obtain the probabilities.
In the decision trees, we omit the leaves with zero weight, since they do not contribute to the costs.
If we can randomly permute all the objects, we can show that the true distribution can indeed be learned according to the law of large numbers. However, the objects may come in an online fashion and a random permutation pre-processing step may not be always possible.
We are inspired by the ideas in [29] and [15].
This could be trivially extend to the batched algorithm.
http://image-net.org/.

References

Bragg, J., Weld, D.S., et al.: Crowdsourcing multi-label classification for taxonomy creation. In: AAAI, (2013)
Chilton, L.B., Little, G., Edge, D., Weld, D.S., Landay, J.A.: Cascade: Crowdsourcing taxonomy creation. In: CHI, pp. 1999–2008. ACM, (2013)
Cicalese, F., Jacobs, T., Laber, E., Molinaro, M.: On the complexity of searching in trees and partially ordered structures. Theor. Comput. Sci. 412(50), 6879–6896 (2011)
Article MathSciNet MATH Google Scholar
Cicalese, F., Jacobs, T., Laber, E., Molinaro, M.: Improved approximation algorithms for the average-case tree searching problem. Algorithmica 68(4), 1045–1074 (2014)
Article MathSciNet MATH Google Scholar
Cicalese, F., Jacobs, T., Laber, E.S., Molinaro, M.: On greedy algorithms for decision trees. In: ISAAC, pp. 206–217, (2010)
Das Sarma, A., Parameswaran, A., Garcia-Molina, H., Halevy, A.: Crowd-powered find algorithms. In: ICDE, pp. 964–975, (2014)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255, (2009)
Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: icrowd: an adaptive crowdsourcing framework. In: SIGMOD, pp. 1015–1030, (2015)
Gao, Y., Parameswaran, A.: Finish them!: pricing algorithms for human computation. PVLDB 7(14), 1965–1976 (2014)
Google Scholar
Gharibshah, Z., Zhu, X., Hainline, A., Conway, M.: Deep learning for user interest and response prediction in online display advertising. Data Sci. Eng. 5(1), 12–26 (2020)
Article Google Scholar
Ipeirotis, P.G., Provost, F., Sheng, V.S., Wang, J.: Repeated labeling using multiple noisy labelers. Data Mining Knowl. Discov. 28(2), 402–441 (2014)
Article MathSciNet MATH Google Scholar
Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. J. Comput. Syst. Sci. 71(3), 291–307 (2005)
Article MathSciNet MATH Google Scholar
Kaplan, H., Lotosh, I., Milo, T., Novgorodov, S.: Answering planning queries with the crowd. PVLDB 6(9), 697–708 (2013)
Google Scholar
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961, (2011)
Kundu, S., Misra, J.: A linear tree partitioning algorithm. SIAM J. Comput. 6(1), 151–154 (1977)
Article MathSciNet MATH Google Scholar
Li, G.: Human-in-the-loop data integration. Proc. VLDB Endow. 10(12), 2006–2017 (2017)
Article Google Scholar
Li, G., Chai, C., Fan, J., et al.: CDB: a crowd-powered database system. PVLDB 11(12), 1926–1929 (2018)
Google Scholar
Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. IEEE Trans. Knowl. Data Eng. 28(9), 2296–2319 (2016)
Article Google Scholar
Li, G., Zheng, Y., Fan, J., Wang, J., Cheng, R.: Crowdsourced data management: overview and challenges. In: SIGMOD, pp. 1711–1716, (2017)
Li, K., Li, G.: Approximate query processing: What is new and where to go? Data Sci. Eng. 3(4), 379–397 (2018)
Article Google Scholar
Li, M., Wang, H., Li, J.: Mining conditional functional dependency rules on big data. Big Data Mining Anal. 03(01), 68 (2020)
Article Google Scholar
Lin, C., Weld, D.S., et al.: To re (label), or not to re (label). In: AAAI, (2014)
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Human Lang. Technol. 5(1), 1–167 (2012)
Article MathSciNet Google Scholar
Marcus, A., Karger, D., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)
Google Scholar
Marcus, A., Wu, E., Karger, D., Madden, S., Miller, R.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)
Google Scholar
Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: query processing with people. In: CIDR, pp. 211–214, (2011)
Ni, J., Li, J., McAuley, J.: Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: EMNLP, pp. 188–197, (2019)
Parameswaran, A., Boyd, S., Garcia-Molina, H., Gupta, A., Polyzotis, N., Widom, J.: Optimal crowd-powered rating and filtering algorithms. PVLDB 7(9), 685–696 (2014)
Google Scholar
Parameswaran, A., Sarma, A.D., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. PVLDB 4(5), 267–278 (2011)
Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372, (2012)
Parameswaran, A.G., Park, H., Garcia-Molina, H., Polyzotis, N., Widom, J.: Deco: declarative crowdsourcing. In: CIKM, pp. 1203–1212, (2012)
Sun, Y., Singla, A., Fox, D., Krause, A.: Building hierarchies of concepts via crowdsourcing. In: IJCAI, pp. 844–851, (2015)
Tao, Y., Li, Y., Li, G.: Interactive graph search. In: SIGMOD, pp. 1393–1410, (2019)
Tian, S., Mo, S., Wang, L., Peng, Z.: Deep reinforcement learning-based approach to tackle topic-aware influence maximization. Data Sci. Eng. 5(1), 1–11 (2020)
Article Google Scholar
Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW, pp. 989–998, (2012)
Vesdapunt, N., Bellare, K., Dalvi, N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)
Google Scholar
Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)
Google Scholar
Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD, pp. 229–240, (2013)
Wang, Y., Yao, Y., Tong, H., Xu, F., Lu, J.: A brief review of network embedding. Big Data Mining Anal. 2(1), 35 (2019)
Article Google Scholar
Wang, Y., Yuan, Y., Ma, Y., Wang, G.: Time-dependent graphs: definitions, applications, and algorithms. Data Sci. Eng. 4(4), 352–366 (2019)
Article Google Scholar
Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)
Google Scholar
Zhang, C.J., Tong, Y., Chen, L.: Where to: crowd-aided path selection. PVLDB 7(14), 2005–2016 (2014)
Google Scholar
Zheng, Y., Li, G., Cheng, R.: DOCS: domain-aware crowdsourcing system. PVLDB 10(4), 361–372 (2016)
Google Scholar
Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: Is the problem solved? PVLDB 10(5), 541–552 (2017)
Google Scholar
Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: a quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046, (2015)

Download references

Acknowledgements

This paper was supported by NSF of China (61925205, 61632016), Huawei, TAL education, and Beijing National Research Center for Information Science and Technology (BNRist).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Yuanbing Li, Yifei Jin, Guoliang Li & Jianhua Feng
Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, Beijing, China
Xian Wu & Jian Li

Authors

Yuanbing Li
View author publications
You can also search for this author in PubMed Google Scholar
Xian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jian Li
View author publications
You can also search for this author in PubMed Google Scholar
Guoliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoliang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Wu, X., Jin, Y. et al. Adaptive algorithms for crowd-aided categorization. The VLDB Journal 31, 1311–1337 (2022). https://doi.org/10.1007/s00778-021-00685-2

Download citation

Received: 20 December 2020
Revised: 15 May 2021
Accepted: 04 June 2021
Published: 13 August 2021
Issue Date: November 2022
DOI: https://doi.org/10.1007/s00778-021-00685-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive algorithms for crowd-aided categorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classification-oriented dawid skene model for transferring intelligence from crowds to machines

The helpfulness of category labels in semi-supervised learning depends on category structure

Large-Scale R-CNN with Classifier Adaptive Quantization

Change history

30 July 2023

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Adaptive algorithms for crowd-aided categorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classification-oriented dawid skene model for transferring intelligence from crowds to machines

The helpfulness of category labels in semi-supervised learning depends on category structure

Large-Scale R-CNN with Classifier Adaptive Quantization

Change history

30 July 2023

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation