iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/S00778-021-00685-2
Adaptive algorithms for crowd-aided categorization | The VLDB Journal Skip to main content

Advertisement

Log in

Adaptive algorithms for crowd-aided categorization

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

This article has been updated

Abstract

We study the problem of utilizing human intelligence to categorize a large number of objects. In this problem, given a category hierarchy and a set of objects, we can ask humans to check whether an object belongs to a category, and our goal is to find the most cost-effective strategy to locate the appropriate category in the hierarchy for each object, such that the cost (i.e., the number of questions to ask humans) is minimized. There are many important applications of this problem, including image classification and product categorization. We develop an online framework, in which category distribution is gradually learned and thus an effective order of questions are adaptively determined. We prove that even if the true category distribution is known in advance, the problem is computationally intractable. We develop an approximation algorithm, and prove that it achieves an approximation factor of 2. We also show that there is a fully polynomial time approximation scheme for the problem. Furthermore, we propose an online strategy which achieves nearly the same performance guarantee as the offline optimal strategy, even if there is no knowledge about category distribution beforehand. We develop effective techniques to tolerate crowd errors. Experiments on a real crowdsourcing platform demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Algorithm 2
Fig. 9
Algorithm 3
Algorithm 4
Fig. 10
Algorithm 5
Algorithm 6
Algorithm 7
Algorithm 8
Algorithm 9
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Change history

  • 30 July 2023

    This article was revised due to update in article title.

Notes

  1. http://www.mturk.com/.

  2. An algorithm A is an FPTAS for an optimization problem P, if for any instance \({\mathcal {I}}\) and any constant \(\epsilon > 0\), A computes a solution S in polynomial time w.r.t. the problem size n and \(1/\epsilon \), of which the value of S satisfies \(|\textsf {OPT} - \textsf {val}(A)| \le \epsilon \textsf {OPT}\), where OPT stands for the value of the optimal solution and \(\textsf {val}(A)\) is the solution returned by A.

  3. In the paper, for any category u, we use u as its corresponding node in \(\mathcal {T}\), and a question node asked at u in \(\mathcal {D}\).

  4. Here we only assign non-negative weights to the nodes, which could be normalized to obtain the probabilities.

  5. In the decision trees, we omit the leaves with zero weight, since they do not contribute to the costs.

  6. If we can randomly permute all the objects, we can show that the true distribution can indeed be learned according to the law of large numbers. However, the objects may come in an online fashion and a random permutation pre-processing step may not be always possible.

  7. We are inspired by the ideas in [29] and [15].

  8. This could be trivially extend to the batched algorithm.

  9. http://image-net.org/.

References

  1. Bragg, J., Weld, D.S., et al.: Crowdsourcing multi-label classification for taxonomy creation. In: AAAI, (2013)

  2. Chilton, L.B., Little, G., Edge, D., Weld, D.S., Landay, J.A.: Cascade: Crowdsourcing taxonomy creation. In: CHI, pp. 1999–2008. ACM, (2013)

  3. Cicalese, F., Jacobs, T., Laber, E., Molinaro, M.: On the complexity of searching in trees and partially ordered structures. Theor. Comput. Sci. 412(50), 6879–6896 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cicalese, F., Jacobs, T., Laber, E., Molinaro, M.: Improved approximation algorithms for the average-case tree searching problem. Algorithmica 68(4), 1045–1074 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cicalese, F., Jacobs, T., Laber, E.S., Molinaro, M.: On greedy algorithms for decision trees. In: ISAAC, pp. 206–217, (2010)

  6. Das Sarma, A., Parameswaran, A., Garcia-Molina, H., Halevy, A.: Crowd-powered find algorithms. In: ICDE, pp. 964–975, (2014)

  7. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255, (2009)

  8. Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: icrowd: an adaptive crowdsourcing framework. In: SIGMOD, pp. 1015–1030, (2015)

  9. Gao, Y., Parameswaran, A.: Finish them!: pricing algorithms for human computation. PVLDB 7(14), 1965–1976 (2014)

    Google Scholar 

  10. Gharibshah, Z., Zhu, X., Hainline, A., Conway, M.: Deep learning for user interest and response prediction in online display advertising. Data Sci. Eng. 5(1), 12–26 (2020)

    Article  Google Scholar 

  11. Ipeirotis, P.G., Provost, F., Sheng, V.S., Wang, J.: Repeated labeling using multiple noisy labelers. Data Mining Knowl. Discov. 28(2), 402–441 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. J. Comput. Syst. Sci. 71(3), 291–307 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  13. Kaplan, H., Lotosh, I., Milo, T., Novgorodov, S.: Answering planning queries with the crowd. PVLDB 6(9), 697–708 (2013)

    Google Scholar 

  14. Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: NIPS, pp. 1953–1961, (2011)

  15. Kundu, S., Misra, J.: A linear tree partitioning algorithm. SIAM J. Comput. 6(1), 151–154 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  16. Li, G.: Human-in-the-loop data integration. Proc. VLDB Endow. 10(12), 2006–2017 (2017)

    Article  Google Scholar 

  17. Li, G., Chai, C., Fan, J., et al.: CDB: a crowd-powered database system. PVLDB 11(12), 1926–1929 (2018)

    Google Scholar 

  18. Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. IEEE Trans. Knowl. Data Eng. 28(9), 2296–2319 (2016)

    Article  Google Scholar 

  19. Li, G., Zheng, Y., Fan, J., Wang, J., Cheng, R.: Crowdsourced data management: overview and challenges. In: SIGMOD, pp. 1711–1716, (2017)

  20. Li, K., Li, G.: Approximate query processing: What is new and where to go? Data Sci. Eng. 3(4), 379–397 (2018)

    Article  Google Scholar 

  21. Li, M., Wang, H., Li, J.: Mining conditional functional dependency rules on big data. Big Data Mining Anal. 03(01), 68 (2020)

    Article  Google Scholar 

  22. Lin, C., Weld, D.S., et al.: To re (label), or not to re (label). In: AAAI, (2014)

  23. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Human Lang. Technol. 5(1), 1–167 (2012)

    Article  MathSciNet  Google Scholar 

  24. Marcus, A., Karger, D., Madden, S., Miller, R., Oh, S.: Counting with the crowd. PVLDB 6(2), 109–120 (2012)

    Google Scholar 

  25. Marcus, A., Wu, E., Karger, D., Madden, S., Miller, R.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)

    Google Scholar 

  26. Marcus, A., Wu, E., Madden, S., Miller, R.C.: Crowdsourced databases: query processing with people. In: CIDR, pp. 211–214, (2011)

  27. Ni, J., Li, J., McAuley, J.: Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: EMNLP, pp. 188–197, (2019)

  28. Parameswaran, A., Boyd, S., Garcia-Molina, H., Gupta, A., Polyzotis, N., Widom, J.: Optimal crowd-powered rating and filtering algorithms. PVLDB 7(9), 685–696 (2014)

    Google Scholar 

  29. Parameswaran, A., Sarma, A.D., Garcia-Molina, H., Polyzotis, N., Widom, J.: Human-assisted graph search: it’s okay to ask questions. PVLDB 4(5), 267–278 (2011)

  30. Parameswaran, A.G., Garcia-Molina, H., Park, H., Polyzotis, N., Ramesh, A., Widom, J.: Crowdscreen: algorithms for filtering data with humans. In: SIGMOD, pp. 361–372, (2012)

  31. Parameswaran, A.G., Park, H., Garcia-Molina, H., Polyzotis, N., Widom, J.: Deco: declarative crowdsourcing. In: CIKM, pp. 1203–1212, (2012)

  32. Sun, Y., Singla, A., Fox, D., Krause, A.: Building hierarchies of concepts via crowdsourcing. In: IJCAI, pp. 844–851, (2015)

  33. Tao, Y., Li, Y., Li, G.: Interactive graph search. In: SIGMOD, pp. 1393–1410, (2019)

  34. Tian, S., Mo, S., Wang, L., Peng, Z.: Deep reinforcement learning-based approach to tackle topic-aware influence maximization. Data Sci. Eng. 5(1), 1–11 (2020)

    Article  Google Scholar 

  35. Venetis, P., Garcia-Molina, H., Huang, K., Polyzotis, N.: Max algorithms in crowdsourcing environments. In: WWW, pp. 989–998, (2012)

  36. Vesdapunt, N., Bellare, K., Dalvi, N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)

    Google Scholar 

  37. Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)

    Google Scholar 

  38. Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD, pp. 229–240, (2013)

  39. Wang, Y., Yao, Y., Tong, H., Xu, F., Lu, J.: A brief review of network embedding. Big Data Mining Anal. 2(1), 35 (2019)

    Article  Google Scholar 

  40. Wang, Y., Yuan, Y., Ma, Y., Wang, G.: Time-dependent graphs: definitions, applications, and algorithms. Data Sci. Eng. 4(4), 352–366 (2019)

    Article  Google Scholar 

  41. Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)

    Google Scholar 

  42. Zhang, C.J., Tong, Y., Chen, L.: Where to: crowd-aided path selection. PVLDB 7(14), 2005–2016 (2014)

    Google Scholar 

  43. Zheng, Y., Li, G., Cheng, R.: DOCS: domain-aware crowdsourcing system. PVLDB 10(4), 361–372 (2016)

    Google Scholar 

  44. Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: Is the problem solved? PVLDB 10(5), 541–552 (2017)

    Google Scholar 

  45. Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: a quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046, (2015)

Download references

Acknowledgements

This paper was supported by NSF of China (61925205, 61632016), Huawei, TAL education, and Beijing National Research Center for Information Science and Technology (BNRist).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoliang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Wu, X., Jin, Y. et al. Adaptive algorithms for crowd-aided categorization. The VLDB Journal 31, 1311–1337 (2022). https://doi.org/10.1007/s00778-021-00685-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-021-00685-2

Keywords

Navigation