Abstract
Crowdsourcing is a new computing approach where human tasks are outsourced to a large number of human workers. Crowdsourcing has not only attracted attention from industry but also from various academic communities. Amazon Mechanical Turk (AMT) has been the first commercial platform offering crowdsourcing services to its customers. AMT is often referred to as a platform supplying ‘artificial’ artificial-intelligence. Recent research efforts have not been addressing the analysis of the community structure of large-scale crowdsourcing platforms. In this work, we discuss detailed statistics of the popular AMT marketplace to provide insights in task properties and requester behavior. Here we present a model to automatically infer requester communities based on task keywords. Hierarchical clustering is used to identify relations between keywords associated with tasks. We present novel techniques to rank communities and requesters by using a graph-based algorithm. Furthermore, we introduce models and methods for the discovery of relevant crowdsourcing brokers who are able to act as intermediaries between requesters and platforms such as AMT.
Similar content being viewed by others
References
Alonso O, Rose DE, Stewart B (2008) Crowdsourcing for relevance evaluation. SIGIR Forum 42(2):9–15
Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286:509
Benkler Y (2001) Coase’s penguin, or linux and the nature of the firm. CoRR. cs.CY/0109077
Bhattacharyya P, Garg A, Wu S (2011) Analysis of user keyword similarity in online social networks. Soc Netw Anal Min 1:143–158. doi:10.1007/s13278-010-0006-4
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Branting L (2011) Context-sensitive detection of local community structure. Soc Netw Anal Min 1–11. doi:10.1007/s13278-011-0035-7
Burt RS (1992) Structural holes: the social structure of competition. Harvard University Press, Cambridge
Callison-Burch C, Dredze M (2010) Creating speech and language data with amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, CSLDAMT ’10. Association for Computational Linguistics, Stroudsburg, pp 1–12
Carvalho VR, Lease M, Yilmaz E (2011) Crowdsourcing for search evaluation. SIGIR Forum 44(2):17–22
Cazabet R, Takeda H, Hamasaki M, Amblard F (2012) Using dynamic community detection to identify trends in user-generated content. Soc Netw Anal Min 1–11. doi:10.1007/s13278-012-0074-8
Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web, WWW ’07. ACM, New York, pp 571–580
Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei D (2009) Reading tea leaves: how humans interpret topic models. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. Morgan Kaufmann, San Mateo, pp 288–296
ClickWorker. http://www.clickworker.com/. Accessed 2012
CrowdFlower. http://crowdflower.com/. Accessed 2012
Doan A, Ramakrishnan, R, Halevy Y (2011) Crowdsourcing systems on the world-wide web. Commun ACM 54(4):86–96
Eda T, Yoshikawa M, Yamamuro M (2008) Locally expandable allocation of folksonomy tags in a directed acyclic graph. In: Proceedings of the 9th international conference on Web information systems engineering, WISE ’08. Springer, Berlin, pp 151–162
Fazeen M, Dantu R, Guturu P (2011) Identification of leaders, lurkers, associates and spammers in a social network: context-dependent and context-independent approaches. Soc Netw Anal Min 1:241–254. doi:10.1007/s13278-011-0017-9
Fisher D, Smith M, Welser HT (2006) You are who you talk to: Detecting roles in usenet newsgroups. In: Proceedings of the 39th annual Hawaii international conference on system sciences, HICSS ’06, vol 03. IEEE Computer Society, Washington, p 59.2
Flickr. http://www.flickr.com/. Accessed 2012
Fogaras D, Rácz B, Csalogány K, Sarlós T (2005) Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math 2(3):333–358
Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R (2011) Crowddb: answering queries with crowdsourcing. In: Proceedings of the 2011 international conference on management of data, SIGMOD ’11. ACM, New York, pp 61–72
Gemmell J, Shepitsen A, Mobasher B, Burke R (2008) Personalizing navigation in folksonomies using hierarchical tag clustering. In: Proceedings of the 10th international conference on data warehousing and knowledge discovery, DaWaK ’08. Springer, Berlin, pp 196–205
Golder S, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208
Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web, WWW ’02. ACM, New York, pp 517–526
Heer J, Bostock M (2010) Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In: Proceedings of the 28th international conference on Human factors in computing systems, CHI ’10. ACM, New York, pp 203–212
Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53
Heymann P, Garcia-Molina H (2006) Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report, Computer Science Department, Standford University
Howe J (2006) The rise of crowdsourcing. Wired 14(14):1–5
Howe J (2008) Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Crown Business, New York
Ipeirotis PG (2010) Analyzing the amazon mechanical turk marketplace. XRDS 17:16–21
Ipeirotis PG (2012) Mechanical turk: Now with 40.92 % spam, 2010. http://bit.ly/mUGs1n. Accessed 2012
Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on World Wide Web, WWW ’03. ACM, New York, pp 271–279
Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: Proceedings of the twenty-sixth annual SIGCHI conference on human factors in computing systems, CHI ’08. ACM, New York, pp 453–456
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632
Kourtellis N, Alahakoon T, Simha R, Lamnitchi A, Tripathi R (2012) Identifying high betweenness centrality nodes in large social networks. Soc Netw Anal Min 1–16. doi:10.1007/s13278-012-0076-6
Lampe C, Resnick P (2004) Slash(dot) and burn: distributed moderation in a large online conversation space. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’04. ACM, New York, pp 543–550
Little G, Chilton LB, Goldman M, Miller RC (2010) Turkit: human computation algorithms on mechanical turk. In: Proceedings of the 23nd annual ACM symposium on User interface software and technology, UIST ’10. ACM, New York, pp 57–66
Marge M, Banerjee S, Rudnicky AI (2010) Using the amazon mechanical turk for transcription of spoken language. In: Proceedings of the IEEE international conference on acoustics, speech, and, signal processing, pp 5270–5273
Michlmayr E, Cayzer S (2007) Learning user profiles from tagging data and leveraging them for personal(ized) information access. In: Tagging and metadata for social information organization, workshop, WWW07
Munro R, Bethard S, Kuperman V, Lai VT, Melnick R, Potts C, Schnoebelen T, Tily H (2010) Crowdsourcing and language studies: the new generation of linguistic data. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, CSLDAMT ’10. Association for Computational Linguistics, Stroudsburg, pp 122–130
oDesk. http://www.odesk.com/. Accessed 2012
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web
Parameswaran A, Park H, Garcia-Molina H, Polyzotis N, Widom J (2011) Deco: declarative crowdsourcing. Stanford University technical report
Psaier H, Skopik F, Schall D, Dustdar S (2011) Resource and agreement management in dynamic crowdcomputing environments. EDOC. IEEE Computer Society, Los Vaqueros Circle Los Alamitos, pp 193–202
Quinn AJ, Bederson BB (2011) Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 annual conference on Human factors in computing systems, CHI ’11. ACM, New York, pp 1403–1412
Romesburg C (2004) Cluster analysis for researchers. Krieger Pub. Co., Malabar
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. PNAS 105:1118
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523
Samasource. http://samasource.org/. Accessed 2012
Satzger B, Psaier H, Schall D, Dustdar S (2011) Stimulating skill evolution in market-based crowdsourcing. In: BPM, pp 66–82
Schall D (2011) A human centric runtime framework for mixed service-oriented systems. Distrib Parallel Databases 29:333–360. doi:10.1007/s10619-011-7081-z
Schall D (2012) Expertise ranking using activity and contextual link measures. Data Knowl Eng 71(1):92–113. doi:10.1016/j.datak.2011.08.001
Schall D, Skopik F (2011) An analysis of the structure and dynamics of large-scale q/a communities. In: Eder J, Bieliková M, Tjoa AM (eds) ADBIS. Lecture notes in computer science, vol 6909. Springer, Berlin, pp 285–301
Schall D, Skopik F, Psaier H, Dustdar S (2011) Bridging socially-enhanced virtual communities. In: Chu WC, Wong WE, Palakal MJ, Hung C-C (eds) SAC. ACM, New York, pp 792–799
Shepitsen A, Gemmell J, Mobasher B, Burke R (2008) Personalized recommendation in social tagging systems using hierarchical clustering. In: Proceedings of the 2008 ACM conference on recommender systems, RecSys ’08. ACM, New York, pp 259–266
Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th international conference on World Wide Web, WWW ’08. ACM, New York, pp 327–336
Skopik F, Schall D, Dustdar S (2009) Start trusting strangers? bootstrapping and prediction of trust. In: Vossen G, Long DDE, Yu JX (eds) WISE. Lecture notes in computer science, vol 5802. Springer, Berlin, pp 275–289
SmartSheet. http://www.smartsheet.com/. Accessed 2012
SpeechInk. http://www.speechink.com/. Accessed 2012
Vukovic M (2009) Crowdsourcing for enterprises. In: Proceedings of the 2009 congress on services-I, Services ’09. IEEE Computer Society, Washington
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schall, D., Skopik, F. Social network mining of requester communities in crowdsourcing markets. Soc. Netw. Anal. Min. 2, 329–344 (2012). https://doi.org/10.1007/s13278-012-0080-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-012-0080-x