Abstract
The relevance of identity data leaks on the Internet is more present than ever. Almost every week we read about leakage of databases with more than a million users in the news. Smaller but not less dangerous leaks happen even multiple times a day. The public availability of such leaked data is a major threat to the victims, but also creates the opportunity to learn not only about security of service providers but also the behavior of users when choosing passwords. Our goal is to analyze this data and generate knowledge that can be used to increase security awareness and security, respectively. This paper presents a novel approach to the processing and analysis of a vast majority of bigger and smaller leaks. We evolved from a semi-manual to a fully automated process that requires a minimum of human interaction. Our contribution is the concept and a prototype implementation of a leak processing workflow that includes the extraction of digital identities from structured and unstructured leak-files, the identification of hash routines and a quality control to ensure leak authenticity. By making use of parallel and distributed programming, we are able to make leaks almost immediately available for analysis and notification after they have been published. Based on the data collected, this paper reveals how easy it is for criminals to collect lots of passwords, which are plain text or only weakly hashed. We publish those results and hope to increase not only security awareness of Internet users but also security on a technical level on the service provider side.
Similar content being viewed by others
Notes
Identity Leak Checker—https://sec.hpi.de/ilc.
State: Nov. 18th, 2016.
BreachAlarm— https://breachalarm.com/.
Survela—https://survela.com/.
HaveIBeenPwned— https://haveibeenpwned.com/.
sqlmap—http://sqlmap.org/.
Slang term for documents, a listing of very specific personal information.
Instance with 32 GB RAM, maximum of 32 cores (16 physical, 16 hyper-threaded) of Xeon E5-2630 v3.
Each node is a virtual machine with 8GM RAM, 6 Cores of Xeon E5-2630 v3.
Vigilante.pw—https://vigilante.pw.
Hashcat: Example hashes— https://hashcat.net/wiki/doku.php?id=example_hashes.
References
Symantec Corporation. Internet Security Threat Report (2015)
Jaeger, D., Graupner, H., et al.: Gathering and analyzing identity leaks for security awareness. In: Proceedings of the 7th International Conference on PASSWORDS (2014)
Krebs, B.: Was the Ashley Madison database leaked? http://krebsonsecurity.com/2015/08/was-the-ashley-madison-database-leaked/ (Visited on 09/10/2015) (2015)
Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: Proceedings of the 33rd IEEE Symposium on Security and Privacy. IEEE Computer Society (2012)
Steube, J.: Introducing the PRINCE attack-mode. In: Proceedings of the 7th International Conference on PASSWORDS (2014)
Ficara, D., Giordano, S., et al.: An improved DFA for fast regular expression matching. SIGCOMM Comput. Commun. Rev. 38(5), 29–40 (2008). doi:10.1145/1452335.1452339
Navarro, G., Raffinot, M.: Compact DFA representation for fast regular expression search. In: Brodal, G.S., Frigioni, D., Marchetti-Spaccamela, A. (eds.) Proceedings of the 5th International Workshop on Algorithm Engineering (WAE’2001), vol. 2141, pp. 1–13. Lecture Notes in Computer Science. Springer, Berlin (2001). doi:10.1007/3-540-44688-5_1
Sin’ya, R., Matsuzaki, K., Sassa, M.: Simultaneous finite automata: an efficient data-parallel model for regular expression matching. In: Proceedings of the 2013 42nd International Conference on Parallel Processing (ICPP ’13), pp. 220–229. IEEE Computer Society, Washington, DC (2013). doi:10.1109/ICPP.2013.31
Barenghi, A., Reghizzi, S.C., et al.: Parallel parsing made practical. Sci. Comput. Program. 112(3), 195–226 (2015). doi:10.1016/j.scico.2015.09.002
Toshniwal, A., Taneja, S., et al.: Storm @Twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD’14), pp. 147–156. ACM (2014). doi:10.1145/2588555.2595641
Noll, M.G.: Understanding the Parallelism of a Storm Topology (2012). URL: http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
Kulkarni, S., Bhagat, N., et al.: Twitter Heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15), pp. 239–250. ACM. ISBN: 978-1-4503-2758-9 (2015). doi:10.1145/2723372.2742788
Marz, N., Warren, J.: Big data: principles and best practices of scalable real-time data systems. In: Gregoire, R., Stout, J. (eds.). Manning Publications, ISBN: 9781617290343 (2015)
Honan, M.: What is Doxing? (2014). http://www.wired.com/2014/03/doxing/ (visited on 09/10/2015)
eSecurity Planet. 3,867,997 Adult FriendFinder account details released (2015). http://www.esecurityplanet.com/hackers/3867997-adult-friendfinder-account-details-released.html (visited on 09/10/2015)
Shafranovich, Y.: Common format and MIME Type for comma-separated values (CSV) files. RFC 4180 (Informational). Internet Engineering Task Force (2005). http://www.ietf.org/rfc/rfc4180.txt
Cox, J.: Another day, another hack: 100 million accounts for VK, Russia’s Facebook. In: Vice Motherboard (2016). http://motherboard.vice.com/read/another-day-another-hack-100-million-accounts-for-vk-russias-facebook (visited on 07/01/2016)
Franceschi-Bicchierai, L.: Another day, another hack: 117 million LinkedIn emails and passwords. In: Vice Motherboard (2016). http://motherboard.vice.com/read/another-day-another-hack-117-million-linkedin-emails-and-password (visited on 07/01/2016)
Franceschi-Bicchierai, L.: Hacker tries to sell 427 milllion stolen MySpace passwords For $2,800. In: Vice Motherboard (2016). http://motherboard.vice.com/read/427-million-myspace-passwords-emails-data-breach (visited on 07/01/2016)
Franceschi-Bicchierai, L.: Hackers stole 65 million passwords from Tumblr, new analysis reveals. In: Vice Motherboard (2016). https://motherboard.vice.com/read/hackers-stole-68-million-passwords-from-tumblr-new-analysis-reveals (visited on 07/01/2016)
Cox, J.: Your shitty password hygiene is spreading hacks like a contagion. In: Vice Motherboard (2016). http://motherboard.vice.com/read/your-shitty-password-hygiene-is-spreading-hacks-like-a-contagion-twitter-logins-hacked (visited on 07/01/2016)
Córdova, P.: Analysis of Real Time Stream Processing Systems Considering Latency. University of Toronto, Technical Report (2015)
Friedl, J.E.F.: In: Oram, A. (ed.) 3rd edn. Mastering regular expressions. O’Reilly Media, Sebastopol, Calfornia, USA (2006)
Hunt, T.: A brief Sony password analysis. Blog entry (2011). https://www.troyhunt.com/brief-sony-password-analysis/ (visited on 06/22/2016)
Das, A., Bonneau, J., et al.: The tangled web of password reuse. In: 21nd Annual Network and Distributed System Security Symposium (NDSS’14) (2014). doi:10.14722/ndss.2014.23357
Yampolskiy, R.V.: Analyzing user password selection behavior for reduction of password space. In: Proceedings of the 2006 40th Annual IEEE International Carnahan Conferences Security Technology, pp. 109–115 (2006)
Nixon, A.: Vetting Leaks: Finding the Truth when the Adversary Lies. Technical Report, Deloitte (2014)
Hunt, T.: Here’s how i verify data breaches (2016). https://www.troyhunt.com/heres-how-i-verify-data-breaches/. (Visited on 07/15/2016)
Murdock, J.: No, Amazon Kindle users have not been hacked. In: International Business Times (2016). http://www.ibtimes.co.uk/amazon-denies-server-breach-after-hacker-claims-leak-data-80000-kindle-users-1570059
Wang, X., Yu, H.: How to break MD5 and other hash functions. In: Proceedings of the 24th Annual International Conference on Theory and Applications of Cryptographic Techniques (2005)
Hemme, L., Hoffmann, L.: Differential fault analysis on the SHA1 compression function. In: Proceedings of the 8th International Workshop on Fault Diagnosis and Tolerance in Cryptography (2011)
SANS Insitute. Password Construction Guidelines (2014)
Xing, L., Bai, X., et al.: Cracking app isolation on apple: unauthorized cross-app resource access on MAC OS\(\sim \) X and iOS. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15), pp. 31–43. ACM, Denver, CO (2015)
Acknowledgements
We thank the students of our class “Dark Web Monitoring and Analysis of Leak Data” of the winter semester 2014/15 for their ideas on the topic and the work dedicated to implementation. We would also like to thank our student assistants Larissa Hoffäller and Marvin Thiele for their supportive work in the conducted experiments. Additionally, we appreciate the support of our colleagues Marian Gawron and Martin Ussath on certain research questions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jaeger, D., Graupner, H., Pelchen, C. et al. Fast Automated Processing and Evaluation of Identity Leaks. Int J Parallel Prog 46, 441–470 (2018). https://doi.org/10.1007/s10766-016-0478-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0478-6