Clairvoyance: Inferring Blocklist Use on the Internet

Li, Vector Guo; Akiwate, Gautam; Levchenko, Kirill; Voelker, Geoffrey M.; Savage, Stefan

doi:10.1007/978-3-030-72582-2_4

Vector Guo Li¹¹,
Gautam Akiwate¹¹,
Kirill Levchenko¹²,
Geoffrey M. Voelker¹¹ &
…
Stefan Savage¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 12671))

Included in the following conference series:

International Conference on Passive and Active Network Measurement

1793 Accesses
2 Citations

Abstract

One of the staples of network defense is blocking traffic to and from a list of “known bad” sites on the Internet. However, few organizations are in a position to produce such a list themselves, so pragmatically this approach depends on the existence of third-party “threat intelligence” providers who specialize in distributing feeds of unwelcome IP addresses. However, the choice to use such a strategy, let alone which data feeds are trusted for this purpose, is rarely made public and thus little is understood about the deployment of these techniques in the wild. To explore this issue, we have designed and implemented a technique to infer proactive traffic blocking on a remote host and, through a series of measurements, to associate that blocking with the use of particular IP blocklists. In a pilot study of 220K US hosts, we find as many as one fourth of the hosts appear to blocklist based on some source of threat intelligence data, and about 2% use one of the 9 particular third-party blocklists that we evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Abuse Sharing Economy: Understanding the Limits of Threat Exchanges

A Closer Look at IP-ID Behavior in the Wild

Demystifying the IP Blackspace

Notes

1.
One exception is the recent work of Bouwman et al. [7] which has explored aspects of this question through the interview of over a dozen security professionals.

References

Afroz, S., Tschantz, M.C., Sajid, S., Qazi, S.A., Javed, M., Paxson, V.: Exploring Server-side Blocking of Regions. Tech. rep, ICSI (2018)
Google Scholar
Anderson, D.: Splinternet behind the great firewall of China. Queue 10(11), 40–49 (2012)
Article Google Scholar
antirez: new TCP scan method. https://seclists.org/bugtraq/1998/Dec/79
Aryan, S., Aryan, H., Halderman, J.A.: Internet censorship in iran: a first look. In: Proceedings of the 3rd USENIX Workshop on Free and Open Communications on the Internet (FOCI) (2013)
Google Scholar
Bellovin, S.M.: A Technique for Counting NATted Hosts. In: Proceedings of the 2nd Internet Measurement Conference (IMC), pp. 267–272 (2002)
Google Scholar
Bhutani, A., Wadhwani, P.: Threat Intelligence Market Size By Component, By Format Type, By Deployment Type, By Application, Industry Analysis Report, Regional Outlook, Growth Potential, Competitive Market Share and Forecast, 2019–2025 (2019)
Google Scholar
Bouwman, X., Griffioen, H., Egbers, J., Doerr, C., Klievink, B., van Eeten, M.: A Different cup of TI? the added value of commercial threat intelligence. In: Proceedings of the 29th USENIX Security Symposium (USENIX Security), pp. 433–450, August 2020
Google Scholar
CAIDA: Inferred AS to Organization Mapping Dataset. https://www.caida.org/data/as_organizations.xml
Censys - Public Internet Search Engine. https://censys.io/
Clayton, Richard., Murdoch, Steven J., Watson, Robert N.M.: Ignoring the great firewall of China. In: Danezis, George, Golle, Philippe (eds.) PET 2006. LNCS, vol. 4258, pp. 20–35. Springer, Heidelberg (2006). https://doi.org/10.1007/11957454_2
Chapter Google Scholar
Ensafi, Roya., Knockel, Jeffrey., Alexander, Geoffrey, Crandall, Jedidiah R.: Detecting intentional packet drops on the internet via TCP/IP side channels. In: Faloutsos, Michalis, Kuzmanovic, Aleksandar (eds.) PAM 2014. LNCS, vol. 8362, pp. 109–118. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04918-2_11
Chapter Google Scholar
FireHOL IP Lists - All Cybercrime IP Feeds. http://iplists.firehol.org/
Hao, S., Kantchelian, A., Miller, B., Paxson, V., Feamster, N.: PREDATOR: proactive recognition and elimination of domain abuse at time-of-registration. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 1568–1579. ACM (2016)
Google Scholar
Hao, S., Thomas, M., Paxson, V., Feamster, N., Kreibich, C., Grier, C., Hollenbeck, S.: Understanding the Domain Registration Behavior of Spammers. In: Proceedings of the ACM Internet Measurement Conference (IMC), pp. 63–76. ACM (2013)
Google Scholar
IP2Location: IP Address to Identify Geolocation. https://www.ip2location.com/
IPdeny IP country blocks. https://www.ipdeny.com/
IPIP.net: The Best IP Geolocation Database. https://en.ipip.net/
Khattak, S., et al.: Do you see what i see? differential treatment of anonymous users. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2016)
Google Scholar
Klein, A., Pinkas, B.: From IP ID to device ID and KASLR bypass. In: Proceedings of the 28th USENIX Security Symposium (USENIX Security), pp. 1063–1080 (2019)
Google Scholar
Kührer, Marc., Rossow, Christian, Holz, Thorsten: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, Angelos, Bos, Herbert, Portokalidis, Georgios (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11379-1_1
Chapter Google Scholar
Lemon, J.: Resisting SYN Flood DoS attacks with a SYN Cache. In: Proceedings of the BSD Conference (BSDCon), pp. 89–97. USENIX Association, USA (2002)
Google Scholar
Li, G.: An Empirical Analysis on Threat Intelligence: Data Characteristics and Real-World Uses. Ph.D. thesis, UC San Diego (2020)
Google Scholar
Li, V.G., Dunn, M., Pearce, P., McCoy, D., Voelker, G.M., Savage, S.: Reading the Tea leaves: a comparative analysis of threat intelligence. In: Proceedings of the 28th USENIX Security Symposium (USENIX Security), pp. 851–867, August 2019
Google Scholar
MaxMind: IP Geolocation and Online Fraud Prevention. https://www.maxmind.com/
McDonald, A., et al.: 403 forbidden: a global view of CDN Geoblocking. Proc. Internet Measurement Conf. 2018, 218–230 (2018)
Article Google Scholar
NetAcuity. https://seclists.org/bugtraq/1998/Dec/790
OpenNet Initiative: Survey of Government Internet Filtering Practices Indicates Increasing Internet Censorship, May 2007
Google Scholar
Park, J.C., Crandall, J.R.: Empirical study of a national-scale distributed intrusion detection system: backbone-level filtering of HTML responses in China. In: IEEE 30th International Conference on Distributed Computing Systems (ICDCS), pp. 315–326. IEEE (2010)
Google Scholar
Pearce, P., Ensafi, R., Li, F., Feamster, N., Paxson, V.: Augur: internet-wide detection of connectivity disruptions. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 427–443. IEEE (2017)
Google Scholar
Pitsillidis, A., Kanich, C., Voelker, G.M., Levchenko, K., Savage, S.: Taster’s choice: a comparative analysis of spam feeds. In: Proceedings of the ACM Internet Measurement Conference (IMC), pp. 427–440. Boston, MA, November 2012 (2012)
Google Scholar
Ponemon Institute LLC: Third Annual Study on Changing Cyber Threat Intelligence: There Has to Be a Better Way (January 2018)
Google Scholar
Postel, J.: RFC0791: Internet Protocol (1981)
Google Scholar
Ramachandran, A., Feamster, N., Dagon, D.: Revealing Botnet Membership Using DNSBL Counter-Intelligence. SRUTI 6 (2006)
Google Scholar
Shackleford, D.: Cyber Threat Intelligence Uses, Successes and Failures: The SANS 2017 CTI Survey. Technical Report, SANS (2017)
Google Scholar
Sheng, S., Wardman, B., Warner, G., Cranor, L.F., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: Proceedings of the Conference on Email and Anti-Spam (CEAS) (2009)
Google Scholar
Singh, R., et al.: Characterizing the nature and dynamics of tor exit blocking. In: Proceedings of the 26th USENIX Security Symposium (USENIX Security), pp. 325–341 (2017)
Google Scholar
Sinha, S., Bailey, M., Jahanian, F.: Shades of grey: on the effectiveness of reputation-based “blacklists”. In: Proceedings of the 3rd International Conference on Malicious and Unwanted Software (MALWARE), pp. 57–64. IEEE (2008)
Google Scholar
Spring, N., Mahajan, R., Wetherall, D.: Measuring ISP topologies with Rocketfuel. ACM SIGCOMM Comput. Commun. Rev. (CCR) 32(4), 133–145 (2002)
Article Google Scholar
Thomas, Kurt., Amira, Rony., Ben-Yoash, Adi., Folger, Ori., Hardon, Amir., Berger, Ari., Bursztein, Elie, Bailey, Michael: The abuse sharing economy: understanding the limits of threat exchanges. In: Monrose, Fabian, Dacier, Marc, Blanc, Gregory, Garcia-Alfaro, Joaquin (eds.) RAID 2016. LNCS, vol. 9854, pp. 143–164. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_7
Chapter Google Scholar
Tounsi, W., Rais, H.: A survey on technical threat intelligence in the age of sophisticated cyber attacks. Comput. Secur. 72, 212–233 (2018)
Article Google Scholar
University of Oregon Route Views Project. http://www.routeviews.org/routeviews/
Best National University Rankings. https://www.usnews.com/best-colleges/rankings/national-universities, January 2020
Zittrain, J., Edelman, B.: Internet filtering in China. IEEE Internet Computing 7(2), 70–77 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, San Diego, USA
Vector Guo Li, Gautam Akiwate, Geoffrey M. Voelker & Stefan Savage
University of Illinois Urbana-Champaign, Champaign, USA
Kirill Levchenko

Authors

Vector Guo Li
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Akiwate
View author publications
You can also search for this author in PubMed Google Scholar
Kirill Levchenko
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey M. Voelker
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Savage
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gautam Akiwate .

Editor information

Editors and Affiliations

Brandenburg University of Technology (BTU), Cottbus, Germany
Oliver Hohlfeld
Telefonica Research, Barcelona, Spain
Andra Lutu
Iribe Center, University of Maryland, College Park, MD, USA
Dave Levin

A Inference Technique Details

Our technique, while simple in theory, needs to handle real-world scenarios, including packet losses, packet reordering during transition, and other traffic on reflectors. The inference method needs to be efficient, accurate, and have low overhead. Blocklists can change frequently, leaving a short window to infer a stable behavior. As such, for the measurement to finish in a reasonable amount of time requires an efficient inference method. Additionally, the method should also have low false positive and false negative rates so that we can be confident about the result. Finally, it should require as few packets as possible to reduce potential impact on reflectors.

The first step is to find reflectors suitable to our measurement technique. Recall that a suitable reflector should have minimal background traffic, and not be part of a network doing ingress filtering for spoofed packets. To find quiescent hosts, reflectors with low background traffic, we send 24 probes to each candidate host, 1 per second, and repeat the experiment 5 times at different times of the day. We then only select hosts where at least 30% of their IP ID increases are equal to 1 per second—the host did not receive any extra traffic in that one second. We use the 30% threshold to select hosts that are largely “quiet”, and thus more likely to yield a perfect signal in the experiment. Next, to identify hosts behind ingress filtering, we acquired 7 vantage points around the world to exercise different paths to the reflector. We sent spoofed packets from our measurement machine to the hosts with spoofed source addresses corresponding to the 7 vantage points, and then collected responses at each vantage point. We only select the hosts that send responses to all 7 vantage points, meaning they did not drop spoofed packets on any of the exercised network paths.

Next, we describe how we infer if a given reflector blocks an IP using multiple trials. We define a trial as a single experiment that tests if a reflector blocks one blocklist IP. Figure 6 shows the process of one trial. For each trial, the measurement machine sends five consecutive probe packets to the reflector, with each packet being sent one second apart. In our experiment, the probe packets are TCP SYN-ACK packets and we get IP IDs from response RST packets. Between the third and fourth probe packets, the measurement machine sends five spoofed packets, also TCP SYN-ACK, with source IPs equal to the blocklist IP. And between the fourth and the fifth probe packets, it sends another five spoofed packets. We send the five spoofed packets 0.15 s apart consecutively each time, spreading them across the one-second window between two probes.

We then inspect the increases between the IP IDs in the packets received by the measurement machine. Ideally, assuming no additional traffic and no packet loss, the IP ID should increase by exactly one between consecutive probes. For the last two deltas, since we send the spoofed packets in between our probe packets, the final IP ID increases will be different based on the host’s blocking behavior.

If the reflector does not block the blocklist IP, then we will observe an IP ID increase sequence in our received RST responses that is: [+1, +1, +6, +6]. Here the last two deltas are +6 since the reflector does not block the blocklist IP and thus responds to spoofed packets, causing IP ID to increase by 5, and our probe packet causes it to increase by another 1, which together make +6.

On the other hand, if the reflector blocks the blocklist IP, then we will see an IP ID increase sequence that is: [+1, +1, +1, +1]. Here the last two deltas are +1 since the reflector blocks the blocklist IP, leading to no extra change in IP ID.

The first three probes—corresponding to the first two IP ID deltas—act as a control. The last two “probe and spoof” patterns perform the actual experiment. Seeing the initial two “+1” indicates this host is in a quiet period (no extra network traffic). Therefore, we can be more confident that the following IP ID jump (“+6” in our case) is because of our experiment. While the choice of the numbers in the experiment may seem arbitrary, there is a rationale behind the choice which we will discuss in following sections.

1.1 A.1 Inference Criteria

We now look at the criteria to infer if a reflector blocks a blocklist IP or not. Our limited vantage point from the measurement machine limits our information to the IP IDs seen from the reflector. Moreover, we desire to be conservative when inferring blocking. Thus, our approach is to try the same trial, between a reflector and a blocklist IP, until we get a “perfect signal”—a response which matches all the criteria below:

1.
The measurement machine received exactly five RST responses from the reflector.
2.
The five responses are received one second apart consecutively.
3.
The IP ID increase sequence is either [+1, +1, +6, +6], which we will conclude as no blocking, or [+1, +1, +1, +1], which we will conclude as blocking.
4.
If any of the above three criteria are not met, we repeat the same experiment again. We repeat up to 15 trials before giving up.

The first requirement ensures no packet loss. The second requirement ensures responses we received reflect the real IP ID changes in the reflector. The Internet does not guarantee the order of packet arrival. Although we send one probe packet per second, these packets might not arrive at the reflector in the same order. Thus, the IP ID sequence from the response packets might not represent the real order of IP ID changes at the host. Hence, by requiring that the response packets cannot be less than 0.85 or more than 1.15 s apart we can minimize the probability of reordered packets.

The third requirement is the core of our inference logic. Since we ignore everything other than an IP ID increase sequence of [+1, +1, +1, +1] or [+1, +1, +6, +6], we can assure that our inference of blocking is conservative. If we saw a sequence of [+1, +1, +1, +1] but the reflector does not block the blocklist IP, that would mean all 10 spoofed packets were lost. On the other hand, if we see [+1, +1, +6, +6] and the reflector actually blocks the blocklist IP, that would mean there are exactly five extra packets generated by the reflector during each of the last two seconds. Both cases are very unlikely, which we will demonstrate next with an analysis of false positives and false negatives.

1.2 A.2 False Positive and False Negative Analysis

For our experiment, a false positive is when a reflector is not blocking a blocklist IP, but we mistakenly conclude it is blocking. On the other hand, a false negative is when a reflector is blocking a blocklist IP, but we mistakenly conclude it is not. To evaluate false positive and false negative rates, we conduct experiments on all the reflectors under consideration and measure the false positive and false negative rates.

For false positive evaluation, we first acquire a list of IPs that are verifiably not being blocked by reflectors. Since we own these IPs, we can easily verify by directly probing reflectors from these IPs. We acquired and tested 1,265 IPs from five different /24s. Then we probe reflectors and send the spoofed packets with source addresses set to these pre-selected IPs. Since these IPs are not being blocked, if we observe an IP ID increase sequence of [+1, +1, +1, +1], then we know it is a false positive.

For false negatives, we run the experiment with only probe packets, and no spoofed packets. This scenario is equivalent to the one where the reflector blocks the spoofed IP. If we observe an IP ID increase sequence of [+1, +1, +6, +6], then we know it was due to the background traffic at the reflector and hence is a false negative.

Although we present the experiment design with five spoofed packets in each of the last two seconds, we also experimented with a range of numbers and calculated their false positive and negative rates. We tested 15 times with spoofed packets equal to 3, 4, 5, 6, and 7 with every reflector, and we repeated the experiment again on a different day. The final results are shown in Fig. 7.

We need to trade off between keeping false positive and negative rates low while generating as little traffic as possible. We choose 5 spoofed packets as a balance. By sending 5 spoofed packets, we get a false positive rate of 2.5e-5, and a false negative rate of 8.5e-5. Furthermore, we also experimented with strategies where we send 4 probe packets, from which we get 3 IP ID deltas, and sending 6 probe packets, from which we get 5 IP ID deltas. With only 3 deltas we suffer a higher false negative rate, as it is easier for the reflector to show the same IP ID increase sequence with extra traffic. With 6 probes, on the other hand, we prolong the experiment, making it harder to get a “perfect signal”. Thus, our choice of 5 probe packets with 5 spoofed packets in between is a good balance between competing factors.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, V.G., Akiwate, G., Levchenko, K., Voelker, G.M., Savage, S. (2021). Clairvoyance: Inferring Blocklist Use on the Internet. In: Hohlfeld, O., Lutu, A., Levin, D. (eds) Passive and Active Measurement. PAM 2021. Lecture Notes in Computer Science(), vol 12671. Springer, Cham. https://doi.org/10.1007/978-3-030-72582-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-72582-2_4
Published: 30 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72581-5
Online ISBN: 978-3-030-72582-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us