Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What?s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa?s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available ?warts and all? for people to experiment with. We have also done some further analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you?re hoping to do with it. We may not be able to say ?yes? to all requests, since we?re just figuring out whether this is a good idea, but everyone will be considered.
Arctic air and record snow falls gripped the northern hemisphere yesterday, inflicting hardship and havoc from China, across Russia to Western Europe and over the US plains.
There were few precedents for the global sweep of extreme cold and ice that killed dozens in India, paralysed life in Beijing and threatened the Florida orange crop. Chicagoans sheltered from a potentially killer freeze, Paris endured sunny Siberian cold, Italy dug itself out of snowdrifts and Poland counted at least 13 deaths in record low temperatures of about minus 25C (-13F).
The heaviest snow yesterday hit northeastern Asia, which is suffering its worst winter weather for 60 years. More than 25 centimetres (10in) of snow covered Seoul, the South Korean capital — the heaviest fall since records began in 1937.
In China, Beijing and the nearby port city of Tianjin had the deepest snow since 1951, with falls of up to 8in and temperatures of minus 10C. In the far north of China, the temperature fell to minus 32C. More than two million Beijing and Tianjin pupils were sent home and 1,200 flights were delayed or cancelled at Beijing’s international airport.
The same far-eastern weather system took its toll of Sakhalin, the Russian island off Siberia, which was hit by blizzards and avalanches. Farther west, in northern and eastern India, more than 60 people, mainly homeless, died of exposure. Thousands of schools were closed. In Uttar Pradesh, the state neighbouring Nepal, the authorities spent £1.3 million on blankets and firewood for needy households.
Western Russia suffered a deep freeze as snow swept across the Baltic and north-central Europe, leaving the worst devastation in Poland, where 13 people died, bringing the toll from the cold this winter to 122.
Up to ten skiers died or were missing in avalanches. The worst incident was in the Diemtig Valley in Switzerland on Sunday, when avalanches hit a group of skiers and then the rescuers who went to their aid. Eight people were pulled from the snow alive, but four died, including an emergency doctor, and three more were missing.
In Italy, emergency services struggled with rare cold and ice. Motorways in the northeast were closed and military helicopters were sent to Sicily with medical aid.
In the United States, heavy snow fell again on the northeast.
In Burlington, Vermont, a record 33in of snow fell in a weekend storm. The previous record in a three-day period was set in 1969. Residents of the Northern Plains were warned to expect lethally cold temperatures of about minus 30C.
The icy conditions of Western Europe, which broke records in half a dozen countries in December, are expected to last for at least another week.
Guo Hu, the head of the Beijing Meteorological Bureau, linked this week’s conditions to unusual atmospheric patterns caused by global warming.
Meteorologists were also trying to find a pattern in the heavy rains that have hit equatorial regions and the southern hemisphere in the past week.
At least 20 people have been killed in flash floods in Kenya after torrential rains made thousands homeless.
In Australia, the authorities declared a natural disaster along the Castlereagh River as it peaked after torrential rain, forcing 1,200 residents to abandon their homes for high ground.
In Brazil, the death toll from flooding and mudslides over the past four days rose above 80.
Closer to home, forecasters have warned Britons to brace themselves for a freezing cold, bleak new year — this winter is set to be the coldest for more than 30 years.
Contact us | Terms and Conditions | Privacy Policy | Site Map | FAQ | Syndication | Advertising
© Times Newspapers Ltd 2010 Registered in England No. 894646 Registered office: 3 Thomas More Square, London, E98 1XY