Publisher beware

February 13th, 2006  |  Tags: ,

schedule.willbenton.com is where I’ve published my iCal for the last couple of years. It’s worked out pretty well: I update something in iCal, it gets webdavved on over to the server, and the PHP iCalendar script does an admirable job of making it look ok. PHP iCalendar will happily generate dynamic web pages for every day, week, month and year from the UNIX epoch until 2038. As one might imagine, Google goes after this like Cookie Monster would a box of Oreos:

googlebot runs amok

The above image is taken from the web statistics for schedule.willbenton.com for this month so far (i.e. 12 full days). Google sucked down 9.15gb of bandwidth, presumably trawling for activities I’ve engaged in in the previous 36 or upcoming 32 years. By comparison, in that period of time, four actual distinct humans checked my schedule (presumably these are my colleagues, trying to set up lunch), consuming a total of 1.87mb.

The real kicker is that I had a robots.txt file; unfortunately, it excluded web spiders from “willbenton.com/schedule” but not from its alias “schedule.willbenton.com.” Since the Google spider seems to have gotten much more aggressive lately, you should make sure you have a robots.txt file installed if you use PHP iCalendar. (There’s also a fairly recent version of PHP iCalendar available that fixes some security bug.)

One wonders how many of Google’s octuple-kajillion indexed web pages are someone’s electronic schedule from 1973.

Technorati Tags: ,