I was just looking on the National Science Foundation's web site to try to find the Index of FOIA Frequently Requested Documents. The Index is mentioned in the NSF's Public Information Handbook. When I couldn't find the Index, I realized the Handbook was written in 1999, and perhaps an older version of the NSF website had a copy of the Index. So I went to the Internet Archive's trusty Wayback Machine, and put in the NSF's web address. Yesterday when I looked at the results page, there were no results, and the statement that the site had been blocked by robots.txt was the only information returned. Today, the Wayback Machine's results page shows each instance when the site was archive, from 1997 to 2005, but when you click on a link, the resulting page is empty and has this message:"We're sorry, access to http://www.nsf.gov/ has been blocked by the site owner via robots.txt."
Why? The archived web pages contain valuable information for researchers, scientists, journalists, and the general public. The web pages are agency documents, and, having been published on the Internet, are part of the FOIA public domain. See Let the People Know the Facts: Can Government Information Removed From the Internet Be Reclaimed?, 98 L. Libr. J. 1, 23 (2006). Since the Federal Records Act (44 U.S.C. 3301 et seq.) prohibits destruction of agency information, the older versions of the web site should be archived somewhere. So who is hiding information, and why? Every single version of the NSF site I tried to view was blocked.
If you know of another Federal agency that has refused to allow access t its archived web sites, I'd like to hear about it. If you know why agencies are blocking access, or have a theory, I'd like to hear about that, too. [Posted by Susan Nevelow Mart]
7/7/06 - Ed. Note: Be sure to read Bill's comments below based on his investigation - Mary Minow
A little more information is here. Summary: I think the glitch was temporary, and the Wayback Machine indexes NSF just fine. The Index of FOIA Frequently Requested Documents, however, does not appear to be available.
Posted by: Bill Hooker | July 08, 2006 at 08:08 PM
I first read your headline as "Is the *NSA* hiding something?"
My response to which was "duh!"
Posted by: Steve Lawson | July 08, 2006 at 02:45 PM
Thanks Bill - Good legwork. - Mary Minow
Posted by: Mary | July 08, 2006 at 08:44 AM
Got a response in short order:
--------
NSF blocks all indexing of the site between 7AM and 7PM ET, our peak traffic hours, for the convenience of our users. However, there is no block on the site from 7PM to 7AM ET. This is standard policy for most high traffic sites. The owner of http://www.archive.org/index.php need only comply with our policy in order to index our pages.
---------
So, neither malice nor stupidity by the look of it. I'd never heard of such a policy, but it makes sense.
Posted by: Bill Hooker | July 07, 2006 at 11:07 AM
Perhaps we should not attribute to malice that which is adequately explained by stupidity. My guess is that, rather than some sinister plot to remove information, someone somewhere thinks that they are fighting the good fight against spam by way of robots.txt.
That said, we are talking about an administration in which it's hard to say whether malice or stupidity is in greater oversupply, so I wrote to the webmaster to ask about this policy. I'll let you know if I get an answer.
Posted by: Bill Hooker | July 07, 2006 at 09:12 AM