Thursday, July 14, 2005

Thoughts on The Wayback Machine Kerfuffle

The Internet Archive a.k.a. Wayback Machine is being sued by a firm called Healthcare Advocates for storing copies of old web pages. (See Good Morning Silicon Valley, for example.) These archived pages are causing the company heartburn in a separate trademank dispute so it's unhappy. Further, for some reason, the pages were allegedly stored in spite of being flagged with a "robots.txt" file to not be archived, cached, spidered, etc.

The case has generated the predictable throwing up of hands in disgust throughtout the online world. As Good Morning Silicon Valley's John Paczkowski succinctly puts it: "Uh, you published that information to a public medium ..." Now I'm certainly sympathetic with the Internet Archive here. At some level, the archiving and caching of publicly-displayed web pages seems almost part of the fabric of the Web and the way it works. However, I'm less convinced than some others that this is Much Ado About Nothing. I preface the following comments and observations with a standard "I Am Not a Lawyer"--and would welcome any on point case law that might be relevant here.

I think we can all stipulate that web pages and such are copyrighted material and freely displaying them to the public doesn't reduce or eliminate that copyright in any way.

I do agree with John that the robots.txt angle seems a wit wacky.
Why? The robots.txt protocol is purely advisory. It has no legal bearing whatsoever. "Robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention ( "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

Ignoring robots.txt may be bad manners, but it's hard to see the legal significance. (There are perhaps analogs in physical trespass laws--posting your property and the like--but my understanding is that the details of such as typically goverened by explicit state and local laws.)

However--and here I perhaps stray into less charted territory--what exactly gives the permission to copy and archive web sites anyway? Certainly, there's no explicit permission like a negative robots.txt file that affirmatively gives the right to replicate, store, transmit, archive, etc. web pages. I suppose the theory is that there is some sort of implicit permission based on custom and social contract. Which seems a rather loosey-goosey state of affairs.

I can't think of any really good analogs here. Yes, I can record TV and radio--but only for my personal use. It's quite well established I can't put those recordings on a server for all to access. Usenet postings might be the most analagous situation; they're now archived as Google Groups and in more fragmentary form elsewhere. However, as far as I know, the legal status of Usenet and other types of online postings doesn't have much case law underpinning it. Furthermore, I think one could easily argue that such postings have a more explicit element of transmission of content out into the world--with the full knowledge that said content will be forwarded and stored for at least some interval--than Web pages which reside on a controlled site.

Nor can I see the exemplary historical service that the Internet Archive is providing with its activities having any bearing. "Preservation of the past" may be a social good, but it's got little to do with copyright law. After all, Abandonware has the same legal status as any other warez in the absence of the copyright owner's explicit permission to release it into the wild.

From where I sit, robots.txt certainly seems like a red herring in this case--given the lack of laws compelling its observence. But there's a much larger issue of caching and archive that seems to rest on very sandy foundations.

1 comment:

Anonymous said...

parallels. - the only thing i can think of is Library of congress/british library - which store copies of all copyrighted materials. its a good question