Thursday, December 31, 2015
Raiders of the Lost Web If a Pulitzer-finalist 34-part series of investigative journalism can vanish from the web, anything can. by ADRIENNE LAFRANCE OCT 14, 2015
The web, as it appears at any one moment, is a phantasmagoria. It’s not a place in any reliable sense of the word. It is not a repository. It is not a library. It is a constantly changing patchwork of perpetual nowness.
You can't count on the web, okay? It’s unstable. You have to know this.
Digital information itself has all kinds of advantages. It can be read by machines, sorted and analyzed in massive quantities, and disseminated instantaneously. “Except when it goes, it really goes,” said Jason Scott, an archivist and historian for the Internet Archive. “It’s gone gone. A piece of paper can burn and you can still kind of get something from it. With a hard drive or a URL, when it’s gone, there is just zero recourse.”
There are exceptions. The Internet Archive’s Wayback Machine has a trove of cached web pages going back to 1996. Scott and his colleagues are saving tens of petabytes of data, chasing an ideal that doubles as their motto: Universal Access to All Knowledge. The trove they’ve built is extraordinary, but it’s far from comprehensive. Today’s web is more dynamic than ever and therefore more at-risk than it sometimes seems.