Blog Post

God Willing and the Bits Don’t Rot

Johanna Drucker’s Pixel Dust: Illusions of Innovation in Scholarly Publishing at the Los Angeles Review of Books notes that “the fate of the humanities is being influenced by a campaign of misinformation.” Turns out, she’s participating.

There’s plenty to challenge in Drucker’s piece, but I want to focus on one part that was pointed out by my friend Bob Stein, who posted it on Facebook asking about the veracity of this specific claim by Drucker:

the fact is that every use of a file degrades and changes it, that “bit rot” sets in as soon as a file is made, and that no two copies of any file are ever precisely the same as another.

What follows is an expanded version of what I posted in response to Bob on Facebook:

Drucker misunderstands the nature of digital representation (as well as internet jokes; I’ll get to the latter in a moment). The actual bits that make up a file—take something like a text file in 8-bit ASCII—do not spontaneously rot or change from routine access and therefore become somehow different from one copy to the next. A file can be corrupted, for sure, but it does not simply “rot.” And there are checks against corruption, which I discuss below.

You can download hex editors that will allow you to see the actual 0s and 1s that make up something like an ASCII text file (eight 0s and 1s per character, or byte, in 8-bit ASCII). And with painstaking work, you can compare those actual bit representations and will see no difference, file to file or copy to copy over time. You can also compute a checksum on files, which are mathematical formulas that help ensure a file’s integrity in a transfer from Point A to Point B, or as a quick means to ensure that a file has not been tampered with. Authors of open-source software typically publish the computed hash for a given file or set of files, so that users of the software or fellow hackers can check its integrity. If spontaneous bit rot were possible, then something like a checksum would be useless for anything other than saying, “Yes, that’s probably a copy of the file. Probably. Maybe.”

What “bit rot” probably refers to, as Drucker is using it, is the physical degradation of storage media. But it’s not really the “bits” that rot, but the media itself (the electromagnetic surface of a hard disk; the polycarbonate and foil material of a CD-ROM). And because of course the density of bits is so extreme, when media fails, its far more than the individual bit here or there that gets lost—but whole huge swaths of data. Thus your garden-variety hard disk failure, etc.

But as far as a files bits changing upon access and Drucker’s claim that “no two copies of any file are ever precisely the same”: that’s simply not true (at least in binary computing; quantum computing is a different matter). Files may get moved from place to place over the course of their lives on a single drive (especially in the old days of defragmenting drives), but there’s nothing in regular read/write ops to change a 0 to a 1 or vice-versa under normal usage. A miswritten bit is a read/write error—but not “rot” in any organic/material sense.

The problem of ensuring the integrity of digital data—that is, checking against read/write errors—is practically as old as digital computing itself. One of the earliest forms of checksum computation was the parity bit.

Fun fact: the 1963 ASCII specification was for a 7-bit character set. It would have been theoretically possible for an 8-bit set at that time, but for one problem: transmission of digital data often took place on paper tape. And paper tape machines were engineered to support things like the simpler 5-bit Baudot codes (Google it).

To ensure the integrity of transmitted data, the tape machines actually supported 6 bits: the 5-bits of data, plus a parity bit. The parity bit was a rudimentary form of data checking on the number of 1s transmitted/holes punched for each byte (line of holes) in the data. If someone were checking for even parity, then the parity bit would be punched on all lines with an odd number of 1s (and remain unpunched on lines with an even number of ones). If the line/byte added up to an odd number, then the machine operator knew immediately that there was corruption in the data.

So why did paper tape force a 7-bit ASCII set (and thus marginalize even other Western alphabets with more than 26 letters)? The answer is forehead-slappingly funny: paper-tape engineers determined that the paper tape could accommodate 7 bits plus a parity bit, for a total of 8 bits. But to try and cram 8 bits and a parity bit onto paper tape would result in the paper tearing. So seven bits reigned supreme

As data transmission and storage have grown more sophisticated, so too have the specifications for data checking. For some light beach reading, check out the IETF’s RFC for computing Internet checksums on TCP/IP and other protocols.

Again: bits don’t rot. They can become corrupted, typically to the loss of the entire file (and more), but individual bits do not spontaneously change state from 0 to 1.

Drucker’s Wikipedia page lists her various positions and honors, including various prestigious digital humanities fellowships. Her faculty page at UCLA notes her expertise in a single word: Preservation.

The word “digital” refers to a mode of representation. In binary computing, it refers to representation as 0s and 1s. Whether those 0s and 1s are stored in an old-school punch card or in the cloud or on an M-DISC (the latter promises data storage lasting up to a thousand years), bits don’t rot. Someone with Drucker’s caliber of expertise in DH and preservation should know this.

But don’t take my word for it: the US Library of Congress maintains an entire website on digital preservation at digitalpreservation.gov/. Had Professor Drucker run a simple Google search for “bit rot” limited to the digitalpreservation.gov domain, she would have found numerous white papers and PDF’d presentation slides discounting the notion of “bit rot.” There’s this presentation by Cory Snavely, Library IT Core Services manager at the University of Michigan, who describes “bit rot” as a “non-issue.”

There is even a white paper authored by none other than Clay Shirky, who observes even more directly that “In Internet lore, [there is a] problem…called ‘bit rot,’ a humorous and notional condition in which digital data is said to decay with time.”

Humorous and notional. “Bit rot” is actually a joke—one used by software developers and programmers to pass off blame for old, no longer functioning software on a phenomenon that’s a fiction. The digital equivalent of “My dog ate it.”