You hear this a lot. Someone wants to make a point about censorship, cryptography, or the futility of banning a particular blob of data. The argument goes like this: a file is just a sequence of 1s and 0s, and numbers are also sequences of 1s and 0s when written in binary. Therefore, banning a file is like banning a number. What next? Will the government outlaw 85?
It’s a neat rhetorical trick. But every time I see it, I wince just a little, because here’s the awkward truth…
Sometimes, digital files aren’t numbers.
How long has this been going on?
Start with the simplest case, the empty file. A file of length zero. That’s a perfectly valid file and I have many copies of it, but what number is it?
You might say “zero”.
Fine — but then what number is a file containing a single zero byte? It can’t also be zero; that slot is taken.
Even if you carve out a special rule for the empty file (perhaps negative 1), you’re still not done. Suppose you want to encode a number as a file. As a 32‑bit integer? A 64‑bit integer? A 128‑bit integer? All of these are different files but they’re the same number.
Numbers are infinite objects. Even the humble 42 has infinitely many zeros to the left and infinitely many zeros to the right after the decimal point. All numbers do, not just answers to metaphysical questions.
Files are finite objects. They have a length. Numbers don’t.
That difference matters.
Leading me along…
A favourite stunt in crypto‑activist circles is to turn a small text file into a prime number, but the trick only works because of a convenient accident.
Almost no real‑world file formats begin with a zero byte.
So when you convert that prime number back into the file, you simply chop off all the infinite leading zeros and start at the first non‑zero byte. It works because the file didn’t need those zeros anyway. But this is a convention, not a truth of nature.
The only widely‑encountered counterexample is UTF‑16 text. If the first character is ASCII, the first byte will be 0x00. Beyond that, you can find raw data formats such as uncompressed pixel dumps, PCM samples, and similar. These might begin with zero bytes if the first value happens to be zero. But structured formats almost never do.
And that’s deliberate. Most file formats begin with a header or “magic number” that identifies the type. These signatures are chosen to be distinctive and printable, which means the humble zero byte is out. Those file prefixes aren’t chosen to enable the prime number trick but rather more mundane practical reasons.
Channelling Gödel
If you really want to treat every possible file as a number, you can but you must choose a convention that encodes both the bits and the length. Only then do you get a true one‑to‑one mapping.
And once you’ve done that, you’ve quietly admitted the point.
A file is not just a number. A file is a number plus its length.
Which is exactly the thing the slogan tries to ignore. To do that you need a little help from Gödel, but that’s a story for another day.




