November 2025 – billpg industries™

November 17, 2025

Python’s Truthiness: A Code Smell Worth Sniffing

Python lets you use any value in if and while conditions, not just Booleans. This permissiveness relies on a concept known as “truthiness”. Values like None, 0, "", and empty collections are treated as false and everything else is true.

It’s convenient. It’s idiomatic. It’s also, I argue, a code smell.

The Bug That Changed My Mind

Here’s a real example from one of my side projects:

def final_thing_count() -> Optional[int]:
    """Returns an integer count of things, but only
    if the number of things is complete. If there may
    yet be things to come, returns None to indicate 
    the count isn't final."""
    # Code Redacted.

current_thing_count: Optional[int] = final_thing_count()
if current_thing_count:
    # Process the final count of things.

Looks fine, right? But it’s broken.

If final_thing_count() returns 0, a perfectly valid and reasonable number of things, the condition fails. The code treats it the same as None, which was meant to signal “not ready yet”. The fix?

if current_thing_count is not None:
    # Process the final count of things.

Now the intent is clear. We care whether the value is None or not, not whether it’s “truthy.” Zero is no longer misclassified.

Even if the function never returns zero, I’d still argue the revised version is better. It’s explicit. It’s readable. It doesn’t require the reader to know Python’s rules for truthiness.

I’ve looked at a wide variety of Python code, and every time I see truthiness used, I ask if this be clearer if it were explicit. The answer is almost always yes. The one exception I’ve observed is this use of the or operator…

# Read string, replacing None with an empty string.
my_string = read_optional_string() or ""

This is elegant and expressive. If Python ever adopted stricter Boolean rules, I’d hope this idiom would be preserved or replaced with a dedicated None-coalescing operator.

“I used to know who I was. Now I look in the mirror and I’m not so sure. Lord, I don’t want to listen to the lies anymore!”

C# Chose Clarity Over Convenience

In C, you could write terse, expressive loops like this:

while (*p++) { /* do something */ }

This worked because C, similar to Python, treated any non-zero value as true, and NULL, 0, '\0', and 0.0 as false. It was flexible but also fragile. A mistyped condition or misunderstood return value could silently do the wrong thing.

C# broke from that tradition. It introduced a proper Boolean type and made it a requirement for conditional expressions. If you try to use an int, string, or object directly in an if statement, the compiler rejects it.

Yes, we lost the ability to write one-line loops that stop on a null. But we gained something more valuable, a guardrail against a whole class of subtle bugs. The language nudges you toward clarity. You must say what you mean.

I believe that was the right trade.

A Modest Proposal (and a Practical One)

I’d love it if Python could raise an error when a non-Boolean value is used in a Boolean context. It would need to be an opt-in feature via a “__future__” import or similar, as it would break too much otherwise.

This would allow developers to opt into a stricter, more explicit style. It would catch bugs like the one I described and make intent more visible to readers and reviewers.

As a secondary proposal, linter applications could help bridge the gap. They could flag uses of implicit truthiness, especially in conditionals involving optional types, and suggest more explicit alternatives. This would nudge developers toward clarity without requiring language changes.

Until then, I’ll continue treating truthiness as a code smell. Not always fatal, but worth sniffing.

“I saw your picture in the magazine I read. Imagined footlights on the wooden fence ahead. Sometimes I felt you were controlling me instead.”

Credits
📸 “100_2223” by paolo. (Creative Commons)
📸 “Elephant” by Kent Wang. (Creative Commons)

November 10, 2025November 11, 2025

The Expanding Universe of Unicode

Before Unicode, digital text lived in a fragmented world of 8-bit encodings. ASCII had settled in as the good-enough-for-English core, taking up the first half of codes, but the other half was a mish-mash of regional code pages that mapped characters differently depending on locale. One set for accented Latin letters, another set for Cyrillic.

Each system carried its own assumptions, collisions, and blind spots. Unicode emerged as a unifying vision. a single character set for all human languages, built on a 16-bit foundation. All developers had to do was swap their 8-bit loops for 16-bit loops. Some bristled that half the bytes were all zeros, but this was for the greater good.

16-bits made 65,536 code points. It was a bold expansion from the cramped quarters of ASCII, a ceremonial leap into linguistic universality. This was enough, it was thought, to encode the entirety of written expression. After all, how many characters could the world possibly need?

“Remember this girls. None of you can be first, but all of you can be next.”

🐹 I absolutely UTF-8 those zero bytes.

It was in this world of 16-bit Unicode that UTF-8 emerged. This had the notable benefit of being compatible with 7-bit ASCII, using the second half of ASCII to encode the non-ASCII side of Unicode as multiple byte sequences.

If your code knew how to work with ASCII it would probably work with UTF-8 without any changes needed. So long as it passed over those multi-byte sequences without attempting to interpret them, you’d be fine. The trade-off was that while ASCII characters only took up one byte, most of Unicode took three bytes, with the letters-with-accents occupying the two-bytes-per-character range.

This wasn’t the hard limit of UTF-8. The initial design allowed for up to 31-bit character codes. Plenty of room for expansion!

🔨 Knocking on UTF-16’s door.

As linguistic diversity, historical scripts, emoji, and symbolic notations clamoured for representation, the Unicode Consortium realised their neat two-byte packages would not be enough and needed to be extended. The world could have moved over to the UTF-8 where there was plenty of room, but too many systems had 16-bit Unicode baked in.

The community that doggedly stuck with ASCII and its 8-bits per character design must have felt a bit smug seeing the rest of the world move to 16-bit Unicode. They stuck with their good-enough-for-English encoding and were rewarded with UTF-8 with its ASCII compatibility and plenty of room for expansion. Meanwhile, those early adopters who made the effort to move to the purity of their fixed size 16-bit encoding were told that their characters weren’t going to be fixed size any more.

This would be the plan to move beyond the 65,536 limit. Two unused blocks of 1024 codes were set aside. If you wanted a character in the original range of 16-bit values, you’d use the 16-bit code as normal, but if you wanted a character from the new extended space, you had to put two 16-bit codes from these blocks together. The first 16-bit code gave you 10 bits (1024=2¹⁰) and the second 16-bit code you 10 more bits, making 20 bits in total.

(Incidentally, we need two separate blocks to allow for self-synchronization. If we only had one block of 1024 codes, we could not drop into the middle of a stream of 16-bit codes and simply start reading. It is only by having two blocks you know that if the first 16-bit code you read is from the second block, you know to discard that one and continue afresh from the next one.)

The original Unicode was rechristened the “Basic Multilingual Plane” or plane zero, while the 20-bit codes allowed by this new encoding were split into 16 separate “planes” of 65,536 codes each, numbered from 1 to (hexadecimal) 10. UTF-16 with its one million possible codes was born.

UTF-8 was standardized to match UTF-16 limits. Plane zero characters were represented by one, two or three byte sequences as before, but the new extended planes required four byte sequences. The longer byte sequences were still there but cordoned off with a “Here be dragons” sign, their byte patterns declared meaningless.

“Don’t need quarters, don’t need dimes, to call a friend of mine. Don’t need computer or TV to have a real good time.”

🧩 What If We Run Out Again?

Unicode’s architects once believed 64K code points would suffice. Then they expanded to a little over a million. But what if we run out again?

It’s not as far-fetched as it sounds. Scripts evolve. Emoji proliferate. Symbolic domains—mathematical, musical, magical—keep expanding. And if humanity ever starts encoding dreams, gestures, or interspecies diplomacy, we might need more.

Fortunately, UTF-8 is quietly prepared. Recall that its original design allowed for up to 31-bit code points, using up to 7 bytes per character. The technical definition of UTF-8 restricts itself to 21 bits, but the scaffolding for expansion is still there.

On the other hand, UTF-16 was never designed to handle more than a million codes. There’s no large unused range of unused code in plane zero to add more bits. But what if we need more?

For now, we can relax a little because we’re way short. Of the 17 planes, only the first four and last three have any codes allocated to them. Ten planes are unused. Could we pull the same trick with that unused space again?

🧮 An Encoding Scheme for UTF-16X

Let’s say we do decide to extend UTF-16 to 31 bits in order to match UTF-8’s original ceiling. Here’s a proposal:

Planes C and D (0xC0000 to 0xDFFFF) are mostly unused, aside from two reserved codes at the end of each.
We designate 49152 codes (2¹⁴+2¹⁵) from each plane as encoding units. This number is close to √2³¹, making it a natural fit.
A Plane C code followed by a Plane D code form a composite: (C×49152+D)
This yields over 2.4 billion combinations, which is more than enough to cover the 31-bit space.

This leaves us with these encoding patterns:

Basic Unicode is represented by a single 16-bit code.
The 16 extended planes by two 16-bit codes.
The remaining 31-bit space as two codes from the C and D planes, or four 16-bit codes.

This scheme would require a new decoder logic, but it mirrors the original surrogate pair trick with mathematical grace. It’s a ritual echo, scaled to the future. Code that only knows about the 17 planes will continue to work with this encoding as long as it simply passes the codes along rather than trying to apply any meaning to them, just like UTF-8 does.

🔥 An Encoding and Decoding Example

Let’s say we want to encode a Unicode code point 123456789 using the UTF-16X proposal above.

To encode into a plane C and plane D pair, divide and mod by 49152:

Plane C index: C = floor(123456789 / 49152) = 2512
Plane D index: D = 123456789 % 49152 = 21381

To get the actual UTF-16 values, add accordingly:

Plane C code: 0xC0000 + 2512 = 0xC09C0
Plane D code: 0xD0000 + 21381 = 0xD537D

To decode these two UTF-16 codes back, mask off the C and D plane bits to multiply and add the two values:

2512 × 49152 + 21381 = 123456789

🧠 Reader’s Exercise

Try rewriting the encoding and decoding steps above using only bitwise operations. Remember that 49,152 was chosen for its bit pattern and that you can replace multiplication and division with combinations of shifts and additions.

🌌 The Threshold of Plane B

Unicode’s expansion has been deliberate, almost ceremonial. Planes 4 through A remain largely untouched, a leisurely frontier for future scripts, symbols, and ceremonial glyphs. We allocate codes as needed, with time to reflect, revise, and ritualize.

But once Plane B begins to fill—once we cross into 0xB0000—we’ll be standing at a threshold. That’s the moment to decide how, or if, we go beyond?

As I write this, around a third of all possible code-points have been allocated. What will we be thinking that day in the future? Will those last few blocks be enough for what we need? Whatever we choose, it should be deliberate. Not just a technical fix, but a narrative decision. A moment of protocol poetry.

Because encoding isn’t just compression—it’s commitment. And Plane B is where the future begins.

“I could say Bella Bella, even Sehr Wunderbar. Each language only helps me tell you how grand you are.”

Credits
📸 “Dasha in a bun” by Keri Ivy. (Creative Commons)
📸 “Toco Toucan” by Bernard Dupont. (Creative Commons)
📸 “No Public Access” by me.
🤖 Editorial assistance and ceremonial decoding provided by Echoquill, my AI collaborator.

November 3, 2025November 3, 2025

Sixgate Part 1½: Why Not Tunnel Services?

I am grateful to “Lenny S” for making a comment on part two of this series, as it has revealed that I really need to make it clearer exactly what the point of Sixgate is. IPv6-over-IPv4 tunnels have existed for decades but I’m trying to solve a different problem.

(If you’ve no idea what any this is about, maybe start at part one you dingus.)

Part 1 – Introduction
Part 1½ – Why not tunnel services?
Part 2 – The Technical Details

The Server’s Problem

Technologies like CGNAT have made IPv4 “good enough” for ISPs. Moving to IPv6 would require a whole bunch of new equipment and no-one other than a few nerds are actually asking for it. This has been the status quo for decades and we realistically expect large chunks of the Internet to not be able to receive incoming connections.

As a server operator, I might want to deploy new services on IPv6‑only infrastructure. There’s no good equivalent of CGNAT on the server side and IPv4 addresses are scarce, expensive and require careful planning. I want to stop burning through them just to keep compatibility alive, but I can’t do that while many of my customers are still behind IPv4‑only ISPs.

From the user’s perspective, their internet connection “just works”. They don’t know what IPv4 or IPv6 is and they shouldn’t have to. If they try to connect to my service and it fails, they won’t start thinking they should sign up for a tunnel, they’ll think, quite reasonably, that my website is broken.

Tunnels put the burden on the least‑equipped party: the end‑user.

They require sign‑ups, configuration, and sometimes payment.
They assume technical knowledge that most customers simply don’t have.
They create friction at exactly the wrong place: the moment a customer is deciding whether my service is trustworthy.

Telling a potential customer to “go fix your internet” is not a viable business model.

“Your smile is like a breath of spring. Your voice is soft like summer rain. I cannot compete with you.”

The Sixgate Approach

This is where Sixgate changes the equation. Instead of asking customers to fix their connectivity, make the gateway discoverable through DNS.

The SRV record tells the client where the gateway is.
The client software (browser, OS, or library) can then use the gateway invisibly.

From the customer’s perspective, nothing changes. They click a link and it works. The SRV lookup adds a moment’s pause, but that’s the price of invisibility. No sign‑ups, no extra services, no confusion.

The SRV record is the keystone of Sixgate’s design. Without it, the bridge collapses into a pile of disconnected ideas. With that SRV record retrieved, the client doesn’t need to sign up for an account or perform a pre-connection ceremony. The remote network has provided the gateway and they want you to be able to connect to them with as little fuss as possible. Everything else rests on that stone. Place it firmly, and the whole arch of compatibility stands.

The Tipping Point

Over time, as more clients can connect to IPv6 servers either natively or through Sixgate, we reach a tipping point. Enough of the world can reach IPv6 that new data centres can seriously consider not deploying IPv4 at all.

That’s the goal. As long as server networks still need IPv4, we’re still going to have the problems of IPv4. If we can work around the ISPs who won’t update their equipment then IPv6 might finally stand on its own.

In Part Two, we’ll explore how Sixgate works under the hood. The SRV lookup, the encapsulation, the stateless routing, and the embedded IPv4 identity that makes it all possible.

Credits
📸 “DSC08463 – El Galeón” by Dennis Jarvis. (Creative Commons)
🤖 With thanks to Echoquill, my robotic assistant, for helping shape this interlude — from the keystone to the tipping point and liberal use of em-dashes.