Why I willingly bought a Windows Phone

Without shame or apology, I use a Windows Phone. A bright orange Lumia 630. I purchased it with my own money. No-one pushed me to it or chose it for me. It was entirely my decision.

But why?!

Phones

My story starts in 2012 when I had outgrown my aging Symbian phone. After considering a number of options, I purchased an Android based Samsung Galaxy S2.

I had considered an iPhone at the time, but the main reason I didn’t was that I’d have to buy into the Apple ecosystem, which just wasn’t for me. My primary computer platforms were Windows based and moving to iPhone would be a big culture shock. My Samsung instead fitted into that world quite neatly and I’d remain happy with my choice for years.

Stage Fright!

In 2015, a security vulnerability (known as Stage Fright) was found in many versions of Android, including the one on my phone. All it would take was for someone to send me a malicious text message in the night and my phone would be taken over.

Not to worry, new phones had already been fixed and I was sure it would only be matter of time before that same fix would be pushed out to older phones like mine. Every day for a few weeks, I’d go into the phone’s check-for-updates system to see if a fix was available. Every day, there wasn’t. I’d call tech support to ask when (not if) a fix would become available. “Soon” was always the infuriatingly non-specific answer, occasionally along with the subtle suggestion that maybe I should buy a new handset instead.

Finally, I just couldn’t take it any more and gave up. My phone, despite being only three years old was considered too old to be updated. The risk of keeping it switched on, waiting for a drive-by attacker, was giving me too much stress. I switched the phone off and put it away, never to be used again.

Normally, there will come a natural time with each phone I use when I start to feel it is time to upgrade, having simply outgrown the old one. When that happens, I keep using the old one while I take my time to consider my choices. This time was different.

It was clear to me now that the Android ecosystem had a problem. Security vulnerabilities were not being taken seriously by the handset makers who would rather I just purchased a new device instead. If I had bought a new Android phone back then, I’d be supporting that attitude with my cash!

Choices

Having lost trust in Android, I was left choosing between Apple or Microsoft. At first, I wasn’t even considering Windows Phone, having had bad experiences with the platform some ten years earlier. Faced with an iPhone as my only choice left, I was willing to give the new Windows Phone a try.

Trying out a Lumia 630, I was pleasantly surprised. The tile concept was a welcome relief from the “Space Invader” style rows-of-icons that dominate the rest of the market. Suitably impressed with the whole package I ended up buying one and I’ve not looked back. (Except to write this.)

The lack of apps for this platform is a little annoying, but I get by. I have instant-messaging, a podcast player, a weather tile on the home screen and a few others. For everything else, I use a number of “M Dot” websites. (m.facebook.com, m.youtube.com, etc.)

The Future

How long, after having purchased a smartphone, is it reasonable to expect support in the form of security updates? Back when “Stage Fright” happened, I found that answer for the Android ecosystem was 1½ years. That’s just way too short in my book.

My Lumia 630 is around two years old as I write this and I’ve just installed an update that fixes the WPA2 “KRACK” bug. If I had purchased another Android based phone back in 2015, would I now have an update for this new bug? (Or, would I be back down the shops spending more money to enrich the handset makers who are laughing at the chump that I am…)

While I’m not planning on replacing my phone any time soon, its likely I will feel I’ve outgrown it in maybe a couple of years down the road, especially as Microsoft have announced they will not be actively developing it any more except for those security updates. When that day comes, I hope Android will have taken a tip from Microsoft on how to do updates right.

Picture Credits
Microsoft Lumia 630 running Podcast Lounge. By me, ironically enough, using an iPhone.
Tension, 91/365 by Matt Harris.
Future by “Legosz”.
(Pictures are Creative Commons licensed.)

Is your API broken?

“Welcome to the Example Rutabaga Company. We’ve got a simple REST API for all your rutabaga needs!”

Indeed, it is simple…

   POST https://rutabaga.example.com/Order/ HTTP/1.1
   Content-Type: application/json

   {"Quantity": 5800,
    "Quality": "Tasty!",
    "DeliverTo": "123 Fake Street, New Orleans"}

Send this and you’ll either get an error or an “OK” response with a tracking ID inside. Later, you’ll get several thousand tasty rutabagas in the post. What could go wrong?

Everything.

Schrödinger’s Response

From the client’s point of view, there’s a clear action to take depending on the response code.

  • 200, log the tracking ID.
  • 5xx, try again later.

But what if there’s no response? Perhaps your friendly HTTP client library code has thrown an exception because the connection has broken down. These errors are unavoidable, especially when the client is on a mobile device. What should we do in this situation?

You could try again later? But hang on, this violates the thing that makes POST different from GET and PUT. (GET and PUT are designed to be repeatable, but POST requests are express calls to take action.)

You might reason that the first POST request failed, so you’re not actually repeating anything, but aren’t you? There are two possibilities when you get an error from any sort of network request.

  1. The request was lost on the way and the remote server did not handle the request.
  2. The request arrived and was handled, but the response to the client was lost.

If A, we’re fine to repeat the POST. No problem.
If B, the remote server is already in the process of shipping a truckload of rutabagas to you and has no idea the response got lost. Repeat that request and you’ll end up with two truckloads of rutabagas.

But this is the point, the client has no way of knowing if its A or B. The only entity that knows is the server and we can’t talk to it.

For a surprising number of APIs I’ve written client code for, that’s the end of the story. The API simply has no reliable way for the client to find out what happened.

How does your API handle this situation? Is your API broken?

Opening the box

One way an API designer could resolve this issue is to provide a way to look up the order history.

This is probably what you’d do if (say) you were shopping online and your internet connection died just as you hit the Complete Purchase button. Once you got back online, you’d check to see if the order was in the system before repeating the order.

Sounds simple? This would work but be careful, for alas, this approach has lots of caveats. Fortunately none of them are really insurmountable.

Beware of false duplicates

Say you’re in this worst case scenario and your link to the server has just been restored. Your code dutifully downloads the list of outstanding orders and finds one for 5800 rutabagas. Job done?

Wait! Was that your order? Maybe the account holder deliberately made another identical order from a different machine. We don’t know – We can’t know.

This can be resolved by ensuring the client has the opportunity to supply its own way to identify the the initial request – perhaps with a client supplied ID – and allowing for a lookup later on.

How long should we keep that ID around?

Expire ID records too quickly and a client that’s been offline for a prolonged amount of time will not be able to resynchronize. Store the IDs forever and that would be a waste of space.

You may have a figure in mind that’s reasonable. If not, add an occasional reconciliation of expired IDs to your API.

Who chooses the ID?

The client should be able to freely chose an ID. You may be looking at your database and thinking there’s a field supplied by the client that’s already got a no-duplicates constraint. If those values came from a source external to the client, it won’t be able to control the uniqueness of those important values. That external entity might very well be feeding identical records into the system through different channels and the client won’t know if that duplicate it found was their own or someone else’s.

Whose ID is it anyway?

Make sure the client has a clear space from which to select IDs. We can’t have multiple users all counting from 1 because you’ll get collisions very quickly. A GUID would work as long as they are generated correctly. Maybe if the API requires that the client log-in first, the server could track IDs on a per-user basis, but not all APIs require a log-in or pre-registration.

Avoid colliding with prior attempts still being processed.

Consider this: A client attempts to send a request to a server, but the connection fails with a time-out error. Thirty seconds later, the client asks the server if that prior request made it, which it answers “No”. Time to repeat that first attempt?

But wait! That first attempt timed out because the server was unexpectantly busy and has only just started dealing with your first request.

You can mitigate this (probably rare) scenario by making sure the server will return an error to the second POST request. Almost all DBs allow for any field or combination of fields to have a uniqueness constraint and the error will just happen if this scenario ended up playing out.

Do you have a ticket?

There’s another protocol that works in a similar way but puts the server in control of the IDs, at the cost of requiring two separate phases. (The actual request could be carried along with either the first or second phases.)

The first phase has the client asking the server for an ID while the second phase has the client committing to complete the transaction with that ID.

This protocol does require that when the client begins phase two, they have committed to not return to phase one for this transaction. The client must also store that ID and be ready to use it for when the connection has been restored. Similarly, the server needs to agree that it only starts processing a transaction once the second phase request has arrived.

This two-phase approach covers for failures at any step along the conversation, so long as the client and server stick to the agreement.

  • If the first request is lost, there’s no problem in repeating the first phase.
  • If the first response is lost, the server will have allocated an ID that will never be committed, but will be left indefinitely in an uncommitted state. (A later occasional reconciliation of orphaned IDs would be useful here.)
  • If the second request is lost, the client can later repeat the commitment of the transaction after checking its state using the ID it received in the first phase.
  • If the second response is lost, the client can later check the state of the transaction using the ID and see that it is already committed.

This protocol has a similar caveat from the earlier plan – How long should the server keep track of used ID numbers? The server will be left with IDs that will never be committed as well as committed IDs that the client might still need to check up upon later. Again, you may wish to come up with reasonable time limits or allow for a reconciliation of IDs later on.

While this protocol might be considered more complicated because of the two phases of conversation, there are fewer caveats to this plan and fewer oportunities for things to go wrong. This is my personal favorite.

Do I really need to do this?

As I write this I’m also working on a small web service that uses a REST API with POST requests, but taking none of the advice I offer on this page. Why not? Simply that the cost of the resources being allocated by this API-to-be are so close to zero that making the effort to implement the API robustly is just not worth it in this particular case.

But consider, even if you’re not transmitting invoices worth thousands of dollars, do you really want duplicates turning up?

Picture Credits
“Rutabagas” by Dale Calder
“Barney the cat” by Bill P. Godfrey (me).
“Rutabaga 2” by Dolan Halbrook
“Commit no nuisance” by Pat Joyce

NEVER sanitize your inputs!

I’ve seen this cartoon being linked-to in so many comment threads and forums. Anytime its even a little bit applicable, someone will post a link to this cartoon. It has become so pervasive that if you search Google for “327”, it’ll be the third link returned, right after the Wikipedia pages for the year and the car.

Search “328” and the next XKCD is no-where to be seen.

The lesson, according to this character and so many real people on the internet, is to sanitize your inputs. The school in the cartoon didn’t sanitize its inputs – and one of its database tables got deleted!

Ask anyone about developing websites and they will tell you the first lesson is always to sanitize your inputs. In this day and age you’d have to be crazy not to sanitize your inputs.

Trouble is, sanitizing your inputs is very bad advice.

What went wrong at the school?

A quick aside for what’s going on in this cartoon. A new student named…
      Robert'); DROP TABLE Students; --
… joins a school and the administrators dutifully add a record to their database for the new student. The software takes the new student’s name and builds an SQL instruction.

    string sqlcmd = "INSERT INTO Students (name) VALUES ('" + name + "')";
    // INSERT INTO Students (name) VALUES ('Wilhelm von Hackensplat')

With normal names, the string would be a perfectly valid SQL command which will add a new record into the table named Students. But what about our friend Bobby Tables?

   INSERT INTO Students (name) VALUES ('Robert'); DROP TABLE Students; --')

Because that single-quote character wasn’t sanitized away, an extra command to drop the Students table crept in. This is what we know in the trade as an “SQL Injection” attack, as some unintentional SQL got injected in.

So let’s sanitize it?

We can’t allow people to go about running arbitrary SQL commands willy-nilly. Something must be done!

That single-quote character in the student’s name is clearly the problem, so we’ll take it out while building the SQL command. This fixes the command and you won’t find database tables disappearing. So why do I call this bad advice?

Trouble is, the single-quote character has a bit of a split personality. As well as being a quote, it’s also an apostrophe. Real people have real names with apostrophes and if you’ve ever seen a name where one has clearly been dropped, you’ve seen the mark of the sanitizer.

Perhaps this is why some Irish people prefer to spell their name using the letter Ó. After years of having their name mangled by naive software developers, they made a new letter.

So forget sanitizing your inputs. What you need to do instead is to contain your inputs.

Contain my inputs?

The error made by the programmers at the school was that they failed to contain Bobby’s name. A student’s name is just a sequence of characters, so you need to use it in a way that could only be a sequence of characters.

Lucky for us, all good SQL access libraries support parameters. Instead, you write the command but with placeholders for the values to be added in little boxes later.

   INSERT INTO Students (name) VALUES (@name)

Here, there’s a clear demarcation between what’s the SQL command and what’s the value from outside. The student’s name is inside the little box where the apostrophe is just another character. The name has been contained and that destructive command inside can’t break out.

But that’s what we mean by “sanitize”!

Then you should stop calling it that. The word “sanitize” is a common enough word and most people understand it as a word for cleaning – removing the bad stuff and keeping the good stuff.

  “Did you sanitize the kitchen worktop?”
  “Yes. I put it in that sealed box over there.”
  “That’s not sanitizing!”

  “When I use a word, it means just what I choose it to mean. Neither more nor less.”

There is a real problem with software not accepting names with apostrophes, as discussed earlier. Real software developers are listening to the advice to sanitize and interpreting it to mean they should have the bad characters removed.

Isn’t sanitization still needed with HTML?

HTML has a similar problem with injection. Say you’re building a website that can take comments from the public, like this one, you’d want to prevent people from leaving comments with bits of scripting code inside.

   "I <i>love</i> this website! <script>alert('Baron von Hackensplat Was Here');</script>"

Its fine to allow the emphasis, but if your website also publishes the script, anyone else visiting your site will end up running that script.

Unfortunately, HTML doesn’t support a nice little box from whence nothing can escape, so we need to provide that box of containment ourselves. Any HTML from the public should be parsed and rewritten as safe-HTML, where only a safe subset of tags are allowed.

You might argue that this amounts to sanitization, but it betrays a bad mental model. Okay, you’ve dealt with the big problem, but forgotten about the little problems.

Have you ever seen a comment thread where, starting part way down the page, everything is in italics? This is caused by someone opening italics but not closing them. If your mental model is to sanitize, your natural reaction would be remove the ability to use italics. If your mental model is instead to contain, you know that italics is really harmless and just needs to be closed when left open.

Cross-out Cross Site Scripting

In closing, I’d just like to appeal to the industry to drop the phrase “Cross-Site-Scripting” and call it “HTML Injection” instead.

Any scripting that you didn’t write or don’t trust, cross-site or not, is a very bad thing to have on your website. Putting “Scripting” in the name makes people think of scripting as the problem but its so much more than that.

Calling it “HTML Injection” draws an obvious parallel with “SQL Injection”. Its the same problem with the same solution.

Credits: XKCD 327 – Exploits of a mom by Randall Munroe.
“When I use a word…” is a quote from Lewis Carroll’s “Through the Looking-Glass”.
Second: sanitize the gloves by Thomas Cizauskas.
Fun with cling film by Elizabeth Gomm.

I need a good podcast catcher (and a bit of a rant)

I listen to podcasts on my daily commute. These are radio shows that can be downloaded over the internet and listened to later. However, to keep up with a weekly show, I’d have to – every week – visit the show’s website and manually download the latest episode. That would get real tedious real fast. To resolve the tedium for us all, the podcast catcher app was invented.

Podcast catchers allow me to list all the shows I want to listen to. Every day or so, it automatically checks each show on the list to see there are any new episodes for me. If it finds any, it downloads them and plays them for me.

Currently, I use Google’s ‘Listen’ app, but that service is about to be closed down with the imminent closure of Google Reader. I need to replace it. I’ve downloaded a handful of alternative apps, but they all lacked a feature I find essential. I remain a little flabbergasted that any podcast app out there does it any other way.

“She smoothes her hair with automatic hand and puts a record on the gramophone.”

My daily commute is ~45 minutes of driving each way, so for me, a good player needs an Auto-Play mode. When one show finishes, another should start playing right away. There’s very few places I could safely pull-over and having to push buttons while I’m driving is right out.

But not just any Auto-Play mode. Oh no. All the apps I tried had an Auto-Play mode, but they all did it so very badly.

Ask yourself – When a show finishes playing and Auto-Play is switched on, which show from the list of unplayed shows should your app select to play next?
   A. The one that’s been waiting in the queue longest.
   B. The one that appears next in the list when sorted by episode title.

Did you pick A or B? Sorry, they’re both wrong, and yet these were the only options available on an awful lot of podcast apps.

The right answer, is to play the one the user has queued up next. The “In the order I want” sort criteria. No really, who is actually asking for the order of play-back to be strictly enforced? Would anything else, perhaps, offend your sense of politeness?

   “You want to listen to the latest Cognitive Dissonance show? But what about this episode of Hanselminutes? It has been waiting paitiently in line and this is its turn to be played.”
   “I say! That would be jolly impolite of me. Don’t want to hurt the feelings of those audio files. Pip pip!”

“I sat upon the shore, fishing with the arid plain behind me. Shall I at least set my lands in order?”

With Google Listen, new episodes join the listening queue, but I can arrange them in the order I like. If I’m just not in the mood for the next episode in line, I’ll select another episode that I do want to listen to and bring it to the top using the ‘Move to the top of queue’ button.

Once I’m happy with my selection of the next hour or so’s worth of stuff at the top of the queue, I hit play and drive off. As the first show finishes, its taken off the queue and the next episode I had queued up starts playing, all without any interaction.

The few alternative apps I downloaded did not offer this. It seems such a simple thing and yet I can’t imagine the insanity of not being able to control the playing order.

If one, settling a pillow by her head should say, “That is not what I meant at all.”

Some people reading this, I’m sure, are thinking “He wants a playlist manager”.

To manage a playlist, you’d need to first create a playlist and give it a name. Then you’d need to add shows to the list and save it. Then once its played you’d need to delete that playlist and start a new one.

No. That’s just another level of insanity. All I want is a button on each episode labelled ‘Move to the top of the queue’. That’s it. If I have to perform some ritual every day to create a new playlist or whatever before I can get that button, I’m not going to be happy. Life is too short for pointless ritual.

Maybe if your UI is so user friendly that the ritualistic parts of your playlist manager just disappear, that’s fine but that’s not what I’ve seen out there.

“Oh, I have to chose a name for this new playlist. Why not just pick a random name for me? I’m only going to delete it in an hour’s time anyway.”

So there is my plea. Does anyone please know of a podcast app for Android phones that implements its Auto-Play mode… correctly? I will happily pay a reasonable subscription fee for good quality software.

If you’re an app developer and your podcast app does it correctly, please feel free to use this page’s comments for some free publicity. On the other hand if your app doesn’t do it right, please treat this page as a bug report.

Picture credits:
Day 30.06 Voices on the radio!” by Frerieke on Flickr.
Listening to Radio Karnali” by the BBC World Service.
The section titles were borrowed from The Waste Land and The Love-Song of J. Alfred Prufrock, both by T.S. Eliot.

PHP – Some strings are more equal than others

You may have recently read about the PHP programming language, when it was found that if you compare the two strings "9223372036854775807" and "9223372036854775808" with the == operator, PHP will report these as identical. Most of the time PHP does the right thing, but you need to be careful about these exceptions to the rule.

This was reported as a bug to the people who maintain PHP, but they responded that regarding these two strings as equal was really the correct thing to do. Programmers who feel these two strings should be treated as different should instead use the === operator. This operator checks if two strings are equal, but this time, means it!

But this isn’t the end of the story…

While === is fine for strings containing only digits, there’s a little known feature of Unicode where you can express an accented letter either by a single character such as 'é' (U+00E9), or by using a regular ascii 'e' (U+0065) and then adding a special character (U+0301) which means “put an accent on that last character”. If you want to compare two strings that are the same except they each use different ways of expressing an 'é', you need to add another equal sign and use ==== to differentiate them, as === will see them as equal.

There’s a similar rule about the Unicode smiley face character ‘‘ (U+263A) and the more familiar colon-bracket smiley ':)'. These will compare equal unless you use the ==== operator. As well as that, all of these comparison operators see both the white smiley face ‘‘ and black smiley face ‘‘ (U+263B) as identical, unless php.ini has the ‘Racist’ setting switched on.

Even the ==== operator isn’t the end of the matter. This can’t tell the difference between serif and sans-serif text. Most programmers are happy to treat these as equivalent, but if the text is highly secure, you need the ===== operator which knows that ‘A‘ and ‘A‘ are different.

But the ultimate equality operator is the six equal sign ====== operator. As I write this, no-one has found two values where x======y returns true, even when x and y are copies. Some mathematicians suspect there are no such pairs of values, but a mathematical proof remains elusive.

Picture credits:
‘Equal in stature’ by Kevin Dooley (CC-BY)
‘Equal Opportunity Employment’ by flickr user ‘pasukaru76’ (CC-BY)

Clever and totally pointless – my first publication

Way back in the early 90s, I subscribed to a magazine (think of it like a big website but printed on paper and sent through the post) called ‘PC Plus’. It included a section called “Wilf’s Programmers Workshop” where every month, Mr Wilf Hey would present a project (usually written in GW-Basic) and discuss the principles at work. It was here where I first managed to get something clever into print, except I didn’t do it quite right.

There would usually be a brief digression at the end of his section, and in one issue, he discussed the idea of a “quine”, a program whose only function is to generate its own source code.

printf(f,34,f,34,10);

It was from this I had an idea of a creative way to produce a quine of my own. I just had to be liberal about the definition of a programming language. Here’s my (faulty) recollection of Mr Hey’s write-up of my entry…

We had a clever entry to our discussion of self-replicating programs from Bill Godfrey who sent in a floppy disk, and it meets the rules of the game.

Run the program SELFREP.EXE and it produces the “source”, PKZIP.EXE itself. He supplies a batch file which recompiles the program. First, PKZIP “compiles” SELFREP.OBJ (instead of .ZIP) and then the “linker” ZIP2EXE is invoked to produce the completed executable program.

Unfortunately, because Mr Godfrey didn’t write PKZIP, he’s technically disqualified from this contest.

Once the initial excitement of appearing in print wore off, I was kicking myself for not thinking my idea through. I only used PKZIP.EXE as the source file because I needed a file to be the source code, and PKZIP itself seemed the most applicable for that role. That decision alone disqualified me.

What I should have done is supply some “source code” such as…
   /* A self replicating program by Bill Godfrey. */
   Go();

The batch file should have just compiled (zipped) that two line text file and then linked (zip2exe) it. Running the generated EXE would have produced the same two line text file back. It would have totally complied with the rules and I would not have been disqualified! Grrrr…

I’ve long since lost that edition of PC Plus. If anyone reading this has a copy, I’d love a scan of that page please.

Picture credits
“Reading a magazine” by flickr user “ZaCky ॐ”.
“Danger – Self Replicating Device!” by Sam Ley, aka flickr user “phidauex”.

Vinegar – refined Vigenère – can you break my cipher?

I’m idly interested in cryptography, the art of scrambling a message so that it can be transmitted securely, and only someone with the magic key can understand the message.

When I was young, I designed a cryptographic algorithm. I thought I was so clever, but just because *I* couldn’t break it, that doesn’t make it secure.

In this article, I present my naive cryptographic algorithm. It’s very flawed, so please don’t use it for anything important. Can you find the flaw?

This article will start with some background on substitution ciphers and the Vigenère cipher, which my method was based upon. Then, we’ll look at my big idea itself, Vinegar. To keep it interesting, there’s a little code breaking challenge as well. Enjoy!


How Etaoin Shrdlu defeated substitution ciphers.

Like most children, my first encounter with cryptography was a substitution cipher. A friend gave me a sheet of paper with each of the 26 letters and wacky squiggle next to each one. This would be our secret code. Replace each letter with it’s squiggle and it would just look like a bunch of squiggles.

We thought it was unbreakable, but it wasn’t. This sort of code can be cracked by knowing that E is the most common letter in English, so the most common squiggle in the hidden message is probably an E. The next most common squiggle is probably a T. Once you’ve covered the twelve most common letters in English; E, T, A, O, I, N, S, H, R, D, L and U;
♦ou ♦an easil♦ ♦or♦ out ♦hat the other ♦issin♦ letters ♦ould ♦e.

What we need is a cipher the produces a coded message with an equal mixture of symbols. Enter Vigenère.

The Vigenère cipher

Centuries ago, the choice of people who wanted to communicate in private was Vigenère. Here’s how it works.

          A 0
Z 25 B 1
Y 24 C 2
X 23 D 3
W 22 E 4
V 21 F 5
U 20 G 6
T 19 H 7
S 18 I 8
R 17 J 9
Q 16 K 10
P 15 L 11
O 14 M 12
N 13

(This was meant to be circular.)

The key to understanding a lot of cryptography is that the 26 letters of the English language can be used as numbers. On this chart, each letter has been given a number. Now, it’s possible to do simple calculations with letters. What’s C+C? The answer is E, because 2+2=4.

Don’t get excited, but you can do subtraction as well. K-H is D, because 10-7=3.

You may be wondering what should happen if you add past ‘Z’ or subtract past ‘A’. For our purposes, imagine the 26 letters on a clock face in a circle. On a normal clock, when a hand ticks past the 12, it moves onto 1, not 13. Its the same with the Vigenère clock of letters. After the letter ‘Z’, is the letter ‘A’.

Finally, because we’ve constrained our system to these 26 values, adding ‘B’ (1) turns out to be the same as subtracting ‘Z’ (25). Regardless of where you start, performing either +B or -Z will end at the same letter. In fact, all of the 26 possible additions will have an equivalent subtraction. You can find each letter’s pair on the chart by looking for the letter across from the other. ‘A’ and ‘N’ are self pairing; +A is the same as -A, and +N is the same as -N.

Working within this system, we can use this to encrypt secret messages. Imagine Bob and Carol wish to communicate in private, but the bad guys can read their messages. To stop the bad guys, Bob and Carol meet up in advance and agreed the code they will use future.

Later, Bob wants to send the message “But now you will die by chainsaw” to Carol. (He’s a fan of Internet cartoons.) Now we have the system of adding letters together, Bob can perform a simple calculation on each letter of the secret plain text message. He takes the previously agreed keyword, “WILHELM”, and adds each letter of the keyword to each letter of the plain text, repeating the keyword as often as needed;

  BUT NOW YOU WILL DIE BY CHAINSAW
+ WIL HEL MWI LHEL MWI LH ELMWILHE
  XCE USH KKC HPPW PEM MF GSMEVDHA

When Carol receives the encoded message, she can get back the plain-text by subtracting the keyword.

  XCE USH KKC HPPW PEM MF GSMEVDHA
- WIL HEL MWI LHEL MWI LH ELMWILHE
  BUT NOW YOU WILL DIE BY CHAINSAW

Vigenère fixes the flaw with substitution ciphers because all the letter Es in the original message will all (mostly) come out as different letters.

Vigenère eventually fell out of use once a new flaw was discovered. Frequency analysis was still hiding there. If you know that the length of the keyword is 7, then you know that every 7th letter was encoded with the same letter from the keyword. So if you circle the 1st, 8th, 15th, etc letter of a long enough hidden message, the most frequent letter of those circled letters is probably an ‘E’, etc. Repeat for each group of letters 7 spaces apart and you can work out what the plain-text message was.

(How did we know the keyword was 7 letters long? There are ways. Wikipedia have an in-depth description but you don’t need to know how for this puzzle.) If you want to experiment, “Sharky” has a rather splendid web-app to perform the encoding and decoding.

Vinegar – Refined Vigenère

This is my improvement to the Vigenère cipher, which I called Vinegar. (Because that’s close to how I kept mispronouncing it.)

The problem with Vigenère is that the keyword is repeated, and that repetition exposed the vulnerability. What we need is a keyword that’s long without repeating, but small enough to be remembered.

Vinegar takes a 17 letter keyword and expands it to 210 letters. With that long 210 letter keyword, you can use Vigenère without having to repeat the keyword, and it’s the repetition in Vigenère that exposes it’s vulnerability.

Why 17? Because it’s the sum of the first four prime numbers. 2+3+5+7=17. We’ll use the keyword “WILHELMVONHACKENS” for this example.

Split the keyword up into groups of 2, 3, 5 and 7 letters. WI, LHE, LMVON and HACKENS.

Repeat each sub-key to make up 210 letters:

  WIWIWIWIWIWIWIWIWIWIWIWIWI...
  LHELHELHELHELHELHELHELHELH...
  LMVONLMVONLMVONLMVONLMVONL...
+ HACKENSHACKENSHACKENSHACKE...

Because each group has a prime number length, the four groups will effectively expand into a 210 letter Vigenère keyword for the price of a 17 letter keyword. (210 is 2x3x5x7.) Add each column together to get the long key…

ZBXRUKLROI YCPVUERRZP DMYCEEZGYZ ULZUIETMYK BQJDPOTUNJ LHIEHSTOTJ WONOQZDOBY VYENRRHOVE VJLSBAOYVM KIVJABGCVG QIGQFLPJFG YXFAWKQBJG SDFLDPAKQQ SLUKNGZLIU SFAKYNEVRB CFIZXXVUST GFCFXICZCC NPCNMHMQBD FTCHSHXXGN OAMHAWWHXM PSQHUCWSER

To avoid the Vigenère vulnerability, we can only use a 17 letter keyword once per 210 letters. So if you are encoding a longer message, use the last 17 letters of each 210 block for a new keyword to use for the next block of 210 characters.

So there’s my cipher. It has a flaw. Can you work it out? To try your hand, I came up with a random 17 letter codeword and expanded it to 210 letters. I then used that long key to encrypt my secret message. (My message is a little shorter than 210 letters, so I left some at the end un-used.) The plain-text is perfectly normal English. Spaces and punctuation are retained from the plain text.

Iwy seix zfvdzykjxm moebj dkaavmin vjkehleozp atir sdvwkvm cf hhbd vw gauj ty qzintte av mjbo xr xxnb whuieift, zaed jmioidh xv ts xtt elmv fg zok xgwnlpn vbues mp irmc twpb, yebhaoz rdlnrpbj jgg kzmlkyah vo dvta hn jzfxggaxcq.

Can you decode my message without the key? Post me a comment. Enjoy.

Picture Credits:
I’m lying by Taylor Dawn Fortune
Local Praire Dog Gossip by Art G
With grateful thanks to Richard Heathfield.

The cult of 140

Apparently, women don’t understand the offside rule. At least that’s according to some TV sports pundit who lost his job recently.

I don’t really understand the offside rule either, so I wrote this on my facebook page in response to the news.

The key to understanding the offside rule is that it doesn’t really matter what the rule is.

Make up any old rubbish, like “Goal keepers must be pipe smokers” and call that the offside rule. It is just as meaningful.

Meh. Hardly my best work, but I thought it just about good enough to post it on my twitter feed too.

That’s where I met… the cult.

FlügenWeb, Späcecode, TwitZöne, Ass Möde

Set in stone.

Twitter is famously limited to 140 characters. My message went over that limit by 78 characters. What to do?

“If it’s too long for 140 characters, make it a blog post and post a message with a link.”
Okay, but really? “Read my hilarious thought on the offside rule! http://bit.ly/√ế№Ω” (75 characters to spare! Yay!)

So my twitter readers would see my teaser message. A few may even be bothered enough to follow the link, but they would be disappointed to have made the effort of loading the page only to get such a short message.

Remember, Twitter is for short messages like mine. What can I do keeping within the Twitter ecosystem?

“The 140 limit forces people to concentrate on what’s important. Cut out the flab!”
Okay. I started with the counter at 78 characters over. Time to start trimming down until it fits. I finally got it down to…

“The key to the offside rule is that it doesn’t matter what it is. Making up some rubbish and calling it the offside rule is as meaningful.”

It was already a rather poor piece of writing when I started. Now, I couldn’t even find space for the bit about pipe smoking goal keepers. Just take it away and put it out of it’s misery!

So I’d like to challenge the 140 character advocates out there. Can you improve on my effort? Take my original message, trim it down to 140 characters and post it as a comment.

<Update> An anonymous commenter came up with
“It doesn’t matter what the offside rule is. It could be any old rubbish like “Goal keepers must be pipe smokers”. It is just as meaningful.”.
That’s probably the best the could have been done within the 140 limit, but this is the point; Is this shorter version better than my original version? In my biased opinion, no. The whole point of my message was about understanding the offside rule. Lose that word and it looks like I’m commenting on football itself.

It seems there isn’t enough room for big complicated words like “understanding”.</Update>

(Pre-emptive snarky comment: I’ve trimmed out all the bad parts of your message. I can’t post it because there’s none left!)

Picture credits:
“little ref” by Richard Boak.
“140” by Gabriela Grosseck.

‘First Past the Post’ isn’t.

I’m a bit of a nerd for vote counting systems. So I discovered with delight that in the UK, there will soon be a referendum on changing the way votes are counted. As I write this, most of the UK (and the USA) uses a method we call “First past the post”.

It’s called “First past the post”, but it isn’t. The name is about as ridiculous as calling North Korea;  “The Democratic People’s Republic of Korea”. The country is on the Korean peninsula, so at least that name is a little bit honest.

Put what you know to one side and think about what a vote counting system called “first past the post” should look like by the name alone. It’s a clear analogy to running in a race with a “post” at the end of the track. So there’s a predetermined number of votes and the first candidate to get past that threshold wins?

Nope. “First past the post” doesn’t work that way at all.

Imagine a vote of 100 voters selecting one of three candidates; A, B and C.

A 45 votes
B 30 votes
C 25 votes

If we put the “post” at 51 votes (a simple majority), then all three candidates lost. They all fell over and dropped out of the race before anyone reached the finish. Three pathetic failures. Rather than declaring no-one the winner, the people organising the race then dig up the post and re-plant it just behind A, as if the post was always there. He is declared the first one to pass the post.

If that wasn’t confusing enough, the other counting system to be offered in the referendum, “Alternative Vote” or “AV”, does indeed look like how you’d imagine “first past the post” to work. To continue the analogy with AV, candidate C drops out of the race after the first round. The second preferences of C’s voters are counted and added to A’s and B’s total.

A 45 + 3 votes 48 votes
B 30 + 22 votes 52 votes
C 25 votes

Hurrah! After two rounds, B was the first candidate past the post. A had an early lead but couldn’t keep up as far as the finishing line, overtaken by B in the closing straight.

So remember, the vote counting system that works in a “first past the post” manner is the one called “Alternative Vote”. The one without anything like a “post” is called “First past the post”. Clear?

Picture credits:
Vote Goat by Jeremy Richardson (Mr Jaded on flickr).
Racing demons by Simon Webster (shaggy359 on flickr).

Digital photography is not rocket science. It just seems that way.

Here’s a TV advert for a camera touting the benefits of film cameras over digital cameras. I’m almost inclined to wonder if this advert is a parody, but even so, it has a point.

Let’s watch…

Photography for technophobes.

I’m reminded of when I was lending my digital camera to a friend some time ago. She knew how to use a film camera, but the technological revolution had, alas, left her behind.

She had no problem with the LCD display on the back. This was why she wanted to borrow my camera in the first place after she saw me using it. Taking a picture while holding the camera at arms length is a lot easier than holding it up to the eye.

Showing her how to browse old pictures took a bit of teaching but she soon picked it up. It helped a lot that this camera had a big switch with only two settings; taking-pictures or looking-at-pictures.

The big stumbling point was when I showed her how to use memory cards. I tried to explain how it stores pictures, but I got a lot of blank looks. I finally said “This card is like the film.” There was a sudden look of understanding on her face.

The analogy to traditional film cameras worked perfectly. I told her that the photo shops will develop (print) her pictures, produce negatives (make a CD copy) and clean the film off to be reused again. If she needed more film, she could buy some by asking for a “128 MB SD” at the shops (which might tell you when this story took place).

Embrace the metaphor!

Film cameras are devices that direct photon particles in order to induce chemical reactions in silver halide particles mounted on sheets of cellulose acetate.

Somehow, the camera industry managed to sell us cameras without having to give us chemistry lessons first. And yet, we all need computer science lessons to use digital cameras. People never really cared about the chemical processes of film photography and we shouldn’t have to care about bits, megabytes and other pieces of jargon that can be abstracted away.

So, here are my suggestions for the digital camera industry.

1. Standardise!
Why are there so many memory card formats? As far as I can tell, they’re all flash memory chips contained in differently shaped blobs of plastic. The industry needs to pick one shape of blob and stick with it. No inventing new blobs unless there’s a really good reason to.

2. Call memory cards, ‘digital film’.
Embrace all the metaphors. If the world already has a name for something, don’t come up with a different name for it.

3. Tell me how many pictures it can store, not how many gigabytes.
This one will be tricky, as the size of a picture depends on the number of pixels. So while I don’t think we could realistically get rid of the “GB”, cameras need to help the user by telling us how many pictures are in a “GB” at that particular time.

4. Cameras should come with a reasonably sized card as standard.
How would you feel if you bought a camera, but later found the lens was extra? Digital film (getting used to the phrase yet?) is reusable and will probably last as long as the camera itself. So why not bundle it with the camera and save your customers the hassle.

5. Photo printing shops to provide archival DVDs as a standard part of the service.
People using film cameras expected their negatives as part of the service. Copying a few gigabytes full of pictures to a DVD should be cheap enough that it could be offered free to anyone who wants to print a vacation’s worth of snaps.

Hang on, did that advert just say two cameras for ten dollars? Forget everything I just wrote, that’s a bargain!

Picture credits:
‘Film and SD card’ by ‘sparkieblues’ of flickr
‘Leica’ by ‘AMERICANVIRUS’ of flickr