Clever and totally pointless – my first publication

Way back in the early 90s, I subscribed to a magazine (think of it like a big website but printed on paper and sent through the post) called ‘PC Plus’. It included a section called “Wilf’s Programmers Workshop” where every month, Mr Wilf Hey would present a project (usually written in GW-Basic) and discuss the principles at work. It was here where I first managed to get something clever into print, except I didn’t do it quite right.

There would usually be a brief digression at the end of his section, and in one issue, he discussed the idea of a “quine”, a program whose only function is to generate its own source code.

printf(f,34,f,34,10);

It was from this I had an idea of a creative way to produce a quine of my own. I just had to be liberal about the definition of a programming language. Here’s my (faulty) recollection of Mr Hey’s write-up of my entry…

We had a clever entry to our discussion of self-replicating programs from Bill Godfrey who sent in a floppy disk, and it meets the rules of the game.

Run the program SELFREP.EXE and it produces the “source”, PKZIP.EXE itself. He supplies a batch file which recompiles the program. First, PKZIP “compiles” SELFREP.OBJ (instead of .ZIP) and then the “linker” ZIP2EXE is invoked to produce the completed executable program.

Unfortunately, because Mr Godfrey didn’t write PKZIP, he’s technically disqualified from this contest.

Once the initial excitement of appearing in print wore off, I was kicking myself for not thinking my idea through. I only used PKZIP.EXE as the source file because I needed a file to be the source code, and PKZIP itself seemed the most applicable for that role. That decision alone disqualified me.

What I should have done is supply some “source code” such as…
   /* A self replicating program by Bill Godfrey. */
   Go();

The batch file should have just compiled (zipped) that two line text file and then linked (zip2exe) it. Running the generated EXE would have produced the same two line text file back. It would have totally complied with the rules and I would not have been disqualified! Grrrr…

I’ve long since lost that edition of PC Plus. If anyone reading this has a copy, I’d love a scan of that page please.

Picture credits
“Reading a magazine” by flickr user “ZaCky ॐ”.
“Danger – Self Replicating Device!” by Sam Ley, aka flickr user “phidauex”.

Vinegar – refined Vigenère – can you break my cipher?

I’m idly interested in cryptography, the art of scrambling a message so that it can be transmitted securely, and only someone with the magic key can understand the message.

When I was young, I designed a cryptographic algorithm. I thought I was so clever, but just because *I* couldn’t break it, that doesn’t make it secure.

In this article, I present my naive cryptographic algorithm. It’s very flawed, so please don’t use it for anything important. Can you find the flaw?

This article will start with some background on substitution ciphers and the Vigenère cipher, which my method was based upon. Then, we’ll look at my big idea itself, Vinegar. To keep it interesting, there’s a little code breaking challenge as well. Enjoy!


How Etaoin Shrdlu defeated substitution ciphers.

Like most children, my first encounter with cryptography was a substitution cipher. A friend gave me a sheet of paper with each of the 26 letters and wacky squiggle next to each one. This would be our secret code. Replace each letter with it’s squiggle and it would just look like a bunch of squiggles.

We thought it was unbreakable, but it wasn’t. This sort of code can be cracked by knowing that E is the most common letter in English, so the most common squiggle in the hidden message is probably an E. The next most common squiggle is probably a T. Once you’ve covered the twelve most common letters in English; E, T, A, O, I, N, S, H, R, D, L and U;
♦ou ♦an easil♦ ♦or♦ out ♦hat the other ♦issin♦ letters ♦ould ♦e.

What we need is a cipher the produces a coded message with an equal mixture of symbols. Enter Vigenère.

The Vigenère cipher

Centuries ago, the choice of people who wanted to communicate in private was Vigenère. Here’s how it works.

          A 0
Z 25 B 1
Y 24 C 2
X 23 D 3
W 22 E 4
V 21 F 5
U 20 G 6
T 19 H 7
S 18 I 8
R 17 J 9
Q 16 K 10
P 15 L 11
O 14 M 12
N 13

(This was meant to be circular.)

The key to understanding a lot of cryptography is that the 26 letters of the English language can be used as numbers. On this chart, each letter has been given a number. Now, it’s possible to do simple calculations with letters. What’s C+C? The answer is E, because 2+2=4.

Don’t get excited, but you can do subtraction as well. K-H is D, because 10-7=3.

You may be wondering what should happen if you add past ‘Z’ or subtract past ‘A’. For our purposes, imagine the 26 letters on a clock face in a circle. On a normal clock, when a hand ticks past the 12, it moves onto 1, not 13. Its the same with the Vigenère clock of letters. After the letter ‘Z’, is the letter ‘A’.

Finally, because we’ve constrained our system to these 26 values, adding ‘B’ (1) turns out to be the same as subtracting ‘Z’ (25). Regardless of where you start, performing either +B or -Z will end at the same letter. In fact, all of the 26 possible additions will have an equivalent subtraction. You can find each letter’s pair on the chart by looking for the letter across from the other. ‘A’ and ‘N’ are self pairing; +A is the same as -A, and +N is the same as -N.

Working within this system, we can use this to encrypt secret messages. Imagine Bob and Carol wish to communicate in private, but the bad guys can read their messages. To stop the bad guys, Bob and Carol meet up in advance and agreed the code they will use future.

Later, Bob wants to send the message “But now you will die by chainsaw” to Carol. (He’s a fan of Internet cartoons.) Now we have the system of adding letters together, Bob can perform a simple calculation on each letter of the secret plain text message. He takes the previously agreed keyword, “WILHELM”, and adds each letter of the keyword to each letter of the plain text, repeating the keyword as often as needed;

  BUT NOW YOU WILL DIE BY CHAINSAW
+ WIL HEL MWI LHEL MWI LH ELMWILHE
  XCE USH KKC HPPW PEM MF GSMEVDHA

When Carol receives the encoded message, she can get back the plain-text by subtracting the keyword.

  XCE USH KKC HPPW PEM MF GSMEVDHA
- WIL HEL MWI LHEL MWI LH ELMWILHE
  BUT NOW YOU WILL DIE BY CHAINSAW

Vigenère fixes the flaw with substitution ciphers because all the letter Es in the original message will all (mostly) come out as different letters.

Vigenère eventually fell out of use once a new flaw was discovered. Frequency analysis was still hiding there. If you know that the length of the keyword is 7, then you know that every 7th letter was encoded with the same letter from the keyword. So if you circle the 1st, 8th, 15th, etc letter of a long enough hidden message, the most frequent letter of those circled letters is probably an ‘E’, etc. Repeat for each group of letters 7 spaces apart and you can work out what the plain-text message was.

(How did we know the keyword was 7 letters long? There are ways. Wikipedia have an in-depth description but you don’t need to know how for this puzzle.) If you want to experiment, “Sharky” has a rather splendid web-app to perform the encoding and decoding.

Vinegar – Refined Vigenère

This is my improvement to the Vigenère cipher, which I called Vinegar. (Because that’s close to how I kept mispronouncing it.)

The problem with Vigenère is that the keyword is repeated, and that repetition exposed the vulnerability. What we need is a keyword that’s long without repeating, but small enough to be remembered.

Vinegar takes a 17 letter keyword and expands it to 210 letters. With that long 210 letter keyword, you can use Vigenère without having to repeat the keyword, and it’s the repetition in Vigenère that exposes it’s vulnerability.

Why 17? Because it’s the sum of the first four prime numbers. 2+3+5+7=17. We’ll use the keyword “WILHELMVONHACKENS” for this example.

Split the keyword up into groups of 2, 3, 5 and 7 letters. WI, LHE, LMVON and HACKENS.

Repeat each sub-key to make up 210 letters:

  WIWIWIWIWIWIWIWIWIWIWIWIWI...
  LHELHELHELHELHELHELHELHELH...
  LMVONLMVONLMVONLMVONLMVONL...
+ HACKENSHACKENSHACKENSHACKE...

Because each group has a prime number length, the four groups will effectively expand into a 210 letter Vigenère keyword for the price of a 17 letter keyword. (210 is 2x3x5x7.) Add each column together to get the long key…

ZBXRUKLROI YCPVUERRZP DMYCEEZGYZ ULZUIETMYK BQJDPOTUNJ LHIEHSTOTJ WONOQZDOBY VYENRRHOVE VJLSBAOYVM KIVJABGCVG QIGQFLPJFG YXFAWKQBJG SDFLDPAKQQ SLUKNGZLIU SFAKYNEVRB CFIZXXVUST GFCFXICZCC NPCNMHMQBD FTCHSHXXGN OAMHAWWHXM PSQHUCWSER

To avoid the Vigenère vulnerability, we can only use a 17 letter keyword once per 210 letters. So if you are encoding a longer message, use the last 17 letters of each 210 block for a new keyword to use for the next block of 210 characters.

So there’s my cipher. It has a flaw. Can you work it out? To try your hand, I came up with a random 17 letter codeword and expanded it to 210 letters. I then used that long key to encrypt my secret message. (My message is a little shorter than 210 letters, so I left some at the end un-used.) The plain-text is perfectly normal English. Spaces and punctuation are retained from the plain text.

Iwy seix zfvdzykjxm moebj dkaavmin vjkehleozp atir sdvwkvm cf hhbd vw gauj ty qzintte av mjbo xr xxnb whuieift, zaed jmioidh xv ts xtt elmv fg zok xgwnlpn vbues mp irmc twpb, yebhaoz rdlnrpbj jgg kzmlkyah vo dvta hn jzfxggaxcq.

Can you decode my message without the key? Post me a comment. Enjoy.

Picture Credits:
I’m lying by Taylor Dawn Fortune
Local Praire Dog Gossip by Art G
With grateful thanks to Richard Heathfield.

The cult of 140

Apparently, women don’t understand the offside rule. At least that’s according to some TV sports pundit who lost his job recently.

I don’t really understand the offside rule either, so I wrote this on my facebook page in response to the news.

The key to understanding the offside rule is that it doesn’t really matter what the rule is.

Make up any old rubbish, like “Goal keepers must be pipe smokers” and call that the offside rule. It is just as meaningful.

Meh. Hardly my best work, but I thought it just about good enough to post it on my twitter feed too.

That’s where I met… the cult.

FlßgenWeb, Späcecode, TwitZÜne, Ass MÜde

Set in stone.

Twitter is famously limited to 140 characters. My message went over that limit by 78 characters. What to do?

“If it’s too long for 140 characters, make it a blog post and post a message with a link.”
Okay, but really? “Read my hilarious thought on the offside rule! http://bit.ly/√ế№Ω” (75 characters to spare! Yay!)

So my twitter readers would see my teaser message. A few may even be bothered enough to follow the link, but they would be disappointed to have made the effort of loading the page only to get such a short message.

Remember, Twitter is for short messages like mine. What can I do keeping within the Twitter ecosystem?

“The 140 limit forces people to concentrate on what’s important. Cut out the flab!”
Okay. I started with the counter at 78 characters over. Time to start trimming down until it fits. I finally got it down to…

“The key to the offside rule is that it doesn’t matter what it is. Making up some rubbish and calling it the offside rule is as meaningful.”

It was already a rather poor piece of writing when I started. Now, I couldn’t even find space for the bit about pipe smoking goal keepers. Just take it away and put it out of it’s misery!

So I’d like to challenge the 140 character advocates out there. Can you improve on my effort? Take my original message, trim it down to 140 characters and post it as a comment.

<Update> An anonymous commenter came up with
“It doesn’t matter what the offside rule is. It could be any old rubbish like “Goal keepers must be pipe smokers”. It is just as meaningful.”.
That’s probably the best the could have been done within the 140 limit, but this is the point; Is this shorter version better than my original version? In my biased opinion, no. The whole point of my message was about understanding the offside rule. Lose that word and it looks like I’m commenting on football itself.

It seems there isn’t enough room for big complicated words like “understanding”.</Update>

(Pre-emptive snarky comment: I’ve trimmed out all the bad parts of your message. I can’t post it because there’s none left!)

Picture credits:
“little ref” by Richard Boak.
“140” by Gabriela Grosseck.

‘First Past the Post’ isn’t.

I’m a bit of a nerd for vote counting systems. So I discovered with delight that in the UK, there will soon be a referendum on changing the way votes are counted. As I write this, most of the UK (and the USA) uses a method we call “First past the post”.

It’s called “First past the post”, but it isn’t. The name is about as ridiculous as calling North Korea;  â€œThe Democratic People’s Republic of Korea”. The country is on the Korean peninsula, so at least that name is a little bit honest.

Put what you know to one side and think about what a vote counting system called “first past the post” should look like by the name alone. It’s a clear analogy to running in a race with a “post” at the end of the track. So there’s a predetermined number of votes and the first candidate to get past that threshold wins?

Nope. “First past the post” doesn’t work that way at all.

Imagine a vote of 100 voters selecting one of three candidates; A, B and C.

A 45 votes
B 30 votes
C 25 votes

If we put the “post” at 51 votes (a simple majority), then all three candidates lost. They all fell over and dropped out of the race before anyone reached the finish. Three pathetic failures. Rather than declaring no-one the winner, the people organising the race then dig up the post and re-plant it just behind A, as if the post was always there. He is declared the first one to pass the post.

If that wasn’t confusing enough, the other counting system to be offered in the referendum, “Alternative Vote” or “AV”, does indeed look like how you’d imagine “first past the post” to work. To continue the analogy with AV, candidate C drops out of the race after the first round. The second preferences of C’s voters are counted and added to A’s and B’s total.

A 45 + 3 votes 48 votes
B 30 + 22 votes 52 votes
C 25 votes

Hurrah! After two rounds, B was the first candidate past the post. A had an early lead but couldn’t keep up as far as the finishing line, overtaken by B in the closing straight.

So remember, the vote counting system that works in a “first past the post” manner is the one called “Alternative Vote”. The one without anything like a “post” is called “First past the post”. Clear?

Picture credits:
Vote Goat by Jeremy Richardson (Mr Jaded on flickr).
Racing demons by Simon Webster (shaggy359 on flickr).

Digital photography is not rocket science. It just seems that way.

Here’s a TV advert for a camera touting the benefits of film cameras over digital cameras. I’m almost inclined to wonder if this advert is a parody, but even so, it has a point.

Let’s watch…

Photography for technophobes.

I’m reminded of when I was lending my digital camera to a friend some time ago. She knew how to use a film camera, but the technological revolution had, alas, left her behind.

She had no problem with the LCD display on the back. This was why she wanted to borrow my camera in the first place after she saw me using it. Taking a picture while holding the camera at arms length is a lot easier than holding it up to the eye.

Showing her how to browse old pictures took a bit of teaching but she soon picked it up. It helped a lot that this camera had a big switch with only two settings; taking-pictures or looking-at-pictures.

The big stumbling point was when I showed her how to use memory cards. I tried to explain how it stores pictures, but I got a lot of blank looks. I finally said “This card is like the film.” There was a sudden look of understanding on her face.

The analogy to traditional film cameras worked perfectly. I told her that the photo shops will develop (print) her pictures, produce negatives (make a CD copy) and clean the film off to be reused again. If she needed more film, she could buy some by asking for a “128 MB SD” at the shops (which might tell you when this story took place).

Embrace the metaphor!

Film cameras are devices that direct photon particles in order to induce chemical reactions in silver halide particles mounted on sheets of cellulose acetate.

Somehow, the camera industry managed to sell us cameras without having to give us chemistry lessons first. And yet, we all need computer science lessons to use digital cameras. People never really cared about the chemical processes of film photography and we shouldn’t have to care about bits, megabytes and other pieces of jargon that can be abstracted away.

So, here are my suggestions for the digital camera industry.

1. Standardise!
Why are there so many memory card formats? As far as I can tell, they’re all flash memory chips contained in differently shaped blobs of plastic. The industry needs to pick one shape of blob and stick with it. No inventing new blobs unless there’s a really good reason to.

2. Call memory cards, ‘digital film’.
Embrace all the metaphors. If the world already has a name for something, don’t come up with a different name for it.

3. Tell me how many pictures it can store, not how many gigabytes.
This one will be tricky, as the size of a picture depends on the number of pixels. So while I don’t think we could realistically get rid of the “GB”, cameras need to help the user by telling us how many pictures are in a “GB” at that particular time.

4. Cameras should come with a reasonably sized card as standard.
How would you feel if you bought a camera, but later found the lens was extra? Digital film (getting used to the phrase yet?) is reusable and will probably last as long as the camera itself. So why not bundle it with the camera and save your customers the hassle.

5. Photo printing shops to provide archival DVDs as a standard part of the service.
People using film cameras expected their negatives as part of the service. Copying a few gigabytes full of pictures to a DVD should be cheap enough that it could be offered free to anyone who wants to print a vacation’s worth of snaps.

Hang on, did that advert just say two cameras for ten dollars? Forget everything I just wrote, that’s a bargain!

Picture credits:
‘Film and SD card’ by ‘sparkieblues’ of flickr
‘Leica’ by ‘AMERICANVIRUS’ of flickr

reddit’d (Followup to ‘Construct Something Else’)

Fame at last! Fame at last!

My last piece, “Construct something else!” got a bit of attention when someone posted it on reddit.That was unexpected.

Remember the rule; If you publish something that’s a bad idea in hindsight, post a “clarification” article claiming you’ve been misunderstood and that you never thought it was a good idea in the first place. Then hide in the shower.

You see, I think I’ve been misunderstood. I was reading stackoverflow and I found the question asking about c# constructors. There was the comment from Eric Lippert, talking about the possibility of implementing this feature, but they were lacking a good reason to undertake the effort. Then I remembered I had exactly what he was looking for, a real-world use case! So I wrote up my experiences in a blog post and left a comment on the stack overflow question.

I thought I was rather clear that I was just providing Mr Lippert with a use case, rather than actually advocating it. Nonetheless, some people mistakenly took my post as advocacy and responded as such. Now if you’ll excuse me, I’m going to go have a shower.

🙂

But seriously, I remain of the opinion that implementing pseudo-constructors would be a good thing, but probably not worth the time for Microsoft to implement. But first, a quick aside to clarify (there’s that word again) how it could work. Just so we’re all clear (!) on what it is I’m advocating.

A pseudo-constructor would essentially be a static factory function. Call it, and it returns an instance of the class, perhaps using a private real-constructor inside. The only difference being that it can be called using the new operator. The compiler sees that the parameters match the pseudo-constructor signature and it generates code to call that static function instead. From a MSIL/CIL view, it’s just like a normal static function.

So why would this be a good thing?

Changing the interface without changing caller code.

This is the reason I raised in my original post. If version 1 of a DLL has a real-constructor, version 2 can use a pseudo-constructor in it’s place. The caller code would have to be recompiled, but the C# code would not need to be modified.

Intellisense™ simplification.

How many times have you needed an instance of particular class, typed new ClassName, only for Intellisense™ to show that no constructors are available. You slap your forehead and remember that this class uses static factory functions instead. If these could be called with a new operator, they would all appear in the same list.

(I suspect this was the original motivation for considering the feature in the first place.)

That’s it?

There’s a few good reasons not to make this change, which I’ll briefly discuss. Enjoy.

They won’t be real constructors.
(Thanks to reddit user “grauenwolf”.)

Sometimes, only a real constructor will do. When writing a subclass constructor, you can call the base class’s constructor just before the first opening brace using the base keyword. This would have to be a real constructor call, as you can’t just decide which base class to use at run-time.

Adding pseudo constructors doesn’t take away real constructors, but it might lead to confusion when people see that a base classes constructors have gone missing.

You don’t need it.
(Thanks to “Anthony” for commenting on the original post.)

You can do all this by making a class full of delegate instances. The constructor can select what functions to fill into those delegates at run-time. Add some [Obsolete] attributes so anyone writing new code will code against the new preferred objects.


So I don’t think this new feature would break anything, except it would be taking up the time of the clever people at Microsoft. Nice to have, but we don’t need it.

If you’re in the mood for discussing future directions of the C# language, please take at look at my earlier piece on destructors for structs. I’m interested in any thoughts on the subject or any reasons why it wouldn’t work.

I hope I’ve gained a little bit of an readership from this experience. If you’re reading this, please leave a comment. Without comments, we’re just bumping around in a closed system and tending towards entropy. Here’s some nice charts for a bit of insight on the reddit people.


Picture credit:
“The Walk of Fame” by flickr user Storm Crypt.
Readership charts by blogger.

Construct something else! (C#)

Please read my follow-up post after reading this one.

Quoth rjw on stackoverflow

Given the following client code:

    var obj = new Class1();

Is there any way to modify the constructor of Class1 so that it will actually return a subclass (or some other alternate implementation) instead?

C# compiler guru, Eric Lippert commented…

We are considering adding a feature “extension new” which would essentially allow you to make a static factory method that would be called when the “new” operator is used, much as extension methods are called when the “.” operator is used. It would be a nice syntactic sugar for the factory. If you have a really awesome scenario where this sort of pattern would be useful, I’d love to see an example.

I have one!

Version one of our DLL had a class that wrapped a connection to a remote server.

    using (var connect = new ExampleConnection("service.example.com"))
    {
        connect.DoStuff(42);
    }

It worked great. Our customers were very happy with it and developed lots of code to use our little DLL. Life was good.

Time passes and our customers ask us to add support for a different type of server that does a similar job but with a very different protocol. No problem, we develop a new class called DifferentConnection and just to be helpful, both ExampleConnection and DifferentConnection implement a common interface object.

We’re about to release version two to our customers, but a common response comes back;

“This is good, but we were hoping your library would automatically detect which variety of server it’s talking to. Also, we really don’t want to change our code. We want to just drop your updated DLL into the install folder, but we’ll recompile our EXE if we really have to.”

With these new requirements, ExampleConnection had to become a class that supported both varieties of remote server. The constructor has to perform the auto-detect, and all of the public functions now all begin with an if statement, selecting for which variety of remote server is in use.

If we had a bit more foresight, we should have supplied a static Connect function that wrapped a private constructor. That way, version two of this function could have returned a subclass object instead. But we didn’t. There are costs to writing code that way, so you wouldn’t do it unless there was a clear point to it. If a normal constructor could return a subclass instead, there would be no problem.

Mr Lippert, I hope this provides the justification you need to add this to .NET 5, but I’d much rather have destructors on structs instead. I also want a pony.

Picture credit: ‘LEGO Mini Construction Site’ by flickr user ‘bucklava’.
(I don’t really want a pony.)

UPDATE: Someone submitted this to reddit. Lots of discussion there.
UPDATE(2): Follow-up post.

Google snooping WiFi? Don’t panic! Don’t panic!

Google have got into a bit of hot water when it emerged that while their cars drove around taking pictures for their Street View service, they collected and stored people’s private WiFi traffic. People have understandably got angry with Google for doing this, but I think some demystification is in order.


Did they collect my private data?

If your WiFi access point uses WPA with a good pass-key, don’t worry. Your network traffic is encrypted and is just noise without that pass-key.

If your WiFi is “open”, then anyone within range can collect and look at your network traffic. I would be more worried about that creepy guy in the van parked around the corner, maliciously snooping on you, spamming and browsing dodgy websites. Worrying about Google would be way down my list. Take this opportunity to switch on WPA on your access point. This article will still be here afterwards.


How come their camera cars collect WiFi data at all?

It can be used to supplement or replace GPS. Google are in the mapping and navigation business, and knowing where you are is essential to helping you get where you are going.

If you go out to some random spot in a built-up area and switch on your laptop’s WiFi gizmo, you’ll find several access points, both public and private, all with a variety of weird names. Make a list of those access points and their signal strength, compare it against a list of known access-points and their previously monitored location, do a few calculations and you’ll have your location.

No need for GPS electronics, just use the same WiFi electronics your laptop will have anyway.

That’s very nice, but even my WiFi address is private. They shouldn’t have collected even that.

Is it? By necessity, your WiFi access point has to broadcast it’s identity to the public in the clear. Your neighbours might be using WiFi too, possibly within range of your own laptop. When it hears something broadcast, it loads the packet and looks to see if its from an access point it knows about. It’ll be receiving lots of noise from your neighbours and silently throwing away anything it’s not interested in.

Now apply the principle that no-one else should even look at a packet’s identity. You’ll have no way of knowing which packets are yours and which are someone else’s unless you do look at the access points identity. It’s part of the protocol.

But they collected private traffic as well as just the access point’s identity. How could that be accidental?

Even when you are only interested in the identity of an access point, you need to collect a whole packet before it’s useful. The trouble is that radio on it’s own is subject to noise and interference. To fix this, the clever people that designed the WiFi protocols added a noise check. Before a packet is broadcast, some simple calculations are done on the content of the packet and the result is added on the end. The recipient takes the packet and performs the same calculation on the content. If the result the recipient ends up with is the same as the number on the end of the packet received, it can be reasonably sure the packet arrived without errors.

For this to work, the recipient needs the whole packet. If they only listened to first bit where the sender’s identity is stored, there is a risk of noise creeping in, masquerading as correct data.

But why did they store the whole packet after the error check has passed?

Very little of a software developer’s work is making things from scratch. Instead, we reuse and build upon work done in the past. We make reusable components that can be reused for different things.

I can only speculate here, but I imagine that when Google put this project together, they would have taken a generic WiFi receiver component which has been well tested and trusted rather than build an entirely new one. The packet is the natural unit of a WiFi receiver, so it would be expected that generic components designed to deal with WiFi traffic would store whole packets as a matter of routine.

Wouldn’t they have noticed a huge data file if they were only planning to store a fraction of what they did collect?

They would have been taking pictures and collecting many image files at the same time, so the space taken up by captured WiFi traffic would be a small proportion. Even if they were only collecting WiFi locations, the amount of storage that would be required in the field isn’t quite so predictable. Databases aren’t simple files where one item is stored one after the other, but are complex structures with indexing and redundant copies.

I imagine that if I were an engineer at Google and I wanted an estimate of how many hard disks to buy, I would send the car out on a short test journey and see how big the database is when it came back. Multiply that figure by however far the car will be going and that’s how much storage I’ll need. Hard disks are not that expensive these days, so spending engineer’s time working on reducing the amount of storage needed might not be a good economy.

Even so, collecting private network traffic is illegal. If I were caught eavesdropping, I probably wouldn’t get away with it.

(I’m not a lawyer, and this is not legal advice. If you take legal advice from a software engineer, you’re insane.)

If Google were taken to a criminal court over this, they could show that there was no intention to eavesdrop as I’ve outlined. If they take steps to securely destroy the additional collected data, no-one has been harmed here. Prosecuting this “crime” would be a petty reaction to a simple oversight.

But I don’t trust Google to not look at and abuse the collected private data.

If you’re not using WPA, your private data has been broadcast to all and sundry in range since you started using it, and you’re only worried now?

Picture credit: ‘Shot of Daventry area while cycling’ by… me!

Wishing for a destructor (C#)

 

I like the C# programming language. It feels like C++ done right, divesting itself of much of the C legacy that complicates matters so much. When I do programming, I prefer to use this language. Having to go back and deal with C++ just doesn’t give me that warm feeling like it used to.

But, I have a pet peeve that I miss from C++.

Choose… Choose the form of the Destructor!

Quick recap. Here’s a brief C++ function…

 void MyFunction()
{
string x;
MyOtherFunction(x); /* Pass by value. */
}

Doesn’t look like much is going on, but there’s five function calls in there.

  1. A constructor function is called to build x.
  2. A copy constructor function is called to copy x for the function call.
  3. ‘MyOtherFunction’ is called.
  4. A destructor function is called to tidy up the copy of x.
  5. The same destructor function is called to tidy up x itself.

The clever bit is that the compiler has worked out when objects go “out of scope” and inserted calls to that’s objects destructor function in exactly the right place. Even “anonymous” objects are tidied up. Say a function is called that returns an object, but the caller just ignores the return value. The compiler spots it and inserts the destructor call in just the right place.

C# doesn’t do this. Instead, from the very beginning of the language, unused objects are “garbage collected”. Every so often, some code will run that goes over everything built by the program and sees if its being used anywhere. Anything that can’t be traced to running code is removed. Doing it this way allows the programmer to share objects between two different areas of code, without having to worry about which one has responsibility for tidying up.

I imagine that when the very clever people at Microsoft designed the C# language, they had already decided to use garbage collection, and so concluded that this messing about with destructors was no longer needed. No need to insert a function call into code, just leave the object lying around and the garbage collector will deal with it.

This would all great if memory was the only resource we have to keep track of. Open file handlers, database connections, etc. All must closed in a deterministic manner, instead of at some unknown time in the future when memory is about to run out.

Microsoft didn’t leave us completely out on the branch, classes that need to be tidied up can be written to implement the IDisposable interface. This allows the using block to work.

 using (SqlConnection con = new SqlConnection(db))
{
/* Use con. */
} /* Dispose con. */

With the using block, just like with the C++ destructors shown above, the compiler inserts a call to the tidy-up function at the end of the block. Even if there’s a return or throw statement in the middle, it’ll make sure everything is tidied up when the code leaves the using block.

But why have the using block at all? If you forget to include the using block, the tidy-up code won’t be called (unless you invoke it manually) and you won’t even get a compiler warning. (You don’t get the warning for very good reasons which I won’t go into right now.)

Even when you use using correctly, adding a using block to a function means introducing an additional block, with all the block-visibility issues and additional indenting that implies.

Structs to the rescue

Fortunately, C# and .NET come with a type of object called structs. These are similar to classes except they are solid value types rather than references to data floating in the ether. The practical difference is that when a struct value is copied (such as when passed into a function as a parameter) and you change the value of the one of the copies, the other copy stays the same.

In contrast, when you copy a class value, you’re instead just making a copy of the reference, so both point to same data. Change the contents of one, and the other changes value too, because there is no “other”.

So what if, when a struct appears in code, it came with an automatic using block attached? That way, we could open files or database connections just by introducing one in code and it would be tidied up in a deterministic way.

To complete the job, we would need mechanisms to support copy constructors and assignment as well as the final destructor call, just like the C++ people are used to.

I’ve been nursing this peeve and whining about it for so long that I’m even boring myself. I plan this to be my last word on the topic and in future I’ll just post links to this article. Enjoy.

Picture credits
“staypuft_3feb2009_0621” by patrick h. lauke on flickr
“choose determinism” by alyceobvious on flickr
“John E. Cox Memorial Bridge” by Elizabeth Thomsen on flickr

Paying for Power

Being an evil genius, I’m obsessed with getting as much power as possible. If only I could get power for nothing, but alas, I have to pay for it.

In England, and most of the western world, we have a well established system of sending electricity from the power stations to me and sending money in the opposite direction. It works, but I think we can improve on it.

An Electron’s Journey

Electricity starts life at the power stations. They sell their power supply on the grid at market rates, competing with other power stations. The price of electricity fluctuates over the day. If the price goes down far enough, they might switch off the generators, keeping their raw materials for when the price goes up. A wind farm can’t keep stocks of wind in reserve, so they will stay online all the time regardless of the price.

We, the public, never see those fluctuations in price. Instead, we purchase electricity from a supplier who deal with the power stations. The suppliers usually charge us a fixed amount per unit of energy, sometimes having a daytime rate and an overnight rate, but the price they charge us is fairly stable, only changing the rates every few months.

(As well as the suppliers, we also pay the companies that maintain the grid system and meters in our homes. This article is not about them.)

When all is said and done, what do the suppliers actually do? They don’t generate the electricity and they don’t bring it to us. They are middle-men who flatten out the price, charging a bit more than the expected average price, like an insurance premium, to compensate for the risk of over-demand and price rises. Do we need that service? We have insurance to spread the risk of unexpected events, not for the everyday costs of life.

What if, instead, we had a minimal supplier that just handles the accountancy at a low cost, quoting a price that changes every five minutes, tracking the wholesale price. (Perhaps having an easy to use gizmo that displays the current price.) With this type of supplier, we would probably save money over the long term. After all, we wouldn’t be paying that insurance premium any more.

But more important than that, it would give us an interest in when we use electricity. At the moment, we really don’t care that the price of electricity rises dramatically during the adverts on popular TV shows. We all switch on our kettles at the same time, not really caring about the economics. If we felt the rise in price, we might plan our tea making better to avoid these peaks and save some money.

This plan wouldn’t have worked when the grid was originally built, but computer and communications technology have advanced to point where we can finally think about pulling down the old ways of working. I’m looking forward to it.

Picture credits.
Nuclear power by koert michiels on flickr.
insurance prohibits ladders by stallio on flickr.