I want a less powerful programming language for Christmas.

I’m writing this because I’m hoping someone will respond, telling me that what I want already exists. I have a specific itch and my suspicion is that developing a whole programming language and runtime is the only way to scratch that itch.

Please tell me I’m wrong.

Dear Father Christmas…

If you’ve ever written a web service, you’ve almost certainly had situations where you’ve taken a bunch of bytes from a completely untrusted stranger and passed those bytes into a JSON parser. What’s more you’ll have done that without validating the bytes first.

Processing your inputs without sanitizing it first? Has Bobby Tables taught us nothing?

You can do this safely because that JSON parser will have been designed to be used in this manner and will be safe in the face of hostile inputs. If you did try feeding the bytes of an EXE file into a JSON parser, it’ll very quickly reject it complaining that “MZ” isn’t an opening brace and refuse to continue beyond that. The worst a hostile user could do is put rude messages inside the JSON strings.

{ "You": "A complete \uD83D\uDC18 head!" }

Now take that idea and think about what if you did have a web service where completely unauthenticated users could use any request body they liked and your service would run that request body in a copy of Python as the program source code.

Hopefully, you’ve just now remarked that it would be a very bad idea, up there with Napoleon’s idea to make his brother the King of Spain. But that’s exactly what I want to do. I want to write a web service that accepts Python code from complete strangers and actually run that code.

(And also make my brother the King of Spain. He’d be great!)

“Hang on to your hopes, my friend. That’s an easy thing to say. But if your hopes should pass away, simply pretend that you can build them again.”

At the gates of dawn

Some time in the early 90s, I had a game called “C Robots”.

This is a game where four tanks are in an arena, driving around and firing missiles at each other. But instead of humans controlling those tanks, each tank was controlled by a program written by the human player. The game controller would keep track of each tank and any missiles in flight, passing back control to each tank’s controller program to let it decide what its next move will be.

For 90s me, programming a robot appealed to me but the tank battle part did not appeal so much. I really wanted to make a robot to play other games that might not involve tanks. At the time, there were two games I enjoyed playing with school friends, Dots-and-Boxes and Rummy. I had an idea of what made good strategies for these specific games, so I thought building those strategies into code might make for a good intellectual exercise.

Decades passed and I built a simple game controller system which I (rather pompously) called “Tourk“. I had made a start on the controllers for a handful of games but I hadn’t gotten around to actually writing actual competitive players, only simple random ones that were good for testing. I imagined that before long, people would write their own players, send them in to me and I’d compile them all together. After I’d let it ran for a million games in a tournament I’d announce the winner.

If anyone had actually written a player and sent it in, my first step would have been to inspect the submitted code thoroughly. These would have been actual C programs and could have done anything a C program could do, including dropping viruses on my hard disk, so inspecting that code would have been very important. Looking back, I’m glad no-one actually did that.

But this was one thing C Robots got right, even if it wasn’t planned that way. Once it compiled the player’s C code, it would run that code in a restricted runtime. Your player code could never go outside its bounds because there’s no instructions in the C Robots runtime to do that. This meant that no-one could use this as an attack vector. (But don’t quote me on that. I’ve not actually audited the code.)

“I never ever ask where do you go. I never ever ask what do you do. I never ever ask what’s in your mind. I never ever ask if you’ll be mine.”

Will the runtime do it?

Could maybe the dot-net runtime or the Python runtime have the answer?

This was one of the first questions I asked on the (then) new Stack Overflow. The answer sent me to Microsoft’s page on “Code Access Security” and if you follow that link now, it says this feature is no longer supported.

Wondering more recently if Python might have an option to do what I wanted, I asked on Hacker News if there was a way to run Python in the way I wanted. There were a few comments but it didn’t get enough up-votes and disappeared fairly quickly. What little discussion we had was more to do with a side issue than the actual question I was asking.

I do feel that the answer might still be here. There’s quite possibly some flag on the runtime that will make any call to an extern function impossible. The Python runtime without the “os” package would seem to get 90% of the way there, but I don’t know enough about it to be certain enough that this won’t have left any holes open.

“We’re all someone’s daughter. We’re all someone’s son.”

Sanitize Your inputs?

Maybe I should listen to Bobby Tables and sanitize my inputs before running them.

Keep the unrestricted runtime, but before we invoke it to run the potentially hostile code, scan it to check it won’t do any bad things.

Simple arithmetic in a loop? That’s fine.
Running a remote access trojan? No.

Once it has passed the test, you should be able to allow the code to run, confident it won’t do anything bad because you’ve already checked it won’t. This approach appeals to me because once that initial test has passed the code for non-hostility, we can allow the runtime to go at full speed.

The problem with this approach are all the edge cases and finding that line between simple arithmetic and remote-access-trojans. You need to allow enough for the actually-not-hostile code to do useful things, but not enough that a hostile user could exploit.

Joining strings together is fine but passing that string into eval is not.
Writing text to stdout is fine but writing into a network socket is not.

Finding that line is going to be difficult. The best approach would be to start with nothing-is-allowed, but when considering what to add, first investigate what would be possible by adding that facility to allowed list. Because it can be used for bad things, eval would never be on that allowed list.

If there’s a function with a million useful things it can do but one bad thing, that function must never be allowed.

“We can go where we want to. A place they’ll never find. We can act like we come from out of this world and leave the real one far behind.”

Ask the Operating System?

I told a colleague about this post while I was still writing it and he mentioned that operating systems can have restrictions placed on programs it runs. He showed me his Mac and there was a utility that listed all the apps he was running and all the permissions it had. It reminded me that my Android phone does something similar. If any apps wants to interact with anything outside its realm, it has to ask first. This is why I’m happy to install apps on my Android phone but not on my Windows laptop.

This would be great, but how do I, a numpty developer, harness this power? What do I do if I want to launch a process (such as the Python runtime) but with all the permissions turned off? It feels like this will be the solution but my searching isn’t coming up with a practical answer.

My hope is that there’s a code library whose job it is to launch processes in this super restricted mode. It’ll work out which OS it is running on, do the necessary magic OS calls and finally launch the process in that super-restricted mode.

“If I was an astronaut I’d be floating in mid air. A broken heart would just belong to someone else down there. I would be the centre of my lonely universe. I’m only human and I’m crashing in the dark.”

Mmmm coffee!

The good people developing web browsers back in the 90s had the same need as me. They wanting to add a little interactivity to web pages, but without having to wait for a round trip back to the server over dialup, so they came up with a language they named JS.

As you read this page, your browser is running some code I supplied to you. That code can’t open up your files on your local device. If anyone did actually find a way to do that, the browser developers would call that a serious bug and push out an emergency update. So could JS be the solution I’m looking for?

As much as it sounds perfect, that JS runtime is inside the browser. If I have some JS code in my server process, how do I get that code into a browser process? Can I even run a web browser on a server without some sort of desktop environment?

The only project I know of where someone has taken JS outside of a browser is node-js. That might be the answer but I have written programs using node-js that load and save files. If this is the answer then I’d need to know how to configure the runtime to run the way I want.

“Play the game, fight the fight, but what’s the point on a beautiful night? Arm in arm, hand in hand. We all stand together.”

Is there an answer?

I began this post expressing my suspicion that the solution is to write my own runtime, designed from first-principles to run in a default-deny mode. I still wonder if that’s the case. I hope someone will read this post and maybe comment with the unknown option on the Python runtime that does exactly what I want.

In the meantime, I have another post in the works as with my thoughts on how this runtime and programming language could work. I hope I can skip it.

Gronda-Gronda.

Picture Credits
📸 “Snow Scot” by Peeja. (With permission.)
📸 “Meeting a Robot” by my anonymous wife. (With permission)
📸 “Great Dane floppy ears” by Sheila Sund. (Creative Commons)
📸 “Fun with cling film” by Elizabeth Gomm. (Creative Commons)
📸 “Rutabaga Ball 2” by Terrence McNally. (Creative Commons)
📸 “Nice day for blowing the cobwebs off” by Jurassic Snark. (With permission.)

(And just in case advocating for your brother to be made King of Spain is treason or something, I don’t actually want to do that. It was a joke.)

Write Your Own POP3 Service

So, you want to write a POP3 service? That’s great. In this post, we’ll walk through building a simple POP3 service that uses a folder full of EML files as a mailbox and serves them to anyone logging in.

Getting Started

I’m assuming you are already set-up to be writing and building C# code. If you have Windows, the free version of Visual Studio 2019 is great. (Or use a more recent version if one exists.) Visual Studio Code is great on Linux too.

Download and build billpg industries POP3 Listener. Open up a new console app project and include the billpg,POP3Listener.dll file as a reference. You’ll find the code for this project on the same github in its own folder.

using System;
using System.IO;
using System.Collections.Generic;
using System.Net;
using System.Linq;
using billpg.pop3;

namespace BuildYourOwnPop3Service
{
    class Program
    {
        static void Main()
        {
            /* Launch POP3. */
            var pop3 = new POP3Listener();
            pop3.ListenOn(IPAddress.Loopback, 110, false);

            /* Keep running until the process is killed. */
            while (true) System.Threading.Thread.Sleep(10000);
        }
    }
}

This is the bare minimum to run a POP3 service. It’ll only accept local connections. If you’re running on Linux, you may need to change the port you’re listening on to 1100. Either way, try connecting to it. You can set up your mail reader or use telnet to connect in and type commands.

Accepting log-in requests.

You’ll notice that any username and password combination fails. This is because you’ve not set up your Provider object yet. If you don’t set one up, the default null-provider just rejects all attempts to log in. Let’s write one.

/* Add just before the ListenOn call. */
pop3.Provider = new MyProvider();

/* New class separate from the Program class. */
class MyProvider : IPOP3MailboxProvider
{
}

This won’t compile because MyProvider doesn’t meet the requirements of the interface. Let’s add those.

/* Inside the MyProvider class. */
public string Name => "My Provider";

public IPOP3Mailbox Authenticate(
    IPOP3ConnectionInfo info, 
    string username, 
    string password)
{
    return null;
}

Now, the service is just as unyielding to attempts to log-in, but we can confirm our provider code is running by adding a breakpoint to the Authenticate function. Now, when we attempt to log-in, we can see that the service has collected a username and password and is asking us if these are correct credentials or not. Returning a NULL means they’re not.

This might be a good opportunity to take a look at the info parameter. All of the functions where the listener calls to the provider will include this object, providing you with the client’s IP address, IDs, user names, etc. You don’t have to make use of them but your code may find the information useful.

A basic mailbox with no messages.

We can change our Authenticate function to actually test credentials. For our play project we’ll just accept one combination of user-name and password.

if (username == "me" && password == "passw0rd")
    return new MyMailbox();
else
    return null;

This will fail compilation because we’ve not written MyMailbox yet. Let’s go ahead and do that.

class MyMailbox : IPOP3Mailbox
{
}

Again, we’ll need to write all the requirements of the interface before we can run. So we can move on quickly, let’s provide just the minimum.

The first thing we’ll need is a list of the available messages. We’ll return an empty collection for now.

public IList<string> ListMessageUniqueIDs(
    IPOP3ConnectionInfo info)
    => new List<string>();

The service needs to know if a mailbox is read-only or not. Let’s say it isn’t.

public bool MailboxIsReadOnly(
    IPOP3ConnectionInfo info)
    => false;

The service might sometimes need to know is a message exists or not. For now, it doesn’t.

public bool MessageExists(
    IPOP3ConnectionInfo info,
    string uniqueID)
    => false;

The client might request the size of a message before it downloads it and the service will pass the request along to the provider. I’ve often suspected that clients don’t really need this so let’s just return your favorite positive integer.

public long MessageSize(
   IPOP3ConnectionInfo info, 
   string uniqueID)
   => 58;

The client will, in due course, request the contents of a message, but won’t because both the list-messages and message-exists will deny the existence of any messages, so for now, we can just return null.

public IMessageContent MessageContents(
    IPOP3ConnectionInfo info, 
    string uniqueID)
    => null;

Finally, we need to handle message deletion. Again, we don’t need to do anything just yet.

public void MessageDelete(
    IPOP3ConnectionInfo info, 
    IList<string> uniqueIDs)
{}

And we’re done. Run the code and log-in. Your mailbox will be perpetually empty but you can add breakpoints and confirm everything is running.

List the messages.

Now, let’s actually start with something useful. Let’s change our ListMessageUniqueIDs to return a list of filenames from a folder. You’ll want to replace the value of FOLDER with something that works for you.

const string FOLDER = @"C:\MyMailbox\";

public IList<string> ListMessageUniqueIDs(
    IPOP3ConnectionInfo info)
    => Directory.GetFiles(FOLDER)
           .Select(Path.GetFileName)
           .ToList();

public bool MessageExists(
    IPOP3ConnectionInfo info, 
    string uniqueID)
    => ListMessageUniqueIDs(info)
           .Contains(uniqueID);

Let’s also place an EML file into our mailbox folder. If you don’t have an EML file to hand, you can write your own using notepad. (It doesn’t care if the file has a “.txt” extension.)

Subject: I'm a very simple EML file.
From: me@example.com
To: you@example.com

Message body goes after a blank line.

If we save that into our mailbox folder and run up the POP3 service, we’ll see there’s a message available. It won’t be able to download it though.

Download the message,

The MessageContents function expects an new object that implements the IMessageContent interface.

/* Replace the MessageContents function. */
public IMessageContent MessageContents(
    IPOP3ConnectionInfo info, 
    string uniqueID)
{
    if (MessageExists(info, uniqueID))
        return new MyMessageContents(
                       Path.Combine(FOLDER, uniqueID));
    else
        return null;
}

/* New class. */
class MyMessageContents : IMessageContent
{
    List<string> lines;
    int index;

    public MyMessageContents(string path)
    {
        lines = File.ReadAllLines(path).ToList();
        index = 0;
    }

    public string NextLine()
        => (index < lines.Count) ? lines[index++] : null;

    public void Close()
    {
    }
}

This shows the requirements of the object that regurgitates a single message’s contents. A function that returns the next line, one-by-one, and another that’s called to close down the stream. The Close function could close opened file streams or delete temporary files, but we don’t need it to do anything in our play project.

Note that the command handling code inside this library has an extension that allows the client to ask for a message by an arbitrary unique ID. Make sure your code doesn’t allow, for example, “../../../../my-secret-file.txt”. Observe the code above checks that the requested unique ID is in the list of acceptable message IDs by going through MessageExists.

Delete messages.

The interface to delete messages passes along a collection of string IDs. This is necessary because the protocol requires that a set of messages are deleted in an atomic manner. Either all of them are deleted or none of them are deleted. We can’t have a situation where some of messages are deleted but some are still there.

But since this is just a play project, we can play fast and loose with such things.

public void MessageDelete(
     IPOP3ConnectionInfo info, 
     IList<string> uniqueIDs)
{
    foreach (var toDelete in uniqueIDs)
        if (MessageExists(info, toDelete))
            File.Delete(Path.Combine(FOLDER, toDelete));
}

What now?

I hope you enjoyed building your very own POP3 service using the POP3 Listener component. The above was a simple project to get you going.

billpg.com

Maybe think about your service could handle multiple users and how you’d check their passwords. What would be a good way to achieve atomic transactions on delete? What happens if someone deletes the file in a mailbox folder just as they’re about to download it?

If you do encounter an issue or you have a question, please open an issue on the project’s github page.

NEVER sanitize your inputs!

I’ve seen this cartoon being linked-to in so many comment threads and forums. Anytime its even a little bit applicable, someone will post a link to this cartoon. It has become so pervasive that if you search Google for “327”, it’ll be the third link returned, right after the Wikipedia pages for the year and the car.

Search “328” and the next XKCD is no-where to be seen.

The lesson, according to this character and so many real people on the internet, is to sanitize your inputs. The school in the cartoon didn’t sanitize its inputs – and one of its database tables got deleted!

Ask anyone about developing websites and they will tell you the first lesson is always to sanitize your inputs. In this day and age you’d have to be crazy not to sanitize your inputs.

Trouble is, sanitizing your inputs is very bad advice.

What went wrong at the school?

A quick aside for what’s going on in this cartoon. A new student named…
      Robert'); DROP TABLE Students; --
… joins a school and the administrators dutifully add a record to their database for the new student. The software takes the new student’s name and builds an SQL instruction.

    string sqlcmd = "INSERT INTO Students (name) VALUES ('" + name + "')";
    // INSERT INTO Students (name) VALUES ('Wilhelm von Hackensplat')

With normal names, the string would be a perfectly valid SQL command which will add a new record into the table named Students. But what about our friend Bobby Tables?

   INSERT INTO Students (name) VALUES ('Robert'); DROP TABLE Students; --')

Because that single-quote character wasn’t sanitized away, an extra command to drop the Students table crept in. This is what we know in the trade as an “SQL Injection” attack, as some unintentional SQL got injected in.

So let’s sanitize it?

We can’t allow people to go about running arbitrary SQL commands willy-nilly. Something must be done!

That single-quote character in the student’s name is clearly the problem, so we’ll take it out while building the SQL command. This fixes the command and you won’t find database tables disappearing. So why do I call this bad advice?

Trouble is, the single-quote character has a bit of a split personality. As well as being a quote, it’s also an apostrophe. Real people have real names with apostrophes and if you’ve ever seen a name where one has clearly been dropped, you’ve seen the mark of the sanitizer.

Perhaps this is why some Irish people prefer to spell their name using the letter Ă“. After years of having their name mangled by naive software developers, they made a new letter.

So forget sanitizing your inputs. What you need to do instead is to contain your inputs.

Contain my inputs?

The error made by the programmers at the school was that they failed to contain Bobby’s name. A student’s name is just a sequence of characters, so you need to use it in a way that could only be a sequence of characters.

Lucky for us, all good SQL access libraries support parameters. Instead, you write the command but with placeholders for the values to be added in little boxes later.

   INSERT INTO Students (name) VALUES (@name)

Here, there’s a clear demarcation between what’s the SQL command and what’s the value from outside. The student’s name is inside the little box where the apostrophe is just another character. The name has been contained and that destructive command inside can’t break out.

But that’s what we mean by “sanitize”!

Then you should stop calling it that. The word “sanitize” is a common enough word and most people understand it as a word for cleaning – removing the bad stuff and keeping the good stuff.

  “Did you sanitize the kitchen worktop?”
  “Yes. I put it in that sealed box over there.”
  “That’s not sanitizing!”

  “When I use a word, it means just what I choose it to mean. Neither more nor less.”

There is a real problem with software not accepting names with apostrophes, as discussed earlier. Real software developers are listening to the advice to sanitize and interpreting it to mean they should have the bad characters removed.

Isn’t sanitization still needed with HTML?

HTML has a similar problem with injection. Say you’re building a website that can take comments from the public, like this one, you’d want to prevent people from leaving comments with bits of scripting code inside.

   "I <i>love</i> this website! <script>alert('Baron von Hackensplat Was Here');</script>"

Its fine to allow the emphasis, but if your website also publishes the script, anyone else visiting your site will end up running that script.

Unfortunately, HTML doesn’t support a nice little box from whence nothing can escape, so we need to provide that box of containment ourselves. Any HTML from the public should be parsed and rewritten as safe-HTML, where only a safe subset of tags are allowed.

You might argue that this amounts to sanitization, but it betrays a bad mental model. Okay, you’ve dealt with the big problem, but forgotten about the little problems.

Have you ever seen a comment thread where, starting part way down the page, everything is in italics? This is caused by someone opening italics but not closing them. If your mental model is to sanitize, your natural reaction would be remove the ability to use italics. If your mental model is instead to contain, you know that italics is really harmless and just needs to be closed when left open.

Cross-out Cross Site Scripting

In closing, I’d just like to appeal to the industry to drop the phrase “Cross-Site-Scripting” and call it “HTML Injection” instead.

Any scripting that you didn’t write or don’t trust, cross-site or not, is a very bad thing to have on your website. Putting “Scripting” in the name makes people think of scripting as the problem but its so much more than that.

Calling it “HTML Injection” draws an obvious parallel with “SQL Injection”. Its the same problem with the same solution.

Credits: XKCD 327 – Exploits of a mom by Randall Munroe.
“When I use a word…” is a quote from Lewis Carroll’s “Through the Looking-Glass”.
Second: sanitize the gloves by Thomas Cizauskas.
Fun with cling film by Elizabeth Gomm.