Part 2

If you can't change it than how do you use it?

You use complex immutable value types the same way you use simple ones:

int i = 0;

i = i + 1;

i = i + 2;

print i;

(We expect this to output '3')

Becomes:

ImTree i = new ImTree("a",1);

i = i.Insert("b",2);

i = i.Insert("c",3);

print i;

The pattern is exactly the same, and we expect the output to be something like '{"a" 1} {"b" 2} {"c" 3}'.

Each call to ImTree.Insert(Key, Value) creates a new tree and returns it; just as a call to +(int,int) creates a new integer and returns it. Most people are comfortable with overwriting integers, since their storage is inline, so what happens to the ImTree that gets overwritten? And what if the ImTree has a lot of nodes, wouldn't making a new one get expensive? And where did "HashTable" go?

Let's rewind a bid. We can't build a true hash table as a value type without doing complete copies, and that would get really slow as the table size increased. What we can do is build a binary search tree and sort it on the key type. This isn't as fast as a hash table for lookups, but creating a new tree based on inserting or deleting an entry can be done much faster. That's because large portions of the tree can be shared with the new tree. I'm calling this new type of tree an ImTree, short for immutable tree.

Now when one value "overwrites" another we're not really overwriting it.

int i = 0;

i = 1;

The 1 doesn't overwrite the 0; we've just changes what i is, not what 0 is. That means in:

ImTree i = new ImTree("a", 1);

i = i.Insert("b", 2);

The ImTree '{"a",1}' still exists after we change what i is, just as 0 still exists above.

Of course that takes up space, and if know one is using that ImTree anymore we'll want to get rid of it. How do we know if some code somewhere is using it? In a mutlithreaded environment we can't. Seriously, we can't. If we handed that ImTree off to another thread, or even some opaque third party library, then it could still be being used by that code.

Thankfully, we don't have to worry about it; instead we let the geniuses that came up with garbage collection deal with it. You see the system can figure it out, and it can do it in a fashion that is much more efficient than we can. Sure we could try and reference out the ImTrees and free them when the count hit zero, but that would mean either putting a lock around every increment and decrement (very very slow), or using atomics to increment and decrement (still slow). Today's modern garbage collectors avoid all that but using techniques that allow them to batch up the analysis of the heap and minimize cross thread contention. So use a language like C# (or anything that runs on CLR), or haskell, or basically anything with a real decent garbage collector if you're going to go multithreaded.

Okay, so how do we create the new tree quickly?

Let's look at a simple case of adding "H" to this tree:

The red nodes have to be created, but the black nodes already exist. So rather than having to copy the whole tree we can reuse the C sub-tree and only build the new nodes needed. We don't have to worry about someone coming along and changing the C sub-tree later because, like the whole tree, the sub-trees are immutable.

If your familiar with normal binary trees you might be wondering why H didn't end up as a right child of F. In a regular binary tree you insert at the leafs, but for our immutable tree we're going to perform all operations at the root and we've going to keep the tree well balanced. Exactly how we do this is the subject of the next few installments.

Page updated

Report abuse