A part of the Kashyyyk2 IRC client is a set of circular buffers that hold incoming message contents for each server. Every server has a 640 byte buffer that is filled, and on every write into the buffer we attempt to read out as many messages as possible. This should be pretty familiar to you if you've dealt with message delimiting over TCP before.

Although Kashyyyk2 doesn't really have enough unit testing, I have been writing tests for subsystems whenever I encounter unexpected behavior. I know this kind of "Innocent until proven buggy" development isn't ideal, but it's also common for projects like this that are in part based on existing code that doesn't have serious unit tests.

One of the tests I wrote for the buffer is a test for what happens when wraparound occurs. This is when the write index rolls over to the start of the buffer's memory. This is the "buggy" part of every circular buffer ever. It requires careful handling of when to reset the write index, when to reset the read index, what you consider a zero-lengh buffer, whether a buffer appears empty or full, etc.

The code in the current unit tests is (as I discovered later) incorrect when testing for this. Let me explain exactly how this buffer works.

When we try to read data from the IRC server, we read in 312 bytes. This number is largely arbitrary. The IRC standard says that no message should be more than 256 characters long including the crlf, but I've tested this, and some servers allow up to 257 bytes including the crlf, and some allow up to 256 NOT including the crlf. So I chose a number that is clearly too large.

Upon reading up to 312 bytes, we copy this into the buffer for the server. The client is then expected to try to get messages out of the buffer until no new messages are yielded (as a sidenote, this would be an excellent place to use a coroutine, if C or C++ had them). The buffer is scanned for messages by starting at the read index, and checking until we find a crlf, or until we hit the write index. If we find a crlf, the data from the read index until the crlf is copied out, and then the read index is placed just after it. If we hit the write index before finding a crlf, then we report no available messages.

If you're curious why all this is necessary, keep in mind two things. One, because of how TCP network stacks function, there is no guarantee that a single send() call on the server will cause a single equivalent recv() on the client. Second, even if that were true, the server is fully allowed to group multiple messages into a single send(). Or even worse, the server could send() half of a message, wait an arbitrary amount of time (literally minutes if it wanted), and then send the second half. The server could even send the first half of another message along with it!

The solution is that you must buffer the results. You could optimize some cases, like each send() resulting in a single recv() (which is fairly common), or optimizing that most recv() calls will receive some number of whole messages. But you must be able to handle what happens when this isn't true.

We choose a size able to store two entire messages for the worst case scenario, where the we recv() all but the last byte (lf) of a max-sized message, and then the next recv() is that last byte, plus another max-sized message. There are ways to shrink the buffer down to exactly one message size, but buffering two max-size messages-worth of data reduces the number of recv calls (which is a real win).

So when I was getting some mangled messages, I decided to write some unit tests on the buffer. I wrote some tests that wrote one message, many messages, tried to read when there would be no new messages (from a new buffer, after one write, after many writes), etc. One test I really wanted was to force rollover. So I wrote out 311 bytes of message data, read in the first few messages to move the read index forward and make room, and then wrote two more messages so that one message would cross the end of the buffer, and another would be written fully with the write index rolled over.

But this is wrong. The buffer is 640 bytes, not 312. And I (smartly?) hid this fact. The buffer is a black box, all the client is told is to write up to 312 bytes, try to get messages out, repeat.
So how would you write a unit test to check rollover?
I got the answer today. I encountered the following quote while reading about FailFast on c2:

Whenever I write a CircularBuffer, I always initialize the pointers just before the the end of the buffer (rather than at the very beginning).

That way, if I mess up the wrap-around code, I don't get a HeisenBug that bites only after hours of operation. It becomes immediately obvious after the first few inserts.

(The wrap-around code is especially tricky when inserting variable-length objects.)

-- DavidCary

Well...that really does solve it. Although you are now relying on the implementation to have this property, it seems like a clear win to do so. Suddenly the hello-world test checks for correct rollover.

I'm not 100% sure this really qualifies as TDD or Fail Fast, though. This makes all the unit tests implicitly test more than one case (both rollover and the intended test case), although truly testing "just one thing" is impossible. And this doesn't exactly sound like Fail Fast to me, it's more like the converse. Instead of failing as soon as we detect any error, we are setting up the expected failure case as soon as we can. But it seems like a clearly beneficial defensive method.