Last night I squashed a bug that has plagued me for several years. Its detection provides a good tale.
First, the bug itself. Here's the code containing the bug, for those of you who actually read C++:
while (k < LongTermListCount)
{
LongTermList[k] = LongTermList[k + 1];
z = LongTermList[k];
Plan[i].PlanLongTermIndex = k;
++k;
}
Now, the bug might not be obvious, but it's right there. The line that refers to "Plan[i]" should instead refer to "Plan[z]". After all, what's the point of calculating z if we never use it? This kind of mistake is quite common in programming and is always hard to detect by inspection. Normally such bugs yield outrageous malfunctions that lead us to them immediately. But this bug was different; its malicious effects are subtle and, worst of all, require the conjunction of TWO unlikely circumstances: a character must die while there are two or more unrealized plans waiting in the long-term plan list. (Long-term plans are those that require such a long preparation time that they must be scheduled to take place on some future date, rather than on the day that they are conceived, as is usually the case.)
The detection of the bug involved three fortuitous events. First, Laura Mixon had reported several cases of the infamous CollectPlanGarbage bug. I have seen this bug for several years, but had never been able to trap it; every time I looked hard for it, the circumstances weren't correct and the bug was invisible. But Laura had observed several instances of the bug, so I knew that it was lurking out there somewhere. Second, Laura had also reported some apparent anomalies in the handling of RemoveActor, the routine that kills off a character. But the real stroke of luck came when I sat down with Shattertown to sniff around for a completely different problem. I stumbled upon a situation that triggered the CollectPlanGarbage bug. This was a big break; if I could trigger the bug reliably, I could track it down. Sure enough, the procedure I followed was repeatable: every time I carried out the same sequence of steps, the bug triggered.
From that point forward, there was no question as to the outcome: any bug that shows itself reliably is subject to the old Army maxim: "If I can see it, I can shoot it; if I can shoot it, I can hit it; if I can hit it, I can kill it." However, tracing it backwards from CollectPlanGarbage to the code segment above proved to be a tortuous hunt. The story engine is a very complex piece of code, and the delay between the bug's infection and the appearance of its symptoms was millions of machine cycles. I tracked the fault backward through four different routines before I found the culprit.
I'm sure that there are many more bugs lurking in the story engine. I'll get them, too, someday. But this was the Jesse James of bugs, and for now, I can sleep soundly at night.