A la chasse au bug sur la version Amstrad CPC de Shinobi par Richard Aplin
-Richard Aplin est le développeur du portage Amstrad CPC (1989) de la borne d'arcade Shinobi (1987). Lors de son développement il a cherché durant deux semaines un bug rare de Shinobi, il en parle sur Twitter. Avec sa permission je mets en ligne sur le site tout le texte présent sur Twitter (pour les photos cela va prendre plus de temps).
Pour la traduction du texte ci-dessous, et bien cela prends du temps, donc sous peu.
A l'époque des micros 8bit, dans le cas présent l'Amstrad CPC (à base de Z80 populaire en europe). J'ai travaillé pour une société de jeux, je réécrivais une conversion du jeu d'arcade Shinobi pour l'Amstrad CPC.
There were all sorts of geeky, tweaky technical tricks to make it run fast and look nice (coded in assembly language), and all went well, game turned out pretty nice, ran fast and played pretty well, until one day, when nearly done, play testers reported that it occasionally - crashed on one level. Boom! Reset. It was really hard to reproduce.
Nobody could come up with anything they actually _did_ to make it crash (or not); was about one in ~20 times(IIRC), when fighting a boss.
I had nothing fancy like a logic analyzer or in-circuit-emulator or other hardware debugging tools, just a regular retail computer. So it just crashed, once in a while, when playing one specific part of the game.
Sure, just some coding bug, like any other, not uncommon. But WHERE and WHY and HOW!? ..and why so hard to reproduce? You could play for hours and no problem, but just when it seemed like it was a ghost or maybe fixed for no known reason.... BOOM reset.
This went ON and ON and I was utterly mystified. I just could _not_ induce this bug to happen more often [the key to find/fixing it], no matter what I tried, I could not find even a way to reproduce reliably, let alone the root cause.
I started doing stuff like checksumming the RAM every frame, looking for some sort of random corruption, putting all sorts of checks in there that slowed it down to a crawl, and still nothing. The bug seemingly came and went as it pleased, never in quite the same place.
Until one day I got lucky. I caught it in the act! One single byte in the middle of my program code got trashed - and this time I caught it _before_ the whole thing blew up. But how?! What on earth was causing this? [OH for a hardware logic analyzer !
Finally, _finally_, after probably two weeks of solid bug-hunting and hair-tearing I found it.
So back in those days, it was customary for the game's music to be written by someone else, and provided as a binary blob of code plus data (e.g. 4Kbytes) that you would just call once a frame, and it took care of controlling the sound chip and playing whatever music tracks.
And it turned out that the music player (I didn't have source code) had a bug in it. Not a big bug. A teeeeeny little bug. It didn't audibly affect the music at all, but one _single_ note, on one channel, in one of the tunes on one of the levels, used a wrong data byte.
And normally, when that single duff musical note played, nothing bad happened, it was a fairly harmless bug, however it caused music player to read wrong byte of RAM; not just off by one byte, but off by tens of KBytes... in fact, it ended up reading a byte of the display ram.
If I recall correctly, it ended up - at that instant - reading a single byte of the display right around where this green circle is, in the upper left corner.
And when that single bad musical note in the tune played - IF the top bit (and only the top bit) of that pixel was a 1, it would then take that as a memory address ELSEWHERE in RAM and increment that location which corrupted a single byte inside of my program code, leading to a subsequent - but not quite immediate; a couple of seconds later, when a baddie was decided to shoot at you - crash.
This was a 2D scrolling game of course, so you were constantly jumping around and doing stuff fighting ninjas- the crash only happened if ONE pixel of the display was a certain color at the instant that one single note of ONE background tune played, and it wasn't in my code.
I vividly remember finally discovering the root cause (disassembling and patching the music player, and finally catching it 'in the act', and figuring out the long chain of events..) .. To this day I have never had a bug as hard as that one.. was SO rewarding to find.
I've had a bunch of equally obscure bugs in the decades since then, but the tools got so much better - protected memory, logic analyzers, CPUs that don't just explode on contact with bad data - nothing has ever been as difficult to track down as that one. Was a great lesson which is that if you persevere for long enough you will win in the end against a computer - bugs can't hide forever. Ever since then I've known it's just a matter of time, and patience. Nowadays I relish a good bug, I rub my hands and chuckle - the game is ON! ;-).