Debugging an intermittent crash in a sea of code!

Cosmium · Post by **Cosmium** » Sun Apr 30, 2023 8:07 pm

Wondered if anyone had a good approach to debugging rare, intermittent crashes in Z80 code.

My normal method of debugging is to print expected values on screen and delve deeper if they don't align with my expectations, or use the debugger to step through the recently added code, or add breakpoints (individually, or in a range) in the general area that seems to be the culprit.

Trouble is, at this point there about 14,500 lines of source code and it's hard to know where to to look! I've tried to understand the game conditions before the crash occurs, but there doesn't seem to be a pattern. I've considered using an rzx recording to get closer to the problem but am not familiar with the process or if it could help with debugging.

Short of stepping through the code and hoping for the crash the occur (which resets the Spectrum back to the (c) message), are there any tips for debugging these sort of rare crashes? Any usual suspects that cause this type of thing?!

WhatHoSnorkers · Post by **WhatHoSnorkers** » Sun Apr 30, 2023 8:35 pm

Would looking at ERRSP help at all?

sn3j · Post by **sn3j** » Sun Apr 30, 2023 8:40 pm

If it's a bad jp (hl) or ret, you could put a 207,26 in front of the stack, and keep the ram in front of that all zeroes.
So you have a ~50% chance the faulty jump ends up in that contiguous area of zeros and finally hits the 207.

Ralf · Post by **Ralf** » Sun Apr 30, 2023 8:42 pm

Does your crash makes Spectrum reset and go back to Basic or does just it hang the program?

If your crash resets the Spectrum then I could have a tip. In some emulators like Spin you could set a breakpoint which fires when ROM code is executed (the current executed instruction is at address 0-16384).

Then, when your breakpoint fires, try to step one instruction back and if you succeed, you will know where your program crashed.

Unfortunately in most emulators you cannot just undo your last instruction which is a shame because it would be really useful. But you may think of some workarounds.

edjones · Post by **edjones** » Sun Apr 30, 2023 9:05 pm

AndyC · Post by **AndyC** » Sun Apr 30, 2023 9:30 pm

If you have a "main loop" type arrangement, try monitoring the value of SP on every pass. If it's constantly decreasing, you're probably leaking stack space and that almost inevitably leads to a crash at some point.

bob_fossil · Post by **bob_fossil** » Sun Apr 30, 2023 11:05 pm

It sounds like it's stack related. All my spectacular game development crashes back to copyright prompt or other sections of the ROM were due to mismatched pushes and pops. As you're using an emulator you could try adding write breakpoints to the addresses around the stack area to see if you're overflowing your stack with too many pushes or under flowing in to the memory above your starting sp with too many pops.

Morkin · Post by **Morkin** » Sun Apr 30, 2023 11:28 pm

Yep, seconded (not that we're voting

), for me just about every intermittent crash that happened after a while of play testing was caused by a PUSH/POP mismatch.

I figured that as it wasn't happening a lot, it was probably related to an activity not happening every game loop (otherwise it'd crash pretty much straight away), so I narrowed it down that way. Still took ages.

cmal · Post by **cmal** » Sun Apr 30, 2023 11:36 pm

SpecEmu has a useful tool in the debugger to look at the execution history. In the debugger click on View -> Execution History. It's the same as looking at the stack pointer memory but it's quicker and less confusing when SP gets modified in the code.

deanysoft · Post by **deanysoft** » Mon May 01, 2023 12:08 am

The trace method is likely to be the only way to track down a truly intermittent crash but if you're not using a dev system that supports it, it could be quite a process to get your code onto one that does. Then you have the system tracing everything until you get the crash. I've not seen the specemu trace system (other than on that video) but usually you end up with a huge list file of instructions executed and after the crash you simply wind back through it until you see your code go awry. It can help if you can hide the interrupts that are processed as that will bulk your trace file. You could turn interrupts off but if that's the source of the crash...!

Personally, I've never needed this trace facility on the ZX like I used to on 64180 or 68K MICEd up hardware (nothing crashes like an OCTART handler). My crashes on the Spectrum are usually, as others have stated...

stack based
dumping a word over neighbouring bytes (this is a favourite! sometimes I crap on another variable, sometimes some code - endless fun)
indexing off the end of a table to get out of bound values
self modding the code with an out of range value
drawing a sprite in the wrong place or at the wrong size

It's probably well after the event now but if you don't already, try and keep some source control (git etc) and regularly use it. If you suddenly notice crashes occurring, you can at least compare your recent changes and scrutinise your additions.

You could also simply start commenting routines out bit by bit. Run your code until the crash occurs OR doesn't occur. Either way, you learn something useful.

Ast A. Moore · Post by **Ast A. Moore** » Mon May 01, 2023 5:49 am

Yup, if it’s a runaway stack issue, RZX is your friend. It helped me out with a couple of bizarre and nasty (and intermittent) issues.

RZX is really pretty straightforward. You just record your game; then you simply play back the recording, mark down the approximate time the crash occurs, and break into the debugger slightly before that on the next playback.

Cosmium · Post by **Cosmium** » Mon May 01, 2023 7:52 am

Some fantastic suggestions, thanks!

And when I track this bug down I'll post back here what it was.

Cosmium · Post by **Cosmium** » Wed May 24, 2023 4:22 am

Aha!

Using some of the neat debugging techniques suggested here I was thankfully able to find and fix the intermittent crash I was experiencing.

It was to do with an interaction between my IM2 service routine and code in the main game loop. I'd assumed the interrupt code and the game code operated on their own code and data. I'd assumed wrong!

There was one particular shared subroutine that relies on self modifying code for speed. To provide the correct starting point for an unrolled LDI copy, it writes a calculated value into the JR offset enabling a jump into a block of LDIs terminated with a JP PE, which then loops back for the remaining blocks of bytes to copy.

The game code occasionally calls this LDI copy subroutine, but under very rare circumstances the mode 2 interrupt happens just at the precise moment the JR offset's been written, and just before the very next instruction (the JR) has executed.

If the interrupt routine happens to also execute the same LDI copy subroutine, it modifies the same JR offset so that by the time the interrupt is over, the game code's JR offset is invalid and an incorrect number of LDI copies occur, meaning the JP PE at end of the loop fails, and LDIs continue past the expected endpoint..

Anyway I've successfully modified the code to avoid this scenario going forward, and am very happy to have the expanded range of debugging techniques offered here "in the toolbox", ready for next time. Much appreciated

WhatHoSnorkers · Post by **WhatHoSnorkers** » Wed May 24, 2023 9:46 am

Brilliant you fixed it, a clever speed technique and awesome that you've told us what it was!

Morkin · Post by **Morkin** » Wed May 24, 2023 11:12 am

Fair play, I would never have figured that one out

PROSM · Post by **PROSM** » Wed May 24, 2023 12:09 pm

Great that you managed to fix it up! Bugs arising from concurrency are some of the trickiest ones to identify.

Bedazzle · Post by **Bedazzle** » Tue May 30, 2023 4:12 pm

I'll add my 5 cents.
In 128k if incorrect ram page loaded, and it must contain some code, while it does not...

Spectrum Computing

Debugging an intermittent crash in a sea of code!

Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!

Re: Debugging an intermittent crash in a sea of code!