Debugging an intermittent crash in a sea of code!
Debugging an intermittent crash in a sea of code!
Wondered if anyone had a good approach to debugging rare, intermittent crashes in Z80 code.
My normal method of debugging is to print expected values on screen and delve deeper if they don't align with my expectations, or use the debugger to step through the recently added code, or add breakpoints (individually, or in a range) in the general area that seems to be the culprit.
Trouble is, at this point there about 14,500 lines of source code and it's hard to know where to to look! I've tried to understand the game conditions before the crash occurs, but there doesn't seem to be a pattern. I've considered using an rzx recording to get closer to the problem but am not familiar with the process or if it could help with debugging.
Short of stepping through the code and hoping for the crash the occur (which resets the Spectrum back to the (c) message), are there any tips for debugging these sort of rare crashes? Any usual suspects that cause this type of thing?!
My normal method of debugging is to print expected values on screen and delve deeper if they don't align with my expectations, or use the debugger to step through the recently added code, or add breakpoints (individually, or in a range) in the general area that seems to be the culprit.
Trouble is, at this point there about 14,500 lines of source code and it's hard to know where to to look! I've tried to understand the game conditions before the crash occurs, but there doesn't seem to be a pattern. I've considered using an rzx recording to get closer to the problem but am not familiar with the process or if it could help with debugging.
Short of stepping through the code and hoping for the crash the occur (which resets the Spectrum back to the (c) message), are there any tips for debugging these sort of rare crashes? Any usual suspects that cause this type of thing?!
Cosmium
https://cosmium.itch.io/
https://cosmium.itch.io/
- WhatHoSnorkers
- Manic Miner
- Posts: 254
- Joined: Tue Dec 10, 2019 3:22 pm
Re: Debugging an intermittent crash in a sea of code!
Would looking at ERRSP help at all?
I have a little YouTube channel of nonsense
https://www.youtube.com/c/JamesOGradyWhatHoSnorkers
https://www.youtube.com/c/JamesOGradyWhatHoSnorkers
Re: Debugging an intermittent crash in a sea of code!
If it's a bad jp (hl) or ret, you could put a 207,26 in front of the stack, and keep the ram in front of that all zeroes.
So you have a ~50% chance the faulty jump ends up in that contiguous area of zeros and finally hits the 207.
So you have a ~50% chance the faulty jump ends up in that contiguous area of zeros and finally hits the 207.
Last edited by sn3j on Sun Apr 30, 2023 8:47 pm, edited 1 time in total.
POKE 23614,10: STOP 1..0 hold, SS/m/n colors, b/spc toggle
Re: Debugging an intermittent crash in a sea of code!
Does your crash makes Spectrum reset and go back to Basic or does just it hang the program?
If your crash resets the Spectrum then I could have a tip. In some emulators like Spin you could set a breakpoint which fires when ROM code is executed (the current executed instruction is at address 0-16384).
Then, when your breakpoint fires, try to step one instruction back and if you succeed, you will know where your program crashed.
Unfortunately in most emulators you cannot just undo your last instruction which is a shame because it would be really useful. But you may think of some workarounds.
If your crash resets the Spectrum then I could have a tip. In some emulators like Spin you could set a breakpoint which fires when ROM code is executed (the current executed instruction is at address 0-16384).
Then, when your breakpoint fires, try to step one instruction back and if you succeed, you will know where your program crashed.
Unfortunately in most emulators you cannot just undo your last instruction which is a shame because it would be really useful. But you may think of some workarounds.
Re: Debugging an intermittent crash in a sea of code!
If you have a "main loop" type arrangement, try monitoring the value of SP on every pass. If it's constantly decreasing, you're probably leaking stack space and that almost inevitably leads to a crash at some point.
- bob_fossil
- Manic Miner
- Posts: 661
- Joined: Mon Nov 13, 2017 6:09 pm
Re: Debugging an intermittent crash in a sea of code!
It sounds like it's stack related. All my spectacular game development crashes back to copyright prompt or other sections of the ROM were due to mismatched pushes and pops. As you're using an emulator you could try adding write breakpoints to the addresses around the stack area to see if you're overflowing your stack with too many pushes or under flowing in to the memory above your starting sp with too many pops.
Re: Debugging an intermittent crash in a sea of code!
Yep, seconded (not that we're voting ), for me just about every intermittent crash that happened after a while of play testing was caused by a PUSH/POP mismatch.
I figured that as it wasn't happening a lot, it was probably related to an activity not happening every game loop (otherwise it'd crash pretty much straight away), so I narrowed it down that way. Still took ages.
I figured that as it wasn't happening a lot, it was probably related to an activity not happening every game loop (otherwise it'd crash pretty much straight away), so I narrowed it down that way. Still took ages.
My Speccy site: thirdharmoniser.com
Re: Debugging an intermittent crash in a sea of code!
SpecEmu has a useful tool in the debugger to look at the execution history. In the debugger click on View -> Execution History. It's the same as looking at the stack pointer memory but it's quicker and less confusing when SP gets modified in the code.
Re: Debugging an intermittent crash in a sea of code!
The trace method is likely to be the only way to track down a truly intermittent crash but if you're not using a dev system that supports it, it could be quite a process to get your code onto one that does. Then you have the system tracing everything until you get the crash. I've not seen the specemu trace system (other than on that video) but usually you end up with a huge list file of instructions executed and after the crash you simply wind back through it until you see your code go awry. It can help if you can hide the interrupts that are processed as that will bulk your trace file. You could turn interrupts off but if that's the source of the crash...!
Personally, I've never needed this trace facility on the ZX like I used to on 64180 or 68K MICEd up hardware (nothing crashes like an OCTART handler). My crashes on the Spectrum are usually, as others have stated...
stack based
dumping a word over neighbouring bytes (this is a favourite! sometimes I crap on another variable, sometimes some code - endless fun)
indexing off the end of a table to get out of bound values
self modding the code with an out of range value
drawing a sprite in the wrong place or at the wrong size
It's probably well after the event now but if you don't already, try and keep some source control (git etc) and regularly use it. If you suddenly notice crashes occurring, you can at least compare your recent changes and scrutinise your additions.
You could also simply start commenting routines out bit by bit. Run your code until the crash occurs OR doesn't occur. Either way, you learn something useful.
Personally, I've never needed this trace facility on the ZX like I used to on 64180 or 68K MICEd up hardware (nothing crashes like an OCTART handler). My crashes on the Spectrum are usually, as others have stated...
stack based
dumping a word over neighbouring bytes (this is a favourite! sometimes I crap on another variable, sometimes some code - endless fun)
indexing off the end of a table to get out of bound values
self modding the code with an out of range value
drawing a sprite in the wrong place or at the wrong size
It's probably well after the event now but if you don't already, try and keep some source control (git etc) and regularly use it. If you suddenly notice crashes occurring, you can at least compare your recent changes and scrutinise your additions.
You could also simply start commenting routines out bit by bit. Run your code until the crash occurs OR doesn't occur. Either way, you learn something useful.
- Ast A. Moore
- Rick Dangerous
- Posts: 2641
- Joined: Mon Nov 13, 2017 3:16 pm
Re: Debugging an intermittent crash in a sea of code!
Yup, if it’s a runaway stack issue, RZX is your friend. It helped me out with a couple of bizarre and nasty (and intermittent) issues.
RZX is really pretty straightforward. You just record your game; then you simply play back the recording, mark down the approximate time the crash occurs, and break into the debugger slightly before that on the next playback.
RZX is really pretty straightforward. You just record your game; then you simply play back the recording, mark down the approximate time the crash occurs, and break into the debugger slightly before that on the next playback.
Every man should plant a tree, build a house, and write a ZX Spectrum game.
Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
Re: Debugging an intermittent crash in a sea of code!
Some fantastic suggestions, thanks!
And when I track this bug down I'll post back here what it was.
And when I track this bug down I'll post back here what it was.
Cosmium
https://cosmium.itch.io/
https://cosmium.itch.io/
Re: Debugging an intermittent crash in a sea of code!
Aha!
Using some of the neat debugging techniques suggested here I was thankfully able to find and fix the intermittent crash I was experiencing.
It was to do with an interaction between my IM2 service routine and code in the main game loop. I'd assumed the interrupt code and the game code operated on their own code and data. I'd assumed wrong!
There was one particular shared subroutine that relies on self modifying code for speed. To provide the correct starting point for an unrolled LDI copy, it writes a calculated value into the JR offset enabling a jump into a block of LDIs terminated with a JP PE, which then loops back for the remaining blocks of bytes to copy.
The game code occasionally calls this LDI copy subroutine, but under very rare circumstances the mode 2 interrupt happens just at the precise moment the JR offset's been written, and just before the very next instruction (the JR) has executed.
If the interrupt routine happens to also execute the same LDI copy subroutine, it modifies the same JR offset so that by the time the interrupt is over, the game code's JR offset is invalid and an incorrect number of LDI copies occur, meaning the JP PE at end of the loop fails, and LDIs continue past the expected endpoint..
Anyway I've successfully modified the code to avoid this scenario going forward, and am very happy to have the expanded range of debugging techniques offered here "in the toolbox", ready for next time. Much appreciated
Using some of the neat debugging techniques suggested here I was thankfully able to find and fix the intermittent crash I was experiencing.
It was to do with an interaction between my IM2 service routine and code in the main game loop. I'd assumed the interrupt code and the game code operated on their own code and data. I'd assumed wrong!
There was one particular shared subroutine that relies on self modifying code for speed. To provide the correct starting point for an unrolled LDI copy, it writes a calculated value into the JR offset enabling a jump into a block of LDIs terminated with a JP PE, which then loops back for the remaining blocks of bytes to copy.
The game code occasionally calls this LDI copy subroutine, but under very rare circumstances the mode 2 interrupt happens just at the precise moment the JR offset's been written, and just before the very next instruction (the JR) has executed.
If the interrupt routine happens to also execute the same LDI copy subroutine, it modifies the same JR offset so that by the time the interrupt is over, the game code's JR offset is invalid and an incorrect number of LDI copies occur, meaning the JP PE at end of the loop fails, and LDIs continue past the expected endpoint..
Anyway I've successfully modified the code to avoid this scenario going forward, and am very happy to have the expanded range of debugging techniques offered here "in the toolbox", ready for next time. Much appreciated
Cosmium
https://cosmium.itch.io/
https://cosmium.itch.io/
- WhatHoSnorkers
- Manic Miner
- Posts: 254
- Joined: Tue Dec 10, 2019 3:22 pm
Re: Debugging an intermittent crash in a sea of code!
Brilliant you fixed it, a clever speed technique and awesome that you've told us what it was!
I have a little YouTube channel of nonsense
https://www.youtube.com/c/JamesOGradyWhatHoSnorkers
https://www.youtube.com/c/JamesOGradyWhatHoSnorkers
Re: Debugging an intermittent crash in a sea of code!
Fair play, I would never have figured that one out
My Speccy site: thirdharmoniser.com
- PROSM
- Manic Miner
- Posts: 476
- Joined: Fri Nov 17, 2017 7:18 pm
- Location: Sunderland, England
- Contact:
Re: Debugging an intermittent crash in a sea of code!
Great that you managed to fix it up! Bugs arising from concurrency are some of the trickiest ones to identify.
All software to-date
Working on something, as always.
Working on something, as always.
Re: Debugging an intermittent crash in a sea of code!
I'll add my 5 cents.
In 128k if incorrect ram page loaded, and it must contain some code, while it does not...
In 128k if incorrect ram page loaded, and it must contain some code, while it does not...