Thinking about this, I don't want to add a big artificial delay in there. But I could do the sort of thing I did with Buzzsaw+, and integrate the beeper loop with the screen copy loop (in Buzzsaw+ a beeper 0/1 was generated after each line of multicolour was copied).
The screen copy loop takes around 1.4 frames. If the game can run in 3 frames then that's nearly a 50% duty cycle for sound effects. I suppose I could artificially extend that period of speaker control, but I doubt I could do anything else productive with that extra time other than twiddle the speaker. If I entered a different loop to do something else then the sound frequency would shift (unless the loop had exactly the same timing as the screen copy). And I wouldn't want to extend it so the game goes any slower than 4 frames.
Come to think of it, the timing of a screen copy is going to be all over the place as it goes in and out of contention, so maybe that's not the best idea I've had...