Constant time memory access

The place for codemasters or beginners to talk about programming any language for the Spectrum.
Post Reply
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Constant time memory access

Post by djnzx48 »

This is a problem I'm having with contention. I want to have some contended memory accesses, but they need to take the same number of T-states to complete each time, regardless of where the beam is on the screen when the access occurs. I can't sync to the 50Hz refresh using either interrupts or the floating bus. The approaches I thought of were:
  • Keep track of the number of T-states elapsed, and use this to work out where the beam is and whether contention will be applied.
  • Use the floating bus to find the beam position, but this requires a certain pattern on the screen and won't work in the border area.
Is it possible to do this? Apparently certain IO ports are contended at all times, not just when the beam is visible, so can I use port accesses as padding between the memory accesses to get a constant time overall?
User avatar
Ast A. Moore
Rick Dangerous
Posts: 2640
Joined: Mon Nov 13, 2017 3:16 pm

Re: Constant time memory access

Post by Ast A. Moore »

I don’t think what you’re looking for is attainable, at least not easily. There’s a reason custom tape loaders always sat in non-contended memory, being pretty timing-sensitive pieces of code.

The problem with counting T states is that there’s no clear reference to anything but the interrupt, because the ULA suspends the CPU. Otherwise, it would be trivial—you could use the R register for counting instructions.

IO contention is machine-specific, unfortunately. For example, all even ports are contended on 48K/128K/+2 machines, but not on the +2A/+3. It’s complicated even further by the fact that the same ports will be contended differently depending on the high address byte, and the contention patterns are tricky to calculate.

However, depending on how accurate you want to be, you could still use the floating bus and test if the beam has left the border area, for example. It won’t be a hundred percent universal, though.

Perhaps, you could share more information on what it is exactly you’re trying to achieve. Then we’d be able to come up with an alternative solution.
Every man should plant a tree, build a house, and write a ZX Spectrum game.

Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: Constant time memory access

Post by djnzx48 »

Yeah, I didn't really expect it to be that feasible. With regard to counting T-states, I was planning on keeping every execution path the same length so that I know beforehand how long everything will take. The aim was basically to get a constant time sound loop that also accesses contended memory. I wouldn't be averse to keeping it machine-specific either, but contention patterns are pretty confusing for me. I was wondering if it were possible to use some specific instruction combination, like how a contended memory address is used for synchronisation in a floating bus loop.

The floating bus approach could work, but if more than a few bytes were being accessed then the raster might get onto the screen before it has time to finish. I'm not sure about the counting T-states option either.
User avatar
Ast A. Moore
Rick Dangerous
Posts: 2640
Joined: Mon Nov 13, 2017 3:16 pm

Re: Constant time memory access

Post by Ast A. Moore »

Without filling at least one line of the bitmap area with particular values for the ULA to read, I really don’t know how you can precisely determine where you are on the screen. Perhaps, since it’s an audio engine, you could get away with an approximation and only determine the moment (give or take a few T states) when you’re outside the top/bottom border, and then use the known contention pattern to adjust your own timings. Whether the effect on the sound generation will be too noticeable is down to experimentation.
Every man should plant a tree, build a house, and write a ZX Spectrum game.

Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: Constant time memory access

Post by djnzx48 »

I just checked the contention details for each instruction, and saw that LDI only puts HL on the bus once in 16 T-states. Does this mean that unrolled LDIs copying from contended memory won't suffer from any contention delays?
User avatar
Ast A. Moore
Rick Dangerous
Posts: 2640
Joined: Mon Nov 13, 2017 3:16 pm

Re: Constant time memory access

Post by Ast A. Moore »

djnzx48 wrote: Sat Jan 26, 2019 3:24 am LDI only puts HL on the bus once in 16 T-states. Does this mean that unrolled LDIs copying from contended memory won't suffer from any contention delays?
On the contrary, it will be one of the most affected instructions. Delays will apply to the fetching of the instruction itself (which is two bytes long), and then to reading and writing a byte, if both the source and destination addresses are in contended memory. For a more accurate breakdown, see this.

Let’s take LDI as an example for a couple of situations. For simplicity, let’s assume we’re on a 48K machine. The breakdown of this instruction is pc:4,pc+1:4,hl:3,de:3,de:1 × 2.

1. LDI sits in contended memory, HL and DE point to non-contended memory, and the fetch happens at T state 14333.
33 + 0 (since T state 14333 doesn’t impose any delay) + 4 (fetch first byte) + 4 (T state 14337 adds 4 T states of delay) + 4 (fetch second byte) + 3 + 3 + 1 × 2 = 53
53 - 33 = 20 T states

2. LDI sits in contended memory, HL and DE point to non-contended memory, and the fetch happens at T state 14337.
37 + 0 (T state 14337 adds 4 T states of delay) + 4 (fetch first byte) + 4 (T state 14345 adds 4 T states of delay) + 4 (fetch second byte) + 3 + 3 + 1 × 2 = 61
61 - 37 = 24 T states

3. LDI sits in contended memory, HL and DE point to contended memory, and the fetch happens at T state 14333.
33 + 0 (since T state 14333 doesn’t impose any delay) + 4 (fetch first byte) + 4 (T state 14337 adds 4 T states of delay) + 4 (fetch second byte) + 4 (T state 14345 adds 4 T states of delay) + 3 + 5 (T state 14352 adds 5 T states of delay) + 3 + 5 (T state 14360 adds 5 T states of delay) + 1 + 0 (T state 14365 is in the right border area, so no delay) + 1 = 67
67 - 33 = 34 T states

As you can see, LDI execution times are all over the place.
Every man should plant a tree, build a house, and write a ZX Spectrum game.

Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: Constant time memory access

Post by djnzx48 »

Right, I think I get how it works now. But if the actual LDI instructions are in uncontended memory, only HL is contended, and the instruction starts at 14333, then there'll be no delays, right? Except on the 128K machines 228 isn't divisible by 8, so I suppose there would be an occasional 4 T-state delay at the start of each scanline, but nothing too major.
User avatar
Ast A. Moore
Rick Dangerous
Posts: 2640
Joined: Mon Nov 13, 2017 3:16 pm

Re: Constant time memory access

Post by Ast A. Moore »

djnzx48 wrote: Sun Jan 27, 2019 3:18 am Right, I think I get how it works now.
Heh. Not quite, I don’t think.
djnzx48 wrote: Sun Jan 27, 2019 3:18 amif the actual LDI instructions are in uncontended memory, only HL is contended, and the instruction starts at 14333, then there'll be no delays, right? Except on the 128K machines 228 isn't divisible by 8, so I suppose there would be an occasional 4 T-state delay at the start of each scanline, but nothing too major.
Let’s assume a 128K/+2 Spectrum, the LDI instruction sitting in non-contended memory, HL pointing to contended memory, DE pointing to non-contended memory, and the instruction starting at T state 14333.

LDI instruction breakdown is the same: pc:4,pc+1:4,hl:3,de:3,de:1 × 2.

33 + 0 + 4 + 0 + 4 + 0 + 3 + 0 + 3 + 1 + 1 = 49
49 - 33 = 16 T states

All is good, the instruction finished before T state 14361, which is when contention begins on the 128K/+2.

Let’s assume the next LDI instruction immediately follows the previous one.

49 + 0 + 4 + 0 + 4 + 0 + 3 + 0 + 3 + 1 + 1 = 65
65 - 49 = 16 T states

Again, the HL fetch finished before T state 14361, so we’re good.

Now, let’s look at the next LDI instruction.

65 + 0 + 4 + 0 + 4 + 2 (T state 14373 adds 2 T states of delay) + 3 + 0 + 3 + 1 + 1 = 83

83 - 65 = 18 T states

This time the fetch of data from memory location pointed to by HL was delayed by two T states.

Note, that I chose 14333 more or less arbitrarily in my example just so it barely touches the first contention T states, depending on where it’s located, or where HL and DE point to. On the +2A/+3, for instance, the third LDI instruction from the example above will take 21 T states, because the contention breakdown of this instruction is different. So, it’s not arbitrary, of course, but neither is it trivial to calculate.

As I mentioned before, all these calculations might be strictly an exercise in futility. I suggest you experiment and tinker based on the observable results. It’s even easier to do with a good emulator and a debugger/monitor.
Every man should plant a tree, build a house, and write a ZX Spectrum game.

Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: Constant time memory access

Post by djnzx48 »

OK, but I was talking about 14333 T-states on the 48K Spectrum. ;) I guess what I was trying to say was that there would be an occasional delay to sync up with the contention, but the majority of the LDIs would execute in 16 T-states. Anyway, thanks for the explanations, they've been quite helpful to me.
Post Reply