Double buffering on the Spectrum

The place for codemasters or beginners to talk about programming any language for the Spectrum.
User avatar
ketmar
Manic Miner
Posts: 701
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Double buffering on the Spectrum

Post by ketmar »

i almost reversed my old code (and put alot more comments in it). there's still one bug left (i have to double-check if it is mine, or i did something the original code wasn't designed for). i'll fix it (i hope ;-), and then i will try to write a post about it all, with code samples and some explanations.
User avatar
MonkZy
Manic Miner
Posts: 279
Joined: Thu Feb 08, 2018 1:01 pm

Re: Double buffering on the Spectrum

Post by MonkZy »

ketmar wrote: Fri Jun 19, 2020 10:53 am i will try to write a post about it all, with code samples and some explanations.
I look forward to this.
User avatar
ketmar
Manic Miner
Posts: 701
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Double buffering on the Spectrum

Post by ketmar »

sorry for bumping, but i want to say that i'm still working on that post/article. it is somewhat hard to split my time between various things, and i want to explain the code and the idea behind it instead of simply dumping it on you, so it require some more time. basically, i'm going through the code line-by-line, adding explanations where i think they're needed, and creating a small sample game to show how one can connect all the pieces together to get something interesting.

it looks like in becomes more a kind of Yet Another Game Programming Tutorial, this time for people who know the asm, and want to do some advanced stuff. dunno if the world needs another one of those, but at least creating it is fun.
Art
Manic Miner
Posts: 206
Joined: Fri Jul 17, 2020 7:21 pm

Re: Double buffering on the Spectrum

Post by Art »

I try to speed up my game, which draws to a memory buffer and then it copies it to the screen. It buffers the entire 6912 screen bytes, so I can use a simple ldir, but I also try the push/pop way. I wrote the following routine and it works, but the screen is a bit corrupted - two columns are doubled on the screen and if I move in the game, the screen is more and more corrupted. I admit that I don't really understand how exactly the stack works (I read some explanations, but it's still not clear). Is anything wrong with my routine and should it be faster than ldir?

Code: Select all

         di                  ; disable interrupt
         ld (stack),sp       ; save stack address
         ld ix,buffer        ; start of buffer
         ld iy,16384+15      ; start of screen + 15
         ld a,216            ; loop counter, 216*32=6912
scrcopy  ld i,a              ; counter to i

         ld sp,ix            ; stack is on ix in buffer
         pop af              ; 16 bytes from buffer to registers
         pop bc
         pop de
         pop hl
         exx
         pop af
         pop bc
         pop de
         pop hl
         exx
         ld sp,iy            ; stack is on iy in screen
         exx
         push hl             ; 16 bytes from registers to screen in reverse order
         push de
         push bc
         push af
         exx
         push hl
         push de
         push bc
         push af
         ld bc,16
         add ix,bc           ; increase buffer address by 16
         add iy,bc           ; increase screen address by 16
         
         ld sp,ix            ; stack is on ix in buffer
         pop af              ; 16 bytes from buffer to registers
         pop bc
         pop de
         pop hl
         exx
         pop af
         pop bc
         pop de
         pop hl
         exx
         ld sp,iy            ; stack is on iy in screen
         exx
         push hl             ; 16 bytes from registers to screen in reverse order
         push de
         push bc
         push af
         exx
         push hl
         push de
         push bc
         push af
         ld bc,16
         add ix,bc           ; increase buffer address by 16
         add iy,bc           ; increase screen address by 16
         
         ld a,i              ; counter to a
         dec a               ; decrease counter by 1
         jr nz,scrcopy       ; if counter is not 0, copy next 32 bytes
         ld sp,(stack)       ; restore stack address
         ei                  ; enable interrupt
         ret
catmeows
Manic Miner
Posts: 718
Joined: Tue May 28, 2019 12:02 pm
Location: Prague

Re: Double buffering on the Spectrum

Post by catmeows »

Hi, Art.
The EXX instruction swaps BC, DE, HL pairs only. AF can swapped by special instruction EX AF,AF'.
What you do is that you load AF register twice from two different locations without swapping and that is why screen get corrupted.
Proud owner of Didaktik M
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Double buffering on the Spectrum

Post by Joefish »

LDIR is slow because it checks the loop status after copying each byte.
If you do LDI 32 times and write your own loop around it (using A as a counter) it will copy a lot faster, comparable to PUSH/POP copying. PUSH/POP is just preferred for multicolour routines as it caches the reads then does all the writes in one quick go.
Art
Manic Miner
Posts: 206
Joined: Fri Jul 17, 2020 7:21 pm

Re: Double buffering on the Spectrum

Post by Art »

Thank you for the answers! I added ex af,af to several places, changed 16384+16 and the screen looks OK. But when I move in the game, vector graphics start to compute wrong and the lines move everywhere. When I use ldir, it's OK, I have no idea why. So the push/pop way is probably too complicated for this case. I will try the ldi way.
Art
Manic Miner
Posts: 206
Joined: Fri Jul 17, 2020 7:21 pm

Re: Double buffering on the Spectrum

Post by Art »

So, I used LDI, it works well and doesn't require another counter:

Code: Select all

         ld hl,buffer
         ld de,16384
         ld bc,6912
scrcopy  ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         jp pe,scrcopy
         ret
megaribi
Drutt
Posts: 15
Joined: Thu Nov 25, 2021 4:31 pm

Re: Double buffering on the Spectrum

Post by megaribi »

I would say that Jet Set Willy is "quintuple buffering" game:
1) The room description is stored on higher addresses, each room 256 bytes
2) When Willy enters the room, the corresponding 256 bytes is copied to predetermined place
3) The room description is expanded to 4,5 K buffer consisting of empty room
4) Every frame, empty room is copied to another 4,5K buffer, and it is then filled with Willy's sprite and guards sprites wit collision detection
5) Prepared buffer is copied to actual video memory
catmeows
Manic Miner
Posts: 718
Joined: Tue May 28, 2019 12:02 pm
Location: Prague

Re: Double buffering on the Spectrum

Post by catmeows »

Does anyone know how much faster is PUSH/POP than LDI when copying to contended memory ? I estimate it could be about 16 cycles/byte for POP/PUSH and about 20 cycles/byte for LDI but I never counted it for real.
Proud owner of Didaktik M
User avatar
Bedazzle
Manic Miner
Posts: 305
Joined: Sun Mar 24, 2019 9:03 am

Re: Double buffering on the Spectrum

Post by Bedazzle »

Joefish wrote: Tue Dec 21, 2021 9:37 pm If you do LDI 32 times and write your own loop around it (using A as a counter) it will copy a lot faster, comparable to PUSH/POP copying.
POP+PUSH is 12 T-states per byte, while LDI is 16 T-states per byte.
25%
catmeows
Manic Miner
Posts: 718
Joined: Tue May 28, 2019 12:02 pm
Location: Prague

Re: Double buffering on the Spectrum

Post by catmeows »

Bedazzle wrote: Wed Dec 22, 2021 8:31 am
Joefish wrote: Tue Dec 21, 2021 9:37 pm If you do LDI 32 times and write your own loop around it (using A as a counter) it will copy a lot faster, comparable to PUSH/POP copying.
POP+PUSH is 12 T-states per byte, while LDI is 16 T-states per byte.
25%
Nope, you forgot contention and loop management.
Proud owner of Didaktik M
megaribi
Drutt
Posts: 15
Joined: Thu Nov 25, 2021 4:31 pm

Re: Double buffering on the Spectrum

Post by megaribi »

catmeows wrote: Wed Dec 22, 2021 1:07 am Does anyone know how much faster is PUSH/POP than LDI when copying to contended memory ? I estimate it could be about 16 cycles/byte for POP/PUSH and about 20 cycles/byte for LDI but I never counted it for real.
You can not tell it without specifying the time and amount of data you copy.
https://sinclair.wiki.zxnet.co.uk/wiki/Contended_memory
If copying is done immediately after interrupt arrival, or during horizontal retrace and you need to copy 18 bytes , the LDI variant is 308 cycles (17 cycles per byte)

Code: Select all

LD HL,source  ; 10 cycles
LD DE,destination ; 10 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
And POP/PUSH variant is 233 cycles (13 cycles per byte)

Code: Select all

LD SP,source  ; 10 cycles
POP AF ; 10 cycles
POP DE ; 10 cycles
POP HL ; 10 cycles
POP BC ; 10 cycles
EXX  ; 4 CYCLES
POP DE  ; 10 cycles
POP HL ; 10 cycles
POP BC ; 10 cycles
POP IX ; 14 cycles
POP IY  ; 14 cycles
LD SP,DEST+18 ; 10 cycles
PUSH IY  ; 15 cycles
PUSH IX  ; 15 cycles
PUSH BC  ; 11 cycles
PUSH HL ; 11 cycles
PUSH DE ; 11 cycles
EXX  ; 4 cycles
PUSH BC ; 11 cycles
PUSH HL ; 11 cycles
PUSH DE ; 11 cycles
PUSH AF ; 11 cycles
However, if the transfer occurs 14335 cycles after the interrupt arrival, assuming that we copy from non-contended memory, statistically we can assume that each transfer will add about 3 cycles, so LDI variant would be 357 cycles (about 19 cycles per byte) and POP/PUSH variant 287 cycles (15 cycles per byte)
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Double buffering on the Spectrum

Post by Joefish »

Also, you have to completely unroll the PUSH/POP code to get that speed. If you loop it, like the LDI version, then you have to do calculated stack pointer changes, and that takes even longer.
User avatar
rastersoft
Microbot
Posts: 151
Joined: Mon Feb 22, 2021 3:55 pm

Re: Double buffering on the Spectrum

Post by rastersoft »

I had the same problem, and a bunch of LDIs need 24 tstates per instruction because the destination address is kept in the last two tstates of the instruction, causing contention. That's why PUSH/POP is much faster.

You can check how I did a nearly full screen copy (including attributes) using PUSH/POP in the code of Escape from M.O.N.J.A.S.
Post Reply