Double buffering on the Spectrum
Re: Double buffering on the Spectrum
i almost reversed my old code (and put alot more comments in it). there's still one bug left (i have to double-check if it is mine, or i did something the original code wasn't designed for). i'll fix it (i hope ;-), and then i will try to write a post about it all, with code samples and some explanations.
Re: Double buffering on the Spectrum
sorry for bumping, but i want to say that i'm still working on that post/article. it is somewhat hard to split my time between various things, and i want to explain the code and the idea behind it instead of simply dumping it on you, so it require some more time. basically, i'm going through the code line-by-line, adding explanations where i think they're needed, and creating a small sample game to show how one can connect all the pieces together to get something interesting.
it looks like in becomes more a kind of Yet Another Game Programming Tutorial, this time for people who know the asm, and want to do some advanced stuff. dunno if the world needs another one of those, but at least creating it is fun.
it looks like in becomes more a kind of Yet Another Game Programming Tutorial, this time for people who know the asm, and want to do some advanced stuff. dunno if the world needs another one of those, but at least creating it is fun.
Re: Double buffering on the Spectrum
I try to speed up my game, which draws to a memory buffer and then it copies it to the screen. It buffers the entire 6912 screen bytes, so I can use a simple ldir, but I also try the push/pop way. I wrote the following routine and it works, but the screen is a bit corrupted - two columns are doubled on the screen and if I move in the game, the screen is more and more corrupted. I admit that I don't really understand how exactly the stack works (I read some explanations, but it's still not clear). Is anything wrong with my routine and should it be faster than ldir?
Code: Select all
di ; disable interrupt
ld (stack),sp ; save stack address
ld ix,buffer ; start of buffer
ld iy,16384+15 ; start of screen + 15
ld a,216 ; loop counter, 216*32=6912
scrcopy ld i,a ; counter to i
ld sp,ix ; stack is on ix in buffer
pop af ; 16 bytes from buffer to registers
pop bc
pop de
pop hl
exx
pop af
pop bc
pop de
pop hl
exx
ld sp,iy ; stack is on iy in screen
exx
push hl ; 16 bytes from registers to screen in reverse order
push de
push bc
push af
exx
push hl
push de
push bc
push af
ld bc,16
add ix,bc ; increase buffer address by 16
add iy,bc ; increase screen address by 16
ld sp,ix ; stack is on ix in buffer
pop af ; 16 bytes from buffer to registers
pop bc
pop de
pop hl
exx
pop af
pop bc
pop de
pop hl
exx
ld sp,iy ; stack is on iy in screen
exx
push hl ; 16 bytes from registers to screen in reverse order
push de
push bc
push af
exx
push hl
push de
push bc
push af
ld bc,16
add ix,bc ; increase buffer address by 16
add iy,bc ; increase screen address by 16
ld a,i ; counter to a
dec a ; decrease counter by 1
jr nz,scrcopy ; if counter is not 0, copy next 32 bytes
ld sp,(stack) ; restore stack address
ei ; enable interrupt
ret
Re: Double buffering on the Spectrum
Hi, Art.
The EXX instruction swaps BC, DE, HL pairs only. AF can swapped by special instruction EX AF,AF'.
What you do is that you load AF register twice from two different locations without swapping and that is why screen get corrupted.
The EXX instruction swaps BC, DE, HL pairs only. AF can swapped by special instruction EX AF,AF'.
What you do is that you load AF register twice from two different locations without swapping and that is why screen get corrupted.
Proud owner of Didaktik M
Re: Double buffering on the Spectrum
LDIR is slow because it checks the loop status after copying each byte.
If you do LDI 32 times and write your own loop around it (using A as a counter) it will copy a lot faster, comparable to PUSH/POP copying. PUSH/POP is just preferred for multicolour routines as it caches the reads then does all the writes in one quick go.
If you do LDI 32 times and write your own loop around it (using A as a counter) it will copy a lot faster, comparable to PUSH/POP copying. PUSH/POP is just preferred for multicolour routines as it caches the reads then does all the writes in one quick go.
Re: Double buffering on the Spectrum
Thank you for the answers! I added ex af,af to several places, changed 16384+16 and the screen looks OK. But when I move in the game, vector graphics start to compute wrong and the lines move everywhere. When I use ldir, it's OK, I have no idea why. So the push/pop way is probably too complicated for this case. I will try the ldi way.
Re: Double buffering on the Spectrum
So, I used LDI, it works well and doesn't require another counter:
Code: Select all
ld hl,buffer
ld de,16384
ld bc,6912
scrcopy ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
ldi
jp pe,scrcopy
ret
Re: Double buffering on the Spectrum
I would say that Jet Set Willy is "quintuple buffering" game:
1) The room description is stored on higher addresses, each room 256 bytes
2) When Willy enters the room, the corresponding 256 bytes is copied to predetermined place
3) The room description is expanded to 4,5 K buffer consisting of empty room
4) Every frame, empty room is copied to another 4,5K buffer, and it is then filled with Willy's sprite and guards sprites wit collision detection
5) Prepared buffer is copied to actual video memory
1) The room description is stored on higher addresses, each room 256 bytes
2) When Willy enters the room, the corresponding 256 bytes is copied to predetermined place
3) The room description is expanded to 4,5 K buffer consisting of empty room
4) Every frame, empty room is copied to another 4,5K buffer, and it is then filled with Willy's sprite and guards sprites wit collision detection
5) Prepared buffer is copied to actual video memory
Re: Double buffering on the Spectrum
Does anyone know how much faster is PUSH/POP than LDI when copying to contended memory ? I estimate it could be about 16 cycles/byte for POP/PUSH and about 20 cycles/byte for LDI but I never counted it for real.
Proud owner of Didaktik M
Re: Double buffering on the Spectrum
Nope, you forgot contention and loop management.
Proud owner of Didaktik M
Re: Double buffering on the Spectrum
You can not tell it without specifying the time and amount of data you copy.
https://sinclair.wiki.zxnet.co.uk/wiki/Contended_memory
If copying is done immediately after interrupt arrival, or during horizontal retrace and you need to copy 18 bytes , the LDI variant is 308 cycles (17 cycles per byte)
Code: Select all
LD HL,source ; 10 cycles
LD DE,destination ; 10 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
Code: Select all
LD SP,source ; 10 cycles
POP AF ; 10 cycles
POP DE ; 10 cycles
POP HL ; 10 cycles
POP BC ; 10 cycles
EXX ; 4 CYCLES
POP DE ; 10 cycles
POP HL ; 10 cycles
POP BC ; 10 cycles
POP IX ; 14 cycles
POP IY ; 14 cycles
LD SP,DEST+18 ; 10 cycles
PUSH IY ; 15 cycles
PUSH IX ; 15 cycles
PUSH BC ; 11 cycles
PUSH HL ; 11 cycles
PUSH DE ; 11 cycles
EXX ; 4 cycles
PUSH BC ; 11 cycles
PUSH HL ; 11 cycles
PUSH DE ; 11 cycles
PUSH AF ; 11 cycles
Re: Double buffering on the Spectrum
Also, you have to completely unroll the PUSH/POP code to get that speed. If you loop it, like the LDI version, then you have to do calculated stack pointer changes, and that takes even longer.
- rastersoft
- Microbot
- Posts: 151
- Joined: Mon Feb 22, 2021 3:55 pm
Re: Double buffering on the Spectrum
I had the same problem, and a bunch of LDIs need 24 tstates per instruction because the destination address is kept in the last two tstates of the instruction, causing contention. That's why PUSH/POP is much faster.
You can check how I did a nearly full screen copy (including attributes) using PUSH/POP in the code of Escape from M.O.N.J.A.S.
You can check how I did a nearly full screen copy (including attributes) using PUSH/POP in the code of Escape from M.O.N.J.A.S.