Double buffering on the Spectrum

The place for codemasters or beginners to talk about programming any language for the Spectrum.
User avatar
druellan
Dynamite Dan
Posts: 1466
Joined: Tue Apr 03, 2018 7:19 pm

Re: Double buffering on the Spectrum

Post by druellan »

Ast A. Moore wrote: Tue Apr 17, 2018 6:59 pm Curiously enough, in his blog post
Seems he was using a bright 1 black attribute as delimiter. That's clever:

Image
I didn't fully appreciate that Sidewize / Crosswize run at 50fps.
Me either, but when I've learned about EmuZWin's real FPS indicator, I started to test games, and I loaded Sidewize because I can, and I got instantly blown again about how well it performs.

Image

I now notice the flickering, EmuZWin seems not to be doing a great job emulating this.
robpearmain
Drutt
Posts: 21
Joined: Thu Jun 13, 2019 11:52 pm
Location: York, UK
Contact:

Re: Double buffering on the Spectrum

Post by robpearmain »

Joefish wrote: Tue Apr 17, 2018 5:15 pm


Cobra
This one is more complicated - there is no background buffer; the scenery is redrawn from scratch each frame. There are a very limited number of scenic blocks, but it's actually quicker than a scroll+copy. Note also that the scenery now scrolls in 2-pixel steps.
The pre-shifted scenery blocks are loaded into the registers AF, BC, DE and HL then the scenery is drawn by a series of PUSHes, PUSHing the relevant pattern onto the screen at that point. It's wasteful as a scrolling pattern may be half-in and half-out of a character position. So the patterns for solid platforms might be:

AF = empty space
BC = start of platform
DE = continuous platform
HL = end of platform

But before this happens, the program must first re-write the list of PUSH instructions to match the level data. The PUSH list for each row of scenery blocks is re-used to draw all 16 pixel rows. It only has to POP four registers to fetch the scenery patterns for an entire pixel row, so it's quicker that using POP+PUSH to copy a row of data (just as many PUSHes, but fewer POPs).

The scroll can also draw things one-character over, so it only needs scenery pre-shifted to 4 positions (to cover each pattern moving across 8 pixels, not 16). The blacked-off borders either side of the screen are uneven; there's an extra character on one side to hide the scenery redraw moving about, as well as any sprite overlap.

It can have a different set of patterns on each row of the screen, and because of the character-position adjustment it can have either:
- one pattern that can be repeated to any length (e.g. platforms which need 3 pre-shifted patterns to start, repeat and end)
- or 3 separate patterns of a narrow upright object (e.g. pipes up to 10 pixels wide, which always just fit within one 16 pixel wide pre-shifted block)
- or one platform that must start and end within 24 pixels and one narrow upright object.

Finally, there's a faster parallax bit at the bottom of the screen drawn by just PUSHing a 16-bit wide pattern right across the screen. This is programmed to occur twice per game cycle. The result is that this part scrolls at 50fps, making everything seem a bit smoother.
This is great information, what about when a row changed graphics, how did this work (see image attached)

Image
catmeows
Manic Miner
Posts: 711
Joined: Tue May 28, 2019 12:02 pm
Location: Prague

Re: Double buffering on the Spectrum

Post by catmeows »

robpearmain wrote: Fri Jun 14, 2019 12:07 am

This is great information, what about when a row changed graphics, how did this work (see image attached)

Image
By loading values in the middle of line.

Code: Select all

	push bc			 ; 89d2 c5 $0bts
	push af			 ; 89d3 f5 $0bts
	push de			 ; 89d4 d5 $0bts
	push bc			 ; 89d5 c5 $0bts
	push hl			 ; 89d6 e5 $0bts
	push hl			 ; 89d7 e5 $0bts
	push hl			 ; 89d8 e5 $0bts
	ld ($8bf0), sp		 ; 89d9 ed 73 f0 8b $14ts
	ld sp, ix		 ; 89dd dd f9 $0ats
	pop bc			 ; 89df c1 $0ats
	pop de			 ; 89e0 d1 $0ats
	pop af			 ; 89e1 f1 $0ats
	ld sp, ($8bf0)		 ; 89e2 ed 7b f0 8b $14ts
	push af			 ; 89e6 f5 $0bts
	push de			 ; 89e7 d5 $0bts
	push de			 ; 89e8 d5 $0bts
	push de			 ; 89e9 d5 $0bts
	push de			 ; 89ea d5 $0bts
	push de			 ; 89eb d5 $0bts
	push de			 ; 89ec d5 $0bts
	jp $9ced		 ; 89ed c3 ed 9c $0at
	
Proud owner of Didaktik M
User avatar
Joefish
Rick Dangerous
Posts: 2041
Joined: Tue Nov 14, 2017 10:26 am

Re: Double buffering on the Spectrum

Post by Joefish »

Could it really do that?
OK, my 50Hz scrolling demo couldn't, but then there was no 'slack time' in it. I tried to keep mine running at a constant speed, whereas with all its sprites, Cobra varies a lot in execution time. Mine used the alternate registers for an alternate set of platforms, but could only swap them between rows, not columns. I just assumed Cobra did the same.
I did develop some code that used the alternate BC/DE/HL registers (or just as easily AF'/IX/IY) for an extra platform type within a row, but it takes longer to execute.
User avatar
g0blinish
Manic Miner
Posts: 281
Joined: Sun Jun 17, 2018 2:54 pm

Re: Double buffering on the Spectrum

Post by g0blinish »

maybe not related to topic, but I used another tricks for AY Mehademo 3

http://g0blinish.ucoz.ru/forblog2/mudademo.rar
(sources inside,compile with sjasm)
User avatar
ketmar
Manic Miner
Posts: 611
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Double buffering on the Spectrum

Post by ketmar »

sorry for necroposting, but i happened to stumble upon Joffa's name here, and simply couldn't stop myself. i reversed Firefly engine around 10 years ago (man, i'm old now!), extracted map blitting parts from it, and even made a simple editor. i intended to use that in my game, but never wrote it. it works mostly like Cobra, using low part of the screen for the status bar, and for floating bus vsyncing. the map is 32x32 tiles of 16x16 pixels, with wraparound.

if anybody's interested, i can try to reverse-engineer my code (i lost most of my notes and such, and my asm source is not really well commented ;-), and publish it here. actually, the code is quite easy to follow (i think ;-), so i may throw in more comments, and turn it into some kind of simple example/tutorial. i think Joffa would be happy to see others learning from his tricks...
User avatar
Ast A. Moore
Rick Dangerous
Posts: 2640
Joined: Mon Nov 13, 2017 3:16 pm

Re: Double buffering on the Spectrum

Post by Ast A. Moore »

ketmar wrote: Wed Jun 17, 2020 5:04 am i reversed Firefly engine . . . . it works mostly like Cobra, using low part of the screen for the status bar, and for floating bus vsyncing
Curious. I haven’t noticed any floating bus polling in Firefly. Besides, the game runs just fine on a +2A/+3. Do you think you could share the relevant part of Firefly’s code with us?
Every man should plant a tree, build a house, and write a ZX Spectrum game.

Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
User avatar
ketmar
Manic Miner
Posts: 611
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Double buffering on the Spectrum

Post by ketmar »

yeah, that may be added by me to have some more time to do things, it is hard to tell now. as i said, i lost most of my notes and oridinal disassemblies, there's only the "final" extracted engine left. if i'll find my original notes, i'll double-check. the blitter itself doesn't really need that, you can simply "chase the beam", of course.

but tbh, i don't keep hight hopes here. i think i used the mix of IDA and my own tools, and i don't have neither IDA, nor those tools anymore.

yeah, judging from git history (which is very short ;-), floating bus vsync was added by me. so sorry for misinformation. still, "builder" and "blitter" code is almost unmodified (and it seems i introduced small bug there...).

when i'll find some more time, i will properly comment it, and will make a post. i think it will be interesting to see how exactly Joffa did his magic. and if i'll find any my old notes, i'll add them.

p.s.: ah, another thing i remember about Firefly is that it is using two types of compression for maps. i don't remember exact algos, but i think that one was your usual RLE, and another one was kind of LZ coding (but somewhat twisted). or maybe just RLE with another counter size.


p.p.s.: yeah, Firefly is one of my favorite Speccy games. i remember seeing its fantastic 8-way smooth scrolling for the first time and dropped my jaw. so dissecting Firefly was a natural choice. ;-)
User avatar
Ast A. Moore
Rick Dangerous
Posts: 2640
Joined: Mon Nov 13, 2017 3:16 pm

Re: Double buffering on the Spectrum

Post by Ast A. Moore »

Firefly is by far my favorite Joffa’s game. Pretty brilliant concept and implementation.
Every man should plant a tree, build a house, and write a ZX Spectrum game.

Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
User avatar
ketmar
Manic Miner
Posts: 611
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Double buffering on the Spectrum

Post by ketmar »

Ast A. Moore wrote: Wed Jun 17, 2020 2:53 pm Firefly is by far my favorite Joffa’s game. Pretty brilliant concept and implementation.
yeah, i absolutely agree. and it aged very well: it still looks fantastic, and insanely addictive. i never really liked Cobra or Green Beret (excellent technical implementation, but the gameplay is not my cup of tea). but i must confess that i am still playing Firefly from time to time, and it is still as good as it was the first time.

randomized world map is a brilliant touch too: it always looks like there are more unexplored maps there, even if you know that there are not so many of them. and each game feels like a fresh one.

eh, i can keep praizing Firefly the whole day... ;-)
User avatar
ketmar
Manic Miner
Posts: 611
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Double buffering on the Spectrum

Post by ketmar »

i almost reversed my old code (and put alot more comments in it). there's still one bug left (i have to double-check if it is mine, or i did something the original code wasn't designed for). i'll fix it (i hope ;-), and then i will try to write a post about it all, with code samples and some explanations.
User avatar
MonkZy
Manic Miner
Posts: 278
Joined: Thu Feb 08, 2018 1:01 pm

Re: Double buffering on the Spectrum

Post by MonkZy »

ketmar wrote: Fri Jun 19, 2020 10:53 am i will try to write a post about it all, with code samples and some explanations.
I look forward to this.
User avatar
ketmar
Manic Miner
Posts: 611
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Double buffering on the Spectrum

Post by ketmar »

sorry for bumping, but i want to say that i'm still working on that post/article. it is somewhat hard to split my time between various things, and i want to explain the code and the idea behind it instead of simply dumping it on you, so it require some more time. basically, i'm going through the code line-by-line, adding explanations where i think they're needed, and creating a small sample game to show how one can connect all the pieces together to get something interesting.

it looks like in becomes more a kind of Yet Another Game Programming Tutorial, this time for people who know the asm, and want to do some advanced stuff. dunno if the world needs another one of those, but at least creating it is fun.
Art
Manic Miner
Posts: 204
Joined: Fri Jul 17, 2020 7:21 pm

Re: Double buffering on the Spectrum

Post by Art »

I try to speed up my game, which draws to a memory buffer and then it copies it to the screen. It buffers the entire 6912 screen bytes, so I can use a simple ldir, but I also try the push/pop way. I wrote the following routine and it works, but the screen is a bit corrupted - two columns are doubled on the screen and if I move in the game, the screen is more and more corrupted. I admit that I don't really understand how exactly the stack works (I read some explanations, but it's still not clear). Is anything wrong with my routine and should it be faster than ldir?

Code: Select all

         di                  ; disable interrupt
         ld (stack),sp       ; save stack address
         ld ix,buffer        ; start of buffer
         ld iy,16384+15      ; start of screen + 15
         ld a,216            ; loop counter, 216*32=6912
scrcopy  ld i,a              ; counter to i

         ld sp,ix            ; stack is on ix in buffer
         pop af              ; 16 bytes from buffer to registers
         pop bc
         pop de
         pop hl
         exx
         pop af
         pop bc
         pop de
         pop hl
         exx
         ld sp,iy            ; stack is on iy in screen
         exx
         push hl             ; 16 bytes from registers to screen in reverse order
         push de
         push bc
         push af
         exx
         push hl
         push de
         push bc
         push af
         ld bc,16
         add ix,bc           ; increase buffer address by 16
         add iy,bc           ; increase screen address by 16
         
         ld sp,ix            ; stack is on ix in buffer
         pop af              ; 16 bytes from buffer to registers
         pop bc
         pop de
         pop hl
         exx
         pop af
         pop bc
         pop de
         pop hl
         exx
         ld sp,iy            ; stack is on iy in screen
         exx
         push hl             ; 16 bytes from registers to screen in reverse order
         push de
         push bc
         push af
         exx
         push hl
         push de
         push bc
         push af
         ld bc,16
         add ix,bc           ; increase buffer address by 16
         add iy,bc           ; increase screen address by 16
         
         ld a,i              ; counter to a
         dec a               ; decrease counter by 1
         jr nz,scrcopy       ; if counter is not 0, copy next 32 bytes
         ld sp,(stack)       ; restore stack address
         ei                  ; enable interrupt
         ret
catmeows
Manic Miner
Posts: 711
Joined: Tue May 28, 2019 12:02 pm
Location: Prague

Re: Double buffering on the Spectrum

Post by catmeows »

Hi, Art.
The EXX instruction swaps BC, DE, HL pairs only. AF can swapped by special instruction EX AF,AF'.
What you do is that you load AF register twice from two different locations without swapping and that is why screen get corrupted.
Proud owner of Didaktik M
User avatar
Joefish
Rick Dangerous
Posts: 2041
Joined: Tue Nov 14, 2017 10:26 am

Re: Double buffering on the Spectrum

Post by Joefish »

LDIR is slow because it checks the loop status after copying each byte.
If you do LDI 32 times and write your own loop around it (using A as a counter) it will copy a lot faster, comparable to PUSH/POP copying. PUSH/POP is just preferred for multicolour routines as it caches the reads then does all the writes in one quick go.
Art
Manic Miner
Posts: 204
Joined: Fri Jul 17, 2020 7:21 pm

Re: Double buffering on the Spectrum

Post by Art »

Thank you for the answers! I added ex af,af to several places, changed 16384+16 and the screen looks OK. But when I move in the game, vector graphics start to compute wrong and the lines move everywhere. When I use ldir, it's OK, I have no idea why. So the push/pop way is probably too complicated for this case. I will try the ldi way.
Art
Manic Miner
Posts: 204
Joined: Fri Jul 17, 2020 7:21 pm

Re: Double buffering on the Spectrum

Post by Art »

So, I used LDI, it works well and doesn't require another counter:

Code: Select all

         ld hl,buffer
         ld de,16384
         ld bc,6912
scrcopy  ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         ldi
         jp pe,scrcopy
         ret
megaribi
Drutt
Posts: 15
Joined: Thu Nov 25, 2021 4:31 pm

Re: Double buffering on the Spectrum

Post by megaribi »

I would say that Jet Set Willy is "quintuple buffering" game:
1) The room description is stored on higher addresses, each room 256 bytes
2) When Willy enters the room, the corresponding 256 bytes is copied to predetermined place
3) The room description is expanded to 4,5 K buffer consisting of empty room
4) Every frame, empty room is copied to another 4,5K buffer, and it is then filled with Willy's sprite and guards sprites wit collision detection
5) Prepared buffer is copied to actual video memory
catmeows
Manic Miner
Posts: 711
Joined: Tue May 28, 2019 12:02 pm
Location: Prague

Re: Double buffering on the Spectrum

Post by catmeows »

Does anyone know how much faster is PUSH/POP than LDI when copying to contended memory ? I estimate it could be about 16 cycles/byte for POP/PUSH and about 20 cycles/byte for LDI but I never counted it for real.
Proud owner of Didaktik M
User avatar
Bedazzle
Manic Miner
Posts: 303
Joined: Sun Mar 24, 2019 9:03 am

Re: Double buffering on the Spectrum

Post by Bedazzle »

Joefish wrote: Tue Dec 21, 2021 9:37 pm If you do LDI 32 times and write your own loop around it (using A as a counter) it will copy a lot faster, comparable to PUSH/POP copying.
POP+PUSH is 12 T-states per byte, while LDI is 16 T-states per byte.
25%
catmeows
Manic Miner
Posts: 711
Joined: Tue May 28, 2019 12:02 pm
Location: Prague

Re: Double buffering on the Spectrum

Post by catmeows »

Bedazzle wrote: Wed Dec 22, 2021 8:31 am
Joefish wrote: Tue Dec 21, 2021 9:37 pm If you do LDI 32 times and write your own loop around it (using A as a counter) it will copy a lot faster, comparable to PUSH/POP copying.
POP+PUSH is 12 T-states per byte, while LDI is 16 T-states per byte.
25%
Nope, you forgot contention and loop management.
Proud owner of Didaktik M
megaribi
Drutt
Posts: 15
Joined: Thu Nov 25, 2021 4:31 pm

Re: Double buffering on the Spectrum

Post by megaribi »

catmeows wrote: Wed Dec 22, 2021 1:07 am Does anyone know how much faster is PUSH/POP than LDI when copying to contended memory ? I estimate it could be about 16 cycles/byte for POP/PUSH and about 20 cycles/byte for LDI but I never counted it for real.
You can not tell it without specifying the time and amount of data you copy.
https://sinclair.wiki.zxnet.co.uk/wiki/Contended_memory
If copying is done immediately after interrupt arrival, or during horizontal retrace and you need to copy 18 bytes , the LDI variant is 308 cycles (17 cycles per byte)

Code: Select all

LD HL,source  ; 10 cycles
LD DE,destination ; 10 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
LDI ; 16 cycles
And POP/PUSH variant is 233 cycles (13 cycles per byte)

Code: Select all

LD SP,source  ; 10 cycles
POP AF ; 10 cycles
POP DE ; 10 cycles
POP HL ; 10 cycles
POP BC ; 10 cycles
EXX  ; 4 CYCLES
POP DE  ; 10 cycles
POP HL ; 10 cycles
POP BC ; 10 cycles
POP IX ; 14 cycles
POP IY  ; 14 cycles
LD SP,DEST+18 ; 10 cycles
PUSH IY  ; 15 cycles
PUSH IX  ; 15 cycles
PUSH BC  ; 11 cycles
PUSH HL ; 11 cycles
PUSH DE ; 11 cycles
EXX  ; 4 cycles
PUSH BC ; 11 cycles
PUSH HL ; 11 cycles
PUSH DE ; 11 cycles
PUSH AF ; 11 cycles
However, if the transfer occurs 14335 cycles after the interrupt arrival, assuming that we copy from non-contended memory, statistically we can assume that each transfer will add about 3 cycles, so LDI variant would be 357 cycles (about 19 cycles per byte) and POP/PUSH variant 287 cycles (15 cycles per byte)
User avatar
Joefish
Rick Dangerous
Posts: 2041
Joined: Tue Nov 14, 2017 10:26 am

Re: Double buffering on the Spectrum

Post by Joefish »

Also, you have to completely unroll the PUSH/POP code to get that speed. If you loop it, like the LDI version, then you have to do calculated stack pointer changes, and that takes even longer.
User avatar
rastersoft
Microbot
Posts: 151
Joined: Mon Feb 22, 2021 3:55 pm

Re: Double buffering on the Spectrum

Post by rastersoft »

I had the same problem, and a bunch of LDIs need 24 tstates per instruction because the destination address is kept in the last two tstates of the instruction, causing contention. That's why PUSH/POP is much faster.

You can check how I did a nearly full screen copy (including attributes) using PUSH/POP in the code of Escape from M.O.N.J.A.S.
Post Reply