Optimisation Challenge, for the benefit of new programmers

The place for codemasters or beginners to talk about programming any language for the Spectrum.
Ralf
Rick Dangerous
Posts: 2288
Joined: Mon Nov 13, 2017 11:59 am
Location: Poland

Re: Optimisation Challenge, for the benefit of new programmers

Post by Ralf »

Your "crap" version appears to use undocumented Z80 instructions, makes extensive use of the alternate registers, nicely sets up a LDI instruction, and unrolls a loop. That's all before it does something clever to handle the screen address, which I don't understand. I've a feeling your "crap" code is already streets ahead of what the vast majority of programmers could write, let alone "new" ones.
Yes, you shouldn't show a standard, decent code and call it crap.
User avatar
Pobulous
Dynamite Dan
Posts: 1365
Joined: Wed Nov 15, 2017 12:51 pm

Re: Optimisation Challenge, for the benefit of new programmers

Post by Pobulous »

OK, this version implements the border rule, and can be adjusted with the EQUs at the top, and I've included a routine to convert graphics from attr, line1-8, attr, line1-8 etc into attr*256, line1*256, line2*256, etc. USR entry point for conversion code is 32896.

All data starting addresses need to be 256 byte aligned.

I've no idea how fast it is compared to other solutions. On the plus side it's not too hard to follow and has some optimisations....

Code: Select all

TopEdge 	EQU 64 		;(NumRows*32)
BottomEdge	EQU 192		;(256-(NumRows*32)
LeftEdge	EQU 2		;NumCols
RightEdge	EQU 30		;32-NumCols


ORG 32768

	DI				;Disable interrupts for speed
	
	LD DE, 16384
	LD BC, 22528
	
	LD HL, CharMap

	LD A,E
	CP TopEdge
	JP NC, TestLeftSide
	LD A,TopEdge
	LD E,A
	LD L,A
	LD C,A

PrintLoop:

			LD A,D

TestBottomEdge:
			CP 80
			LD A,E
			JP NZ, TestLeftSide
			CP BottomEdge
			JP C, TestLeftSide
			EI				;Finished, Return
			RET

TestLeftSide:
			AND 31
			CP LeftEdge
			JP NC, TestRightSide
			LD A,LeftEdge
			ADD A,E
			LD E,A
			LD L,A
			LD C,A
			JP PrintLoop
			
TestRightSide:
			CP RightEdge
			JP C, PrintChar
			LD A, 31
			OR E
			LD E,A
			LD L,A
			LD C,A
			JP SkipPrinting	

PrintChar:
			LD A,(HL)		;Get next CharMap character to print
					
			CP 8			;chars 0-(n-1) (0-7 in this case) will only update attr data (set to 0 to disable this feature)
			
			PUSH HL			;Store CharMap location
			LD H,UDGs/256	;Get base address of UDGs
			LD L,A			;Select UDG 1-255
			LD A,(HL)		;copy ATTR data
			LD (BC),A
			
			JP C,PrintATTROnly
			
			INC H			;Get first line of chardata
			PUSH DE			;Store current screen address

			LD A,(HL)		;copy 8 lines of UDG data to screen
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A

			POP DE			;Restore screen address
			
PrintATTROnly:
			POP HL			;Restore CharMap location
			
SkipPrinting:
			INC BC			;Advance to next ATTR position
			INC HL			;Advance to next CharMap character
			INC E			;Advance screen address

		JP NZ, PrintLoop	;Loop until E wraps to zero - signals a third of the screen is completed
		
		
		LD A,D				;Adjust DE to point to next screen third
		ADD A,8
		LD D,A
		
		CP 88				;If MSB of screen addr is now 88, then we are done.
	JP C, TestBottomEdge	;Do next screen third
	
	EI
	RET


	
PrepUDGS:				;Run this before main code if required.
	LD DE,UDGs
	LD HL,RawUDGs		;Graphics in ATTR followed by 8 lines of pixels
	
UDGLineLoop:

		PUSH BC			;Save outer loop counter
		PUSH DE
		
		LD B,9
UDGLoop:
			LD A,(HL)
			LD (DE),A
			INC HL
			INC D
			
		DJNZ UDGLoop
				
		POP DE
		INC E
		
	JP NZ, UDGLineLoop
	
	RET


ORG 49152
CharMap:
	db 8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8
	db 8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8
	db 8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  10,11,0,0,0,1,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,9,10,11,10,11,0,  1,0,9,10,11,1,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,1,1,1,1,13,9,  1,9,1,12,11,1,11,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,1,12,13,12,13,0,  1,0,1,12,13,1,1,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,12,13,0,0,0,0,12,  13,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,8,8,8
	db 8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8
	db 8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8
	db 8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8,  8,8,8,8,8,8,8,8


UDGs:				;256 8 byte arrays of attribs abd characters aligned on 256 byte boundary, interleaved
					;256 ATTR bytes, followed by 256 top rows of UDGS, then 256 2nd rows, etc
	db  0,0,8,16,24,32,40,48,56,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
	db  24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
	db  50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75
	db  76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
	db  102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
	db  128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153
	db  154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179
	db  180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205
	db  206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231
	db  232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12
	db  0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60
	db  0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60
	db  8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16
	db  0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0
	db  0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64
	db  16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60,0,32
	db  0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126
	db  60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0
	db  60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0
	db  0,0,0,0,0,0,0,0,0,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16
	db  60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66
	db  56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66
	db  24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56
	db  68,68,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120
	db  56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64
	db  0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66,56,32
	db  28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64
	db  64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68
	db  70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104
	db  0,0,0,0,0,0,0,0,0,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24
	db  68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66
	db  4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12
	db  40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16
	db  68,68,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68
	db  68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120
	db  48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66,4,60
	db  32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124
	db  124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68
	db  74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84
	db  0,0,0,0,0,0,0,0,0,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16
	db  68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62
	db  60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2
	db  72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16
	db  68,40,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68
	db  68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68
	db  16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62,60,34
	db  32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2
	db  66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40
	db  82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84
	db  0,0,0,0,0,0,0,0,0,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16
	db  60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2
	db  68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66
	db  126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16
	db  68,40,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68
	db  68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68
	db  16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2,68,34
	db  32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66
	db  66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40
	db  98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84
	db  0,0,0,0,0,0,0,0,0,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16
	db  4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60
	db  60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60
	db  8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12
	db  56,16,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68
	db  56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68
	db  56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60,60,60
	db  28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60
	db  60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16
	db  60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0
	db  0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0
	db  0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0
	db  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0	


RawUDGs:
	db 9*0, 0,0,0,0,0,0,0,0
	db 9*3+64, 0,0,0,0,0,0,0,0
	db 9*2, 0,0,0,0,0,0,0,0
	db 9*3, 0,0,0,0,0,0,0,0
	db 9*4, 0,0,0,0,0,0,0,0
	db 9*5, 0,0,0,0,0,0,0,0
	db 9*6, 0,0,0,0,0,0,0,0
	db 9*7, 0,0,0,0,0,0,0,0

	db	01110000b,    11001100b,01100110b,00110011b,10011001b,11001100b,01100110b,00110011b,10011001b
	
	db	01000011b,    00111100b,01111110b,11111111b,11111111b,11111111b,11111111b,01111110b,00111100b
	
	db	01000011b,    00000000b,00000011b,00001111b,00011111b,00111111b,00111111b,01111111b,01111111b
	db	01000011b,    00000000b,11000000b,11110000b,11111000b,11111100b,11111100b,11111110b,11111110b
	db	01000011b,    01111111b,01111111b,00111111b,00111111b,00011111b,00001111b,00000011b,00000000b
	db	01000011b,    11111110b,11111110b,11111100b,11111100b,11111000b,11110000b,11000000b,00000000b
Output using Joefish's image data.
Image
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

dfzx wrote: Fri Apr 29, 2022 3:00 pm
Joefish wrote: Thu Apr 28, 2022 7:57 pm In that case, here's a crap version with no optimisation for a start.
It does the job, but not in any particular rush.
That takes just about exactly 5 stopwatch seconds to run 100 times (in a BASIC loop). So 50ms per iteration as a starting point. :)
Probably better to use a loop of machine code calls, as BASIC calls themselves are pretty slow.
User avatar
Pobulous
Dynamite Dan
Posts: 1365
Joined: Wed Nov 15, 2017 12:51 pm

Re: Optimisation Challenge, for the benefit of new programmers

Post by Pobulous »

Slight bit of optimisation, fixed a bug in the graphic preparing routine, included the output of that routine from joefish's UDGs, added a benchmark routine, added more comments.

Benchmarks at around 455 50ths of a second for 256 passes in 48K in Fuse. 128K machines are slightly faster at 449 in 128K or 48K mode.
Either way it's just over 35ms per pass.

Code: Select all

ScreenAddr	EQU	16384		;These could be changed to target 2nd Screen on 128K machines
ATTRAddr	EQU 22528

;These four are used to set the non-printing margins
TopEdge 	EQU 64 		;(NumRows*32)  (Maximum 7 lines)
BottomEdge	EQU 192		;(256-(NumRows*32) (Maximum 8 lines)
LeftEdge	EQU 2		;NumCols
RightEdge	EQU 30		;32-NumCols
EmptyUDGs	EQU 2		;UDGs 0-(n-1) (0-1 in this case) will only update attr data (set to 0 to disable this feature)



ORG 32768

	DI					;Disable interrupts for speed
DisplayScreen:	
	LD DE, ScreenAddr	;Initialise the main registers - Worthy of note is that the LSB of ScreenAddr, ATTRAddr and CharMap will be equal throughout the runtime.
	LD BC, ATTRAddr
	LD HL, CharMap		;Must be divisible by 256

	LD A,TopEdge		;Check for non-printing margin
	AND A
	JP Z, TestLeftSide	;If TopEdge is larger than current position, then set current position print ScreenAddr LSB to TopEdge
	LD E,A			
	LD L,A				;ATTRAddr and CharMap LSB can be updated likewise
	LD C,A

PrintLoop:

			LD A,D				;We will be testing if we are in bottom third of screen and if so, testing if we are in non-printing border area

TestBottomEdge:					;Entry point for if A already contains MSB of ScreenAddr
			CP 80				;Are we in bottom 3rd of screen?
			LD A,E				;LSB is needed to be tested whatever portion of the screen we are in
			JP NZ, TestLeftSide	;Jump if not in bottom third
			CP BottomEdge		;Test if we are in bottom non-printing area
			JP C, TestLeftSide	;Jump if not
			EI					;Finished printing, Enable Interrupts and return
			RET

TestLeftSide:
			AND 31					;A contains LSB of ScreenAddr, lowest 5 bits contains X position
			CP LeftEdge				;Are we in left non-printing border?
			JP NC, TestRightSide	;Jump to next test if not
			LD A,LeftEdge			;Add border offset to LSB ScreenAddr
			ADD A,E
			LD E,A					
			LD L,A					;ATTRAddr and CharMap LSB can be updated likewise
			LD C,A
			JP PrintLoop			;Skip over right border test
			
TestRightSide:
			CP RightEdge			;compare screen X position with right-hand non-printing border
			JP C, PrintChar			;Jump into printing routine if not in border
			LD A, 31				;Set ScreenAddr position to x=31 on the current line
			OR E
			LD E,A
			LD L,A					;ATTRAddr and CharMap LSB can be updated likewise
			LD C,A
			JP SkipPrinting			;Skip Printing, via checking for end of third of screen.

PrintChar:
			LD A,(HL)			;Get next CharMap character to print
					
			CP EmptyUDGs		;Test if will only be updating ATTR	
			
			PUSH HL				;Store CharMap location
			LD H,UDGs/256		;Get base address of UDGs
			LD L,A				;Select UDG 1-255
			LD A,(HL)			;copy ATTR data
			LD (BC),A
			
			JP C,PrintATTROnly	;Jump if updating ATTR only
			
			INC H				;Get first line of chardata
			PUSH DE				;Store current screen address

			LD A,(HL)			;copy 8 lines of UDG data to screen
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A
			INC H
			INC D
			LD A,(HL)
			LD (DE),A

			POP DE			;Restore screen address
			
PrintATTROnly:
			POP HL			;Restore CharMap location
			
SkipPrinting:
			INC BC			;Advance to next ATTR position
			INC HL			;Advance to next CharMap character
			INC E			;Advance screen address

		JP NZ, PrintLoop	;Loop until E wraps to zero - signals a third of the screen is completed
		
		
		LD A,D				;Adjust DE to point to next screen third
		ADD A,8
		LD D,A
		
		CP 88				;If MSB of screen addr is now 88, then we are done.
	JP C, TestBottomEdge	;Do next screen third
	
	EI
	RET


	
PrepUDGS:				;Run this before main code if required.
	LD DE,UDGs
	LD HL,RawUDGs		;Graphics in ATTR followed by 8 lines of pixels
	
UDGLineLoop:
		PUSH DE
		
		LD B,9
UDGLoop:
			LD A,(HL)
			LD (DE),A
			INC HL
			INC D
			
		DJNZ UDGLoop
				
		POP DE
		INC E
		
	JP NZ, UDGLineLoop	;Loop will finish after 256 iterations when E overflows back to zero
	
	RET
	
Benchmark:				;Run the DisplayScreen routine 256 times and return elapsed time in 50ths of a second
	LD BC,0
	LD (23672),BC		;Set lower 2 bytes of Frames to 0
SpeedLoop:
	PUSH BC				;Run Routine 256 times
	CALL DisplayScreen
	POP BC
	DJNZ SpeedLoop
	LD BC,(23672)		;Return elapsed time in 50ths of a second
	RET
	
ORG 49152
CharMap:
	db 	2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2
	db 	2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2
	db 	2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  4,5,0,0,0,1,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,3,4,5,4,5,0,  1,0,3,4,5,1,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,1,1,1,1,7,3,  1,3,1,6,5,1,5,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,1,6,7,6,7,0,  1,0,1,6,7,1,1,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,6,7,0,0,0,0,6,  7,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0,  0,0,0,0,0,2,2,2
	db 	2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2
	db 	2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2
	db 	2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2,  2,2,2,2,2,2,2,2


UDGs:				;256 8 byte arrays of attribs and characters aligned on 256 byte boundary, interleaved
					;256 ATTR bytes, followed by 256 top rows of UDGS, then 256 2nd rows, etc
	db 0,91,112,67,67,67,67,67,0,0,0,0,0,0,0,0			;attribs
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	
	db 0,0,204,60,0,0,127,254,0,0,0,0,0,0,0,0			;1st row of gfx data
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	
	db 0,0,102,126,3,192,127,254,0,0,0,0,0,0,0,0		;2nd row of gfx data
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	
	db 0,0,51,255,15,240,63,252,0,0,0,0,0,0,0,0			;3rd row of gfx data
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	
	db 0,0,153,255,31,248,63,252,0,0,0,0,0,0,0,0		;4th row of gfx data
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	
	db 0,0,204,255,63,252,31,248,0,0,0,0,0,0,0,0		;5th row of gfx data
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	
	db 0,0,102,255,63,252,15,240,0,0,0,0,0,0,0,0		;6th row of gfx data
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	
	db 0,0,51,126,127,254,3,192,0,0,0,0,0,0,0,0			;7th row of gfx data
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	
	db 0,0,153,60,127,254,0,0,0,0,0,0,0,0,0,0			;8th row of gfx data
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
	db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


RawUDGs:
	db 9*0, 0,0,0,0,0,0,0,0
	db 9*3+64, 0,0,0,0,0,0,0,0

	db	01110000b,    11001100b,01100110b,00110011b,10011001b,11001100b,01100110b,00110011b,10011001b
	
	db	01000011b,    00111100b,01111110b,11111111b,11111111b,11111111b,11111111b,01111110b,00111100b
	
	db	01000011b,    00000000b,00000011b,00001111b,00011111b,00111111b,00111111b,01111111b,01111111b
	db	01000011b,    00000000b,11000000b,11110000b,11111000b,11111100b,11111100b,11111110b,11111110b
	db	01000011b,    01111111b,01111111b,00111111b,00111111b,00011111b,00001111b,00000011b,00000000b
	db	01000011b,    11111110b,11111110b,11111100b,11111100b,11111000b,11110000b,11000000b,00000000b
Now I need to find a way to utilise the routine in the crap game compo.
User avatar
Pobulous
Dynamite Dan
Posts: 1365
Joined: Wed Nov 15, 2017 12:51 pm

Re: Optimisation Challenge, for the benefit of new programmers

Post by Pobulous »

I'm looking at building on this routine to actually display some gameplay.

One thing that struck me is that the other parts that need to be done (outside the routine) are:

Have a background map of tiles for non-moving elements of the screen and initialise the CharMap from this
Have some code to put moveable elements into the CharMap

Then every frame:

Display CharMap (using above created routine)
Execute program logic
Copy background map tiles to CharMap
Copy sprites to CharMap
Loop to DisplayCharMap

A useful optimisation is to have UDG 0 be a do nothing rather than just update ATTRs (I had it this way originally)

Now the method is:
Copy levelmap to background map
Initialise CharMap by LDIRing background map tiles to CharMap

Display CharMap (using above created routine, skipping over zero UDGs)
Execute program logic
Copy background map tiles to CharMap, setting any unchanged tiles to zero in CharMap, and leaving any zeroed tiles as are
Copy sprites to CharMap
Loop to DisplayCharMap


In most game types a lot of the map won't change, but may contain background graphics. Could be the dots and walls in pacman, or the mushrooms in centipede, or ladders and platforms, etc.
Setting those to zero and skipping drawing them will speed things up - kind of a simplified dirty rectangles.
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

I think trying to keep track of unchanged tiles just makes the program far more complicated than it needs to be.

Certainly if you've got a scrolling background, you don't want 'holes' in the rendering, which is why I don't bother with a skippable UDG #0.

The simplest program loop is simply to copy in your background again across the whole character map. The hidden border area lets you do this from a block-based scenery map, without having to worry about trimming those blocks to fit exactly.

Then add your sprites over the top. Now when copying the sprites into the character map (or even copying parallax layers of foreground scenery), then you probably want a transparent, 'don't draw me', character 0, so you can leave gaps in your sprites to show the background.

But then when you finally call the render function, it renders the whole screen without gaps.
User avatar
ketmar
Manic Miner
Posts: 700
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Optimisation Challenge, for the benefit of new programmers

Post by ketmar »

for the real game engine, you actually can get away with the backbuffer only for 32x8 pixels (and attrs), and some flags in tilemap. the idea is that your tiles has flags "not changed", "blit", "dirty", and each character row has a list of sprites touching it. now, the main loop is something like this:

Code: Select all

clear sprite lists
loop over all sprites, put each sprite in row lists
loop over all tile rows, for each row (using our 1-row backbuffer):
  render all sprites into backbuffer, also mark all touched tiles as "dirty" here
  now loop over all 32 tiles:
    "not changed": ignore
    "blit": copy tile from gfx data directly to screen, mark it as "not changed"
    "dirty": copy backbuffer gfx, mark tile as "blit"
by decrementing tile flag (dirty->blit->unchanged), you also get automatic erasing of printed sprites for free, no need to track them. ;-)

this way you can support a lot of sprites, background tile animations, and still render only changed screen parts. i'm planning to write a demo engine based on this… for several years, lol.

p.s.: and thanks to "row sorting" step, you can do fast collision detection for many sprites too: you already have almost all the info ready. i.e. each sprite marks tiles as "dirty", so you can use that to perform "broad phase" of coldet (checking if there is a possible collision), and use precise check only when necessary. i.e. time spent on collision checking is almost independed of the number of active sprites.
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

dfzx wrote: Sat Apr 30, 2022 8:13 pm @Joefish I think you might have a dose of excessive average familiarity, so per this XKCD comic:

Image

Your "crap" version appears to use undocumented Z80 instructions, makes extensive use of the alternate registers, nicely sets up a LDI instruction, and unrolls a loop. That's all before it does something clever to handle the screen address, which I don't understand. I've a feeling your "crap" code is already streets ahead of what the vast majority of programmers could write, let alone "new" ones. :lol:
Well, yes, 'crap' for me is to just dump some stuff to get to the first thing that actually seems to work. For some reason I'm averse to in-memory variable storage and the stack and try to do everything in registers (hence extensive use of the 'undocumented' IXH , IXL etc. instructions as caches where someone else might use PUSH and POP). Although in my defence, they are some of the most well-documented 'undocumented' features of anything ever! :lol:

The only 'optimisation' change I made was realising that the attributes should come before the pixels in the UDG definition rather than after, to make it easier to copy the attribute but skip the pixel step. In assembly it's perfectly possible to write functions that skip operations by entering the function later rather than exiting it early, but you really don't want to go there...
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

Anyway, my idea for a scrolling game using this system is to split the UDGs roughly in half, the first 128 for scenery and the rest for sprites and animations. The first half are then split again; 64 for foreground stuff and the other 64 split, yet again, four ways for the parallax. That leaves 16 UDGs, so either an 8x2 or 4x4 grid, with 4 frames of 2-pixel pre-shifting, rotated in-place.

With your foreground character map you have 0s in it where you want the background scroll to show through, so you copy up the foreground, then insert the relevant background characters in the zero spaces. Anything that's still 0 just gets rendered as a black (blue?) square UDG at the final stage.

The slight drawback with this is if you want bands of colour in your background, your sprite UDGs can't easily change PAPER colours to match. So if your parallax is a band between two colours (e.g. sky blue to black), best keep it high up the screen where your character can't jump. If it's lower down, stick with all black PAPER.
...Or stick with very chunky sprites that always have black PAPER.
...Or modify the renderer to OR in different PAPER colours per row as it renders the screen. Or AND the colours in, then if your UDGs have either white or black PAPER you can control which ones show background colour.
...Or re-write the code so that there are two bytes per cell in the character map, one for UDGs (pixels only) and the second byte for attributes.
But we'll leave those as an exercise for anyone who thinks they've mastered this renderer first!
Last edited by Joefish on Fri May 06, 2022 3:20 pm, edited 1 time in total.
User avatar
PeterJ
Site Admin
Posts: 6878
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: Optimisation Challenge, for the benefit of new programmers

Post by PeterJ »

It would be nice to have some more of these types of things, though perhaps aimed at a rather lower level of experience. The sorts of assumptions that Jonathan makes in his creating arcade game book would be perfect.

Image

I'm only on chapter 5, but I understand his code (and feel comfortable adapting it), and if I'm struggling I consult the Z80 documentation. It's amazing what you can achieving with load, inc, dec, pop, push, compare, call and jumps!
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

It's just meant to be a tool. A one-level up assembly-friendly version of CharAde. You don't necessarily need to understand all the code in a library function in order to use it. The point here is to get some expert optimisation of a routine for rendering to the screen from a character map. The level of knowledge a programmer then needs in order to manipulate that character-mapped version of the screen (and the time and memory costs of doing so) are much reduced.
User avatar
PeterJ
Site Admin
Posts: 6878
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: Optimisation Challenge, for the benefit of new programmers

Post by PeterJ »

Got you @Joefish. Thanks!
User avatar
Pobulous
Dynamite Dan
Posts: 1365
Joined: Wed Nov 15, 2017 12:51 pm

Re: Optimisation Challenge, for the benefit of new programmers

Post by Pobulous »

@PeterJ It might be worth having a go at creating a simple routine as the original description. It's a small enough routine to be manageable, but with lots of ways to go, and a chance to gain some extra understanding of the subtleties of the Spectrum screen layout.

For instance, it had never dawned on me before that the least significant byte of the addresses of the pixels and the attributes is always the same.

The trickiest part is needing to juggle 4 addresses, plus some loop variables - you soon run out of registers. I started out by using the IX and IY pairs, then slowly optimised those away.

I'm sure my code can be sped up, but it's been fun coding it and optimising it (to some extent), and now I'm slowly building out the rest of the parts needed to make it into an actual engine for a game (character movement only mind you!). One small routine at a time. That's also helped me to find and fix some edge cases that weren't quite correct.
User avatar
ketmar
Manic Miner
Posts: 700
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Optimisation Challenge, for the benefit of new programmers

Post by ketmar »

the worst thing with the original task is 8x8 tiles. if only they could be 16x16… then the whole bag of stack tricks could be used (like i did with attribute copying in one of my solutions).

p.s.: and strict restrictions on additional memory… no JIT compiling, no tilemap massaging… sigh
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

ketmar wrote: Sat May 07, 2022 12:16 am the worst thing with the original task is 8x8 tiles. if only they could be 16x16… then the whole bag of stack tricks could be used (like i did with attribute copying in one of my solutions).
p.s.: and strict restrictions on additional memory… no JIT compiling, no tilemap massaging… sigh
By all means start another thread! But if you're rendering a screen from larger tiles, what are you going to do for sprites? Or scrolling?
User avatar
ketmar
Manic Miner
Posts: 700
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Optimisation Challenge, for the benefit of new programmers

Post by ketmar »

Joefish wrote: Tue May 17, 2022 12:04 pmBy all means start another thread! But if you're rendering a screen from larger tiles, what are you going to do for sprites? Or scrolling?
oh, it really depends of the game requirements, so it's quite hard to create something truly universal. like Firefly engine, which is rendering directly to the screen, and can cope only with several sprites… but hey, 25 FPS! or Scrappy Doo engine, which is 16 FPS, and renders everything to the backbuffer, then waits for the ray to start drawing the screen, and blits backbuffer… decisions, decisions…

it would prolly be possible to clone Potsworth engine (with seems to be the next version of Scrappy Doo engine) to use it in various platformers, i think. maybe i really should start a thread like yours for that. have to think about it…
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

A tiled engine could do platforms or mazes. And if you can have some way of prioritising tiles over sprites you can even do those pseudo-overhead RPG type views, where your sprite can go behind the tips of building roofs or tops of trees.

That's fairly easily done on a character-based engine by simply defining roof UDGs and the tops of walls as high numbers, and paths and fronts of buildings, walls, etc. as low number UDGs. Then write some code so that your sprite UDGs can only be drawn over the low numbers. So when you overlap with the top of a wall, your feet disappear. But when you overlap the lower part of the same wall you appear in front.
User avatar
Pobulous
Dynamite Dan
Posts: 1365
Joined: Wed Nov 15, 2017 12:51 pm

Re: Optimisation Challenge, for the benefit of new programmers

Post by Pobulous »

I'm still working on building with the tile engine I posted earlier.

I've created the sprite routines, which highlighted some bugs and odd edge-cases that I needed to handle, as well as further optimisations.

This is the current state with two moving sprites plus a player-controlled sprite. There's 2 halts per frame to slow things down (with standard ROM interrupts enabled for the halts). Player sprite is controlled with 6789.



Another useful thing for new programmers would be some guidance on implementing sounds - especially using the beeper. I have no idea how to make useful beeper sounds without the game pausing.
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

I'm now less convinced that interleaved UDG data is the fastest way of doing this...
User avatar
Pobulous
Dynamite Dan
Posts: 1365
Joined: Wed Nov 15, 2017 12:51 pm

Re: Optimisation Challenge, for the benefit of new programmers

Post by Pobulous »

I wouldn't form an opinion on that based on the efficiency of my code, TBH.

Also, most of the time it's not actually drawing any of the UDGs, due to it skipping unchanged tiles.
User avatar
Pobulous
Dynamite Dan
Posts: 1365
Joined: Wed Nov 15, 2017 12:51 pm

Re: Optimisation Challenge, for the benefit of new programmers

Post by Pobulous »

This TAP file might show the performance of the routine more clearly.
(I have done a small reordering so it ought to be slightly improved, too)

Only one HALT is being used now, and using border to show the speed of the copy to screen routine - that should also help me test future optimisations without counting t-states. It's using about 1/3 of a frame for this routine - the white part of the border.

The cyan border is the routine to update the tilemap - it's doing more than copying 768 characters as it's responsible for zeroing unchanged tiles. Some optimisation here would be useful.

User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

Pobulous wrote: Tue May 17, 2022 8:52 pm I wouldn't form an opinion on that based on the efficiency of my code, TBH.
Also, most of the time it's not actually drawing any of the UDGs, due to it skipping unchanged tiles.
I was thinking the optimisations are a trade-off between getting to the UDG as fast as possible versus drawing it as fast as possible.
Getting the address of the UDG the way I did it (multiplying the UDG ID by 9 using BC) is quite slow because the shift instructions involved are two-byte instructions, meaning it takes 16 cycles to double BC. And you need to do that three times to get x8. You can do the doubling quicker in HL with ADD HL,HL (11 cycles) but then you have to use EX DE,HL to add the x1 and x8 together in a separate register (DE) to get the x9. Or, you can use a look-up table to get the address of the UDG (only 1/2 K needed) and that can be interleaved for speed. So it's:

Code: Select all

LD H,table/256
LD L,A
LD A,(HL)
INC H
LD H,(HL)
LD L,A
which is 33 cycles. Obviously slower than just going straight to the address in an interleaved set of UDGs with just:

Code: Select all

LD H,udgs/256
LD L,A
at 11 cycles, but only 22 more.
Now if you look at where the UDG is drawn, for the pixels, the interleaved version uses:

Code: Select all

LD A,(HL)
LD (DE),A
INC H
INC D
at 22 cycles per byte, or 44 cycles for two bytes. Skipping the INCs from the last iteration, that's 168 cycles per character.
But with the linear version you can do an initial stack manipulation (if interrupts are disabled, you're using alternate and IX/IY registers for spares rather than the stack, and you've preserved SP for later):

Code: Select all

LD SP,HL
EX DE,HL
(10 cycles) followed by:

Code: Select all

POP BC
LD (HL),B
INC H
LD (HL),C
INC H
at 32 cycles for two bytes. Again, skipping the last INC, that's 134 cycles per character. Add on our 22 cycle penalty from earlier and we're still at 156 cycles, still 12 quicker than the interleaved version - assuming we can get the appropriate screen addresses in the right registers at the right time. The UDG bytes are then in a natural, linear order, and you don't have to define all 256 if you don't want to. But you do need a 1/2 K lookup table.
User avatar
Pobulous
Dynamite Dan
Posts: 1365
Joined: Wed Nov 15, 2017 12:51 pm

Re: Optimisation Challenge, for the benefit of new programmers

Post by Pobulous »

Ah, I get what you're saying.

A saving of 12 T-states per char isn't a huge margin and could end up being consumed elsewhere within the loop by needing to juggle registers around, especially given that you'll need to do all the chars without using PUSH, as restoring SP would cost at least 10 T-states.

Would be interesting to see the difference in timing of the whole loop.

You don't need to define all 256 UDGs with interleaving, running the code to interleave them will just fill the undefined ones with whatever happens to be in memory.

I've concentrated on avoiding drawing the UDGs at all as much as possible, but with a solution that's best suited to filp screen.
User avatar
Joefish
Rick Dangerous
Posts: 2058
Joined: Tue Nov 14, 2017 10:26 am

Re: Optimisation Challenge, for the benefit of new programmers

Post by Joefish »

Not a huge margin, and as you say, juggling the registers to make it work could be costly. I'm trying to work out how to do all the loops and counting in registers working back from that core routine. I still need to be able to use an LDI to do the attribute. After I do that and LD SP,HL (6t) I can do EXX (4t) to get the screen address from the alternate registers. You could say that's my 12T margin almost gone, but I was doing the EXX already, so maybe only half-gone! ;)

There's also an issue that the contents of either BC or DE needs to be empty so that the POP to fetch the pixel data doesn't trash anything...

Was it yours where the loop counters are in IXL etc.? Good choice, particularly for the row counter, as they're rarely accessed compared to everything else. I only recently noticed that DJNZ is hardly worth making the effort to fit in, saving only a single cycle over DEC B and JP NZ on all but the last loop, and that method works on any register.

Interleaving sprites does really force you to use all 256 though. If you leave out a few it still puts data onto 9 separate 256-byte page boundaries, leaving at least 2K fairly useless!

I also think a better test is to render a screen that's 50% actual UDGs, so will set something up. I didn't intend for everyone to copy my first run! :lol:
User avatar
ketmar
Manic Miner
Posts: 700
Joined: Tue Jun 16, 2020 5:25 pm
Location: Ukraine

Re: Optimisation Challenge, for the benefit of new programmers

Post by ketmar »

Joefish wrote: Thu May 19, 2022 11:09 amI didn't intend for everyone to copy my first run! :lol:
it gives a common ground for testing. and it also frees us from designing our own udg and screens. ;-)
Post Reply