Yes, you shouldn't show a standard, decent code and call it crap.Your "crap" version appears to use undocumented Z80 instructions, makes extensive use of the alternate registers, nicely sets up a LDI instruction, and unrolls a loop. That's all before it does something clever to handle the screen address, which I don't understand. I've a feeling your "crap" code is already streets ahead of what the vast majority of programmers could write, let alone "new" ones.
Optimisation Challenge, for the benefit of new programmers
Re: Optimisation Challenge, for the benefit of new programmers
Re: Optimisation Challenge, for the benefit of new programmers
OK, this version implements the border rule, and can be adjusted with the EQUs at the top, and I've included a routine to convert graphics from attr, line1-8, attr, line1-8 etc into attr*256, line1*256, line2*256, etc. USR entry point for conversion code is 32896.
All data starting addresses need to be 256 byte aligned.
I've no idea how fast it is compared to other solutions. On the plus side it's not too hard to follow and has some optimisations....
Output using Joefish's image data.
All data starting addresses need to be 256 byte aligned.
I've no idea how fast it is compared to other solutions. On the plus side it's not too hard to follow and has some optimisations....
Code: Select all
TopEdge EQU 64 ;(NumRows*32)
BottomEdge EQU 192 ;(256-(NumRows*32)
LeftEdge EQU 2 ;NumCols
RightEdge EQU 30 ;32-NumCols
ORG 32768
DI ;Disable interrupts for speed
LD DE, 16384
LD BC, 22528
LD HL, CharMap
LD A,E
CP TopEdge
JP NC, TestLeftSide
LD A,TopEdge
LD E,A
LD L,A
LD C,A
PrintLoop:
LD A,D
TestBottomEdge:
CP 80
LD A,E
JP NZ, TestLeftSide
CP BottomEdge
JP C, TestLeftSide
EI ;Finished, Return
RET
TestLeftSide:
AND 31
CP LeftEdge
JP NC, TestRightSide
LD A,LeftEdge
ADD A,E
LD E,A
LD L,A
LD C,A
JP PrintLoop
TestRightSide:
CP RightEdge
JP C, PrintChar
LD A, 31
OR E
LD E,A
LD L,A
LD C,A
JP SkipPrinting
PrintChar:
LD A,(HL) ;Get next CharMap character to print
CP 8 ;chars 0-(n-1) (0-7 in this case) will only update attr data (set to 0 to disable this feature)
PUSH HL ;Store CharMap location
LD H,UDGs/256 ;Get base address of UDGs
LD L,A ;Select UDG 1-255
LD A,(HL) ;copy ATTR data
LD (BC),A
JP C,PrintATTROnly
INC H ;Get first line of chardata
PUSH DE ;Store current screen address
LD A,(HL) ;copy 8 lines of UDG data to screen
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
POP DE ;Restore screen address
PrintATTROnly:
POP HL ;Restore CharMap location
SkipPrinting:
INC BC ;Advance to next ATTR position
INC HL ;Advance to next CharMap character
INC E ;Advance screen address
JP NZ, PrintLoop ;Loop until E wraps to zero - signals a third of the screen is completed
LD A,D ;Adjust DE to point to next screen third
ADD A,8
LD D,A
CP 88 ;If MSB of screen addr is now 88, then we are done.
JP C, TestBottomEdge ;Do next screen third
EI
RET
PrepUDGS: ;Run this before main code if required.
LD DE,UDGs
LD HL,RawUDGs ;Graphics in ATTR followed by 8 lines of pixels
UDGLineLoop:
PUSH BC ;Save outer loop counter
PUSH DE
LD B,9
UDGLoop:
LD A,(HL)
LD (DE),A
INC HL
INC D
DJNZ UDGLoop
POP DE
INC E
JP NZ, UDGLineLoop
RET
ORG 49152
CharMap:
db 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8
db 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8
db 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 10,11,0,0,0,1,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,9,10,11,10,11,0, 1,0,9,10,11,1,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,1,1,1,1,13,9, 1,9,1,12,11,1,11,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,1,12,13,12,13,0, 1,0,1,12,13,1,1,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,12,13,0,0,0,0,12, 13,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,8,8,8
db 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8
db 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8
db 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8
UDGs: ;256 8 byte arrays of attribs abd characters aligned on 256 byte boundary, interleaved
;256 ATTR bytes, followed by 256 top rows of UDGS, then 256 2nd rows, etc
db 0,0,8,16,24,32,40,48,56,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
db 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
db 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75
db 76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
db 102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
db 128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153
db 154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179
db 180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205
db 206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231
db 232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12
db 0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60
db 0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60
db 8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16
db 0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0
db 0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64
db 16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60,0,32
db 0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126
db 60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0
db 60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0
db 0,0,0,0,0,0,0,0,0,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16
db 60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66
db 56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66
db 24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56
db 68,68,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120
db 56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64
db 0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66,56,32
db 28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64
db 64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68
db 70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104
db 0,0,0,0,0,0,0,0,0,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24
db 68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66
db 4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12
db 40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16
db 68,68,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68
db 68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120
db 48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66,4,60
db 32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124
db 124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68
db 74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84
db 0,0,0,0,0,0,0,0,0,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16
db 68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62
db 60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2
db 72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16
db 68,40,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68
db 68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68
db 16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62,60,34
db 32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2
db 66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40
db 82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84
db 0,0,0,0,0,0,0,0,0,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16
db 60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2
db 68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66
db 126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16
db 68,40,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68
db 68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68
db 16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2,68,34
db 32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66
db 66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40
db 98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84
db 0,0,0,0,0,0,0,0,0,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16
db 4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60
db 60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60
db 8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12
db 56,16,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68
db 56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68
db 56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60,60,60
db 28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60
db 60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16
db 60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0
db 0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0
db 0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0
RawUDGs:
db 9*0, 0,0,0,0,0,0,0,0
db 9*3+64, 0,0,0,0,0,0,0,0
db 9*2, 0,0,0,0,0,0,0,0
db 9*3, 0,0,0,0,0,0,0,0
db 9*4, 0,0,0,0,0,0,0,0
db 9*5, 0,0,0,0,0,0,0,0
db 9*6, 0,0,0,0,0,0,0,0
db 9*7, 0,0,0,0,0,0,0,0
db 01110000b, 11001100b,01100110b,00110011b,10011001b,11001100b,01100110b,00110011b,10011001b
db 01000011b, 00111100b,01111110b,11111111b,11111111b,11111111b,11111111b,01111110b,00111100b
db 01000011b, 00000000b,00000011b,00001111b,00011111b,00111111b,00111111b,01111111b,01111111b
db 01000011b, 00000000b,11000000b,11110000b,11111000b,11111100b,11111100b,11111110b,11111110b
db 01000011b, 01111111b,01111111b,00111111b,00111111b,00011111b,00001111b,00000011b,00000000b
db 01000011b, 11111110b,11111110b,11111100b,11111100b,11111000b,11110000b,11000000b,00000000b
Re: Optimisation Challenge, for the benefit of new programmers
Re: Optimisation Challenge, for the benefit of new programmers
Slight bit of optimisation, fixed a bug in the graphic preparing routine, included the output of that routine from joefish's UDGs, added a benchmark routine, added more comments.
Benchmarks at around 455 50ths of a second for 256 passes in 48K in Fuse. 128K machines are slightly faster at 449 in 128K or 48K mode.
Either way it's just over 35ms per pass.
Now I need to find a way to utilise the routine in the crap game compo.
Benchmarks at around 455 50ths of a second for 256 passes in 48K in Fuse. 128K machines are slightly faster at 449 in 128K or 48K mode.
Either way it's just over 35ms per pass.
Code: Select all
ScreenAddr EQU 16384 ;These could be changed to target 2nd Screen on 128K machines
ATTRAddr EQU 22528
;These four are used to set the non-printing margins
TopEdge EQU 64 ;(NumRows*32) (Maximum 7 lines)
BottomEdge EQU 192 ;(256-(NumRows*32) (Maximum 8 lines)
LeftEdge EQU 2 ;NumCols
RightEdge EQU 30 ;32-NumCols
EmptyUDGs EQU 2 ;UDGs 0-(n-1) (0-1 in this case) will only update attr data (set to 0 to disable this feature)
ORG 32768
DI ;Disable interrupts for speed
DisplayScreen:
LD DE, ScreenAddr ;Initialise the main registers - Worthy of note is that the LSB of ScreenAddr, ATTRAddr and CharMap will be equal throughout the runtime.
LD BC, ATTRAddr
LD HL, CharMap ;Must be divisible by 256
LD A,TopEdge ;Check for non-printing margin
AND A
JP Z, TestLeftSide ;If TopEdge is larger than current position, then set current position print ScreenAddr LSB to TopEdge
LD E,A
LD L,A ;ATTRAddr and CharMap LSB can be updated likewise
LD C,A
PrintLoop:
LD A,D ;We will be testing if we are in bottom third of screen and if so, testing if we are in non-printing border area
TestBottomEdge: ;Entry point for if A already contains MSB of ScreenAddr
CP 80 ;Are we in bottom 3rd of screen?
LD A,E ;LSB is needed to be tested whatever portion of the screen we are in
JP NZ, TestLeftSide ;Jump if not in bottom third
CP BottomEdge ;Test if we are in bottom non-printing area
JP C, TestLeftSide ;Jump if not
EI ;Finished printing, Enable Interrupts and return
RET
TestLeftSide:
AND 31 ;A contains LSB of ScreenAddr, lowest 5 bits contains X position
CP LeftEdge ;Are we in left non-printing border?
JP NC, TestRightSide ;Jump to next test if not
LD A,LeftEdge ;Add border offset to LSB ScreenAddr
ADD A,E
LD E,A
LD L,A ;ATTRAddr and CharMap LSB can be updated likewise
LD C,A
JP PrintLoop ;Skip over right border test
TestRightSide:
CP RightEdge ;compare screen X position with right-hand non-printing border
JP C, PrintChar ;Jump into printing routine if not in border
LD A, 31 ;Set ScreenAddr position to x=31 on the current line
OR E
LD E,A
LD L,A ;ATTRAddr and CharMap LSB can be updated likewise
LD C,A
JP SkipPrinting ;Skip Printing, via checking for end of third of screen.
PrintChar:
LD A,(HL) ;Get next CharMap character to print
CP EmptyUDGs ;Test if will only be updating ATTR
PUSH HL ;Store CharMap location
LD H,UDGs/256 ;Get base address of UDGs
LD L,A ;Select UDG 1-255
LD A,(HL) ;copy ATTR data
LD (BC),A
JP C,PrintATTROnly ;Jump if updating ATTR only
INC H ;Get first line of chardata
PUSH DE ;Store current screen address
LD A,(HL) ;copy 8 lines of UDG data to screen
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
POP DE ;Restore screen address
PrintATTROnly:
POP HL ;Restore CharMap location
SkipPrinting:
INC BC ;Advance to next ATTR position
INC HL ;Advance to next CharMap character
INC E ;Advance screen address
JP NZ, PrintLoop ;Loop until E wraps to zero - signals a third of the screen is completed
LD A,D ;Adjust DE to point to next screen third
ADD A,8
LD D,A
CP 88 ;If MSB of screen addr is now 88, then we are done.
JP C, TestBottomEdge ;Do next screen third
EI
RET
PrepUDGS: ;Run this before main code if required.
LD DE,UDGs
LD HL,RawUDGs ;Graphics in ATTR followed by 8 lines of pixels
UDGLineLoop:
PUSH DE
LD B,9
UDGLoop:
LD A,(HL)
LD (DE),A
INC HL
INC D
DJNZ UDGLoop
POP DE
INC E
JP NZ, UDGLineLoop ;Loop will finish after 256 iterations when E overflows back to zero
RET
Benchmark: ;Run the DisplayScreen routine 256 times and return elapsed time in 50ths of a second
LD BC,0
LD (23672),BC ;Set lower 2 bytes of Frames to 0
SpeedLoop:
PUSH BC ;Run Routine 256 times
CALL DisplayScreen
POP BC
DJNZ SpeedLoop
LD BC,(23672) ;Return elapsed time in 50ths of a second
RET
ORG 49152
CharMap:
db 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
db 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
db 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 4,5,0,0,0,1,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,3,4,5,4,5,0, 1,0,3,4,5,1,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,1,1,1,1,7,3, 1,3,1,6,5,1,5,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,1,6,7,6,7,0, 1,0,1,6,7,1,1,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,6,7,0,0,0,0,6, 7,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
db 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
db 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
db 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
UDGs: ;256 8 byte arrays of attribs and characters aligned on 256 byte boundary, interleaved
;256 ATTR bytes, followed by 256 top rows of UDGS, then 256 2nd rows, etc
db 0,91,112,67,67,67,67,67,0,0,0,0,0,0,0,0 ;attribs
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,204,60,0,0,127,254,0,0,0,0,0,0,0,0 ;1st row of gfx data
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,102,126,3,192,127,254,0,0,0,0,0,0,0,0 ;2nd row of gfx data
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,51,255,15,240,63,252,0,0,0,0,0,0,0,0 ;3rd row of gfx data
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,153,255,31,248,63,252,0,0,0,0,0,0,0,0 ;4th row of gfx data
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,204,255,63,252,31,248,0,0,0,0,0,0,0,0 ;5th row of gfx data
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,102,255,63,252,15,240,0,0,0,0,0,0,0,0 ;6th row of gfx data
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,51,126,127,254,3,192,0,0,0,0,0,0,0,0 ;7th row of gfx data
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,153,60,127,254,0,0,0,0,0,0,0,0,0,0 ;8th row of gfx data
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
RawUDGs:
db 9*0, 0,0,0,0,0,0,0,0
db 9*3+64, 0,0,0,0,0,0,0,0
db 01110000b, 11001100b,01100110b,00110011b,10011001b,11001100b,01100110b,00110011b,10011001b
db 01000011b, 00111100b,01111110b,11111111b,11111111b,11111111b,11111111b,01111110b,00111100b
db 01000011b, 00000000b,00000011b,00001111b,00011111b,00111111b,00111111b,01111111b,01111111b
db 01000011b, 00000000b,11000000b,11110000b,11111000b,11111100b,11111100b,11111110b,11111110b
db 01000011b, 01111111b,01111111b,00111111b,00111111b,00011111b,00001111b,00000011b,00000000b
db 01000011b, 11111110b,11111110b,11111100b,11111100b,11111000b,11110000b,11000000b,00000000b
Re: Optimisation Challenge, for the benefit of new programmers
I'm looking at building on this routine to actually display some gameplay.
One thing that struck me is that the other parts that need to be done (outside the routine) are:
Have a background map of tiles for non-moving elements of the screen and initialise the CharMap from this
Have some code to put moveable elements into the CharMap
Then every frame:
Display CharMap (using above created routine)
Execute program logic
Copy background map tiles to CharMap
Copy sprites to CharMap
Loop to DisplayCharMap
A useful optimisation is to have UDG 0 be a do nothing rather than just update ATTRs (I had it this way originally)
Now the method is:
Copy levelmap to background map
Initialise CharMap by LDIRing background map tiles to CharMap
Display CharMap (using above created routine, skipping over zero UDGs)
Execute program logic
Copy background map tiles to CharMap, setting any unchanged tiles to zero in CharMap, and leaving any zeroed tiles as are
Copy sprites to CharMap
Loop to DisplayCharMap
In most game types a lot of the map won't change, but may contain background graphics. Could be the dots and walls in pacman, or the mushrooms in centipede, or ladders and platforms, etc.
Setting those to zero and skipping drawing them will speed things up - kind of a simplified dirty rectangles.
One thing that struck me is that the other parts that need to be done (outside the routine) are:
Have a background map of tiles for non-moving elements of the screen and initialise the CharMap from this
Have some code to put moveable elements into the CharMap
Then every frame:
Display CharMap (using above created routine)
Execute program logic
Copy background map tiles to CharMap
Copy sprites to CharMap
Loop to DisplayCharMap
A useful optimisation is to have UDG 0 be a do nothing rather than just update ATTRs (I had it this way originally)
Now the method is:
Copy levelmap to background map
Initialise CharMap by LDIRing background map tiles to CharMap
Display CharMap (using above created routine, skipping over zero UDGs)
Execute program logic
Copy background map tiles to CharMap, setting any unchanged tiles to zero in CharMap, and leaving any zeroed tiles as are
Copy sprites to CharMap
Loop to DisplayCharMap
In most game types a lot of the map won't change, but may contain background graphics. Could be the dots and walls in pacman, or the mushrooms in centipede, or ladders and platforms, etc.
Setting those to zero and skipping drawing them will speed things up - kind of a simplified dirty rectangles.
Re: Optimisation Challenge, for the benefit of new programmers
I think trying to keep track of unchanged tiles just makes the program far more complicated than it needs to be.
Certainly if you've got a scrolling background, you don't want 'holes' in the rendering, which is why I don't bother with a skippable UDG #0.
The simplest program loop is simply to copy in your background again across the whole character map. The hidden border area lets you do this from a block-based scenery map, without having to worry about trimming those blocks to fit exactly.
Then add your sprites over the top. Now when copying the sprites into the character map (or even copying parallax layers of foreground scenery), then you probably want a transparent, 'don't draw me', character 0, so you can leave gaps in your sprites to show the background.
But then when you finally call the render function, it renders the whole screen without gaps.
Certainly if you've got a scrolling background, you don't want 'holes' in the rendering, which is why I don't bother with a skippable UDG #0.
The simplest program loop is simply to copy in your background again across the whole character map. The hidden border area lets you do this from a block-based scenery map, without having to worry about trimming those blocks to fit exactly.
Then add your sprites over the top. Now when copying the sprites into the character map (or even copying parallax layers of foreground scenery), then you probably want a transparent, 'don't draw me', character 0, so you can leave gaps in your sprites to show the background.
But then when you finally call the render function, it renders the whole screen without gaps.
Re: Optimisation Challenge, for the benefit of new programmers
for the real game engine, you actually can get away with the backbuffer only for 32x8 pixels (and attrs), and some flags in tilemap. the idea is that your tiles has flags "not changed", "blit", "dirty", and each character row has a list of sprites touching it. now, the main loop is something like this:
by decrementing tile flag (dirty->blit->unchanged), you also get automatic erasing of printed sprites for free, no need to track them. ;-)
this way you can support a lot of sprites, background tile animations, and still render only changed screen parts. i'm planning to write a demo engine based on this… for several years, lol.
p.s.: and thanks to "row sorting" step, you can do fast collision detection for many sprites too: you already have almost all the info ready. i.e. each sprite marks tiles as "dirty", so you can use that to perform "broad phase" of coldet (checking if there is a possible collision), and use precise check only when necessary. i.e. time spent on collision checking is almost independed of the number of active sprites.
Code: Select all
clear sprite lists
loop over all sprites, put each sprite in row lists
loop over all tile rows, for each row (using our 1-row backbuffer):
render all sprites into backbuffer, also mark all touched tiles as "dirty" here
now loop over all 32 tiles:
"not changed": ignore
"blit": copy tile from gfx data directly to screen, mark it as "not changed"
"dirty": copy backbuffer gfx, mark tile as "blit"
this way you can support a lot of sprites, background tile animations, and still render only changed screen parts. i'm planning to write a demo engine based on this… for several years, lol.
p.s.: and thanks to "row sorting" step, you can do fast collision detection for many sprites too: you already have almost all the info ready. i.e. each sprite marks tiles as "dirty", so you can use that to perform "broad phase" of coldet (checking if there is a possible collision), and use precise check only when necessary. i.e. time spent on collision checking is almost independed of the number of active sprites.
Re: Optimisation Challenge, for the benefit of new programmers
Well, yes, 'crap' for me is to just dump some stuff to get to the first thing that actually seems to work. For some reason I'm averse to in-memory variable storage and the stack and try to do everything in registers (hence extensive use of the 'undocumented' IXH , IXL etc. instructions as caches where someone else might use PUSH and POP). Although in my defence, they are some of the most well-documented 'undocumented' features of anything ever!dfzx wrote: ↑Sat Apr 30, 2022 8:13 pm @Joefish I think you might have a dose of excessive average familiarity, so per this XKCD comic:
Your "crap" version appears to use undocumented Z80 instructions, makes extensive use of the alternate registers, nicely sets up a LDI instruction, and unrolls a loop. That's all before it does something clever to handle the screen address, which I don't understand. I've a feeling your "crap" code is already streets ahead of what the vast majority of programmers could write, let alone "new" ones.
The only 'optimisation' change I made was realising that the attributes should come before the pixels in the UDG definition rather than after, to make it easier to copy the attribute but skip the pixel step. In assembly it's perfectly possible to write functions that skip operations by entering the function later rather than exiting it early, but you really don't want to go there...
Re: Optimisation Challenge, for the benefit of new programmers
Anyway, my idea for a scrolling game using this system is to split the UDGs roughly in half, the first 128 for scenery and the rest for sprites and animations. The first half are then split again; 64 for foreground stuff and the other 64 split, yet again, four ways for the parallax. That leaves 16 UDGs, so either an 8x2 or 4x4 grid, with 4 frames of 2-pixel pre-shifting, rotated in-place.
With your foreground character map you have 0s in it where you want the background scroll to show through, so you copy up the foreground, then insert the relevant background characters in the zero spaces. Anything that's still 0 just gets rendered as a black (blue?) square UDG at the final stage.
The slight drawback with this is if you want bands of colour in your background, your sprite UDGs can't easily change PAPER colours to match. So if your parallax is a band between two colours (e.g. sky blue to black), best keep it high up the screen where your character can't jump. If it's lower down, stick with all black PAPER.
...Or stick with very chunky sprites that always have black PAPER.
...Or modify the renderer to OR in different PAPER colours per row as it renders the screen. Or AND the colours in, then if your UDGs have either white or black PAPER you can control which ones show background colour.
...Or re-write the code so that there are two bytes per cell in the character map, one for UDGs (pixels only) and the second byte for attributes.
But we'll leave those as an exercise for anyone who thinks they've mastered this renderer first!
With your foreground character map you have 0s in it where you want the background scroll to show through, so you copy up the foreground, then insert the relevant background characters in the zero spaces. Anything that's still 0 just gets rendered as a black (blue?) square UDG at the final stage.
The slight drawback with this is if you want bands of colour in your background, your sprite UDGs can't easily change PAPER colours to match. So if your parallax is a band between two colours (e.g. sky blue to black), best keep it high up the screen where your character can't jump. If it's lower down, stick with all black PAPER.
...Or stick with very chunky sprites that always have black PAPER.
...Or modify the renderer to OR in different PAPER colours per row as it renders the screen. Or AND the colours in, then if your UDGs have either white or black PAPER you can control which ones show background colour.
...Or re-write the code so that there are two bytes per cell in the character map, one for UDGs (pixels only) and the second byte for attributes.
But we'll leave those as an exercise for anyone who thinks they've mastered this renderer first!
Last edited by Joefish on Fri May 06, 2022 3:20 pm, edited 1 time in total.
Re: Optimisation Challenge, for the benefit of new programmers
It would be nice to have some more of these types of things, though perhaps aimed at a rather lower level of experience. The sorts of assumptions that Jonathan makes in his creating arcade game book would be perfect.
I'm only on chapter 5, but I understand his code (and feel comfortable adapting it), and if I'm struggling I consult the Z80 documentation. It's amazing what you can achieving with load, inc, dec, pop, push, compare, call and jumps!
I'm only on chapter 5, but I understand his code (and feel comfortable adapting it), and if I'm struggling I consult the Z80 documentation. It's amazing what you can achieving with load, inc, dec, pop, push, compare, call and jumps!
Re: Optimisation Challenge, for the benefit of new programmers
It's just meant to be a tool. A one-level up assembly-friendly version of CharAde. You don't necessarily need to understand all the code in a library function in order to use it. The point here is to get some expert optimisation of a routine for rendering to the screen from a character map. The level of knowledge a programmer then needs in order to manipulate that character-mapped version of the screen (and the time and memory costs of doing so) are much reduced.
Re: Optimisation Challenge, for the benefit of new programmers
@PeterJ It might be worth having a go at creating a simple routine as the original description. It's a small enough routine to be manageable, but with lots of ways to go, and a chance to gain some extra understanding of the subtleties of the Spectrum screen layout.
For instance, it had never dawned on me before that the least significant byte of the addresses of the pixels and the attributes is always the same.
The trickiest part is needing to juggle 4 addresses, plus some loop variables - you soon run out of registers. I started out by using the IX and IY pairs, then slowly optimised those away.
I'm sure my code can be sped up, but it's been fun coding it and optimising it (to some extent), and now I'm slowly building out the rest of the parts needed to make it into an actual engine for a game (character movement only mind you!). One small routine at a time. That's also helped me to find and fix some edge cases that weren't quite correct.
For instance, it had never dawned on me before that the least significant byte of the addresses of the pixels and the attributes is always the same.
The trickiest part is needing to juggle 4 addresses, plus some loop variables - you soon run out of registers. I started out by using the IX and IY pairs, then slowly optimised those away.
I'm sure my code can be sped up, but it's been fun coding it and optimising it (to some extent), and now I'm slowly building out the rest of the parts needed to make it into an actual engine for a game (character movement only mind you!). One small routine at a time. That's also helped me to find and fix some edge cases that weren't quite correct.
Re: Optimisation Challenge, for the benefit of new programmers
the worst thing with the original task is 8x8 tiles. if only they could be 16x16… then the whole bag of stack tricks could be used (like i did with attribute copying in one of my solutions).
p.s.: and strict restrictions on additional memory… no JIT compiling, no tilemap massaging… sigh
p.s.: and strict restrictions on additional memory… no JIT compiling, no tilemap massaging… sigh
Re: Optimisation Challenge, for the benefit of new programmers
By all means start another thread! But if you're rendering a screen from larger tiles, what are you going to do for sprites? Or scrolling?ketmar wrote: ↑Sat May 07, 2022 12:16 am the worst thing with the original task is 8x8 tiles. if only they could be 16x16… then the whole bag of stack tricks could be used (like i did with attribute copying in one of my solutions).
p.s.: and strict restrictions on additional memory… no JIT compiling, no tilemap massaging… sigh
Re: Optimisation Challenge, for the benefit of new programmers
oh, it really depends of the game requirements, so it's quite hard to create something truly universal. like Firefly engine, which is rendering directly to the screen, and can cope only with several sprites… but hey, 25 FPS! or Scrappy Doo engine, which is 16 FPS, and renders everything to the backbuffer, then waits for the ray to start drawing the screen, and blits backbuffer… decisions, decisions…
it would prolly be possible to clone Potsworth engine (with seems to be the next version of Scrappy Doo engine) to use it in various platformers, i think. maybe i really should start a thread like yours for that. have to think about it…
Re: Optimisation Challenge, for the benefit of new programmers
A tiled engine could do platforms or mazes. And if you can have some way of prioritising tiles over sprites you can even do those pseudo-overhead RPG type views, where your sprite can go behind the tips of building roofs or tops of trees.
That's fairly easily done on a character-based engine by simply defining roof UDGs and the tops of walls as high numbers, and paths and fronts of buildings, walls, etc. as low number UDGs. Then write some code so that your sprite UDGs can only be drawn over the low numbers. So when you overlap with the top of a wall, your feet disappear. But when you overlap the lower part of the same wall you appear in front.
That's fairly easily done on a character-based engine by simply defining roof UDGs and the tops of walls as high numbers, and paths and fronts of buildings, walls, etc. as low number UDGs. Then write some code so that your sprite UDGs can only be drawn over the low numbers. So when you overlap with the top of a wall, your feet disappear. But when you overlap the lower part of the same wall you appear in front.
Re: Optimisation Challenge, for the benefit of new programmers
I'm still working on building with the tile engine I posted earlier.
I've created the sprite routines, which highlighted some bugs and odd edge-cases that I needed to handle, as well as further optimisations.
This is the current state with two moving sprites plus a player-controlled sprite. There's 2 halts per frame to slow things down (with standard ROM interrupts enabled for the halts). Player sprite is controlled with 6789.
Another useful thing for new programmers would be some guidance on implementing sounds - especially using the beeper. I have no idea how to make useful beeper sounds without the game pausing.
I've created the sprite routines, which highlighted some bugs and odd edge-cases that I needed to handle, as well as further optimisations.
This is the current state with two moving sprites plus a player-controlled sprite. There's 2 halts per frame to slow things down (with standard ROM interrupts enabled for the halts). Player sprite is controlled with 6789.
Another useful thing for new programmers would be some guidance on implementing sounds - especially using the beeper. I have no idea how to make useful beeper sounds without the game pausing.
Re: Optimisation Challenge, for the benefit of new programmers
I'm now less convinced that interleaved UDG data is the fastest way of doing this...
Re: Optimisation Challenge, for the benefit of new programmers
I wouldn't form an opinion on that based on the efficiency of my code, TBH.
Also, most of the time it's not actually drawing any of the UDGs, due to it skipping unchanged tiles.
Also, most of the time it's not actually drawing any of the UDGs, due to it skipping unchanged tiles.
Re: Optimisation Challenge, for the benefit of new programmers
This TAP file might show the performance of the routine more clearly.
(I have done a small reordering so it ought to be slightly improved, too)
Only one HALT is being used now, and using border to show the speed of the copy to screen routine - that should also help me test future optimisations without counting t-states. It's using about 1/3 of a frame for this routine - the white part of the border.
The cyan border is the routine to update the tilemap - it's doing more than copying 768 characters as it's responsible for zeroing unchanged tiles. Some optimisation here would be useful.
(I have done a small reordering so it ought to be slightly improved, too)
Only one HALT is being used now, and using border to show the speed of the copy to screen routine - that should also help me test future optimisations without counting t-states. It's using about 1/3 of a frame for this routine - the white part of the border.
The cyan border is the routine to update the tilemap - it's doing more than copying 768 characters as it's responsible for zeroing unchanged tiles. Some optimisation here would be useful.
Re: Optimisation Challenge, for the benefit of new programmers
I was thinking the optimisations are a trade-off between getting to the UDG as fast as possible versus drawing it as fast as possible.
Getting the address of the UDG the way I did it (multiplying the UDG ID by 9 using BC) is quite slow because the shift instructions involved are two-byte instructions, meaning it takes 16 cycles to double BC. And you need to do that three times to get x8. You can do the doubling quicker in HL with ADD HL,HL (11 cycles) but then you have to use EX DE,HL to add the x1 and x8 together in a separate register (DE) to get the x9. Or, you can use a look-up table to get the address of the UDG (only 1/2 K needed) and that can be interleaved for speed. So it's:
Code: Select all
LD H,table/256
LD L,A
LD A,(HL)
INC H
LD H,(HL)
LD L,A
Code: Select all
LD H,udgs/256
LD L,A
Now if you look at where the UDG is drawn, for the pixels, the interleaved version uses:
Code: Select all
LD A,(HL)
LD (DE),A
INC H
INC D
But with the linear version you can do an initial stack manipulation (if interrupts are disabled, you're using alternate and IX/IY registers for spares rather than the stack, and you've preserved SP for later):
Code: Select all
LD SP,HL
EX DE,HL
Code: Select all
POP BC
LD (HL),B
INC H
LD (HL),C
INC H
Re: Optimisation Challenge, for the benefit of new programmers
Ah, I get what you're saying.
A saving of 12 T-states per char isn't a huge margin and could end up being consumed elsewhere within the loop by needing to juggle registers around, especially given that you'll need to do all the chars without using PUSH, as restoring SP would cost at least 10 T-states.
Would be interesting to see the difference in timing of the whole loop.
You don't need to define all 256 UDGs with interleaving, running the code to interleave them will just fill the undefined ones with whatever happens to be in memory.
I've concentrated on avoiding drawing the UDGs at all as much as possible, but with a solution that's best suited to filp screen.
A saving of 12 T-states per char isn't a huge margin and could end up being consumed elsewhere within the loop by needing to juggle registers around, especially given that you'll need to do all the chars without using PUSH, as restoring SP would cost at least 10 T-states.
Would be interesting to see the difference in timing of the whole loop.
You don't need to define all 256 UDGs with interleaving, running the code to interleave them will just fill the undefined ones with whatever happens to be in memory.
I've concentrated on avoiding drawing the UDGs at all as much as possible, but with a solution that's best suited to filp screen.
Re: Optimisation Challenge, for the benefit of new programmers
Not a huge margin, and as you say, juggling the registers to make it work could be costly. I'm trying to work out how to do all the loops and counting in registers working back from that core routine. I still need to be able to use an LDI to do the attribute. After I do that and LD SP,HL (6t) I can do EXX (4t) to get the screen address from the alternate registers. You could say that's my 12T margin almost gone, but I was doing the EXX already, so maybe only half-gone!
There's also an issue that the contents of either BC or DE needs to be empty so that the POP to fetch the pixel data doesn't trash anything...
Was it yours where the loop counters are in IXL etc.? Good choice, particularly for the row counter, as they're rarely accessed compared to everything else. I only recently noticed that DJNZ is hardly worth making the effort to fit in, saving only a single cycle over DEC B and JP NZ on all but the last loop, and that method works on any register.
Interleaving sprites does really force you to use all 256 though. If you leave out a few it still puts data onto 9 separate 256-byte page boundaries, leaving at least 2K fairly useless!
I also think a better test is to render a screen that's 50% actual UDGs, so will set something up. I didn't intend for everyone to copy my first run!
There's also an issue that the contents of either BC or DE needs to be empty so that the POP to fetch the pixel data doesn't trash anything...
Was it yours where the loop counters are in IXL etc.? Good choice, particularly for the row counter, as they're rarely accessed compared to everything else. I only recently noticed that DJNZ is hardly worth making the effort to fit in, saving only a single cycle over DEC B and JP NZ on all but the last loop, and that method works on any register.
Interleaving sprites does really force you to use all 256 though. If you leave out a few it still puts data onto 9 separate 256-byte page boundaries, leaving at least 2K fairly useless!
I also think a better test is to render a screen that's 50% actual UDGs, so will set something up. I didn't intend for everyone to copy my first run!
Re: Optimisation Challenge, for the benefit of new programmers
it gives a common ground for testing. and it also frees us from designing our own udg and screens. ;-)