Optimisation Challenge, for the benefit of new programmers
Optimisation Challenge, for the benefit of new programmers
Here's a challenge, based on what I was doing with the CharAde engine, to help new programmers. It could help with that parallax programming thread too. It's based on the simplest, earliest bit of machine code anyone should learn - drawing a single character on the Speccy screen. The challenge is simple - paint the screen as fast as possible from a character map.
Basic outline:
1. Model, in memory, a 32x24 grid in linear, row-by-row order, with one byte per character (768 bytes). The user should be able to select this to be anywhere in memory. You may specify that it has to start on a 256-byte boundary, if that helps. Ideally there should be no need to duplicate or expand any of this data for rendering purposes (see rule #5).
2. Elsewhere in memory you have a definition of 256 8x8 pixel UDGs, each with a unique attribute (9 bytes per UDG). This may be at a fixed position in memory or flexible, or on a 256-byte boundary, whatever helps. And the data storage may be in any order (linear, interleaved, attributes separate from pixels, whatever is needed to optimise processing). Ideally there should be no duplication or expansion of this data for rendering purposes (see rule #5).
3. The first UDG (or a selectable number of initial UDGs from the set of 256) may be designed with no pixels set, only attributes with matching INK and PAPER, such that they only need the attribute to be copied to the screen when rendering. The pixels under them need not be copied up or erased when rendering. This is to allow plain coloured areas to render faster. Though if it doesn't work out faster then don't bother with rule #3!
4. A two-character wide border around the edges of the grid need not be rendered. Only the 28x20 inner grid should be rendered, centred, on the screen. Or, preferably, rendered one character higher than centre such that - although it is the centre of the grid being rendered - the result has one blank character row above, 2 to the sides, and 3 below. For scores and stuff. Or render the central grid, shifted right to the top of the screen so it has four blank rows below. Rule #4 is entirely optional. But kind of nice to have.
5. The code may have an initial 'preparation' stage based around the supplied set of UDGs, but it must not be necessary to call that preparation function again if only the character map changes. Altogether, the code may use up to 2 K as workspace or for expanded data. But preferably, none.
6. The final code should be completely free for anyone to use, credit or not, though obviously a mention would be nice.
7. Please also supply the 768 byte character map and 2304 byte set of UDGs from your example / demo as binaries or ASM defb listings, so people can try them out in their own engines and compare to yours for performance.
So, the aim is to complete the rendering in as fast a time as possible, using whatever optimisations you like. Though try to avoid data expansions that occupy tons more memory to achieve it. That should give a generic piece of code for new programmers who can then populate a screen (background and sprites) from UDGs - which has to be easier than filling out the Speccy's notoriously awkward screen memory - and quickly turn that into a display.
Basic outline:
1. Model, in memory, a 32x24 grid in linear, row-by-row order, with one byte per character (768 bytes). The user should be able to select this to be anywhere in memory. You may specify that it has to start on a 256-byte boundary, if that helps. Ideally there should be no need to duplicate or expand any of this data for rendering purposes (see rule #5).
2. Elsewhere in memory you have a definition of 256 8x8 pixel UDGs, each with a unique attribute (9 bytes per UDG). This may be at a fixed position in memory or flexible, or on a 256-byte boundary, whatever helps. And the data storage may be in any order (linear, interleaved, attributes separate from pixels, whatever is needed to optimise processing). Ideally there should be no duplication or expansion of this data for rendering purposes (see rule #5).
3. The first UDG (or a selectable number of initial UDGs from the set of 256) may be designed with no pixels set, only attributes with matching INK and PAPER, such that they only need the attribute to be copied to the screen when rendering. The pixels under them need not be copied up or erased when rendering. This is to allow plain coloured areas to render faster. Though if it doesn't work out faster then don't bother with rule #3!
4. A two-character wide border around the edges of the grid need not be rendered. Only the 28x20 inner grid should be rendered, centred, on the screen. Or, preferably, rendered one character higher than centre such that - although it is the centre of the grid being rendered - the result has one blank character row above, 2 to the sides, and 3 below. For scores and stuff. Or render the central grid, shifted right to the top of the screen so it has four blank rows below. Rule #4 is entirely optional. But kind of nice to have.
5. The code may have an initial 'preparation' stage based around the supplied set of UDGs, but it must not be necessary to call that preparation function again if only the character map changes. Altogether, the code may use up to 2 K as workspace or for expanded data. But preferably, none.
6. The final code should be completely free for anyone to use, credit or not, though obviously a mention would be nice.
7. Please also supply the 768 byte character map and 2304 byte set of UDGs from your example / demo as binaries or ASM defb listings, so people can try them out in their own engines and compare to yours for performance.
So, the aim is to complete the rendering in as fast a time as possible, using whatever optimisations you like. Though try to avoid data expansions that occupy tons more memory to achieve it. That should give a generic piece of code for new programmers who can then populate a screen (background and sprites) from UDGs - which has to be easier than filling out the Speccy's notoriously awkward screen memory - and quickly turn that into a display.
Re: Optimisation Challenge, for the benefit of new programmers
Well, how is the challenge different from Charade ? (really, just asking, I'm not sure about its specs)
IMHO, emulation of full 32x24 text mode would be perhaps more versatile.
IMHO, emulation of full 32x24 text mode would be perhaps more versatile.
Proud owner of Didaktik M
Re: Optimisation Challenge, for the benefit of new programmers
Primarily, CharAde is designed to be called from BASIC. This is for a routine that can be used with some first forays into machine code. Also to have the source code available so people can hack, modify, or improve it further.
CharAde also kind of over-does it with the optimisation in some areas. For a start, it only manages 64 UDGs at a time. It expands each 9-byte UDG into its own little 32-byte long function for actually drawing that particular UDG on the screen, then it calls those functions in a cascade to render each row. It's fast, but it requires quite an excessive amount of data expansion.
I'll post my own version of this in a few days, if no-one else wants to start it off.
The reasons for windowing the view, rather than rendering the whole character map, are for various gaming-related advantages. The first is that if you're drawing sprites as UDGs, you can over-print them off the edge of the visible area rather than having to clip part of the sprite at the screen edge. 2x2 sprites can go right into the border area and vanish. 3x3 sprites can go two whole characters off the edge before they need to be pulled. It's the equivalent of the all-black attributes at the sides of the screen in Hysteria or Cobra to hide the sprite over-printing. Another reason is the same for when drawing the scenery from blocks. You can over-print some blocks off the edge of the visible screen rather than having to clip them. And finally, by forcing at least one break into the copying of a line, it makes it easier to adapt the code to use a different sized window and a larger character map, should you want to use that as a way of scrolling. So you could do your 32x24 character render, but do it from (for example) a 128x24 character map, then move the window along it to make the display scroll. But I'm not asking for that initially; it's just one way the function could be adapted once we've got the most efficient method working.
So as a nod to flexibility, it would be advantageous if this routine is parameterised. i.e. the height and width of the character-mapped screen are defined by constants at the start of the routine so it can easily be expanded (to e.g. 64x24 or wider for a horizontal level) then the window into the map that's rendered to the screen be made movable by changing or POKEing a value somewhere, to roughly replicate hardware scrolling. But the main test is to do the 28x20 window from a 32x24 map (as stated above) as fast as possible.
CharAde also kind of over-does it with the optimisation in some areas. For a start, it only manages 64 UDGs at a time. It expands each 9-byte UDG into its own little 32-byte long function for actually drawing that particular UDG on the screen, then it calls those functions in a cascade to render each row. It's fast, but it requires quite an excessive amount of data expansion.
I'll post my own version of this in a few days, if no-one else wants to start it off.
The reasons for windowing the view, rather than rendering the whole character map, are for various gaming-related advantages. The first is that if you're drawing sprites as UDGs, you can over-print them off the edge of the visible area rather than having to clip part of the sprite at the screen edge. 2x2 sprites can go right into the border area and vanish. 3x3 sprites can go two whole characters off the edge before they need to be pulled. It's the equivalent of the all-black attributes at the sides of the screen in Hysteria or Cobra to hide the sprite over-printing. Another reason is the same for when drawing the scenery from blocks. You can over-print some blocks off the edge of the visible screen rather than having to clip them. And finally, by forcing at least one break into the copying of a line, it makes it easier to adapt the code to use a different sized window and a larger character map, should you want to use that as a way of scrolling. So you could do your 32x24 character render, but do it from (for example) a 128x24 character map, then move the window along it to make the display scroll. But I'm not asking for that initially; it's just one way the function could be adapted once we've got the most efficient method working.
So as a nod to flexibility, it would be advantageous if this routine is parameterised. i.e. the height and width of the character-mapped screen are defined by constants at the start of the routine so it can easily be expanded (to e.g. 64x24 or wider for a horizontal level) then the window into the map that's rendered to the screen be made movable by changing or POKEing a value somewhere, to roughly replicate hardware scrolling. But the main test is to do the 28x20 window from a 32x24 map (as stated above) as fast as possible.
Re: Optimisation Challenge, for the benefit of new programmers
I'm ashamed to say that I've never looked at Charade @Joefish, but have just downloaded the PDF for some afternoon reading.
Re: Optimisation Challenge, for the benefit of new programmers
I just noticed that in the latest CharAde PDF, some of the diagrams have screwed up and the diagram arrows have all moved out of place. Not sure how that happened... I need to update it again. I still need to come up with a few more demo programs but I keep getting distracted by other projects!
Re: Optimisation Challenge, for the benefit of new programmers
Thanks @Joefish,
Could you post on the forum when you have had time to update?
Cheers
Could you post on the forum when you have had time to update?
Cheers
Re: Optimisation Challenge, for the benefit of new programmers
Ah, CharAde... one of my fav tools, I can't get why am I the only one using it. It's really powerful and very flexible, and you can mix it with asm, compiled BASIC and etc.
But well...
But well...
Re: Optimisation Challenge, for the benefit of new programmers
@Joefish, give me some time, please. i hope to have a time window at the next week, and i feel that doing something after you publish your own would be waste of time. ;-)
Re: Optimisation Challenge, for the benefit of new programmers
Don't worry, I'll deliberately post a rubbish one to start with!
Re: Optimisation Challenge, for the benefit of new programmers
I'm working on one, with the helpful optimisation of the graphic data being interleaved, so all 256 of the first lines, then all 256 of the 2nd line, etc, followed by the attr data.
I suppose I ought to provide a routine to do the interleaving, as it's sposed to be simple to use!
I suppose I ought to provide a routine to do the interleaving, as it's sposed to be simple to use!
Re: Optimisation Challenge, for the benefit of new programmers
you just stole my idea! ;-)
Re: Optimisation Challenge, for the benefit of new programmers
In that case, here's a crap version with no optimisation for a start.
It does the job, but not in any particular rush.
This uses PASMO syntax.
It does the job, but not in any particular rush.
This uses PASMO syntax.
Code: Select all
ENTRY_POINT EQU 32768
CHAR_MAP EQU 40000
UDGs EQU 50000
EMPTY_UDGs EQU 2
SCREEN EQU 16384
ATTRS EQU 22528
SCREEN_X_OFFSET EQU 2
SCREEN_Y_OFFSET EQU 2
SCREEN_OFFSET EQU SCREEN_X_OFFSET + (32 * SCREEN_Y_OFFSET)
MAP_COLS EQU 32
MAP_ROWS EQU 24
WINDOW_COLS EQU 28
WINDOW_ROWS EQU 20
WINDOW_X_OFFSET EQU 2
WINDOW_Y_OFFSET EQU 2
WINDOW_OFFSET EQU WINDOW_X_OFFSET + (MAP_COLS * WINDOW_Y_OFFSET)
org ENTRY_POINT
ld a,1
out (254),a
ld bc,SCREEN+SCREEN_OFFSET ; BC = Screen address
ld de,ATTRS+SCREEN_OFFSET ; DE = Attribute address
exx ; Alternate registers
push hl ; Save HL' and restore before exit. Also don't mess with IY.
ld de,CHAR_MAP+WINDOW_OFFSET ; DE' = Character Map address
ld c,WINDOW_ROWS ; C' = Row loop counter
row_loop
ld b,WINDOW_COLS ; B' = Column (character) loop counter
col_loop
ld a,(de) ; Get character code from Character Map
inc de
exx ; Back to regular registers.
ld ixh,b
ld ixl,c ; Save Screen address
ld hl,UDGs ; Calculate address of UDG in HL, messing with BC
ld b,0
ld c,a
add hl,bc ; HL = UDGs + (1 x code)
sla c
rl b ; BC x 2
sla c
rl b ; BC x 4
sla c
rl b ; BC x 8
add hl,bc ; HL = UDGs + (9 x code)
ldi ; Copy attribute of UDG, advancing DE and HL
ld b,ixh
ld c,ixl ; Restore Screen address in BC
cp EMPTY_UDGs
jp c,skip_pixels ; Jump over pixel drawing for low UDG numbers
REPT 7
ld a,(hl)
ld (bc),a
inc hl
inc b
ENDM
ld a,(hl)
ld (bc),a ; Copy 8 bytes of UDG data
ld b,ixh ; Restore Screen address in BC (hi-byte B) to before UDG was drawn
skip_pixels
inc c ; Advance screen address to next character
exx
djnz col_loop ; Repeat for all chars in row (count in B').
; Note we re-enter the loop with alternate registers active.
exx ; Back to regular registers
ld a,c
add a,32-WINDOW_COLS ; Advance Screen address BC over skipped columns
ld c,a
jp nc,block_skip
ld a,8 ; If there was a carry from the lo-byte,
add a,b ; add 8 to the hi-byte for the next screen third
ld b,a
block_skip
ld a,e
add a,32-WINDOW_COLS ; Advance Attribute address DE over skipped columns
ld e,a
ld a,0
adc a,d ; Pickup any 8-bit carried bit from E in D
ld d,a
exx ; Alternate registers again
ld a,e
add a,MAP_COLS-WINDOW_COLS ; Advance Character Map address DE' over skipped columns
ld e,a
ld a,0
adc a,d ; Pickup any 8-bit carried bit from E' in D'
ld d,a
dec c
jp nz,row_loop ;Repeat for all rows (count in C')
pop hl ; Restore HL' for BASIC
exx
ret ; And return.
org CHAR_MAP
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 4,5,0,0,0,1,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,3,4,5,4,5,0, 1,0,3,4,5,1,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,1,1,1,1,7,3, 1,3,1,6,5,1,5,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,1,6,7,6,7,0, 1,0,1,6,7,1,1,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,6,7,0,0,0,0,6, 7,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
org UDGs
; Attribute, then data.
defb 00000000b, 00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b
defb 01011011b, 00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b
defb 01110000b, 11001100b,01100110b,00110011b,10011001b,11001100b,01100110b,00110011b,10011001b
defb 01000011b, 00111100b,01111110b,11111111b,11111111b,11111111b,11111111b,01111110b,00111100b
defb 01000011b, 00000000b,00000011b,00001111b,00011111b,00111111b,00111111b,01111111b,01111111b
defb 01000011b, 00000000b,11000000b,11110000b,11111000b,11111100b,11111100b,11111110b,11111110b
defb 01000011b, 01111111b,01111111b,00111111b,00111111b,00011111b,00001111b,00000011b,00000000b
defb 01000011b, 11111110b,11111110b,11111100b,11111100b,11111000b,11110000b,11000000b,00000000b
defs 9*(256-8),0
end ENTRY_POINT
Re: Optimisation Challenge, for the benefit of new programmers
How do we measure the speed? Is there a standard way of doing that?
Derek Fountain, author of the ZX Spectrum C Programmer's Getting Started Guide and various open source games, hardware and other projects, including an IF1 and ZX Microdrive emulator.
Re: Optimisation Challenge, for the benefit of new programmers
Good question! Don't know really!
For optimising, it'll be about counting the clock cycles of every operation.
For judging, I guess you'd have to do a standard CALL / RET of the routine, say, 1000 times (without any VBL synching) with a specific test-case image and just time it with a stopwatch. You should probably disable interrupts too (some routines might impose that anyway) so you can't rely on the Speccy for timing.
For optimising, it'll be about counting the clock cycles of every operation.
For judging, I guess you'd have to do a standard CALL / RET of the routine, say, 1000 times (without any VBL synching) with a specific test-case image and just time it with a stopwatch. You should probably disable interrupts too (some routines might impose that anyway) so you can't rely on the Speccy for timing.
Re: Optimisation Challenge, for the benefit of new programmers
Please do. Note that Rule #5, data expansion, does not apply or limit you here as you could easily have a UDG editor that saves data in your chosen format. The data is still the same size, even if re-ordered.Pobulous wrote: ↑Thu Apr 28, 2022 7:04 pm I'm working on one, with the helpful optimisation of the graphic data being interleaved, so all 256 of the first lines, then all 256 of the 2nd line, etc, followed by the attr data.
I suppose I ought to provide a routine to do the interleaving, as it's sposed to be simple to use!
The screen character map should remain in linear order, so as not to be a burden on the programmer. If you want to pre-process that into a different order, or expand it (e.g convert bytes to word addresses) then that WILL come off your run time and 2K cache limit. Though even then the 2K is just a guide. If you need 2.5K to pull off something blisteringly fast, by all means make the case for it. Just don't make something that needs all 48K just to run once!
Now if some smart-arse wants to have a character map in vertical linear order, instead of horizontal, I guess that's technically allowed. Probably best in that particular case to expand it to 32 high so it's easier to calculate for printing on. The rendered portion would still be the same size.
Re: Optimisation Challenge, for the benefit of new programmers
That takes just about exactly 5 stopwatch seconds to run 100 times (in a BASIC loop). So 50ms per iteration as a starting point.
Derek Fountain, author of the ZX Spectrum C Programmer's Getting Started Guide and various open source games, hardware and other projects, including an IF1 and ZX Microdrive emulator.
Re: Optimisation Challenge, for the benefit of new programmers
Here's my early entry.
I could do with trying to optimise away using IX and IY, possibly by using the alternate registers.
I managed to optimise away using B for the inner loop, which was nice.
As I'm using IY I have disabled interrupts, which makes speed hard to work out.
Graphics data is 256 attribs, followed by 256 first lines, 256 second lines, etc - does away with multiplying to find the start of the character's data, and just an INC H to get next line, although it does mean you need all 256 characters defined and it took some Excel and text editor acrobatics to convert to db statements.
Whole screen is always updated but UDG0 is a do nothing character, the next (defineable) few characters just update the ATTRs.
Using UDG0 in my CharMap I have defined a border/status area.
Sample output:
I could do with trying to optimise away using IX and IY, possibly by using the alternate registers.
I managed to optimise away using B for the inner loop, which was nice.
As I'm using IY I have disabled interrupts, which makes speed hard to work out.
Graphics data is 256 attribs, followed by 256 first lines, 256 second lines, etc - does away with multiplying to find the start of the character's data, and just an INC H to get next line, although it does mean you need all 256 characters defined and it took some Excel and text editor acrobatics to convert to db statements.
Whole screen is always updated but UDG0 is a do nothing character, the next (defineable) few characters just update the ATTRs.
Using UDG0 in my CharMap I have defined a border/status area.
Sample output:
Code: Select all
ORG 32768
PUSH IY ;Need to preserve IY for return to Basic
DI ;Need to disable interrupts since IY is being used
LD IX, CharMap
LD DE, 16384
LD IY, 22528
LD C, UDGs/256
LD B,3 ;Loop once for each screen third
PrintLoop:
LD A,(IX+0) ;Get next CharMap character to print
INC IX ;Advance to next CharMap character
AND A ;if zero, skip over screen position
JR Z, SkipPrinting
CP 9 ;chars 1-(n-1) (1-8 in this case) will only update attr data (set to 0 or 1 to disable this feature)
LD H,C ;Get base address of UDGs
LD L,A ;Select UDG 1-255
LD A,(HL) ;copy attr data
LD (IY+0),A
JR C,SkipPrinting
INC H
PUSH DE ;Store current screen address
LD A,(HL) ;copy 8 lines of UDG data to screen
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
POP DE ;Restore screen address
SkipPrinting:
INC IY ;Advance to next ATTR position
INC E ;Advance screen address
JR NZ, PrintLoop ;Loop until E wraps to zero - signals a third of the screen is completed
LD A,D ;Adjust DE to point to next screen third
ADD A,8
LD D,A
DJNZ PrintLoop ;Do next screen third
POP IY ;Restore IY and re-enable interrupts before returning to BASIC
EI
RET
ORG 49152
CharMap: ;24 rows of 32 bytes of characters to render, aligned on 256 byte boundary
;0s don't aren't printed, 1-8 only update the attributes
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
UDGs: ;256 8 byte arrays of attribs abd characters aligned on 256 byte boundary, interleaved
;256 ATTR bytes, followed by 256 top rows of UDGS, then 256 2nd rows, etc
db 0,0,8,16,24,32,40,48,56,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
db 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
db 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75
db 76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
db 102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
db 128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153
db 154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179
db 180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205
db 206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231
db 232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12
db 0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60
db 0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60
db 8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16
db 0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0
db 0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64
db 16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60,0,32
db 0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126
db 60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0
db 60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0
db 0,0,0,0,0,0,0,0,0,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16
db 60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66
db 56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66
db 24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56
db 68,68,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120
db 56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64
db 0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66,56,32
db 28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64
db 64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68
db 70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104
db 0,0,0,0,0,0,0,0,0,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24
db 68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66
db 4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12
db 40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16
db 68,68,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68
db 68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120
db 48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66,4,60
db 32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124
db 124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68
db 74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84
db 0,0,0,0,0,0,0,0,0,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16
db 68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62
db 60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2
db 72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16
db 68,40,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68
db 68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68
db 16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62,60,34
db 32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2
db 66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40
db 82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84
db 0,0,0,0,0,0,0,0,0,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16
db 60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2
db 68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66
db 126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16
db 68,40,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68
db 68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68
db 16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2,68,34
db 32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66
db 66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40
db 98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84
db 0,0,0,0,0,0,0,0,0,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16
db 4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60
db 60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60
db 8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12
db 56,16,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68
db 56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68
db 56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60,60,60
db 28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60
db 60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16
db 60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0
db 0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0
db 0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0
Re: Optimisation Challenge, for the benefit of new programmers
This isn't what was asked for, but I'll post it here as a curio or a line in the sand. It's an extremely naive C implementation which uses a loop within a loop and calculations for all addresses. I wrote it to ensure I understood the requirement. I used joefish's data and it produces identical output.
It takes about 600ms per iteration. i.e. you can actually see it drawing the screen. So arguably quite readable, but needs work to make it faster...
Code: Select all
#include <stdint.h>
#include <arch/zx.h>
extern uint8_t char_map[];
extern uint8_t udgs[];
void main(void)
{
uint8_t x,y;
zx_border(INK_BLUE);
for( y=2; y < 22; y++ )
{
for( x=2; x < 30; x++ )
{
uint8_t *screen_addr = zx_cxy2saddr(x, y); // char x,y to screen pixel address
uint8_t *att_addr = zx_cxy2aaddr(x, y); // char x,y to attribute address
uint8_t c = char_map[(y*32)+x]; // pick the character out of the map
// Fill in the 8 scan lines of the screen's char cell from the UDG data
uint8_t i;
for(i=0;i<8;i++)
{
// Find the UDG required, skip the attribute value, pick up the scan line byte
*screen_addr = udgs[ (9*c) + 1 + i ];
// Next screen row down
screen_addr += 256;
}
// Set the colour byte. UDG's attribute value is first in the 9 bytes of data
*att_addr = udgs[ 9*c ];
}
}
return;
}
Derek Fountain, author of the ZX Spectrum C Programmer's Getting Started Guide and various open source games, hardware and other projects, including an IF1 and ZX Microdrive emulator.
Re: Optimisation Challenge, for the benefit of new programmers
If this is on the next... I wrote a dot command that can be used to time things at about 1 second granularity using the real time clock.. https://github.com/jarikomppa/specnext/tree/master/dt
http://iki.fi/sol - my schtuphh
https://github.com/jarikomppa/speccy/ - 48k speccy dev
https://github.com/jarikomppa/specnext/ - specnext dev
https://github.com/jarikomppa/speccy/ - 48k speccy dev
https://github.com/jarikomppa/specnext/ - specnext dev
Re: Optimisation Challenge, for the benefit of new programmers
so you didn't leave me a choice. ;-) 256-unrolled UDG version (not heavily optimised yet); normal->unrolled UDG converter routine included.
compiled .tap demo
Code: Select all
DEBUG equ 0
DEBUG_LOOP equ 0
;; 0x300 bytes (768)
;; need not to be at a 256-byte boundary
TILEMAP equ 0xf000
;; 0x900 bytes (2304)
;; must be at a 256-byte boundary
UDGADDR equ 0xf300
TILE_SKIP_TOP equ 2
TILE_SKIP_BOTTOM equ 2
TILE_SKIP_LEFT equ 2
TILE_SKIP_RIGHT equ 2
TILE_ROW_SKIP equ TILE_SKIP_LEFT+TILE_SKIP_RIGHT
TILEMAP_LAST_VIS_TILE equ TILEMAP+32*24-1-TILE_SKIP_RIGHT-32*TILE_SKIP_BOTTOM
org 32768
IF DEBUG
di ;; for debug
ELSE
;; for basic
exx
push hl
ENDIF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; to make things easier, i'm unrolling normal UDG here
;; normal UDG is 8 bitmap bytes and attribute
;; you can skip this unrolling, and use prepared data instead
call prepare_udg_set
again:
IF !DEBUG && DEBUG_LOOP
halt
ld a,1
out (#fe),a
ENDIF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; draw attributes
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
row_attr:
ld bc,TILEMAP_LAST_VIS_TILE
ld de,0x5800+32*24-1-TILE_SKIP_RIGHT-32*TILE_SKIP_BOTTOM
ld hl,UDGADDR+256*8
exx
;; alternate set: counters
ld c,24-TILE_SKIP_TOP-TILE_SKIP_BOTTOM ; height
print_one_row_loop:
;; alternate set: counters
ld b,32-TILE_ROW_SKIP ; width
print_one_tile:
;; here we should be at "counters"
exx
ld a,(bc)
ld l,a
ldd
inc hl ; restore high byte (required for zero tile)
exx
djnz print_one_tile
; next row
dec c
jp z,attr_done
;; alternate set: counters
exx
;; normal set: addresses
; skip invisible tiles
DUP TILE_ROW_SKIP
dec bc
dec de
EDUP
exx
jp print_one_row_loop
attr_done:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; draw bitmap
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; top 1/3
ld bc,TILEMAP+TILE_SKIP_LEFT+32*TILE_SKIP_TOP
ld hl,0x4000+TILE_SKIP_LEFT+32*TILE_SKIP_TOP
exx
ld c,8-TILE_SKIP_TOP ; row count
call do_one_row
;; alternate set (counters)
;; middle 1/3
exx
;; main set (addresses)
DUP TILE_ROW_SKIP
inc bc
EDUP
ld hl,0x4800+TILE_SKIP_LEFT
exx
;; alternate set (counters)
ld c,8 ; row count
call do_one_row
;; alternate set (counters)
;; bottom 1/3
exx
;; main set (addresses)
DUP TILE_ROW_SKIP
inc bc
EDUP
ld hl,0x5000+TILE_SKIP_LEFT
exx
;; alternate set (counters)
ld c,8-TILE_SKIP_BOTTOM ; row count
call do_one_row
;; alternate set (counters)
IF DEBUG
jr $
ENDIF
IF !DEBUG && DEBUG_LOOP
xor a
out (#fe),a
jp again
ENDIF
IF !DEBUG
;; for basic
pop hl
exx
ret
ENDIF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; in:
;; C': height
;; BC: tilemap
;; HL: scraddr
;; registers must be at the alternate set (counters)
;;
;; out:
;; DE,HL,AF: dead
;; BC: right after last read tilemap byte
;; B'C': dead
;; registers are at the alternate set (counters)
;;
do_one_row:
ld b,32-TILE_ROW_SKIP
;; do it tile by tile
do_one_tile:
;; alternate set (counters)
exx
;; normal set (addresses)
ld a,(bc)
inc bc
or a
jp z,skip_one_tile
ld (do_one_tile_save_addr+1),hl
ld (one_tile_udg_addr+1),a
one_tile_udg_addr:
ld de,UDGADDR
; tile bitmap
DUP 7
ld a,(de)
ld (hl),a
inc d
inc h
EDUP
; avoid two useless increments
ld a,(de)
ld (hl),a
do_one_tile_save_addr:
ld hl,0 ; self-modifying code, fixed above
skip_one_tile:
inc l
exx
djnz do_one_tile
dec c
ret z
; we are done with the row
;; alternate set (counters)
exx
;; normal set (addresses)
DUP TILE_ROW_SKIP
inc bc
inc l
EDUP
exx
jp do_one_row
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unroll linear UDG set to `UDGADDR`
;;
prepare_udg_set:
ld c,UDG_COUNT
ld hl,UDG_SET
ld de,UDGADDR
; char by char
prepare_next_char:
push de
ld b,9
prepare_char_loop:
ld a,(hl)
ld (de),a
inc hl
inc d
djnz prepare_char_loop
pop de
inc e
dec c
jr nz,prepare_next_char
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; linear UDG set
;;
UDG_COUNT equ 8
UDG_SET:
;; bitmap, then attribute
defb 00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b, 00000000b
defb 00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b, 01011011b
defb 11001100b,01100110b,00110011b,10011001b,11001100b,01100110b,00110011b,10011001b, 01110000b
defb 00111100b,01111110b,11111111b,11111111b,11111111b,11111111b,01111110b,00111100b, 01000011b
defb 00000000b,00000011b,00001111b,00011111b,00111111b,00111111b,01111111b,01111111b, 01000011b
defb 00000000b,11000000b,11110000b,11111000b,11111100b,11111100b,11111110b,11111110b, 01000011b
defb 01111111b,01111111b,00111111b,00111111b,00011111b,00001111b,00000011b,00000000b, 01000011b
defb 11111110b,11111110b,11111100b,11111100b,11111000b,11110000b,11000000b,00000000b, 01000011b
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; tilemap
;;
org TILEMAP
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,0,2,0,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,0,0,2,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 4,5,0,0,0,1,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,3,4,5,4,5,0, 1,0,3,4,5,1,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,1,1,1,1,7,3, 1,3,1,6,5,1,5,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,1,6,7,6,7,0, 1,0,1,6,7,1,1,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,6,7,0,0,0,0,6, 7,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unrolled UDG set
;; contains first bytes for all 256 tiles, then next bytes, and so on
org UDGADDR
;INCBIN "udg.bin"
Re: Optimisation Challenge, for the benefit of new programmers
and slightly more optimised version for the case when tilemap is 256-byte aligned:
needs UrAsm to be compiled (mostly for "IF"-conditions, so can be adapted to other asms manually, i believe).
Code: Select all
DEBUG equ 0
DEBUG_LOOP equ 0
;; 0x300 bytes (768)
;; might be at a 256-byte boundary (slightly faster)
TILEMAP equ 0xf000
;; 0x900 bytes (2304)
;; must be at a 256-byte boundary
UDGADDR equ 0xf300
;; top/bottom skip: [0..7]
TILE_SKIP_TOP equ 2
TILE_SKIP_BOTTOM equ 2
;; left/right skip: any sane value, need to have at least one tile visible
TILE_SKIP_LEFT equ 2
TILE_SKIP_RIGHT equ 2
TILE_ROW_SKIP equ TILE_SKIP_LEFT+TILE_SKIP_RIGHT
TILEMAP_LAST_VIS_TILE equ TILEMAP+32*24-1-TILE_SKIP_RIGHT-32*TILE_SKIP_BOTTOM
org 32768
IF DEBUG
di ;; for debug
ELSE
;; for basic
exx
push hl
ENDIF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; to make things easier, i'm unrolling normal UDG here
;; normal UDG is 8 bitmap bytes and attribute
;; you can skip this unrolling, and use prepared data instead
call prepare_udg_set
again:
IF !DEBUG && DEBUG_LOOP
halt
ld a,1
out (#fe),a
ENDIF
;; call main blitting routine
call blit_tilemap
IF DEBUG
jr $
ENDIF
IF !DEBUG && DEBUG_LOOP
xor a
out (#fe),a
jp again
ENDIF
IF !DEBUG
;; for basic
pop hl
exx
ENDIF
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; in:
;; none
;;
;; out:
;; BC,DE,HL,AF: dead
;; B'C', D'E', H'L': dead
;;
;; doesn't use IX/IY, or stack tricks
;;
;; tile #0 is "attribute print" (i.e. only attrs will be printed)
;;
blit_tilemap:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; draw attributes
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
ld bc,TILEMAP_LAST_VIS_TILE
ld de,0x5800+32*24-1-TILE_SKIP_RIGHT-32*TILE_SKIP_BOTTOM
ld hl,UDGADDR+256*8
exx
;; alternate set: counters
ld c,24-TILE_SKIP_TOP-TILE_SKIP_BOTTOM ; height
print_one_row_loop:
;; alternate set: counters
ld b,32-TILE_ROW_SKIP ; width
print_one_tile:
;; here we should be at "counters"
exx
ld a,(bc)
ld l,a
ldd
inc hl ; restore high byte (required for zero tile)
exx
djnz print_one_tile
; next row
dec c
jp z,attr_done
;; alternate set: counters
exx
;; normal set: addresses
; skip invisible tiles
DUP TILE_ROW_SKIP
dec bc
dec de
EDUP
exx
jp print_one_row_loop
attr_done:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; draw bitmap
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; top 1/3
ld bc,TILEMAP+TILE_SKIP_LEFT+32*TILE_SKIP_TOP
ld hl,0x4000+TILE_SKIP_LEFT+32*TILE_SKIP_TOP
exx
ld c,8-TILE_SKIP_TOP ; row count
call do_one_row
;; alternate set (counters)
;; middle 1/3
ld bc,TILEMAP+TILE_SKIP_LEFT+32*8
ld hl,0x4800+TILE_SKIP_LEFT
exx
;; alternate set (counters)
ld c,8 ; row count
call do_one_row
;; alternate set (counters)
;; bottom 1/3
ld bc,TILEMAP+TILE_SKIP_LEFT+32*8*2
ld hl,0x5000+TILE_SKIP_LEFT
exx
;; alternate set (counters)
ld c,8-TILE_SKIP_BOTTOM ; row count
call do_one_row
;; alternate set (counters)
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; in:
;; C': height
;; BC: tilemap
;; HL: scraddr
;; registers must be at the alternate set (counters)
;;
;; out:
;; DE,HL,AF: dead
;; BC: right after last read tilemap byte
;; B'C': dead
;; registers are at the alternate set (counters)
;;
do_one_row:
ld b,32-TILE_ROW_SKIP
;; do it tile by tile
do_one_tile:
;; alternate set (counters)
exx
;; normal set (addresses)
ld a,(bc)
IF TILEMAP&0xff
inc bc
ELSE
inc c
ENDIF
or a
jp z,skip_one_tile
ld (do_one_tile_save_addr+1),hl
ld (one_tile_udg_addr+1),a
one_tile_udg_addr:
ld de,UDGADDR
; tile bitmap
DUP 7
ld a,(de)
ld (hl),a
inc d
inc h
EDUP
; avoid two useless increments
ld a,(de)
ld (hl),a
do_one_tile_save_addr:
ld hl,0 ; self-modifying code, fixed above
skip_one_tile:
inc l
exx
djnz do_one_tile
dec c
ret z
; we are done with the row
;; alternate set (counters)
exx
;; normal set (addresses)
IF TILE_ROW_SKIP > 3 && (TILEMAP&0xff) == 0
ld a,TILE_ROW_SKIP
add a,c
ld c,a
ELSE
DUP TILE_ROW_SKIP
IF TILEMAP&0xff
inc bc
ELSE
inc c
ENDIF
EDUP
ENDIF
IF TILE_ROW_SKIP > 3
ld a,TILE_ROW_SKIP
add a,l
ld l,a
ELSE
DUP TILE_ROW_SKIP
inc l
EDUP
ENDIF
exx
jp do_one_row
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unroll linear UDG set to `UDGADDR`
;;
prepare_udg_set:
ld c,UDG_COUNT
ld hl,UDG_SET
ld de,UDGADDR
; char by char
prepare_next_char:
push de
ld b,9
prepare_char_loop:
ld a,(hl)
ld (de),a
inc hl
inc d
djnz prepare_char_loop
pop de
inc e
dec c
jr nz,prepare_next_char
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; linear UDG set
;;
UDG_COUNT equ 8
UDG_SET:
;; bitmap, then attribute
defb 00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b, 00000000b
defb 00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b, 01011011b
defb 11001100b,01100110b,00110011b,10011001b,11001100b,01100110b,00110011b,10011001b, 01110000b
defb 00111100b,01111110b,11111111b,11111111b,11111111b,11111111b,01111110b,00111100b, 01000011b
defb 00000000b,00000011b,00001111b,00011111b,00111111b,00111111b,01111111b,01111111b, 01000011b
defb 00000000b,11000000b,11110000b,11111000b,11111100b,11111100b,11111110b,11111110b, 01000011b
defb 01111111b,01111111b,00111111b,00111111b,00011111b,00001111b,00000011b,00000000b, 01000011b
defb 11111110b,11111110b,11111100b,11111100b,11111000b,11110000b,11000000b,00000000b, 01000011b
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; tilemap
;;
org TILEMAP
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,0,2,0,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,0,0,2,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 4,5,0,0,0,1,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,3,4,5,4,5,0, 1,0,3,4,5,1,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,1,1,1,1,7,3, 1,3,1,6,5,1,5,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,1,6,7,6,7,0, 1,0,1,6,7,1,1,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,6,7,0,0,0,0,6, 7,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unrolled UDG set
;; contains first bytes for all 256 tiles, then next bytes, and so on
org UDGADDR
Re: Optimisation Challenge, for the benefit of new programmers
and i optimised it slightly more, but let's wait for others first. ;-) (also, cleaned up the code too.)
Re: Optimisation Challenge, for the benefit of new programmers
@Joefish I think you might have a dose of excessive average familiarity, so per this XKCD comic:
Your "crap" version appears to use undocumented Z80 instructions, makes extensive use of the alternate registers, nicely sets up a LDI instruction, and unrolls a loop. That's all before it does something clever to handle the screen address, which I don't understand. I've a feeling your "crap" code is already streets ahead of what the vast majority of programmers could write, let alone "new" ones.
I managed to get my C version down from 600ms to 91ms, which is still close to double the time your "crap" version takes. I think I could make it faster if I understood how to work with the screen better. Given that a compiler can't do the clever things an assembly language programmer can I was quite pleased with what I've go to.
Derek Fountain, author of the ZX Spectrum C Programmer's Getting Started Guide and various open source games, hardware and other projects, including an IF1 and ZX Microdrive emulator.
Re: Optimisation Challenge, for the benefit of new programmers
two basic tricks: interleaved tile data, and drawing the screen by 1/3 at a time can do wonders. ;-) with interleaved data you don't have to calculate tile data address at all, and working with only 1/3 of the screen at a time you don't need to calculate screen addresses too. i don't think that it is possible to do it much faster without dynamic codegen/code specialization anyway. at least that's where i hit the ceiling. but i'm not the best Z80 coder out there (not even a very good one), so you don't have to believe my words. ;-)
and let me spam you all with my Final RC version of the code, btw. ;-)
Code: Select all
DEBUG equ 0
DEBUG_LOOP equ 0
DEBUG_NO_ZERO_TILES equ 0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; options
;;
;; should tile #0 change only attrs?
;; if set, tile #0 will change only attributes, leaving pixels intact
ZERO_TILE_ATTR_ONLY equ 1
;; this is marginally faster when there are many #0 tiles, and `ZERO_TILE_ATTR_ONLY` is set
;; if almost whole tilemap is non-zero, separate attr pass is marginally slower (600-800 tics)
;; note that separate attr pass is temporarily modifying SP
;; so if you're not doing this right after HALT, use combined pass instead
;; the default is safer combined pass
USE_SEPARATE_ATTR_PASS equ 0
;; 0x300 bytes (768)
;; might be at a 256-byte boundary (slightly faster)
TILEMAP equ 0xf000
;; 0x900 bytes (2304)
;; must be at a 256-byte boundary
UDGADDR equ 0xf300
;; top/bottom skip: [0..7]
TILE_SKIP_TOP equ 2
TILE_SKIP_BOTTOM equ 2
;; left/right skip: any sane value, need to have at least one tile visible
;; visible window size must be even for maximum speed, and must be >2
;; note that the code is optimised for at least `2+2` skips (i.e. 4 and more)
TILE_SKIP_LEFT equ 2
TILE_SKIP_RIGHT equ 2
TILE_ROW_SKIP equ TILE_SKIP_LEFT+TILE_SKIP_RIGHT
TILEMAP_LAST_VIS_TILE equ TILEMAP+32*24-1-TILE_SKIP_RIGHT-32*TILE_SKIP_BOTTOM
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; sanity checks
;;
IF TILE_SKIP_TOP < 0 || TILE_SKIP_TOP > 7
FATAL "invalid TILE_SKIP_TOP"
ENDIF
IF TILE_SKIP_BOTTOM < 0 || TILE_SKIP_BOTTOM > 7
FATAL "invalid TILE_SKIP_BOTTOM"
ENDIF
IF TILE_SKIP_LEFT < 0 || TILE_SKIP_LEFT > 30
FATAL "invalid TILE_SKIP_LEFT"
ENDIF
IF TILE_SKIP_RIGHT < 0 || TILE_SKIP_RIGHT > 30
FATAL "invalid TILE_SKIP_RIGHT"
ENDIF
IF TILE_ROW_SKIP < 0 || TILE_ROW_SKIP > 30 || (TILE_ROW_SKIP&1)
FATAL "invalid TILE_SKIP_LEFT/TILE_SKIP_RIGHT combination"
ENDIF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; main entry
;;
org 32768
IF !DEBUG
;; for basic
exx
push hl
ENDIF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; to make things easier, i'm unrolling normal UDG here
;; normal UDG is 8 bitmap bytes and attribute
;; you can skip this unrolling, and use prepared data instead
call prepare_udg_set
IF DEBUG_NO_ZERO_TILES
ld hl,TILEMAP
ld bc,32*24
tilefix_loop:
ld a,(hl)
or a
jp nz,tilefix_no_fix
inc a
ld (hl),a
tilefix_no_fix:
inc hl
dec bc
ld a,b
or c
jp nz,tilefix_loop
ENDIF
again:
halt
if DEBUG
di
ELSE
IF DEBUG_LOOP
ld a,1
out (#fe),a
ENDIF
ENDIF
test0:
;; call main blitting routine
call blit_tilemap
test1:
IF DEBUG
jr $
ENDIF
IF !DEBUG && DEBUG_LOOP
xor a
out (#fe),a
jp again
ENDIF
IF !DEBUG
;; for basic
pop hl
exx
ENDIF
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; in:
;; none
;;
;; out:
;; BC,DE,HL,AF: dead
;; B'C', D'E', H'L': dead
;; if `USE_SEPARATE_ATTR_PASS` is zero, IX is dead too
;;
;; if `USE_SEPARATE_ATTR_PASS` is non-zero, uses stack to blit attributes
;; so either use HALT, or restore corrupted screen borders
;;
;; if `ZERO_TILE_ATTR_ONLY` is non-zero, then tile #0 is "attribute print"
;; (i.e. only attrs will be printed)
;;
blit_tilemap:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; draw attributes
;; use stack to copy them
if USE_SEPARATE_ATTR_PASS
ld (blit_attr_saved_sp+1),sp
;; alternate set: counters
ld hl,0x5800+32*24-TILE_SKIP_RIGHT-32*TILE_SKIP_BOTTOM
ld de,0-32 ; constant
ld c,24-TILE_SKIP_TOP-TILE_SKIP_BOTTOM ; height
exx
;; normal set: addresses
ld bc,TILEMAP_LAST_VIS_TILE
ld hl,UDGADDR
print_one_row_loop:
;; normal set: addresses
exx
print_one_row_loop_no_exx:
;; alternate set: counters
ld b,0+(32-TILE_ROW_SKIP)/2 ; width
;; set address to push
ld sp,hl
print_one_tile:
;; here we should be at "counters"
;; 4+7+4+7+6+7+4+7+6+11+4 = 67 tics per 2 tiles
exx ; 4
;; load two tiles
ld a,(bc) ; 7
ld l,a ; 4
ld d,(hl) ; 7
dec bc ; 6
ld a,(bc) ; 7
ld l,a ; 4
ld e,(hl) ; 7
dec bc ; 6
;; and push them
push de ; 11
exx ; 4
djnz print_one_tile
; next row
dec c ; 4
jp z,blit_attr_saved_sp
;; alternate set: counters
;; move hl to the previous screen line
add hl,de ; 11
; skip invisible tiles
IF TILE_ROW_SKIP
exx ; 4
;; normal set: addresses
; fixed 15 tics most of the time (w/o jp)
ld a,c ; 4
sub a,TILE_ROW_SKIP ; 7
ld c,a ; 4
jp nc,print_one_row_loop ; 10
dec b ; 4
jp print_one_row_loop ; 10
ELSE
jp print_one_row_loop_no_exx ; 10
ENDIF
blit_attr_saved_sp:
ld sp,0
endif
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; draw bitmap
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; top 1/3
ld bc,TILEMAP+TILE_SKIP_LEFT+32*TILE_SKIP_TOP
ld hl,0x4000+TILE_SKIP_LEFT+32*TILE_SKIP_TOP
;; normal set (addresses)
exx
;; microoptimisation: load B' to skip one loading
ld bc,256*(32-TILE_ROW_SKIP)+8-TILE_SKIP_TOP ; row count
ld e,b ; E' is used to restore B'
IF !USE_SEPARATE_ATTR_PASS
; IX is attributes address
ld ix,0x5800+TILE_SKIP_LEFT+32*TILE_SKIP_TOP
ENDIF
call do_one_tile ; HACK! jump into subroutine body
;; alternate set (counters)
;; middle 1/3
; B' will be set at do_one_third
ld c,8 ; row count
IF !USE_SEPARATE_ATTR_PASS
; IX is attributes address
ld ix,0x5800+TILE_SKIP_LEFT+32*8
ENDIF
exx
;; normal set (addresses)
ld bc,TILEMAP+TILE_SKIP_LEFT+32*8
ld hl,0x4800+TILE_SKIP_LEFT
call do_one_third
;; alternate set (counters)
;; bottom 1/3
; B' will be set at do_one_third
ld c,8-TILE_SKIP_BOTTOM ; row count
IF !USE_SEPARATE_ATTR_PASS
; IX is attributes address
ld ix,0x5800+TILE_SKIP_LEFT+32*8*2
ENDIF
exx
;; normal set (addresses)
ld bc,TILEMAP+TILE_SKIP_LEFT+32*8*2
ld hl,0x5000+TILE_SKIP_LEFT
;jp do_one_third
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; in:
;; C': height
;; E': width
;; BC: tilemap
;; HL: scraddr
;; registers must be at the normal set (addresses)
;;
;; out:
;; DE,HL,AF: dead
;; BC: right after last read tilemap byte
;; B'C': dead
;; registers are at the alternate set (counters)
;;
do_one_third:
;; normal set (addresses)
exx ; 4
do_one_third_no_exx:
;; alternate set (counters)
ld b,e ; 4
;; do it tile by tile
do_one_tile:
;; alternate set (counters)
exx ; 4
;; normal set (addresses)
ld a,(bc) ; 7
; cheap micro-optimisation if tilemap is 256-byte aligned
IF TILEMAP&0xff
inc bc ; 6
ELSE
inc c ; 4
ENDIF
IF ZERO_TILE_ATTR_ONLY
or a ; 4
jp z,skip_one_tile ; 10
ENDIF
ld d,UDGADDR/256+USE_SEPARATE_ATTR_PASS ; 7
ld e,a ; 4
IF !USE_SEPARATE_ATTR_PASS
; tile attributes
ld a,(de)
ld (ix+0),a
inc d
inc ix
ENDIF
; tile bitmap
DUP 7
ld a,(de) ; 7
ld (hl),a ; 7
inc d ; 4
inc h ; 4
EDUP
; avoid two useless increments
ld a,(de) ; 7
ld (hl),a ; 7
; faster than SMC (16+10) vs (10+11)
ld de,0-256*7+1 ; 10
add hl,de ; 11
exx ; 4
;; alternate set (counters)
djnz do_one_tile ; 13/8
done_row:
; decrement row counter
dec c ; 4
; we are done with the row
IF TILE_ROW_SKIP
ret z ; 11/5 (5 most of the time)
;; alternate set (counters)
exx ; 4
;; normal set (addresses)
; skip screen addresses
; fixed 15 tics
ld a,TILE_ROW_SKIP ; 7
add a,l ; 4
ld l,a ; 4
IF !USE_SEPARATE_ATTR_PASS
; use the fact that we are working by 1/3
; i.e. it's enough to modify only IXL
ld a,TILE_ROW_SKIP ; 7
add a,ixl ; 4+4
ld ixl,a ; 4+4
ENDIF
; skip tilemap bytes
; fixed 15 tics
ld a,TILE_ROW_SKIP ; 7
add a,c ; 4
ld c,a ; 4
IF TILEMAP&0xff
jp nc,do_one_third ; 10
dec b ; 4
ENDIF
jp do_one_third
ELSE
;; full width
;; alternate set (counters)
jp nz,do_one_third_no_exx
ret
ENDIF
; some pasta for tile skip
skip_one_tile:
;; normal set (addresses)
; tile attributes
IF !USE_SEPARATE_ATTR_PASS
ld de,UDGADDR ; 11
ld a,(de)
ld (ix+0),a
inc ix
ENDIF
inc l ; 4
exx ; 4
;; alternate set (counters)
djnz do_one_tile ; 13/8
jp done_row ; 10
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unroll linear UDG set to `UDGADDR`
;;
prepare_udg_set:
ld c,UDG_COUNT
ld hl,UDG_SET
ld de,UDGADDR
; char by char
prepare_next_char:
push de
ld b,9
prepare_char_loop:
ld a,(hl)
ld (de),a
inc hl
inc d
djnz prepare_char_loop
pop de
inc e
dec c
jr nz,prepare_next_char
ret
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; linear UDG set
;;
UDG_COUNT equ 8
UDG_SET:
;; attribute, then bitmap
defb 00000000b, 00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b
defb 01011011b, 00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b,00000000b
defb 01110000b, 11001100b,01100110b,00110011b,10011001b,11001100b,01100110b,00110011b,10011001b
defb 01000011b, 00111100b,01111110b,11111111b,11111111b,11111111b,11111111b,01111110b,00111100b
defb 01000011b, 00000000b,00000011b,00001111b,00011111b,00111111b,00111111b,01111111b,01111111b
defb 01000011b, 00000000b,11000000b,11110000b,11111000b,11111100b,11111100b,11111110b,11111110b
defb 01000011b, 01111111b,01111111b,00111111b,00111111b,00011111b,00001111b,00000011b,00000000b
defb 01000011b, 11111110b,11111110b,11111100b,11111100b,11111000b,11110000b,11000000b,00000000b
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; tilemap
;;
org TILEMAP
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,0,2,0,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,0,0,2,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 4,5,0,0,0,1,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,3,4,5,4,5,0, 1,0,3,4,5,1,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,1,1,1,1,7,3, 1,3,1,6,5,1,5,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,1,6,7,6,7,0, 1,0,1,6,7,1,1,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,6,7,0,0,0,0,6, 7,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0, 0,0,0,0,0,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
defb 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unrolled UDG set
;; contains first bytes for all 256 tiles, then next bytes, and so on
org UDGADDR
;INCBIN "udg.bin"
Re: Optimisation Challenge, for the benefit of new programmers
I've removed any need for IX and IY registers, converted all JRs to JPs.
I stil need to write the graphic interleaving routine, so it can use the simple graphic data from Joefish's example routine.
Be interesting to see how they all compare in different scenarios.
I stil need to write the graphic interleaving routine, so it can use the simple graphic data from Joefish's example routine.
Be interesting to see how they all compare in different scenarios.
Code: Select all
ORG 32768
DI ;Disable interrupts for speed
LD DE, 16384
LD BC, 22528
LD HL, CharMap
PrintLoop:
LD A,(HL) ;Get next CharMap character to print
AND A ;if zero, skip over screen position
JP Z, SkipPrinting
CP 9 ;chars 1-(n-1) (1-8 in this case) will only update attr data (set to 0 or 1 to disable this feature)
PUSH HL ;Store CharMap location
LD H,UDGs/256 ;Get base address of UDGs
LD L,A ;Select UDG 1-255
LD A,(HL) ;copy ATTR data
LD (BC),A
JP C,PrintATTROnly
INC H ;Get first line of chardata
PUSH DE ;Store current screen address
LD A,(HL) ;copy 8 lines of UDG data to screen
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
INC H
INC D
LD A,(HL)
LD (DE),A
POP DE ;Restore screen address
PrintATTROnly:
POP HL ;Restore CharMap location
SkipPrinting:
INC BC ;Advance to next ATTR position
INC HL ;Advance to next CharMap character
INC E ;Advance screen address
JP NZ, PrintLoop ;Loop until E wraps to zero - signals a third of the screen is completed
LD A,D ;Adjust DE to point to next screen third
ADD A,8
LD D,A
CP 88 ;If MSB of screen addr is now 88, then we are done.
JP C, PrintLoop ;Do next screen third
EI
RET
ORG 49152
CharMap: ;24 rows of 32 bytes of characters to render, aligned on 256 byte boundary
;0s don't aren't printed, 1-8 only update the attributes
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,0,0,0
db 0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
UDGs: ;256 8 byte arrays of attribs abd characters aligned on 256 byte boundary, interleaved
;256 ATTR bytes, followed by 256 top rows of UDGS, then 256 2nd rows, etc
db 0,0,8,16,24,32,40,48,56,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
db 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
db 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75
db 76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101
db 102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127
db 128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153
db 154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179
db 180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205
db 206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231
db 232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12
db 0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60
db 0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60
db 8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16
db 0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0
db 0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64
db 16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126,60,126,60,60,0,32
db 0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0,60,24,60,60,8,126
db 60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0,0,0,0,0,0,0,16,0,0
db 60,24,60,60,8,126,60,126,60,60,0,32,0,4,0,12,0,64,16,4,32,16,0
db 0,0,0,0,0,0,0,0,0,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16
db 60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66
db 56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66
db 24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56
db 68,68,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120
db 56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64
db 0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64,64,2,66,66,56,32
db 28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68,70,40,66,66,24,64
db 64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104,120,56,120,60,28,56,56,68,68
db 70,40,66,66,24,64,64,2,66,66,56,32,28,4,56,16,60,64,0,0,40,16,104
db 0,0,0,0,0,0,0,0,0,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24
db 68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66
db 4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12
db 40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16
db 68,68,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68
db 68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120
db 48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124,124,4,60,66,4,60
db 32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68,74,8,2,12,40,124
db 124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84,68,68,68,68,32,64,16,68,68
db 74,8,2,12,40,124,124,4,60,66,4,60,32,60,68,24,68,120,48,4,48,16,84
db 0,0,0,0,0,0,0,0,0,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16
db 68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62
db 60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2
db 72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16
db 68,40,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68
db 68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68
db 16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2,66,8,66,62,60,34
db 32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40,82,8,60,2,72,2
db 66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84,68,68,68,68,32,56,16,68,40
db 82,8,60,2,72,2,66,8,66,62,60,34,32,68,120,16,68,68,16,4,48,16,84
db 0,0,0,0,0,0,0,0,0,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16
db 60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2
db 68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66
db 126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16
db 68,40,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68
db 68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68
db 16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66,66,16,66,2,68,34
db 32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40,98,8,64,66,126,66
db 66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84,68,68,120,60,32,4,16,68,40
db 98,8,64,66,126,66,66,16,66,2,68,34,32,68,64,16,60,68,16,4,40,16,84
db 0,0,0,0,0,0,0,0,0,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16
db 4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60
db 60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60
db 8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12
db 56,16,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68
db 56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68
db 56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60,60,16,60,60,60,60
db 28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16,60,62,126,60,8,60
db 60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84,68,56,64,4,32,120,12,56,16
db 60,62,126,60,8,60,60,16,60,60,60,60,28,60,60,16,4,68,56,36,36,12,84
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0
db 0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0
db 0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0,0,0,64,6,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,0,24,0,0,0