How much self modifying code was there and did it do anything?
- bluespikey
- Manic Miner
- Posts: 959
- Joined: Tue Jun 30, 2020 3:54 pm
How much self modifying code was there and did it do anything?
I read somewhere that Joffa's code was so transcendent that it would modify itself. If the game is in a state where there is no reason that a logical validation would ever return true, then comment the check out rather than let it run, etc. Now thats really hard stuff to stay on top of and I wouldn't know how to do it in a modern language. But was it common tool back in the days of Z80 machine code, or just used by mad geniuses like Joffa? Is there much to be gained from it to make up for the extra complexity of code?
- ParadigmShifter
- Manic Miner
- Posts: 673
- Joined: Sat Sep 09, 2023 4:55 am
Re: How much self modifying code was there and did it do anything?
There's some basic stuff I do.
Save and restore stack pointer when you are about to abuse the stack to copy 16 bits at a time
but you can do that with any register/constant variable (e.g. changing loop counters, setting a flag that another routine will check, etc.).
another use is writing in opcodes to change a blitter from doing logical or to logical xor so you only need one function
where you can change that from doing an or (hl) to an xor (hl) by writing the single byte opcode to (.logicalop) before calling
or (hl) is #B6
xor (hl) is #AE
Joffa had blocks of code that would emit code e.g. a sequence of push operations to draw 16 pixels to the screen so you could push any register pair you want
His SuperFX music player self modifies a lot and it's way too hard for me to begin to work out what is going on in that (mainly setting loop variables I think).
Self modifying code is a very bad idea on modern CPUs since you have to flush the instruction cache. Code which generates other code has to override memory block status flags as well (mark them as executable) to allow data to be executed as code (e.g. just in time compilers).
EDIT: Here's a (not very good - period is only 65536, probably ok for arcade games though) random number generator which writes the new seed into the code when it is done. This was the RNG routine I use in SJOE (although I have a better one now which does not self modify and has a much longer period). It's not too bad if you throw away random numbers every so often (I throw away a random number every frame in the menus and every frame no buttons are being pressed during the game loop).
EDIT2: A couple more cases explained here
https://wikiti.brandonw.net/index.php?t ... timization
1. Modifying relative jump to do variable length loop unrolling
2. Poke the index into indexed instruction (replacing the 0 with another value)
ld a, (ix+0)
EDIT4: Bear in mind all this is an optimisation technique, you can do everything without self modifying code. If a variable is only read in 1 place you can poke the value to that place rather than using a global variable. You can save bytes by modifying functions slightly to do more than 1 thing. And you can get away with using less registers (e.g. no loop variable in the loop unrolling example).
Save and restore stack pointer when you are about to abuse the stack to copy 16 bits at a time
Code: Select all
ld (.savesp+1), sp
; change the stack pointer and do stuff
.savesp
ld sp, 0 ; 0 gets overwritten by sp
another use is writing in opcodes to change a blitter from doing logical or to logical xor so you only need one function
Code: Select all
; HL points to screen
; DE points to graphics data
; version which uses or
ld a, (de)
.logicalop
or (hl) ;or the graphics with what is on screen
ld (hl), a ; write it back
or (hl) is #B6
xor (hl) is #AE
Joffa had blocks of code that would emit code e.g. a sequence of push operations to draw 16 pixels to the screen so you could push any register pair you want
His SuperFX music player self modifies a lot and it's way too hard for me to begin to work out what is going on in that (mainly setting loop variables I think).
Self modifying code is a very bad idea on modern CPUs since you have to flush the instruction cache. Code which generates other code has to override memory block status flags as well (mark them as executable) to allow data to be executed as code (e.g. just in time compilers).
EDIT: Here's a (not very good - period is only 65536, probably ok for arcade games though) random number generator which writes the new seed into the code when it is done. This was the RNG routine I use in SJOE (although I have a better one now which does not self modify and has a much longer period). It's not too bad if you throw away random numbers every so often (I throw away a random number every frame in the menus and every frame no buttons are being pressed during the game loop).
Code: Select all
get_next_rand:
;f(n+1)=241f(n)+257 ;65536
;181 cycles, add 17 if called
;Outputs:
; BC was the previous pseudorandom value
; HL is the next pseudorandom value
;Notes:
; You can also use B,C,H,L as pseudorandom 8-bit values
; this will generate all 8-bit values
randSeed:
ld hl, 0
ld c, l
ld b, h
add hl, hl
add hl, bc
add hl, hl
add hl, bc
add hl, hl
add hl, bc
add hl, hl
add hl, hl
add hl, hl
add hl, hl
add hl, bc
inc h
inc hl
ld (randSeed+1),hl
ret
https://wikiti.brandonw.net/index.php?t ... timization
1. Modifying relative jump to do variable length loop unrolling
Code: Select all
ld (jpmodify),a
;...
jpmodify = $+1
jr $00
rrca
rrca
rrca
rrca
rrca
rrca
rrca
rrca
ld a, (ix+0)
EDIT4: Bear in mind all this is an optimisation technique, you can do everything without self modifying code. If a variable is only read in 1 place you can poke the value to that place rather than using a global variable. You can save bytes by modifying functions slightly to do more than 1 thing. And you can get away with using less registers (e.g. no loop variable in the loop unrolling example).
Re: How much self modifying code was there and did it do anything?
I think most of my games have selfmodifying code.
Mostly to save bytes when you only have 1K to code but also for speed.
Mostly to save bytes when you only have 1K to code but also for speed.
Re: How much self modifying code was there and did it do anything?
Self modifying code tends to fall into one of two categories:
1) Simple stuff like storing a variable inline inside an instruction so LD, HL, nnnn has the nnnn value directly stored in the opcode. Slightly more advanced versions tweak the odd instruction itself to swap an XOR for an OR or a NOP maybe. The reasons for doing this are usually running out of registers or avoiding duplicating large blocks of very similar code.
2) Full on code generators. The classic example being things like sprite scaling. You have some code that works out the necessary sequence of instructions for scaling a horizontal line and then call this for each line of the sprite, rather than doing all the scaling logic every time. This is harder to do and understand, but the benefits are often massive in terms of performance because you're running specifically tailored code rather than a generic routine.
1) Simple stuff like storing a variable inline inside an instruction so LD, HL, nnnn has the nnnn value directly stored in the opcode. Slightly more advanced versions tweak the odd instruction itself to swap an XOR for an OR or a NOP maybe. The reasons for doing this are usually running out of registers or avoiding duplicating large blocks of very similar code.
2) Full on code generators. The classic example being things like sprite scaling. You have some code that works out the necessary sequence of instructions for scaling a horizontal line and then call this for each line of the sprite, rather than doing all the scaling logic every time. This is harder to do and understand, but the benefits are often massive in terms of performance because you're running specifically tailored code rather than a generic routine.
- Ast A. Moore
- Rick Dangerous
- Posts: 2641
- Joined: Mon Nov 13, 2017 3:16 pm
Re: How much self modifying code was there and did it do anything?
I use SMC all the time. Mostly to save on space and speed up code execution, but occasionally I replace opcodes on the fly, too. All very useful stuff for 8-bit micros, where every byte and T state counts.
Every man should plant a tree, build a house, and write a ZX Spectrum game.
Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
Re: How much self modifying code was there and did it do anything?
There's lots of things that are normal in asm (even more so in the older CPUs without memory execution prevention) but which high level languages can't adequately express in first-class language features. Sure you can approximate some of it with inline asm, pointers, code in byte arrays, and private far jumps, but you're really then just using asm techniques by the back door for your specific CPU, and you lose any code portability that the high level languages are supposed to give you.bluespikey wrote: ↑Thu Jan 04, 2024 12:15 pm Now thats really hard stuff to stay on top of and I wouldn't know how to do it in a modern language.
Like Ast I use SMC all the time, and it's one of the reasons I like asm better than HLLs. My normal code style is to store all variables inside operands too, using operand labels which are a feature in some assemblers like sjasmplus and zeus.
In routines optimized for speed it can really make a big difference. I'd far rather patch a dozen opcodes or operands onces during level setup or on a special event, than have a dozen conditional calculations that run every time a loop iterates.
Last edited by Seven.FFF on Thu Jan 04, 2024 7:01 pm, edited 1 time in total.
Robin Verhagen-Guest
SevenFFF / Threetwosevensixseven / colonel32
NXtel • NXTP • ESP Update • ESP Reset • CSpect Plugins
SevenFFF / Threetwosevensixseven / colonel32
NXtel • NXTP • ESP Update • ESP Reset • CSpect Plugins
- ParadigmShifter
- Manic Miner
- Posts: 673
- Joined: Sat Sep 09, 2023 4:55 am
Re: How much self modifying code was there and did it do anything?
You definitely wouldn't want to use self modifying code in a high level language these days (since it has to flush the instruction cache) so it's probably a good idea they make it hard to do so
Closest thing you can do is pointers to functions/virtual functions/interface calls really which are very useful of course. C like languages already implement jump tables if they think it will make your code faster with drop through switch/case statements. (Maybe Java doesn't allow that but Java sucks anyway lol).
Just in time compilers have to generate code on the fly and mark the data blocks they build as executable otherwise the OS will kill your program.
Using the cache efficiently is much more important for program optimisation than these old techniques which result in much worse performance.
Closest thing you can do is pointers to functions/virtual functions/interface calls really which are very useful of course. C like languages already implement jump tables if they think it will make your code faster with drop through switch/case statements. (Maybe Java doesn't allow that but Java sucks anyway lol).
Just in time compilers have to generate code on the fly and mark the data blocks they build as executable otherwise the OS will kill your program.
Using the cache efficiently is much more important for program optimisation than these old techniques which result in much worse performance.
- ParadigmShifter
- Manic Miner
- Posts: 673
- Joined: Sat Sep 09, 2023 4:55 am
Re: How much self modifying code was there and did it do anything?
Dunno what you mean by that even though I use sjasm lol, can you give an example?
I have used macros to inject opcodes into sequences and skip that if the opcode would be a NOP if that's what you mean.
sjasm macros are lacking a bit since they can't return just an expression which is annoying e.g. I can't provide a macro to do most of the work here, have to use 2 very similar macros, a c #define would be lovely, instead of this copy/paste malarkey
Code: Select all
MACRO XYTOSCRADDRHL _x_, _y_
ld hl, SCRBASE + (((_y_)|((_y_&#C0)>>3))<<8)|((_x_F)|((_y_&)<<2))
ENDM
MACRO DWXYTOSCRADDR _x_, _y_
dw SCRBASE + (((_y_)|((_y_&#C0)>>3))<<8)|((_x_F)|((_y_&)<<2))
ENDM
Reason for those macros is simple: if the compiler can do the maths for you at compile time, don't do it at runtime
- Ast A. Moore
- Rick Dangerous
- Posts: 2641
- Joined: Mon Nov 13, 2017 3:16 pm
Re: How much self modifying code was there and did it do anything?
ParadigmShifter wrote: ↑Thu Jan 04, 2024 7:05 pm Dunno what you mean by that even though I use sjasm lol, can you give an example?
Code: Select all
ld (new_score+1),a ;write the value of A to memory location
;marked by label new_score plus one byte
...
...
...
new_score ld a,0 ;the code above will modify the operand
;of the LD A,n instruction here
Every man should plant a tree, build a house, and write a ZX Spectrum game.
Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.
- ParadigmShifter
- Manic Miner
- Posts: 673
- Joined: Sat Sep 09, 2023 4:55 am
Re: How much self modifying code was there and did it do anything?
Ok. That's just a basic example of where you read a variable in a single place you can store it directly in the code instead of elsewhere. EDIT: I'm pretty sure all assemblers support that too?
It's only really useful if you only read the value in a single place though, it gets worse the more times you access it in the code.
My blitter example of modifying the logical operation gets worse the more places you need to do the logical operation you want, so it gets worse the more you unroll loops.
It's only really useful if you only read the value in a single place though, it gets worse the more times you access it in the code.
My blitter example of modifying the logical operation gets worse the more places you need to do the logical operation you want, so it gets worse the more you unroll loops.
Re: How much self modifying code was there and did it do anything?
Some time ago while trying a few hacks, I saw some self-modifying code in Penetrator.
There's a routine at 40535/9E57 that sets the A register to either 40 or 24. This is then used to change the instruction at 40594 to JR (A = 24) or JR Z (A = 40).
The game is quite an early one (1982) - though IMO Veronika Megler and Philip Mitchell were way ahead of their time in programming ability back then.
There's a routine at 40535/9E57 that sets the A register to either 40 or 24. This is then used to change the instruction at 40594 to JR (A = 24) or JR Z (A = 40).
The game is quite an early one (1982) - though IMO Veronika Megler and Philip Mitchell were way ahead of their time in programming ability back then.
My Speccy site: thirdharmoniser.com
Re: How much self modifying code was there and did it do anything?
Like this. Test label is defined as the address of the 123 operand byte, not the ld a opcode.ParadigmShifter wrote: ↑Thu Jan 04, 2024 7:05 pm Dunno what you mean by that even though I use sjasm lol, can you give an example?
Code: Select all
Test+*: ld a, 123
In zeus it would be this, and you can get fancier with marking up multiple operands and regular labels all on the same lines:
Code: Select all
TestOp: ld a, [Test]123
Test2Op: ld (ix+[Test2a]20), [Test2b]123
Data: db 123, [DataA]456, 789, [DataB]999
Yes, with manual label calculation anything would support it. I'm talking about normal code style where you're trying to make code readable to signal intent, so anything making that intent clearer using first-class assembler features is great in my book.ParadigmShifter wrote: ↑Thu Jan 04, 2024 7:53 pm I'm pretty sure all assemblers support that too?
It's only really useful if you only read the value in a single place though, it gets worse the more times you access it in the code.
It's just as useful if you read variables in multiple places. It wouldn't be any different from reading any labelled or non-labelled address from multiple places. You can still pick the location of the operand storage so that it gets max speed benefits from, say, being referenced many times inside a loop.
Robin Verhagen-Guest
SevenFFF / Threetwosevensixseven / colonel32
NXtel • NXTP • ESP Update • ESP Reset • CSpect Plugins
SevenFFF / Threetwosevensixseven / colonel32
NXtel • NXTP • ESP Update • ESP Reset • CSpect Plugins
Re: How much self modifying code was there and did it do anything?
I have been using
JR Label
and
LD A,label mod 256
to skip a JR
JR Label
and
LD A,label mod 256
to skip a JR
- ParadigmShifter
- Manic Miner
- Posts: 673
- Joined: Sat Sep 09, 2023 4:55 am
Re: How much self modifying code was there and did it do anything?
Data alignment is still my favourite optimisation.
If you align data so you know it will never cross a page (256 byte) boundary you can replace inc hl with inc l and so on. (3 jiffies saved!)
If you align code to a 256 byte boundary you can use just one byte of data to jump to a page aligned routine so a function pointer for half price. (1 byte saved per usage!)
If you align data so you know it will never cross a page (256 byte) boundary you can replace inc hl with inc l and so on. (3 jiffies saved!)
If you align code to a 256 byte boundary you can use just one byte of data to jump to a page aligned routine so a function pointer for half price. (1 byte saved per usage!)
Re: How much self modifying code was there and did it do anything?
That is not selfmodifying. I use this a lot to check controis.ParadigmShifter wrote: ↑Thu Jan 04, 2024 8:46 pm Data alignment is still my favourite optimisation.
If you align data so you know it will never cross a page (256 byte) boundary you can replace inc hl with inc l and so on. (3 jiffies saved!)
If you align code to a 256 byte boundary you can use just one byte of data to jump to a page aligned routine so a function pointer for half price. (1 byte saved per usage!)
An exemple out of my current game I am working on...
Code: Select all
keytab db %11111101,1,down mod 256 ; A
db %11111011,1,up mod 256 ; Q
db %11011111,1,right mod 256 ; P
db %11011111,2,left mod 256 ; O
db %11111110,2,fire mod 256 ; Z
db 0
jphl ld l,(hl)
jp (hl)
left dec c
jp p,l1
ld c,#3e+3
l1 ld a,play0 mod 256
jr setdir
down ld a,b
rrca
jr c,d1 ; odd value always allows down
dwnbot ld hl,0
call tdown
inc hl
call z,tdown
jp nz,dwnret
d1 inc b
jr mtest
up call whatbg ; get current background
call uptest ; check ladders
jr nz,mtest ; no ladders, no up
doup dec b ; go up 1 position
call whatbg ; get higher background
call uptest ; now check open path
jr z,mtest ; up allowed
undoup inc b ; undo up
fire jr mtest
right inc c
ld a,c
sub #42
jr nz,setdir-2
ld c,a
ld a,play4 mod 256
; add computerplayer
setdir ld (dir+1),a
and further in the game
ld hl,keytab
keylp ld a,(hl)
inc hl
in a,(254)
cpl
and (hl)
inc hl
push hl
jp nz,jphl ; from keypressed to execute routine
dwnret pop hl
mvret inc hl
ld a,(hl)
or a
jr nz,keylp ; all keys tested
- ParadigmShifter
- Manic Miner
- Posts: 673
- Joined: Sat Sep 09, 2023 4:55 am
Re: How much self modifying code was there and did it do anything?
Yeah I know it's not self-modifying but since we're talking about optimisation it deserves to be mentioned.
Data alignment is easy to do...aligning code opens up a whole new area of memory saving. If you only ever jump to a function that has low address 0 you can save that byte in your data everywhere you need it (and only reading 1 byte of data is a win too). Just store the high byte, low byte is always 0!
Data alignment is easy to do...aligning code opens up a whole new area of memory saving. If you only ever jump to a function that has low address 0 you can save that byte in your data everywhere you need it (and only reading 1 byte of data is a win too). Just store the high byte, low byte is always 0!
Re: How much self modifying code was there and did it do anything?
actually, what Joffa did called "JIT compiler" nowdays. ;-) let's take Firefly as an example: there is a compiler, which builds screen blitting code based on the visible map region. then that code is called to draw the map — no more checks, no extra actions, just raw data copying speed. it is faster to build ("jit-compile") blitting code each frame instead of trying to use slower universal blitter.bluespikey wrote: ↑Thu Jan 04, 2024 12:15 pm I read somewhere that Joffa's code was so transcendent that it would modify itself. If the game is in a state where there is no reason that a logical validation would ever return true, then comment the check out rather than let it run, etc. Now thats really hard stuff to stay on top of and I wouldn't know how to do it in a modern language. But was it common tool back in the days of Z80 machine code, or just used by mad geniuses like Joffa? Is there much to be gained from it to make up for the extra complexity of code?
most other time SMC is used to store variables directly in the code (in "LD" instruction, for example), or to patch some jumps. (that's what my line drawing routine is using, for example).
sorry for being late to the show, and replying before reading all other replies. i guess this all was already written here several times. ;-)
Re: How much self modifying code was there and did it do anything?
Mainroutine of my ZX81 emulator on the ZX Spectrum (no faster decisiontable to reach 256 routines)ParadigmShifter wrote: ↑Thu Jan 04, 2024 9:10 pm Yeah I know it's not self-modifying but since we're talking about optimisation it deserves to be mentioned.
Data alignment is easy to do...aligning code opens up a whole new area of memory saving. If you only ever jump to a function that has low address 0 you can save that byte in your data everywhere you need it (and only reading 1 byte of data is a win too). Just store the high byte, low byte is always 0!
Code: Select all
ld a,(de) ; DE is PC, get opcode
ld l,a ; low byte of address now set
ld h,b ; B is constant, highbyte of 256 bytes table
ld h,(hl) ; get highbyte of routine
jp (hl) ; emulate opcode in routine
tab db op00 / 256 ; op00 = nn00
db op01 / 256 ; op01 = nn01
...
db opff / 256 ; opff = nnff
Re: How much self modifying code was there and did it do anything?
The practical examples I see quite often:
1) store SP value directly as argument of LD SP, nnnn
2) update a byte/word value and inject new value back to computation code - RNGs use it quite often
3) compute jump argument for JR - very often it works as replacement for SWITCH CASE statement
4) various precomputations of value that is defacto constant inside loop
Less common practices seen mostly in decryptors and etc:
5) XOR decryptors overwriting itself
6) LDIR/LDDR decryptors overwriting itself
7) copying from circular buffer to screen where series of LDI is modified by JR
8) complete code generators, usually constructing PUSH fillers
1) store SP value directly as argument of LD SP, nnnn
2) update a byte/word value and inject new value back to computation code - RNGs use it quite often
3) compute jump argument for JR - very often it works as replacement for SWITCH CASE statement
4) various precomputations of value that is defacto constant inside loop
Less common practices seen mostly in decryptors and etc:
5) XOR decryptors overwriting itself
Code: Select all
loop:
xor (hl)
ld (hl), a
dec hl
jr loop ;loop is terminated by overwriting JR parameter
7) copying from circular buffer to screen where series of LDI is modified by JR
8) complete code generators, usually constructing PUSH fillers
Proud owner of Didaktik M
Re: How much self modifying code was there and did it do anything?
Yep, that last one is how the scenery blocks in Cobra are rendered. The pixel data for the scrollable tiles of scenery for one line of the screen are loaded into the CPU registers in pairs, then the stack pointer pointed at screen memory. So then by doing PUSH BC, PUSH DE, etc. a row of scenery tiles can be drawn. On the next pixel line of the screen the next row of pixels for each tile are put in the registers, then the same pattern of PUSH instructions are called. That particular pattern of PUSH instructions is laid down just ahead of things by self-modifying code, that looks at the level map and writes out the corresponding PUSH instructions in order. I seem to recall it can also insert LD instructions in amongst the PUSH instructions to redefine some of the blocks mid-line.
Re: How much self modifying code was there and did it do anything?
Test for a different real key pressed
Code: Select all
keyrd:
ld a,(23560)
keyt cp 0
jr z,keyrd
ld (keyt+1),a ; save latest key pressed
[\code]
Re: How much self modifying code was there and did it do anything?
Looks interesting, can you explain that one in a bit more detail? I can't quite get my head around it...catmeows wrote: ↑Sun Jan 21, 2024 10:47 am 5) XOR decryptors overwriting itselfCode: Select all
loop: xor (hl) ld (hl), a dec hl jr loop ;loop is terminated by overwriting JR parameter
My Speccy site: thirdharmoniser.com
- jpnz
- Manic Miner
- Posts: 328
- Joined: Tue Nov 14, 2017 4:07 pm
- Location: Hamilt[r]on - City Of The Future - NZ
Re: How much self modifying code was there and did it do anything?
Self modifying code is great
It can be used to make "things" relocatable - i.e load some code into memory exactly (within reason) where you want it to run and execution will "fixup" itself
A good example is HiSoft Devpac GENS & MONS
A 1981 relocation article from a TRS-80 chap is here (page 47), but is relevant and a good read
It can be used to make "things" relocatable - i.e load some code into memory exactly (within reason) where you want it to run and execution will "fixup" itself
A good example is HiSoft Devpac GENS & MONS
A 1981 relocation article from a TRS-80 chap is here (page 47), but is relevant and a good read
Re: How much self modifying code was there and did it do anything?
NOPs into HALTs
Can be an easy self modify for a slow down/speed up, (waiting for interrupt or not)
Can be an easy self modify for a slow down/speed up, (waiting for interrupt or not)
CLEAR 23855
Re: How much self modifying code was there and did it do anything?
As HL is decreasing, it will rewrite second byte of JR instruction. Since series of multiple xors over chunk are deterministic, you can choose start value in A register such that result of XOR will change parameter of JR to jump outside the loop.
Code: Select all
encryptStart
ld hl, encrypted-1
ld b, encryptedCount
ld a, $FB ;initial value for encrypting parameter of JR instruction
encryptLoop
xor (hl)
ld (hl), a
inc hl
djnz encryptLoop
ld (decryptSeed+1), a
ret
decrypt:
ld hl, lastEncryptedByte
decryptSeed
ld a, 0
decryptLoop
xor (hl)
ld (hl), a
dec hl
jr encrypted ; $18 $00 after encryption it will be "JR decryptLoop" $18 $FB
encrypted
;encrypted code follows here
This is case when we want jump right after decrypt loop but it is perfectly possible to choose initial encrypt value such that decrypt loop will jump somewhere else.
Proud owner of Didaktik M