How much self modifying code was there and did it do anything?

bluespikey · Post by **bluespikey** » Thu Jan 04, 2024 12:15 pm

I read somewhere that Joffa's code was so transcendent that it would modify itself. If the game is in a state where there is no reason that a logical validation would ever return true, then comment the check out rather than let it run, etc. Now thats really hard stuff to stay on top of and I wouldn't know how to do it in a modern language. But was it common tool back in the days of Z80 machine code, or just used by mad geniuses like Joffa? Is there much to be gained from it to make up for the extra complexity of code?

ParadigmShifter · Post by **ParadigmShifter** » Thu Jan 04, 2024 12:55 pm

There's some basic stuff I do.

Save and restore stack pointer when you are about to abuse the stack to copy 16 bits at a time

Code: Select all

    ld (.savesp+1), sp
    ; change the stack pointer and do stuff
    
.savesp
    ld sp, 0 ; 0 gets overwritten by sp

but you can do that with any register/constant variable (e.g. changing loop counters, setting a flag that another routine will check, etc.).

another use is writing in opcodes to change a blitter from doing logical or to logical xor so you only need one function

Code: Select all

    ; HL points to screen
    ; DE points to graphics data
    ; version which uses or
    ld a, (de)
.logicalop
    or (hl) ;or the graphics with what is on screen
    ld (hl), a ; write it back

where you can change that from doing an or (hl) to an xor (hl) by writing the single byte opcode to (.logicalop) before calling

or (hl) is #B6
xor (hl) is #AE

Joffa had blocks of code that would emit code e.g. a sequence of push operations to draw 16 pixels to the screen so you could push any register pair you want

His SuperFX music player self modifies a lot and it's way too hard for me to begin to work out what is going on in that (mainly setting loop variables I think).

Self modifying code is a very bad idea on modern CPUs since you have to flush the instruction cache. Code which generates other code has to override memory block status flags as well (mark them as executable) to allow data to be executed as code (e.g. just in time compilers).

EDIT: Here's a (not very good - period is only 65536, probably ok for arcade games though) random number generator which writes the new seed into the code when it is done. This was the RNG routine I use in SJOE (although I have a better one now which does not self modify and has a much longer period). It's not too bad if you throw away random numbers every so often (I throw away a random number every frame in the menus and every frame no buttons are being pressed during the game loop).

Code: Select all

get_next_rand:
;f(n+1)=241f(n)+257   ;65536
;181 cycles, add 17 if called
;Outputs:
;     BC was the previous pseudorandom value
;     HL is the next pseudorandom value
;Notes:
;     You can also use B,C,H,L as pseudorandom 8-bit values
;     this will generate all 8-bit values
randSeed:
	ld hl, 0
	ld c, l
	ld b, h
	add hl, hl
	add hl, bc
	add hl, hl
	add hl, bc
	add hl, hl
	add hl, bc
	add hl, hl
	add hl, hl
	add hl, hl
	add hl, hl
	add hl, bc
	inc h
	inc hl
	ld (randSeed+1),hl
	ret

EDIT2: A couple more cases explained here

https://wikiti.brandonw.net/index.php?t ... timization

1. Modifying relative jump to do variable length loop unrolling

Code: Select all

 ld (jpmodify),a
;...
jpmodify = $+1
 jr $00
 rrca
 rrca
 rrca
 rrca
 rrca
 rrca
 rrca
 rrca

2. Poke the index into indexed instruction (replacing the 0 with another value)

ld a, (ix+0)

EDIT4: Bear in mind all this is an optimisation technique, you can do everything without self modifying code. If a variable is only read in 1 place you can poke the value to that place rather than using a global variable. You can save bytes by modifying functions slightly to do more than 1 thing. And you can get away with using less registers (e.g. no loop variable in the loop unrolling example).

Dr beep · Post by **Dr beep** » Thu Jan 04, 2024 2:09 pm

I think most of my games have selfmodifying code.

Mostly to save bytes when you only have 1K to code but also for speed.

AndyC · Post by **AndyC** » Thu Jan 04, 2024 2:20 pm

Self modifying code tends to fall into one of two categories:

1) Simple stuff like storing a variable inline inside an instruction so LD, HL, nnnn has the nnnn value directly stored in the opcode. Slightly more advanced versions tweak the odd instruction itself to swap an XOR for an OR or a NOP maybe. The reasons for doing this are usually running out of registers or avoiding duplicating large blocks of very similar code.

2) Full on code generators. The classic example being things like sprite scaling. You have some code that works out the necessary sequence of instructions for scaling a horizontal line and then call this for each line of the sprite, rather than doing all the scaling logic every time. This is harder to do and understand, but the benefits are often massive in terms of performance because you're running specifically tailored code rather than a generic routine.

Ast A. Moore · Post by **Ast A. Moore** » Thu Jan 04, 2024 3:41 pm

I use SMC all the time. Mostly to save on space and speed up code execution, but occasionally I replace opcodes on the fly, too. All very useful stuff for 8-bit micros, where every byte and T state counts.

Seven.FFF · Post by **Seven.FFF** » Thu Jan 04, 2024 6:53 pm

bluespikey wrote: ↑Thu Jan 04, 2024 12:15 pm Now thats really hard stuff to stay on top of and I wouldn't know how to do it in a modern language.

There's lots of things that are normal in asm (even more so in the older CPUs without memory execution prevention) but which high level languages can't adequately express in first-class language features. Sure you can approximate some of it with inline asm, pointers, code in byte arrays, and private far jumps, but you're really then just using asm techniques by the back door for your specific CPU, and you lose any code portability that the high level languages are supposed to give you.

Like Ast I use SMC all the time, and it's one of the reasons I like asm better than HLLs. My normal code style is to store all variables inside operands too, using operand labels which are a feature in some assemblers like sjasmplus and zeus.

In routines optimized for speed it can really make a big difference. I'd far rather patch a dozen opcodes or operands onces during level setup or on a special event, than have a dozen conditional calculations that run every time a loop iterates.

ParadigmShifter · Post by **ParadigmShifter** » Thu Jan 04, 2024 7:01 pm

You definitely wouldn't want to use self modifying code in a high level language these days (since it has to flush the instruction cache) so it's probably a good idea they make it hard to do so

Closest thing you can do is pointers to functions/virtual functions/interface calls really which are very useful of course. C like languages already implement jump tables if they think it will make your code faster with drop through switch/case statements. (Maybe Java doesn't allow that but Java sucks anyway lol).

Just in time compilers have to generate code on the fly and mark the data blocks they build as executable otherwise the OS will kill your program.

Using the cache efficiently is much more important for program optimisation than these old techniques which result in much worse performance.

ParadigmShifter · Post by **ParadigmShifter** » Thu Jan 04, 2024 7:05 pm

Seven.FFF wrote: ↑Thu Jan 04, 2024 6:53 pm My normal code style is to store all variables inside operands too, using operand labels which are a feature in some assemblers like sjasmplus and zeus.

Dunno what you mean by that even though I use sjasm lol, can you give an example?

I have used macros to inject opcodes into sequences and skip that if the opcode would be a NOP if that's what you mean.

sjasm macros are lacking a bit since they can't return just an expression which is annoying e.g. I can't provide a macro to do most of the work here, have to use 2 very similar macros, a c #define would be lovely, instead of this copy/paste malarkey

Code: Select all

	MACRO XYTOSCRADDRHL _x_, _y_
	ld hl, SCRBASE + (((_y_&#7)|((_y_&#C0)>>3))<<8)|((_x_&#1F)|((_y_&#38)<<2))
	ENDM

	MACRO DWXYTOSCRADDR _x_, _y_
	dw SCRBASE + (((_y_&#7)|((_y_&#C0)>>3))<<8)|((_x_&#1F)|((_y_&#38)<<2))
	ENDM

EDIT: Those macros turn constant screen coordinates (_x_ = column [0-31] and _y_ = row [0-191]) into screen addresses (SCRBASE is #4000).

Reason for those macros is simple: if the compiler can do the maths for you at compile time, don't do it at runtime

Ast A. Moore · Post by **Ast A. Moore** » Thu Jan 04, 2024 7:48 pm

ParadigmShifter wrote: ↑Thu Jan 04, 2024 7:05 pm Dunno what you mean by that even though I use sjasm lol, can you give an example?

Code: Select all


		ld (new_score+1),a	;write the value of A to memory location 
					;marked by label new_score plus one byte
	
		...
		...
		...

new_score	ld a,0			;the code above will modify the operand
					;of the LD A,n instruction here

ParadigmShifter · Post by **ParadigmShifter** » Thu Jan 04, 2024 7:53 pm

Ok. That's just a basic example of where you read a variable in a single place you can store it directly in the code instead of elsewhere. EDIT: I'm pretty sure all assemblers support that too?

It's only really useful if you only read the value in a single place though, it gets worse the more times you access it in the code.

My blitter example of modifying the logical operation gets worse the more places you need to do the logical operation you want, so it gets worse the more you unroll loops.

Morkin · Post by **Morkin** » Thu Jan 04, 2024 8:11 pm

Some time ago while trying a few hacks, I saw some self-modifying code in Penetrator.

There's a routine at 40535/9E57 that sets the A register to either 40 or 24. This is then used to change the instruction at 40594 to JR (A = 24) or JR Z (A = 40).

The game is quite an early one (1982) - though IMO Veronika Megler and Philip Mitchell were way ahead of their time in programming ability back then.

Seven.FFF · Post by **Seven.FFF** » Thu Jan 04, 2024 8:12 pm

ParadigmShifter wrote: ↑Thu Jan 04, 2024 7:05 pm Dunno what you mean by that even though I use sjasm lol, can you give an example?

Like this. Test label is defined as the address of the 123 operand byte, not the ld a opcode.

Code: Select all

Test+*:                 ld a, 123

I don't think it's supported in sjasm, only in sjasmplus. It's in the docs.

In zeus it would be this, and you can get fancier with marking up multiple operands and regular labels all on the same lines:

Code: Select all

TestOp:                 ld a, [Test]123
Test2Op:                ld (ix+[Test2a]20), [Test2b]123
Data:                   db 123, [DataA]456, 789, [DataB]999

ParadigmShifter wrote: ↑Thu Jan 04, 2024 7:53 pm I'm pretty sure all assemblers support that too?

It's only really useful if you only read the value in a single place though, it gets worse the more times you access it in the code.

Yes, with manual label calculation anything would support it. I'm talking about normal code style where you're trying to make code readable to signal intent, so anything making that intent clearer using first-class assembler features is great in my book.

It's just as useful if you read variables in multiple places. It wouldn't be any different from reading any labelled or non-labelled address from multiple places. You can still pick the location of the operand storage so that it gets max speed benefits from, say, being referenced many times inside a loop.

Dr beep · Post by **Dr beep** » Thu Jan 04, 2024 8:40 pm

I have been using
JR Label
and
LD A,label mod 256

to skip a JR

ParadigmShifter · Post by **ParadigmShifter** » Thu Jan 04, 2024 8:46 pm

Data alignment is still my favourite optimisation.

If you align data so you know it will never cross a page (256 byte) boundary you can replace inc hl with inc l and so on. (3 jiffies saved!)

If you align code to a 256 byte boundary you can use just one byte of data to jump to a page aligned routine so a function pointer for half price. (1 byte saved per usage!)

Dr beep · Post by **Dr beep** » Thu Jan 04, 2024 8:59 pm

ParadigmShifter wrote: ↑Thu Jan 04, 2024 8:46 pm Data alignment is still my favourite optimisation.

If you align data so you know it will never cross a page (256 byte) boundary you can replace inc hl with inc l and so on. (3 jiffies saved!)

If you align code to a 256 byte boundary you can use just one byte of data to jump to a page aligned routine so a function pointer for half price. (1 byte saved per usage!)

That is not selfmodifying. I use this a lot to check controis.
An exemple out of my current game I am working on...

Code: Select all

keytab  db 	%11111101,1,down mod 256 	; A
	db 	%11111011,1,up mod 256   	; Q
	db 	%11011111,1,right mod 256	; P
	db 	%11011111,2,left mod 256	; O
	db 	%11111110,2,fire mod 256	; Z
	db 	0

jphl	ld 	l,(hl)
	jp 	(hl)

left    dec 	c
	jp 	p,l1
	ld 	c,#3e+3
l1	ld 	a,play0 mod 256
	jr 	setdir

down    ld 	a,b
	rrca
	jr 	c,d1		; odd value always allows down

dwnbot	ld 	hl,0
	call 	tdown
	inc 	hl
	call 	z,tdown
	jp 	nz,dwnret	
d1	inc 	b
	jr 	mtest

up	call 	whatbg		; get current background
	call 	uptest		; check ladders
	jr 	nz,mtest	; no ladders, no up

doup	dec 	b		; go up 1 position
	call 	whatbg		; get higher background
	call 	uptest		; now check open path 
	jr 	z,mtest		; up allowed

undoup	inc 	b		; undo up
fire	jr 	mtest

right   inc 	c
	ld 	a,c
	sub 	#42
	jr 	nz,setdir-2
	ld 	c,a
	ld 	a,play4 mod 256
; add computerplayer 
setdir  ld 	(dir+1),a

and further in the game


	ld 	hl,keytab
keylp	ld 	a,(hl)
	inc 	hl
	in 	a,(254)
	cpl
	and 	(hl)
	inc 	hl
	push	hl
	jp 	nz,jphl		; from keypressed to execute routine
dwnret	pop	hl
mvret	inc 	hl
	ld 	a,(hl)
	or 	a
	jr 	nz,keylp	; all keys tested

ParadigmShifter · Post by **ParadigmShifter** » Thu Jan 04, 2024 9:10 pm

Yeah I know it's not self-modifying but since we're talking about optimisation it deserves to be mentioned.

Data alignment is easy to do...aligning code opens up a whole new area of memory saving. If you only ever jump to a function that has low address 0 you can save that byte in your data everywhere you need it (and only reading 1 byte of data is a win too). Just store the high byte, low byte is always 0!

ketmar · Post by **ketmar** » Thu Jan 04, 2024 9:34 pm

bluespikey wrote: ↑Thu Jan 04, 2024 12:15 pm I read somewhere that Joffa's code was so transcendent that it would modify itself. If the game is in a state where there is no reason that a logical validation would ever return true, then comment the check out rather than let it run, etc. Now thats really hard stuff to stay on top of and I wouldn't know how to do it in a modern language. But was it common tool back in the days of Z80 machine code, or just used by mad geniuses like Joffa? Is there much to be gained from it to make up for the extra complexity of code?

actually, what Joffa did called "JIT compiler" nowdays. ;-) let's take Firefly as an example: there is a compiler, which builds screen blitting code based on the visible map region. then that code is called to draw the map — no more checks, no extra actions, just raw data copying speed. it is faster to build ("jit-compile") blitting code each frame instead of trying to use slower universal blitter.

most other time SMC is used to store variables directly in the code (in "LD" instruction, for example), or to patch some jumps. (that's what my line drawing routine is using, for example).

sorry for being late to the show, and replying before reading all other replies. i guess this all was already written here several times. ;-)

Dr beep · Post by **Dr beep** » Thu Jan 04, 2024 9:43 pm

ParadigmShifter wrote: ↑Thu Jan 04, 2024 9:10 pm Yeah I know it's not self-modifying but since we're talking about optimisation it deserves to be mentioned.

Data alignment is easy to do...aligning code opens up a whole new area of memory saving. If you only ever jump to a function that has low address 0 you can save that byte in your data everywhere you need it (and only reading 1 byte of data is a win too). Just store the high byte, low byte is always 0!

Mainroutine of my ZX81 emulator on the ZX Spectrum (no faster decisiontable to reach 256 routines)

Code: Select all

	ld	a,(de)	; DE is PC, get opcode 
	ld	l,a		; low byte of address now set
	ld	h,b		; B is constant, highbyte of 256 bytes table
	ld	h,(hl)	; get highbyte of routine
	jp	(hl)		; emulate opcode in routine
	
tab	db	op00 / 256		; op00 = nn00
	db	op01 / 256		; op01 = nn01
	...
	db	opff / 256 		; opff = nnff

catmeows · Post by **catmeows** » Sun Jan 21, 2024 10:47 am

The practical examples I see quite often:

1) store SP value directly as argument of LD SP, nnnn

2) update a byte/word value and inject new value back to computation code - RNGs use it quite often

3) compute jump argument for JR - very often it works as replacement for SWITCH CASE statement

4) various precomputations of value that is defacto constant inside loop

Less common practices seen mostly in decryptors and etc:

5) XOR decryptors overwriting itself

Code: Select all

  loop:
  xor (hl)
  ld (hl), a
  dec hl
  jr loop    ;loop is terminated by overwriting JR parameter

6) LDIR/LDDR decryptors overwriting itself

7) copying from circular buffer to screen where series of LDI is modified by JR

8) complete code generators, usually constructing PUSH fillers

Joefish · Post by **Joefish** » Sun Jan 21, 2024 11:35 pm

Yep, that last one is how the scenery blocks in Cobra are rendered. The pixel data for the scrollable tiles of scenery for one line of the screen are loaded into the CPU registers in pairs, then the stack pointer pointed at screen memory. So then by doing PUSH BC, PUSH DE, etc. a row of scenery tiles can be drawn. On the next pixel line of the screen the next row of pixels for each tile are put in the registers, then the same pattern of PUSH instructions are called. That particular pattern of PUSH instructions is laid down just ahead of things by self-modifying code, that looks at the level map and writes out the corresponding PUSH instructions in order. I seem to recall it can also insert LD instructions in amongst the PUSH instructions to redefine some of the blocks mid-line.

Dr beep · Post by **Dr beep** » Mon Jan 22, 2024 6:11 am

catmeows wrote: ↑Sun Jan 21, 2024 10:47 am The practical examples I see quite often:

Less common practices seen mostly in decryptors and etc:

5) XOR decryptors overwriting itself

Test for a different real key pressed

Code: Select all

keyrd:
     ld a,(23560)
keyt cp 0
     jr z,keyrd
     ld (keyt+1),a ; save latest key pressed
[\code]

Morkin · Post by **Morkin** » Mon Jan 22, 2024 8:47 am

catmeows wrote: ↑Sun Jan 21, 2024 10:47 am 5) XOR decryptors overwriting itself
Code: Select all
  loop:
  xor (hl)
  ld (hl), a
  dec hl
  jr loop    ;loop is terminated by overwriting JR parameter

Looks interesting, can you explain that one in a bit more detail? I can't quite get my head around it...

jpnz · Post by **jpnz** » Mon Jan 22, 2024 9:45 am

Self modifying code is great

It can be used to make "things" relocatable - i.e load some code into memory exactly (within reason) where you want it to run and execution will "fixup" itself

A good example is HiSoft Devpac GENS & MONS

A 1981 relocation article from a TRS-80 chap is here (page 47), but is relevant and a good read

uglifruit · Post by **uglifruit** » Mon Jan 22, 2024 10:08 am

NOPs into HALTs

Can be an easy self modify for a slow down/speed up, (waiting for interrupt or not)

catmeows · Post by **catmeows** » Mon Jan 22, 2024 4:54 pm

Morkin wrote: ↑Mon Jan 22, 2024 8:47 am Looks interesting, can you explain that one in a bit more detail? I can't quite get my head around it...

As HL is decreasing, it will rewrite second byte of JR instruction. Since series of multiple xors over chunk are deterministic, you can choose start value in A register such that result of XOR will change parameter of JR to jump outside the loop.

Code: Select all

encryptStart
  ld hl, encrypted-1
  ld b, encryptedCount
  ld a, $FB                       ;initial value for encrypting parameter of JR instruction
encryptLoop
  xor (hl)
  ld (hl), a
  inc hl
  djnz encryptLoop
  ld (decryptSeed+1), a
  ret  

decrypt:
  ld hl, lastEncryptedByte
decryptSeed  
  ld a, 0
decryptLoop
  xor (hl)
  ld (hl), a
  dec hl
  jr encrypted ; $18 $00 after encryption it will be "JR decryptLoop"  $18  $FB
encrypted  
  ;encrypted code follows here

This is case when we want jump right after decrypt loop but it is perfectly possible to choose initial encrypt value such that decrypt loop will jump somewhere else.

Spectrum Computing

How much self modifying code was there and did it do anything?

How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?

Re: How much self modifying code was there and did it do anything?