Multiplying routine in assembler
Posted: Sun Mar 10, 2024 3:44 pm
Here's a quick multiply routine. It utilizes a 256 byte lookup table.
I didn't find anything like this in the database, so I'm sharing it here.
The routine takes two numbers in the range 0..255 and returns the product in hl.
This takes 164 T states.
The algorithm decomposes the two numbers into 4 nibbles (half-bytes) and looks up 4 products according to the combinations (b.h,c.h), (b.h,c.l), (c.h,b.l) and (b.l,c.l).
The result is added up from all parts, meaning b*c = 256*b.h*c.h + 16*(b.h*c.l+c.h*b.l) + b.l*c.l where .h, .l address the respective nibble.
The attached tap file contains a Basic program with the code, the lookup table and a test.
https://drive.google.com/file/d/1xYcPam ... sp=sharing
I didn't find anything like this in the database, so I'm sharing it here.
The routine takes two numbers in the range 0..255 and returns the product in hl.
This takes 164 T states.
The algorithm decomposes the two numbers into 4 nibbles (half-bytes) and looks up 4 products according to the combinations (b.h,c.h), (b.h,c.l), (c.h,b.l) and (b.l,c.l).
The result is added up from all parts, meaning b*c = 256*b.h*c.h + 16*(b.h*c.l+c.h*b.l) + b.l*c.l where .h, .l address the respective nibble.
The attached tap file contains a Basic program with the code, the lookup table and a test.
https://drive.google.com/file/d/1xYcPam ... sp=sharing
Code: Select all
Mulu88 ; multiply, hl := b*c (changes: af, b, de, hl)
--
4 ld a b
16 rlca x4
4 xor c
4 ld d a
7 and 240
4 xor c ; (b.l,c.l)
7 ld h MulTab.hi ; 256-aligned table for xy -> x*y where x,y in 0..15
4 ld l a
7 ld e (hl) ; e = b.l*c.l
4 ld a d
7 and 15
4 xor c ; (c.h,b.h)
4 ld l a
7 ld d (hl) ; de = 256*b.h*c.h + b.l*c.l
4 ld a b
4 xor c
7 and 15
4 xor c ; (c.h,b.l)
4 ld l a
4 ld a b
4 xor c
7 and 240
4 xor c ; (b.h,c.l)
7 ld c (hl) ; c = c.h*b.l
4 ld l a
7 ld a (hl) ; a = b.h*c.l
4 add c ; (C,a) = b.h*c.l + c.h*b.l
16 rla x4
4 ld l a
4 rla
7 and 31
4 ld h a
4 ld a l
7 and 240
4 ld l a ; hl = 16 * (b.h*c.l + c.h*b.l)
11 add hl de ; hl = 256*b.h*c.h + 16*(b.h*c.l+c.h*b.l) + b.l*c.l
ret