Euclidean dist/Sqrt/Multiplication routines 16bit

ParadigmShifter · Post by **ParadigmShifter** » Wed Apr 03, 2024 10:45 pm

sn3j wrote: ↑Wed Apr 03, 2024 10:41 pm There's something similar used on Z80 heaven.
The algorithms look quite similar to the division routines, the fastest afaik was some unrolled loop with 330T for a 16 bit integer root with remainder(?).
Problem is that anything that requires multiplication or division isn't going to work on a Z80.

You nearly always get the remainder as a side effect if you implement a division algorithm using long (binary) division (you don't with a lookup though of course).

Which is why many modern PC C/C++ compilers have an intrinsic divmod instruction which returns both. (Also sincos which gets sine and cosine in 1 call too). Optimisers these days are clever and if you use a division and also want the remainder they know about it and will use the single instruction anyway.

sn3j · Post by **sn3j** » Wed Apr 03, 2024 10:49 pm

ParadigmShifter wrote: ↑Wed Apr 03, 2024 10:41 pm Is that total cycles per address? Cos that would be good. I use Spin atm mainly because I am used to it (but there's no documentation any more I can find for it).

Yes. Be sure to DI before measuring, though.

ParadigmShifter wrote: ↑Wed Apr 03, 2024 10:41 pm Working out the exact tile-based metric distance might be faster though as per my last few posts in the A* thread... no multiplies just min and max and a scale by 1/sqrt(2) for the diagonal direction (which you can do using a lookup of floor 1/sqrt(2) anyway, then use the value with more diagonal steps as a tie breaker if 2 distances are equal).

I'm intending to keep both metrics and play a bit around with them.

sn3j · Post by **sn3j** » Wed Apr 03, 2024 10:54 pm

MustardTiger wrote: ↑Wed Apr 03, 2024 10:40 pm A hacky way of getting this distance between two 2d points is this
Code: Select all
determine largest and smallest values from x,y
length = largest + (smallest>>1) gives a worst case 12% error
if smallest >= (largest>>3) then length = largest + (smallest>>1) - (largest>>3) gives worst case ~6% error but mostly <2%

Yes, we were discussing this exact metric in some other thread.
It's Taxicab plus Chebyshev, as proven by ParadigmShifter. Ok, half of it.

ParadigmShifter · Post by **ParadigmShifter** » Wed Apr 03, 2024 10:59 pm

This is the other thread

viewtopic.php?t=11119

I impressed myself near the end with working out how to calculate the tile metric distance exactly, was the most interesting maths I've done for a while (although the Hull-Dobell theorem stuff was ok as well, I didn't have to prove anything for that though, just work through the algorithm). I included diagrams as well! I suppose it helped that Metric Spaces was the subject at uni I did the best in

EDIT: Although that was over 30 years ago now

This is the Hull-Dobell thread but GLFSR is better for ASM really. I was just surprised I hadn't heard about that theorem before.

viewtopic.php?t=11240

ParadigmShifter · Post by **ParadigmShifter** » Wed Apr 03, 2024 11:37 pm

MustardTiger wrote: ↑Wed Apr 03, 2024 10:40 pm A hacky way of getting this distance between two 2d points is this
Code: Select all
determine largest and smallest values from x,y
length = largest + (smallest>>1) gives a worst case 12% error
if smallest >= (largest>>3) then length = largest + (smallest>>1) - (largest>>3) gives worst case ~6% error but mostly <2%

That may be some kind of Newton Raphson approximation I guess? I've not seen that before and some info about it would be good, to see where the worst case approximations occur. EDIT: Probably a Taylor Series approx instead? Probably needs an else though maybe largest + (smallest>>1) - (smallest>>3) or something? That's just a guess though, beer o'clock is here.

Obviously that code can be optimised a lot (obvs an optimising compiler would do that for you), you already worked out largest + (smallest>>1) and you work out (largest>>3) before doing the if.

largest + (smallest>>1) = (Chebyshev + Taxicab)/2 (although it loses the low bit) as I proved (proof wasn't hard). (Ok, I proved largest * 2 + smallest is Chebyshev + Taxicab but you get the idea).

Chebyshev + Taxicab is where you have a cost of moving orthogonally of 2 and diagonally of 3 (like lots of games use e.g. Lords of Midnight).

ParadigmShifter · Post by **ParadigmShifter** » Thu Apr 04, 2024 5:00 pm

Found this which has more terms of the approximation.

https://www.flipcode.com/archives/Fast_ ... ions.shtml

Code: Select all

u32 approx_distance( s32 dx, s32 dy )
{
   u32 min, max;

   if ( dx < 0 ) dx = -dx;
   if ( dy < 0 ) dy = -dy;

   if ( dx < dy )
   {
      min = dx;
      max = dy;
   } else {
      min = dy;
      max = dx;
   }

   // coefficients equivalent to ( 123/128 * max ) and ( 51/128 * min )
   return ((( max << 8 ) + ( max << 3 ) - ( max << 4 ) - ( max << 1 ) +
            ( min << 7 ) - ( min << 5 ) + ( min << 3 ) - ( min << 1 )) >> 8 );
}

Can probably truncate some of the terms as well. No need for << 8 or >> 8 of course if you use a register pair obvs. There's a lot of shifts involved though. Probably faster than 2 multiplies and a sqrt though...

Art · Post by **Art** » Thu Apr 04, 2024 10:10 pm

ParadigmShifter wrote: ↑Thu Apr 04, 2024 5:00 pm Found this which has more terms of the approximation.

https://www.flipcode.com/archives/Fast_ ... ions.shtml
Code: Select all
u32 approx_distance( s32 dx, s32 dy )
{
   u32 min, max;

   if ( dx < 0 ) dx = -dx;
   if ( dy < 0 ) dy = -dy;

   if ( dx < dy )
   {
      min = dx;
      max = dy;
   } else {
      min = dy;
      max = dx;
   }

   // coefficients equivalent to ( 123/128 * max ) and ( 51/128 * min )
   return ((( max << 8 ) + ( max << 3 ) - ( max << 4 ) - ( max << 1 ) +
            ( min << 7 ) - ( min << 5 ) + ( min << 3 ) - ( min << 1 )) >> 8 );
} 
Can probably truncate some of the terms as well. No need for << 8 or >> 8 of course if you use a register pair obvs. There's a lot of shifts involved though. Probably faster than 2 multiplies and a sqrt though...

It looks interesting, but I don't understand it very well. How would it look in the Spectrum Basic? Or in Z80 assembly with 16-bit numbers dx and dy?

ParadigmShifter · Post by **ParadigmShifter** » Thu Apr 04, 2024 10:17 pm

<< is shift left. (Also known as multiplying by 2 for every shift). So max << 8 is max*256 which is same as loading the high byte of a 16 bit register.
>> is shift right. Also known as dividing by 2 for each shift done.

It's all integer maths here so in basic you have to use INT to truncate divisions to make them shift rights.

if and else should be obvious

u32 approx_distance( s32 dx, s32 dy )

is the function signature. It returns a u32 (unsigned 32 bit integer) and has 2 s32 arguments (signed 32 bit integers).

Obviously for Z80 and OP they would use signed 16 bit integers and return an unsigned 16 bit integer.

You really should learn a bit of C it will help a great deal if you want to understand 90% (conservative estimate) of code examples on the internet. It's the lingua franca of programmers really...

EDIT: So this

return ((( max << 8 ) + ( max << 3 ) - ( max << 4 ) - ( max << 1 ) +
( min << 7 ) - ( min << 5 ) + ( min << 3 ) - ( min << 1 )) >> 8 );

is doing (I assume you understand how min and max are worked out)

INT(((max*256 + max*8 - max*16 - max*2) + (min*128 - min*32 + min*8 - min*2))/256).

See website I linked to for an explanation of what it is doing (approximating one octant of a circle by using straight lines approximation), and max and min basically works out which octant of the circle you are in.

Art · Post by **Art** » Thu Apr 04, 2024 10:39 pm

Thank you, I understand. Yes, I didn't know what << meant. Several years ago, I made a 3D vector game in Z80 assembly with coloured non-transparent city buildings, so I needed to sort them according to their distances from the player "camera". It works well on the Spectrum, but I always look for new tricks to make the game faster. But I'm not a programmer, so sometimes I don't know some programmer's symbols.

MustardTiger · Post by **MustardTiger** » Fri Apr 05, 2024 8:14 am

ParadigmShifter wrote: ↑Wed Apr 03, 2024 11:37 pm That may be some kind of Newton Raphson approximation I guess? I've not seen that before and some info about it would be good, to see where the worst case approximations occur. EDIT: Probably a Taylor Series approx instead? Probably needs an else though maybe largest + (smallest>>1) - (smallest>>3) or something? That's just a guess though, beer o'clock is here.

I heard the half smallest thing from some old 8bit guys back in the day. The smallest>>3 was me trying to improve it. I've got some ZX basic for testing it and showing where the errors occur, I'll share it later when I get time.

DoctorRad · Post by **DoctorRad** » Fri Apr 05, 2024 9:59 am

sn3j wrote: ↑Sun Mar 31, 2024 11:26 am For example I'd need it for my A* algorithm as an admissive heuristic function. I did use the tile-distance (Chebyshev+Taxicab) as metric but the results are not good enough. Euclidean dist would be just the right metric.

Could you calculate the square root less often or not at all if you used a different admissible heuristic function? You could presumably use something which is faster to calculate but which meets the criterion of never overestimating the cost of reaching the goal?

sn3j · Post by **sn3j** » Fri Apr 05, 2024 9:02 pm

DoctorRad wrote: ↑Fri Apr 05, 2024 9:59 am Could you calculate the square root less often or not at all if you used a different admissible heuristic function? You could presumably use something which is faster to calculate but which meets the criterion of never overestimating the cost of reaching the goal?

If the estimation is too far off you might end up visiting too many nodes. Let's assume we take |dx| + |dy|, which is easy to compute and mostly > sqrt(x² + y²) and sometimes equal. Then, while looking through neighboring nodes, you might find more candidates than really needed, because your initial estimation is so far off from the true distance. Imagine a lattice of nodes, your start is top-left and your goal is bottom-right. With this metric (|x| + |y|) you'd end up visiting the whole lattice. (If I'm not mistaken.)

We've been talking about the metric n + m * sqrt(2), which measures the cost of an 8-way movement within an isometric grid.
This is quite interesting because in games you mostly move like that.
It doesn't matter if we're talking about a character cell grid or pixel grid. No matter how you're going to move your sprite, you'll find that the cost is well reflected by that metric.
It's an interesting topic and deserves some research. I've not delved very much into it, though. There's some lecture material I scraped from a stack overflow thread's comment, but lost the URL. I'll share it here if I manage to get hold of it.

DoctorRad · Post by **DoctorRad** » Fri Apr 12, 2024 11:14 pm

You need a function which is always < sqrt(x² + y²) though, such as 2(|dx| + |dy|)/3 (I think). That should never overestimate the cost.

ParadigmShifter · Post by **ParadigmShifter** » Fri Apr 12, 2024 11:56 pm

I think that means never overestimate the cost with the metric you are using? You don't have to use Euclidean distance.

I believe it just means you have to respect the triangle inequality (which all metrics do), i.e. if there are 3 points A, B and C then

d(A, C) <= d(A, B) + d(B, C)

i.e. it's never faster to go from A to C via an intermediate node B (but it may be the same speed).

All metrics (of which everything discussed so far are) satisfy the triangle inequality. (It's one of the axioms of a metric space).

https://en.wikipedia.org/wiki/Triangle_ ... tric_space

Your metric (2 * (|dx| + |dy|) / 3) is just a scaling of the Chebyshev metric and scaling a metric by a constant is still a metric.

Spectrum Computing

Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit

Re: Euclidean dist/Sqrt/Multiplication routines 16bit