unsigned to ascii

The place for codemasters or beginners to talk about programming any language for the Spectrum.
Alcoholics Anonymous
Berk
Posts: 4
Joined: Mon Oct 08, 2018 2:36 am

Re: unsigned to ascii

Post by Alcoholics Anonymous » Mon Oct 08, 2018 2:48 am

dfzx wrote:
Thu Oct 04, 2018 7:07 pm
Z88DK has a utoa() function in the library ("unsigned to ASCII") which is hand coded assembler, but it takes a radix and uses a subroutine to do the division so it can handle big numbers. It's all a bit heavyweight for what I need.
The utoa function actually has optional special cases for bases 2, 8, 10 and 16 which can be enabled in the library configuration. The base 10 special case is enabled by default so you should get a quick base 10 conversion. But it does have a few extras as you say like a paremeter that takes radix and checks that the radix is legal.

You can call the lower level asm library code directly though.

The small implementation of decimal utoa is l_small_utoa and can be called from asm like this:

Code: Select all

EXTERN l_small_utoa
...
call l_small_utoa
You can write your own lightweight c interface to call from c.

The fast implementation is l_fast_utoa and can be called like this:

Code: Select all

EXTERN l_fast_utoa
...
call l_fast_utoa
The difference is it will check if the number is < 256 and do an 8-bit version instead of going through the motions of a full 16-bit conversion.

A nice feature of these functions is that you can ask for leading 0s by setting the carry flag before calling. So "1234" will actually convert to "01234", ie zero padded for 5 decimal digits.
0 x

dfzx
Dizzy
Posts: 92
Joined: Mon Nov 13, 2017 6:55 pm
Location: New Forest, UK
Contact:

Re: unsigned to ascii

Post by dfzx » Mon Oct 08, 2018 6:53 am

Ah, thanks AA, that all makes sense. I saw everything you describe in the Z88DK library sources but either couldn't figure out what it was or couldn't figure out how to use it. :)

I wrote a bunch of test cases over the weekend, timing some of the ideas offered here. I was going to post the results here later today, but now I need to give these things you suggest a try. I don't think anything is going to get near the "store it in a string and rotate the chars" approach for speed, but since that's also the least flexible approach I'll probably end up using something else.
0 x

Alcoholics Anonymous
Berk
Posts: 4
Joined: Mon Oct 08, 2018 2:36 am

Re: unsigned to ascii

Post by Alcoholics Anonymous » Mon Oct 08, 2018 7:10 am

It also appears I am wrong about utoa - the special decimal case is not enabled by default. It's the ascii to number decimal case that is enabled and not the number to ascii.

The clib options live in config_clib.m4 and you can see I've highlighted the option that controls whether the special decimal code is used. It needs to have bit 2 set to enable it. The library has to be re-built for any changes to take effect.
0 x

Metalbrain
Berk
Posts: 21
Joined: Thu Feb 15, 2018 2:14 pm

Re: unsigned to ascii

Post by Metalbrain » Tue Oct 09, 2018 10:27 am

There's a small optimization which saves 1 byte and 3 states for all solutions so far: 10/100 (and -10/-100, and -10+256/-100+256) have the same value for the high byte, so we can load only the lower byte for it. So, "ld e,-10" in my proposal, "ld e,10" in Bizzley's and "ld c,-100+256" in both fast and small z88dk routines.
0 x

dfzx
Dizzy
Posts: 92
Joined: Mon Nov 13, 2017 6:55 pm
Location: New Forest, UK
Contact:

Re: unsigned to ascii

Post by dfzx » Tue Oct 09, 2018 5:06 pm

I had a look at some of these options. My Z80 assembly language isn't much above beginner's level so I stuck with Z88DK/C, which works for me because that's what my game is written in. :)

I implemented the unsigned int to ACSII conversion several times and put each one in a loop 9999 times so I could time it using a stopwatch.

First up, here's the sprintf() version:

Code: Select all

/*
 * zcc +zx -vn -startup=0 -clib=sdcc_iy -SO3 --max-allocs-per-node200000 utoa_char_sprintf.c -o utoa_char_sprintf -create-app --list
 *
 * gcc -o utoa_char_sprintf utoa_char_sprintf.c
 */

#include <stdio.h>

unsigned char output_str[5];

int main()
{
  unsigned int i;

  for( i=9999; i>0; i-- ) {
    sprintf( output_str, "%04u", i );

#ifdef __GNUC__
    printf("%s\n", output_str);
#endif

  }

  return 0;
}
On a 48K Spectrum in Fuse this runs in 24 seconds, which is why I asked the original question! Slow... The Z88DK sprintf %u convertor uses the library's utoa() function under the covers, so replacing sprintf with utoa() results in pretty much the same code and timings.

However, as AA advised, there's a library build time switch which pulls in utoa() code optimised for the base 10 case. Using that brings the testcase run time down to 16 seconds, so that's a pretty useful optimisation. Still slow though. :p

Next up is the ASCII digit rolling technique suggested by @R-Tape and @spectron. Like this:

Code: Select all

/*
 * zcc +zx -vn -startup=0 -clib=sdcc_iy -SO3 --max-allocs-per-node200000 utoa_char_digits.c -o utoa_char_digits -create-app --list
 *
 * gcc -o utoc_char_digits utoa_char_digits.c
 */

#include <stdio.h>

unsigned char output_str[5] = "9999";

int main()
{

  while( output_str[0] != '0' || output_str[1] != '0' || output_str[2] != '0' || output_str[3] != '1' ) {
    if( --output_str[3] == 0x2F ) {
      output_str[3] = '9';
      if( --output_str[2] == 0x2F ) {
	output_str[2] = '9';
	if( --output_str[1] == 0x2F ) {
	  output_str[1] = '9';
	  if( --output_str[0] == 0x2F ) {
	    output_str[0] = '9';
	  }
	}
      }
    }

#ifdef __GNUC__
    printf("%s\n", output_str);
#endif

  }

  return 0;
}
This is faster still. Even on the 48K Spectrum this is almost too fast to time. It takes a bit over a second. The code compiles to this:

Code: Select all

 _main:
 l_main_00112:
 	ld	bc,_output_str+0
 	ld	a, (bc)
 	ld	d, a
 	ld	hl,+(_output_str + 0x0003)
 	ld	e, (hl)
 	ld	a, d
 	sub	a,0x30
 	jr	NZ,l_main_00113
 	ld	a,(_output_str + 0x0001)
 	sub	a,0x30
 	jr	NZ,l_main_00113
 	ld	a,(_output_str + 0x0002)
 	sub	a,0x30
 	jr	NZ,l_main_00113
 	ld	a, e
 	sub	a,0x31
 	jr	Z,l_main_00114
 l_main_00113:
 	dec	e
 	ld	hl, +(_output_str + 0x0003)
 	ld	(hl), e
 	ld	a, e
 	sub	a,0x2f
 	jr	NZ,l_main_00112
 	ld	(hl),0x39
 	ld	a,(_output_str + 0x0002)
 	add	a,0xff
 	ld	hl, +(_output_str + 0x0002)
   	ld	(hl), a
 	sub	a,0x2f
 	jr	NZ,l_main_00112
 	ld	(hl),0x39
 	ld	a,(_output_str + 0x0001)
 	add	a,0xff
 	ld	hl, +(_output_str + 0x0001)
 	ld	(hl), a
 	sub	a,0x2f
 	jr	NZ,l_main_00112
 	ld	(hl),0x39
 	ld	a, (bc)
 	add	a,0xff
 	ld	(bc), a
 	sub	a,0x2f
 	jr	NZ,l_main_00112
 	ld	a,0x39
 	ld	(bc), a
 	jr	l_main_00112
 l_main_00114:
 	ld	hl,0x0000
 	ret
I can follow that, just about. I don't quite get why it uses an 'add a,0xff' to do the decrement (surely 'dec a' would be the obvious choice?) but the rest of it makes sense.

The problem with this approach is that it's not very flexible. If I want to decrement the counter by, say, 10, I'd have the run it in a loop that many times. If I want to do any even moderate maths on the value, like add a bonus, then it gets a bit tricky. But it's fast though. :)

Here's the other option I played with, taking the divide-by-powers-of-10 approach advocated by @Metalbrain:

Code: Select all

/*
 * zcc +zx -vn -startup=0 -clib=sdcc_iy -SO3 --max-allocs-per-node200000 utoa_pow10_divs.c -o utoa_pow10_divs -create-app --list
 *
 * gcc -o utoa_pow10_divs utoa_pow10_divs.c
 */

#include <stdio.h>
#include <string.h>

unsigned char output_str[5];

int main()
{
  unsigned int i = 9999;

  while( i > 0 ) {
    unsigned int tmp = i;
    memcpy( output_str, "0000", 4 );

    while( tmp>=1000 ) {
      output_str[0]++;
      tmp -= 1000;
    }

    while( tmp>=100 ) {
      output_str[1]++;
      tmp -= 100;
    }

    while( tmp>=10 ) {
      output_str[2]++;
      tmp -= 10;
    }

    output_str[3]+=tmp;

#ifdef __GNUC__
    printf("%s\n", output_str);
#endif

    i--;
  }

  return 0;
}
This loop counts down in about 5 seconds, so it's still way faster than the sprintf and keeps the value in a scalar so it's easy to manipulate.

The code compiles to this:

Code: Select all

_main:
	ld	bc,0x270f
l_main_00110:
	ld	a, b
	or	a,c
	jr	Z,l_main_00112
	push	bc
	ld	de,_output_str
	ld	bc,0x0004
	ld	hl,___str_0
	ldir
	pop	bc
	ld	e, c
	ld	d, b
l_main_00101:
	ld	a, e
	sub	a,0xe8
	ld	a, d
	sbc	a,0x03
	jr	C,l_main_00119
	ld	a,(_output_str)
	inc	a
	ld	(_output_str),a
	ld	hl,0xfc18
	add	hl,de
	ex	de,hl
	jr	l_main_00101
l_main_00119:
l_main_00104:
	ld	a, e
	sub	a,0x64
	ld	a, d
	sbc	a,0x00
	jr	C,l_main_00121
	ld	hl,_output_str + 1
	inc	(hl)
	ld	hl,0xff9c
	add	hl,de
	ex	de,hl
	jr	l_main_00104
l_main_00121:
l_main_00107:
	ld	a, e
	sub	a,0x0a
	ld	a, d
	sbc	a,0x00
	jr	C,l_main_00109
	ld	hl,_output_str + 2
	inc	(hl)
	ld	hl,0xfff6
	add	hl,de
	ex	de,hl
	jr	l_main_00107
l_main_00109:
	ld	a,(_output_str + 0x0003)
	add	a, e
	ld	((_output_str + 0x0003)),a
	dec	bc
	jr	l_main_00110
l_main_00112:
	ld	hl,0x0000
	ret
I'm a bit out of my depth with this, but I get the idea.

Finally I tried switching to the small_utoa assembly language function described by AA, which is used internally in Z88DK. This does something similar to the tight powers-of-10 dividing code posted by @Bizzley:

Code: Select all

/*
 * zcc +zx -vn -startup=0 -clib=sdcc_iy -SO3 --max-allocs-per-node200000 small_utoa_main.c small_utoa.asm -o small_utoa -create-app --list
 *
 */

#include <stdio.h>
#include <string.h>

unsigned char output_str[5];

char* small_utoa( unsigned int, char* );

int main()
{
  unsigned int i;

  for( i=9999; i>0; i-- ) {
    small_utoa( i, output_str );
  }

  return 0;
}
It needs this (not very efficient because I'm still new to this) interface code to externalise what is normally an internal function:

Code: Select all

; char *small_utoa(unsigned int num, char *buf)

SECTION code_stdlib

PUBLIC _small_utoa

EXTERN l_small_utoa

_small_utoa:

   pop af
   pop hl
   pop de
   pop bc
   
   push bc
   push de
   push hl
   push af
   
   call l_small_utoa

   ex de,hl
   ld (hl),0
   ret
That takes about 3.5 seconds to do what is basically the same as my C code, only with hand coded ASM.

I'd like to try the BCD option but I don't I understand it enough to try. :)

So, bottom line: I knew the sprintf() would be slow which is why I went through this exercise. The divide-by-powers-of-10 approach is much faster, and the hand coded assembly language version internal to Z88DK is significantly faster than my C implementation, so that's the code to use for maximum flexibility. However, the way my game currently works I think I can use the digit rolling code. :)
0 x

Bizzley
Berk
Posts: 35
Joined: Thu Nov 16, 2017 10:47 am

Re: unsigned to ascii

Post by Bizzley » Tue Oct 09, 2018 6:18 pm

Here's a very simple and small z80 routine to add two five digit numbers held as ASCII strings together using string addition rather than 16bit conversion. It's generalised in that it will add any five digit "number" to another with a resultant less than 99999 but can be optimized by reducing the main loop for powers of ten that aren't used. e.g. if you are only going to be adding numbers between 1 and 999 to the Total then you can set the length of your strings to three and change the value in the ld bc,nn statement to this number. If you are only going to be using a limited set of numbers, such as multiples of 100 up to 9900 with no other digits, then you can do similar by changing the start position of the pointer into the Total string. You could optimise it for speed a little by presetting some of the values used (e.g. ld de,106*256+68) and using sub d and add e instead but I've left it as is so it's a bit clearer.

Code: Select all

;enter with HL pointing to the string holding the value in ASCII.
	ld hl,addstring
	call calcstring
;
calcstring	ld bc,5
	add hl,bc
	ld b,c
	ld ix,totalstring+5
calcstring1	dec ix
	dec hl
	ld a,(hl)
	add (ix+0)
	sub 106
	jr c,calcstring2
	sub 10
	inc (ix-1)
calcstring2	add 58
	ld (ix+0),a
	djnz calcstring1
	ret
;
addstring	dm "000000"
totalstring dm "000000"
1 x

Bizzley
Berk
Posts: 35
Joined: Thu Nov 16, 2017 10:47 am

Re: unsigned to ascii

Post by Bizzley » Wed Oct 10, 2018 2:22 am

Here's a 5 byte smaller and slightly faster version of the above, done by replacing IX with DE. I've also fixed the bug with the wrong length of the strings.

Code: Select all

calcstring	ld bc,5
	add hl,bc
	ld b,c
	ld de,totalstring+5
	ex de,hl
calcstring1	dec hl
	dec de
	ld a,(de)
	add (hl)
	sub 106
	jr c,calcstring2
	sub 10
	dec hl
	inc (hl)
	inc hl
calcstring2	add 58
	ld (hl),a
	djnz calcstring1
	ret
;
addstring	dm "00000"
totalstring	dm "00000"
0 x

Alcoholics Anonymous
Berk
Posts: 4
Joined: Mon Oct 08, 2018 2:36 am

Re: unsigned to ascii

Post by Alcoholics Anonymous » Wed Oct 10, 2018 4:18 pm

dfzx wrote:
Tue Oct 09, 2018 5:06 pm
However, as AA advised, there's a library build time switch which pulls in utoa() code optimised for the base 10 case. Using that brings the testcase run time down to 16 seconds, so that's a pretty useful optimisation. Still slow though. :p
sprintf has to deal with a lot of things. Even on the surface sprintf( output_str, "%04u", i ) is being asked to output the integer in a field width of four characters that will have leading 0 padding. Underneath, sprintf is using the printf engine with a string "device driver". The printf engine has to be able to write its output to any device including disk drives, printers, serial devices etc, so it must first generate portions of the output into a buffer (using utoa into a buffer on the stack for %u), then apply formatting as it sends output to the device using a couple of primitives (put character optionally repeated and put buffer). Then the string driver, in the sprintf case, carries out these operations with an ldir equivalent but it must keep track of the current string print position and count how many characters in total are output (the return value of the printf family tells you how many chars where sent in the course of printing). So it's a long road from sprintf to string. sprintf is fantastic for complicated output that mixes all sorts of things into the string but if you want just a 5-digit score in a fixed buffer of 5 chars, it's the slowboat :) That's where it's better to use direct functions like utoa or lower level things either from the library or what you've written.
I can follow that, just about. I don't quite get why it uses an 'add a,0xff' to do the decrement (surely 'dec a' would be the obvious choice?) but the rest of it makes sense.
We can't optimize it out as we currently have no way in the compiler to find out of the carry flag is important in following instructions. At one time I tried to optimize some of these sorts of things out for 16-bit numbers only to find out it broke 32-bit math because the 32-bit math was relying on the carry flag to borrow from the high word.

Code: Select all

; char *small_utoa(unsigned int num, char *buf)
You can change this to fastcall linkage if you also place the buffer at a fixed place in memory like the other examples here are doing:

Code: Select all


; extern void small_utoa(unsigned int num) __z88dk_fastcall;

SECTION code_user

PUBLIC _small_utoa

EXTERN l_small_utoa

_small_utoa:

   ; hl = num
   
   ; destination buffer address
   
   ld de,_score_buffer
   
   ; carry set for leading zeroes
   
   scf
   call l_small_utoa

   ; zero terminate (maybe not needed)
   
   xor a
   ld (de),a
   
   ret


; extern char score_buffer[5];

SECTION bss_user
PUBLIC _score_buffer

_score_buffer:  defs 6
I separated code from vars - it's good practice, even though not necessary for the zx if creating programs that load into ram. The proper assignment of stuff to sections allows the toolchain to generate romable code, which is something that could happen on the zx if you ever decided to make an if2 cartridge, eg.

But yeah the fastest will be adding in ascii as shown in the posts above.
1 x

Alcoholics Anonymous
Berk
Posts: 4
Joined: Mon Oct 08, 2018 2:36 am

Re: unsigned to ascii

Post by Alcoholics Anonymous » Wed Oct 10, 2018 4:56 pm

Metalbrain wrote:
Tue Oct 09, 2018 10:27 am
There's a small optimization which saves 1 byte and 3 states for all solutions so far ... "ld c,-100+256" in both fast and small z88dk routines.
I'll put that into my next commit.

More importantly there is a bug in the 32-bit version - it's missing the hundreds case. LDIRing those constants onto the stack is probably better too.
0 x

Post Reply