Pseudorandom number generator

ZedaZ80 · Post by **ZedaZ80** » Sun Mar 21, 2021 1:29 am

Hi!

I found out about this site when I saw someone had linked to a project of mine. I come from the world of Texas Instruments Z80-based calculators (TI-83+/84+ in particular). I love optimizing code and I mostly make math routines and parsers, though I occasionally dabble in 2-D graphics.

I think that of all the routines I made last year, this is the one I am most proud of, an 8-bit PRNG that occasionally moonlights as a true RNG:

Code: Select all

;This code snippet is 9 bytes and 43cc
;Inputs:
;   HL is the input seed and must be non-zero
;Outputs:
;   A is the 8-bit pseudo-random number
;   HL is the new seed value (will be non-zero)


                  ;opcode cc
    add hl,hl     ; 29    11
    sbc a,a       ; 9F     4
    and %00101101 ; E62D   7
    xor l         ; AD     4
    ld l,a        ; 6F     4
    ld a,r        ; ED5F   9
    add a,h       ; 84     4


;-------------------------------------------------------------------------------
;Technical details:
;   The concept behind this routine is to combine an LFSR (poor RNG) with a
; counter. The counter improves the RNG quality, while also extending the period
; length.
;   For this routine, I took advantage of the Z80's built-in counter, the `r`
; register. This means that we don't need to store the counter anywhere, and it
; is pretty fast to access!
;   Some caveats:
;     * r is a 7-bit counter
;     * r will increment some number of times between runs of the RNG. In most
;       cases, this will be constant, but if it increments an even number each
;       time, then the bottom bit is always the same, weakening the effect of
;       the counter. In the worst case, it increments a multiple of 128 times,
;       effectively making your RNG just as good/bad as the LFSR. Ideally, you
;       want `r` to increment an odd number of times between runs.
;     * In the best case, the bottom 7 bits have 50/50 chance of being 0 or 1.
;       The top bit is 1 with probability 1/2 + 1/(2^17-2) ~ .5000076295
;     * In the event that your main loop waits for user input between calls,
;       then congatulations, you might have a True RNG :)
;-------------------------------------------------------------------------------

It was created for a very speed-critical program: noise for a 5-channel sound engine on the TI-83+/84+ -- note that we don't have any sound hardware, but we have a link port that we can toggle the lines HIGH and LOW!

The routine is surprisingly good and obscenely fast.

If you are into truly obscene code, this square root routine will do it for you:

Code: Select all

; fast 16-bit isqrt by Zeda Thomas
;Feel free to use for whatever :)

sqrtHL:
;Input: HL
;Output: A is the integer square root of HL
;Destroys: HL,DE (D is actually 0)
;min: 343cc
;max: 380cc
;avg: 361.5cc
;88 bytes
  ld de,05040h  ; 10
  ld a,h        ; 4
  sub e         ; 4
  jr nc,sq7     ;\
  add a,e       ; | branch 1: 12cc
  ld d,16       ; | branch 2: 18cc
sq7:            ;/

; ----------

  cp d          ; 4
  jr c,sq6      ;\
  sub d         ; | branch 1: 12cc
  set 5,d       ; | branch 2: 19cc
sq6:            ;/

; ----------
  res 4,d       ; 8
  srl d         ; 8
  set 2,d       ; 8
  cp d          ; 4
  jr c,sq5      ;\
  sub d         ; | branch 1: 12cc
  set 3,d       ; | branch 2: 19cc
sq5:            ;/
  srl d         ; 8

; ----------

  inc a         ; 4
  sub d         ; 4
  jr nc,sq4     ;\
  dec d         ; | branch 1: 12cc
  add a,d       ; | branch 2: 19cc
  dec d         ; | <-- this resets the low bit of D, so `srl d` resets carry.
sq4:            ;/
  srl d         ; 8
  ld h,a        ; 4

; ----------

  ld a,e        ; 4
  sbc hl,de     ; 15
  jr nc,sq3     ;\
  add hl,de     ; | 12cc or 18cc
sq3:            ;/
  ccf           ; 4
  rra           ; 4
  srl d         ; 8
  rra           ; 4

; ----------

  ld e,a        ; 4
  sbc hl,de     ; 15
  jr c,sq2      ;\
  or 20h        ; | branch 1: 23cc
  db 254        ; |   <-- start of `cp *` which is 7cc to skip the next byte.
sq2:            ; | branch 2: 21cc
  add hl,de     ;/

  xor 18h       ; 7
  srl d         ; 8
  rra           ; 4

; ----------

  ld e,a        ; 4
  sbc hl,de     ; 15
  jr c,sq1      ;\
  or 8          ; | branch 1: 23cc
  db 254        ; |   <-- start of `cp *` which is 7cc to skip the next byte.
sq1:            ; | branch 2: 21cc
  add hl,de     ;/

  xor 6         ; 7
  srl d         ; 8
  rra           ; 4

; ----------

  ld e,a        ; 4
  sbc hl,de     ; 15

;This code would restore the square root
;   jr nc,sq0     ;\
;   add hl,de     ; | 12cc or 18cc
; sq0:            ;/

  sbc a,255     ; 7
  srl d         ; 8
  rra           ; 4
  ret           ; 10

This was inspired after I saw John Metcalf's implementation here. I thought it would be impossible to optimize further, but I changed the approach and managed to eke out a few more bytes and cycles.

Some of my projects on Github are:
Z80 Optimized Routines - These are optimized routines that I've either made or gathered (with permission). If you have some useful routines or optimizations that you'd like to contribute, please do!
z80float - This is a library of floating point routines for the Z80. There are 24-bit floats that are great for the Z80 (operands can be passed via registers), as well as two 32-bit float formats (7 digits), and an 80-bit format (19 digits). Each variant has addition, subtraction, multiplication, division, square roots, logarithms, exponentials (i.e. e^x, x^y, etc), cosine/sine/tangent, arccosine/arcsine/arctangent, hyperbolic cos/sin/tan, hyperbolic arccos/arcsin/arctan, and probably a few others that I missed. Please bug me about any issues!

Einar Saukas · Post by **Einar Saukas** » Sun Mar 21, 2021 2:55 am

Welcome to the forum

Would you mind if I ask a moderator to move your post to a new thread? I would like to comment about your post (probably other people too) but I don't want to hijack the "introduce yourself" topic!

ZedaZ80 · Post by **ZedaZ80** » Sun Mar 21, 2021 2:08 pm

Sure, that works for me!

R-Tape · Post by **R-Tape** » Sun Mar 21, 2021 2:25 pm

Posts being moved here. Nothing to see yet.

EDIT - New thread. I'll be interested to see how this performs compared to [mention]Patrik Rak[/mention]'s RNG. Welcome [mention]ZedaZ80[/mention]!

1024MAK · Post by **1024MAK** » Sun Mar 21, 2021 2:26 pm

Will they appear at random

utz · Post by **utz** » Sun Mar 21, 2021 2:59 pm

Wooooo Xeda! Good to see you're still going strong on TI and Z80.
I'm curious about that 5 channel sound routine, do you have a link to that?

If you just want a cheapo PRNG to generate noise, you can go even faster:

Code: Select all

  ld de,$2157

next_num
  add hl,de
  rlc h
  inc h     ; optional, improves quality
  ld a,h

Though your PRNG looks like it'll definitely give better quality, and of course it only needs one register pair. Think I might nick that for a sound routine. Btw,

ZedaZ80 wrote: ↑Sun Mar 21, 2021 1:29 am note that we don't have any sound hardware, but we have a link port that we can toggle the lines HIGH and LOW!

It works exactly the same on Spectrum 48K, except it's only one output line instead of two.

ZedaZ80 · Post by **ZedaZ80** » Sun Mar 21, 2021 4:06 pm

utz!

I don't know if they've posted the code yet, but I found this post related to the project (by fghsgh "fuggy"): https://www.cemetech.net/forum/viewtopi ... 32002d9239

Here is a link to Youtube video (the video itself is an oscilloscope output, and the sound is from the calc): https://www.youtube.com/watch?v=iphQzQZEgK8

As for the "noise generator" code I posted, this is what it looks like when I use it to fill the screen:

The average cycle length is only about 5592320 if I recall correctly (65535 * 128 * 2/3), so you'll eventually start to see cycles.

Einar Saukas · Post by **Einar Saukas** » Sun Mar 21, 2021 5:23 pm

ZedaZ80 wrote: ↑Sun Mar 21, 2021 1:29 amSome of my projects on Github are:

These pages are excellent!

What makes your routines really useful is that, besides being very well implemented, they are also very well organized and documented. Each of them contains a proper description, parameters ("inputs"), results ("outputs"), affected registers ("destroys"), size, time, and additional relevant information when needed (such as your comments about the R register period). This is awesome! Assembly developers always neglect the importance of this kind of information, actually I don't even recall anybody else that does it except myself

I have a few suggestions to improve "Z80-Optimized-Routines", I hope you don't mind:

To make your page even more useful, you could turn it into a central reference for assembly optimization. At the end of your README file, you could add a section "Links" containing links to optimized routines (for instance this and this) and useful pages about generic optimizations (for instance this).

Although most of your routines are very well documented, some of them aren't. For instance I don't think the purpose of this routine will be obvious to most people, this could be easily fixed with a short explanation in one sentence. And in this other routine it's the opposite, its purpose is obvious but it would be nice to have the same documentation quality here too.

IMHO it's better to provide minimalist versions of each routine whenever possible. For instance in this routine, instead of PUSH/POP all registers, it would be better to simply document they are all affected, and let users only save/restore the ones they need. It's a no-brainer to add PUSH/POP to an existing routine with a single RET, but it takes more work to remove PUSH/POP from an existing routine since you need to examine it first to ensure there will be no side-effects.

Anyway these are all cosmetic changes. The content itself looks perfect, I didn't notice anything to suggest differently. Good work!

Einar Saukas · Post by **Einar Saukas** » Sun Mar 21, 2021 5:58 pm

Actually I have a suggestion about code content: this routine could be split into 2 versions:

One routine called "cpHL_DE_shorter"

Code: Select all

    or a
    sbc hl,de
    add hl,de

Another routine called "cpHL_DE_faster"

Code: Select all

    ld a,h
    cp d
    jr nz,result
    ld a,l
    cp e
result:

ZedaZ80 · Post by **ZedaZ80** » Mon Mar 22, 2021 12:26 am

R-Tape wrote: ↑Sun Mar 21, 2021 2:25 pm EDIT - New thread. I'll be interested to see how this performs compared to @Patrik Rak's RNG. Welcome @ZedaZ80!

One of my go-to ways of assessing the quality of an RNG is by playing the Chaos Game. You give it a set of points, and each iteration, it picks one of those points at random and jumps halfway to it. When you have three points, it makes a Sierpinski's Triangle (albeit skewed, depending on where the three points are). With 4 points that form a rectangle, a quality RNG will fill it. In any event, bias in the RNG can often be seen visually. For example, here is my 8-bit RNG after a number of iterations (it wasn't plotting any new points):

That's not great (still far better than an LCG or LFSR).

Here is Patrik Rak's

I set the bottom-right corner as fourth point, and the upper-left as the first point. This RNG is causing the Chaos Game to (noticeably) favor the bottom right, so the RNG is biased. In this case, it means the upper 3 bits of the result are noticeably more often %11 and noticeably less often %00.

Then there is John Metcalf's 16-bit xorshift that is fantastic:

Technically, there is a bias against the upper-left, but it is a very tiny bias and it isn't noticeable. The only reason that I know it has that bias is because the RNG generates numbers 1-65535 with equal probability, and never generates a 0, so the upper-left corner is chosen 16383/65535 times, and the other corners are chosen 16384/65535 times each, which means the upper left is chosen about 0.0061% fewer times.

John's will eventually fill the screen and I highly recommend using it. John's also has the advantage of a quality 16-bit output as well as a quality 8-bit output. It does have a few issues, like the low cycle length (65535 versus mine's average of 5592320 and Patrik's 4294967295). That can be mitigated by pairing it with a counter, either explicitly or like the way mine uses the R register.

There is also the consideration of size and speed. In my case, it needed to fit within 52 clock cycles as that was all that could be spared, but in most cases, 100-200 clock cycles is reasonable. Not counting the RET:

Code: Select all

Zeda    9 bytes, 43cc
John    20 bytes, 86 cycles
Patrik  28 bytes, 122cc

So where you need noise and extreme speed, I'd advise using mine, but where you need speed and high quality, I'd go with John's. If you want to extend Johns' cycle length at cheap cost, just append to the end:

Code: Select all

  ld a,r
  add a,h
  ;ld h,a ; uncomment if you plan to use HL

That adds 13cc/3 bytes (still under 100cc!), but extends the cycle length to about 5592320 on average.

Einar Saukas wrote: ↑Sun Mar 21, 2021 5:23 pm
ZedaZ80 wrote: ↑Sun Mar 21, 2021 1:29 amSome of my projects on Github are:
These pages are excellent!

What makes your routines really useful is that, besides being very well implemented, they are also very well organized and documented. Each of them contains a proper description, parameters ("inputs"), results ("outputs"), affected registers ("destroys"), size, time, and additional relevant information when needed (such as your comments about the R register period). This is awesome! Assembly developers always neglect the importance of this kind of information, actually I don't even recall anybody else that does it except myself

Thanks ^_^ When I first started Z80 programming, I did it all in hexadecimal. When I finally got a computer and could learn the mnemonics and actually write legible source code, I was referencing a document (for my calculator) that had the input/output/destroys/[notes] syntax and that has stuck with me.

Einar Saukas wrote: ↑Sun Mar 21, 2021 5:23 pm I have a few suggestions to improve "Z80-Optimized-Routines", I hope you don't mind:

Yes please!

Einar Saukas wrote: ↑Sun Mar 21, 2021 5:23 pm
To make your page even more useful, you could turn it into a central reference for assembly optimization. At the end of your README file, you could add a section "Links" containing links to optimized routines (for instance this and this) and useful pages about generic optimizations (for instance this).

Thanks, I'll do that

That's a good idea, kind of like what DW0RKiN's 16-bit Floating Point Library does (which goes into great detail and analysis that I haven't done for mine).

Einar Saukas wrote: ↑Sun Mar 21, 2021 5:23 pm
Although most of your routines are very well documented, some of them aren't. For instance I don't think the purpose of this routine will be obvious to most people, this could be easily fixed with a short explanation in one sentence. And in this other routine it's the opposite, its purpose is obvious but it would be nice to have the same documentation quality here too.

I've updated those two (and added your faster variants).

Einar Saukas wrote: ↑Sun Mar 21, 2021 5:23 pm
IMHO it's better to provide minimalist versions of each routine whenever possible. For instance in this routine, instead of PUSH/POP all registers, it would be better to simply document they are all affected, and let users only save/restore the ones they need. It's a no-brainer to add PUSH/POP to an existing routine with a single RET, but it takes more work to remove PUSH/POP from an existing routine since you need to examine it first to ensure there will be no side-effects.

That's a good point. Routines that are wrapped like that were pulled from some of my other projects (where preserving variables for general use was expected). It'll take some work to implement those changes, but I'll put it on a to-do list.

Einar Saukas · Post by **Einar Saukas** » Mon Mar 22, 2021 6:29 pm

I also noticed some routines are duplicated. For instance there's "mul8.z80" and "mul8_faster_smaller.z80". I see no reason to keep the slower bigger version

ZedaZ80 · Post by **ZedaZ80** » Mon Mar 22, 2021 10:12 pm

Einar Saukas wrote: ↑Mon Mar 22, 2021 6:29 pm I also noticed some routines are duplicated. For instance there's "mul8.z80" and "mul8_faster_smaller.z80". I see no reason to keep the slower bigger version

In that case, the smaller/faster version destroys A in order to save a byte and 3cc or 4cc (3.5cc on average). A is a valuable register, so the "mul8_faster_smaller" will cost more if you need to preserve A.

Einar Saukas · Post by **Einar Saukas** » Mon Mar 22, 2021 10:22 pm

Good point!

Patrik Rak · Post by **Patrik Rak** » Tue Mar 23, 2021 2:01 pm

ZedaZ80 wrote: ↑Mon Mar 22, 2021 12:26 am Here is Patrik Rak's

I set the bottom-right corner as fourth point, and the upper-left as the first point. This RNG is causing the Chaos Game to (noticeably) favor the bottom right, so the RNG is biased. In this case, it means the upper 3 bits of the result are noticeably more often %11 and noticeably less often %00.

Interesting - would you mind sharing the code for the test? I find so much bias surprising... Did you mean top 2 bits instead? My only explanation would be the longer period takes longer to "wrap over" to bias the other bits, given the John's is the very same XS algorithm, just with shorter period...

Also, would you mind testing the CMWC one? Thanks.

Patrik

ZedaZ80 · Post by **ZedaZ80** » Tue Mar 23, 2021 9:45 pm

Patrik Rak wrote: ↑Tue Mar 23, 2021 2:01 pm
ZedaZ80 wrote: ↑Mon Mar 22, 2021 12:26 am Here is Patrik Rak's

I set the bottom-right corner as fourth point, and the upper-left as the first point. This RNG is causing the Chaos Game to (noticeably) favor the bottom right, so the RNG is biased. In this case, it means the upper 3 bits of the result are noticeably more often %11 and noticeably less often %00.
Interesting - would you mind sharing the code for the test? I find so much bias surprising... Did you mean top 2 bits instead? My only explanation would be the longer period takes longer to "wrap over" to bias the other bits, given the John's is the very same XS algorithm, just with shorter period...

Also, would you mind testing the CMWC one? Thanks.

Patrik

Oh boy, I found my mistake -__- Your xorshift destroyed DE and I was using DE for pixel coordinates. With it fixed, it looks much better:

Sorry!

As for that CMCW one, that one is just plain snazzy, and it performs well:

EDIT: For clarification, I stopped these at a point where any patterns would be obvious; these do fill the screen up if I let them run

Also, I'm not sure how to edit my other post with this update :/

For the source code, here is an agnostic version with the 4 rngs

Code: Select all

  ld de,(points_LUT)  ; Initialize the starting point's coordinates

main_loop:
  push de         ; save the current coords

  ; Iterate the PRNG, return in A register
  ; call cmwc       ; Patrik's cmwc
  call rnd        ; Patrik's xorshift
  ; call rand_zeda  ; Zeda's fast RNG
  ; call xrnd       ; John's xorshift

  pop de         ; restore current coords
  call calc_plot  ; Use the random number to calculate where to plot the point
  call check_exit ; Return z flag set if we need to exit.
  jr nz,main_loop
  ret


calc_plot:
; Choose from 4 points to jump halfway to

  ; Now we need to put 2 random bits in the bottom bits of A (to choose 0-3),
  ; then multiply by 2 to get the index into the LUT
  rlca
  rlca
  rlca
  and %00000110
  add a,points_LUT&255  ; Add a to the LSB of the LUT pointer
  ld c,a
  adc a,points_LUT>>8   ; now we'll continue by adding with carry to the MSB
  sub c                 ; but note that we are off by +C, so subtract C
  ld b,a

  ; Now read in the coordinates and jump halfway
  ld a,(bc)
  add a,e
  rra
  ld e,a
  inc bc
  ld a,(bc)
  add a,d
  rra
  ld d,a

  jp plot   ; must preserve HL and DE!

xrnd:
;By John Metcalf:
  ld hl,1       ; seed must not be 0
  ld a,h
  rra
  ld a,l
  rra
  xor h
  ld h,a
  ld a,l
  rra
  ld a,h
  rra
  xor l
  ld l,a
  xor h
  ld h,a
  ld (xrnd+1),hl
  ret

rand_zeda:
  ld hl,12345
  add hl,hl     ; 29    11
  sbc a,a       ; 9F     4
  and %00101101 ; E62D   7
  xor l         ; AD     4
  ld l,a        ; 6F     4
  ld a,r        ; ED5F   9
  add a,h       ; 84     4
  ld (rand_zeda+1),hl
  ret

rnd:
  ld  hl,$A280   ; yw -> zt
  ld  de,$C0DE   ; xz -> yw
  ld  (rnd+4),hl  ; x = y, z = w
  ld  a,l         ; w = w ^ ( w << 3 )
  add a,a
  add a,a
  add a,a
  xor l
  ld  l,a
  ld  a,d         ; t = x ^ (x << 1)
  add a,a
  xor d
  ld  h,a
  rra             ; t = t ^ (t >> 1) ^ w
  xor h
  xor l
  ld  h,e         ; y = z
  ld  l,a         ; w = t
  ld  (rnd+1),hl
  ret



cmwc:   ld   hl,table

        ld   a,(hl) ; i = ( i & 7 ) + 1
        and  7
        inc  a
        ld   (hl),a

        inc  l      ; hl = &cy

        ld   d,h    ; de = &q[i]
        add  a,l
        ld   e,a

        ld   a,(de) ; y = q[i]
        ld   b,a
        ld   c,a
        ld   a,(hl) ; ba = 256 * y + cy

        sub  c      ; ba = 255 * y + cy
        jr   nc,$+3
        dec  b

        sub  c      ; ba = 254 * y + cy
        jr   nc,$+3
        dec  b

        sub  c      ; ba = 253 * y + cy
        jr   nc,$+3
        dec  b

        ld   (hl),b ; cy = ba >> 8, x = ba & 255
        cpl         ; x = (b-1) - x = -x - 1 = ~x + 1 - 1 = ~x
        ld   (de),a ; q[i] = x

        ret

table:   db   0,0,82,97,120,111,102,116,20,15

        if (table/256)-((table+9)/256)
                error "whole table must be within single 256 byte block"
        endif


points_LUT:
  db 0,0
  db 64,0
  db 0,96
  db 64,96

You'll need to add your own check_exit routine (on my calc, it checks if the [ON] key is pressed) and a plot routine that takes coordinates in (D,E) and plots a point.
EDIT: You might also want to update the points_LUT. which are (X,Y) or (Y,X) pairs. It doesn't matter which, as long as you are consistent with those points and the plot routine.

Full code for the TI-83+/84+

This code works with the spasm assembler and is intended for the monochrome TI-83+/84+/SE variants.

Code: Select all

.db $BB,$6D
.org $9D95

; clear the graphics buffer
  ld hl,9340h
  xor a
  ld b,a
loop:
  ld (hl),a
  inc hl
  ld (hl),a
  inc hl
  ld (hl),a
  inc hl
  djnz loop


  di
  ld de,(points_LUT)  ; Initialize the starting point's coordinates

main_loop:
  push de         ; save the current coords

  ; Iterate the PRNG, return in A register
  ; call cmwc       ; Patrik's cmwc
  call rnd        ; Patrik's xorshift
  ; call rand_zeda  ; Zeda's fast RNG
  ; call xrnd       ; John's xorshift

  pop de         ; restore current coords
  call calc_plot  ; Use the random number to calculate where to plot the point
  call check_exit ; Return z flag set if we need to exit.
  jr nz,main_loop
  ret


calc_plot:
; Choose from 4 points to jump halfway to

  ; Now we need to put 2 random bits in the bottom bits of A (to choose 0-3),
  ; then multiply by 2 to get the index into the LUT
  rlca
  rlca
  rlca
  and %00000110
  add a,points_LUT&255  ; Add a to the LSB of the LUT pointer
  ld c,a
  adc a,points_LUT>>8   ; now we'll continue by adding with carry to the MSB
  sub c                 ; but note that we are off by +C, so subtract C
  ld b,a

  ; Now read in the coordinates and jump halfway
  ld a,(bc)
  add a,e
  rra
  ld e,a
  inc bc
  ld a,(bc)
  add a,d
  rra
  ld d,a

  jp plot   ; must preserve HL and DE!
points_LUT:
  .db 0,0
  .db 64,0
  .db 0,96
  .db 64,96

xrnd:
;By John Metcalf:
  ld hl,1       ; seed must not be 0
  ld a,h
  rra
  ld a,l
  rra
  xor h
  ld h,a
  ld a,l
  rra
  ld a,h
  rra
  xor l
  ld l,a
  xor h
  ld h,a
  ld (xrnd+1),hl
  ret

rand_zeda:
  ld hl,12345
  add hl,hl     ; 29    11
  sbc a,a       ; 9F     4
  and %00101101 ; E62D   7
  xor l         ; AD     4
  ld l,a        ; 6F     4
  ld a,r        ; ED5F   9
  add a,h       ; 84     4
  ld (rand_zeda+1),hl
  ret

rnd:
  ld  hl,$A280   ; yw -> zt
  ld  de,$C0DE   ; xz -> yw
  ld  (rnd+4),hl  ; x = y, z = w
  ld  a,l         ; w = w ^ ( w << 3 )
  add a,a
  add a,a
  add a,a
  xor l
  ld  l,a
  ld  a,d         ; t = x ^ (x << 1)
  add a,a
  xor d
  ld  h,a
  rra             ; t = t ^ (t >> 1) ^ w
  xor h
  xor l
  ld  h,e         ; y = z
  ld  l,a         ; w = t
  ld  (rnd+1),hl
  ret



cmwc:   ld   hl,table

        ld   a,(hl) ; i = ( i & 7 ) + 1
        and  7
        inc  a
        ld   (hl),a

        inc  l      ; hl = &cy

        ld   d,h    ; de = &q[i]
        add  a,l
        ld   e,a

        ld   a,(de) ; y = q[i]
        ld   b,a
        ld   c,a
        ld   a,(hl) ; ba = 256 * y + cy

        sub  c      ; ba = 255 * y + cy
        jr   nc,$+3
        dec  b

        sub  c      ; ba = 254 * y + cy
        jr   nc,$+3
        dec  b

        sub  c      ; ba = 253 * y + cy
        jr   nc,$+3
        dec  b

        ld   (hl),b ; cy = ba >> 8, x = ba & 255
        cpl         ; x = (b-1) - x = -x - 1 = ~x + 1 - 1 = ~x
        ld   (de),a ; q[i] = x

        ret

table:   .db   0,0,82,97,120,111,102,116,20,15

#if (table/256)-((table+9)/256)
.error "whole table must be within single 256 byte block"
#endif





check_exit:
; must preserve HL and DE!
  in a,(4)
  and 8
  ret

plot:
; must preserve HL and DE!
; Inputs: (D,E) = (x,y)
; Does not check for out-of-bounds pixels
  ld c,e

  ld b,0
  ld h,b
  ld l,c
  add hl,hl
  add hl,bc
  add hl,hl
  add hl,hl
  ld c,a
  srl c
  srl c
  srl c
  add hl,bc
  ld bc,$9340
  add hl,bc

  and 7
  ld bc,pixLUT
  add a,c
  ld c,a
  jr nc,$+3
  inc b
  ld a,(bc)
  scf

  or (hl)
  ld (hl),a

gbuf_to_lcd_6MHz:
;This copies the contents of gbuf to the LCD.
;Preserves all registers.
  push hl     ;11   11
  push de     ;11   22
  push bc     ;11   33
  push af     ;11   44
  ld hl,9340h  ;10   54
  ld a,$80    ;7    61
  out (16),a  ;11   0

  ld de,11    ;10   10
  ld a,$20    ;7    17
  ex (sp),hl  ;19   36 \ waste clock cycles
  ex (sp),hl  ;19   55 /
col:
  out (10h),a ;11   0
  ld b,(hl)   ;7    7
  xor b       ;4    11  \ waste clock cycles
  xor b       ;4    15  /
  ld bc,$8011 ;10   25
  nop         ;4    29

row:
  sbc hl,de   ;15   44  \ waste clock cycles
  add hl,de   ;11   55  /
  outi        ;16   5
  add hl,de   ;11   16
  djnz row    ;13/8 29/24
  inc a       ;4    28
  dec h       ;4    32
  dec h       ;4    36
  dec h       ;4    40
  inc hl      ;6    46
  cp $2C      ;7    53
  jp nz,col   ;10   63
  pop af      ;10
  pop bc      ;10
  pop de      ;10
  pop hl      ;10
  ret         ;10
pixLUT:
  .db $80,$40,$20,$10,$08,$04,$02,$01

Patrik Rak · Post by **Patrik Rak** » Thu Mar 25, 2021 9:00 am

ZedaZ80 wrote: ↑Tue Mar 23, 2021 9:45 pm Oh boy, I found my mistake -__- Your xorshift destroyed DE and I was using DE for pixel coordinates. With it fixed, it looks much better:

Oh, that's a relief! You had me worried there for a while.

I mean, quite a few people seem to use mine and it would be a shame if it was indeed behaving so poorly.

ZedaZ80 wrote: ↑Tue Mar 23, 2021 9:45 pm Also, I'm not sure how to edit my other post with this update :/

I think it can be only edited for some time after posting. Well, don't worry about it now, let's hope that interested reader can read few posts further and won't reject mine as crap right away.

Overall, thanks for this collection, superb work! I was not aware of John's, it can be indeed pretty useful in some situations. Of course, the 65535 period makes it unsuitable for some tasks (for example, it won't even fill the entire 256x256 square with PLOTs), but other than that, it's as good XS as any.

Patrik

introspec · Post by **introspec** » Thu Mar 25, 2021 9:16 am

ZedaZ80 wrote: ↑Sun Mar 21, 2021 1:29 am I think that of all the routines I made last year, this is the one I am most proud of, an 8-bit PRNG that occasionally moonlights as a true RNG:

Code: Select all

;This code snippet is 9 bytes and 43cc
;Inputs:
;   HL is the input seed and must be non-zero
;Outputs:
;   A is the 8-bit pseudo-random number
;   HL is the new seed value (will be non-zero)


                  ;opcode cc
    add hl,hl     ; 29    11
    sbc a,a       ; 9F     4
    and %00101101 ; E62D   7
    xor l         ; AD     4
    ld l,a        ; 6F     4
    ld a,r        ; ED5F   9
    add a,h       ; 84     4


;-------------------------------------------------------------------------------
;Technical details:
;   The concept behind this routine is to combine an LFSR (poor RNG) with a
; counter. The counter improves the RNG quality, while also extending the period
; length.
;   For this routine, I took advantage of the Z80's built-in counter, the `r`
; register. This means that we don't need to store the counter anywhere, and it
; is pretty fast to access!
;   Some caveats:
;     * r is a 7-bit counter
;     * r will increment some number of times between runs of the RNG. In most
;       cases, this will be constant, but if it increments an even number each
;       time, then the bottom bit is always the same, weakening the effect of
;       the counter. In the worst case, it increments a multiple of 128 times,
;       effectively making your RNG just as good/bad as the LFSR. Ideally, you
;       want `r` to increment an odd number of times between runs.
;     * In the best case, the bottom 7 bits have 50/50 chance of being 0 or 1.
;       The top bit is 1 with probability 1/2 + 1/(2^17-2) ~ .5000076295
;     * In the event that your main loop waits for user input between calls,
;       then congatulations, you might have a True RNG :)
;-------------------------------------------------------------------------------

It was fun to see this generator, because it is very similar to one of the generators I wrote back in 2014: http://forum.tslabs.info/viewtopic.php?f=3&t=391 (my apologies that the post is in Russian, but Google translate does miracles these days). I am particularly surprised by your choice of the feedback constant (#2D), because I wrote a program that tested every byte for the longest period of the resulting sequence and I do not think I had your constant in my list. So, either my program was broken (a definite possibility), or your generator is not getting the longest possible period (also possible!). In any case, I mainly wanted to compliment you for the idea to use R as the mechanism for bit-mixing; usually R is pretty poor generator of randomness, but as a nonlinear transformer of LSFR output it seems cool.

ZedaZ80 · Post by **ZedaZ80** » Thu Mar 25, 2021 9:50 pm

introspec wrote: ↑Thu Mar 25, 2021 9:16 am
ZedaZ80 wrote: ↑Sun Mar 21, 2021 1:29 am I think that of all the routines I made last year, this is the one I am most proud of, an 8-bit PRNG that occasionally moonlights as a true RNG:
Code: Select all
;This code snippet is 9 bytes and 43cc
;Inputs:
;   HL is the input seed and must be non-zero
;Outputs:
;   A is the 8-bit pseudo-random number
;   HL is the new seed value (will be non-zero)


                  ;opcode cc
    add hl,hl     ; 29    11
    sbc a,a       ; 9F     4
    and %00101101 ; E62D   7
    xor l         ; AD     4
    ld l,a        ; 6F     4
    ld a,r        ; ED5F   9
    add a,h       ; 84     4


;-------------------------------------------------------------------------------
;Technical details:
;   The concept behind this routine is to combine an LFSR (poor RNG) with a
; counter. The counter improves the RNG quality, while also extending the period
; length.
;   For this routine, I took advantage of the Z80's built-in counter, the `r`
; register. This means that we don't need to store the counter anywhere, and it
; is pretty fast to access!
;   Some caveats:
;     * r is a 7-bit counter
;     * r will increment some number of times between runs of the RNG. In most
;       cases, this will be constant, but if it increments an even number each
;       time, then the bottom bit is always the same, weakening the effect of
;       the counter. In the worst case, it increments a multiple of 128 times,
;       effectively making your RNG just as good/bad as the LFSR. Ideally, you
;       want `r` to increment an odd number of times between runs.
;     * In the best case, the bottom 7 bits have 50/50 chance of being 0 or 1.
;       The top bit is 1 with probability 1/2 + 1/(2^17-2) ~ .5000076295
;     * In the event that your main loop waits for user input between calls,
;       then congatulations, you might have a True RNG :)
;-------------------------------------------------------------------------------
It was fun to see this generator, because it is very similar to one of the generators I wrote back in 2014: http://forum.tslabs.info/viewtopic.php?f=3&t=391 (my apologies that the post is in Russian, but Google translate does miracles these days). I am particularly surprised by your choice of the feedback constant (#2D), because I wrote a program that tested every byte for the longest period of the resulting sequence and I do not think I had your constant in my list. So, either my program was broken (a definite possibility), or your generator is not getting the longest possible period (also possible!). In any case, I mainly wanted to compliment you for the idea to use R as the mechanism for bit-mixing; usually R is pretty poor generator of randomness, but as a nonlinear transformer of LSFR output it seems cool.

I got those taps from some paper that had a fairly comprehensive list. I just now ran a Python program to search for all taps that are entirely contained within the lower 8 bits of a 16-bit LFSR and came up with:

Code: Select all

  binary     hex
0b00101101   0x2d
0b00111001   0x39
0b00111111   0x3f
0b01010011   0x53
0b10111101   0xbd
0b11010111   0xd7

Note that this is a brute-force approach, but there are more efficient ways to find these taps.

Code: Select all

# brute force to get the lfsr taps
print('  binary     hex')
for i in range(1, 255):
    c = 0
    s = 1
    while (s != 1 or c == 0) and c != 65536:
        s += s
        if s&65536:
            s ^= (65536 + i)
        c += 1
    if c == 65535:
        print("0b{:0>8}   {}".format(bin(i)[2:], hex(i)))

And the beauty of using the R register is that for an LFSR, all you need to do is combine it with a counter (even just an n+1 ==> n counter). So if R is super predictable, then it works to extend the LFSR's period and smooths out probabilities of a given bit being 0 or 1. But if R itself is chaotic (or even random), then that only serves to help the RNG even more. In that regard, it's a win-win. The only real downside to using R is the possibility of it incrementing 2^k times between runs, making it a (7-k)-bit counter.

introspec · Post by **introspec** » Fri Mar 26, 2021 5:06 pm

There is nothing wrong with the brute-force approach in this case and I am glad to see that all my feedback values are in your list too, so clearly whatever I had did not look at all potential values, I do not remember why (my old code is now lost). And I agree that the use of R to transform output is neat and almost free. In any case, the fact that we arrived at pretty much identical codes suggests that we are very close to the local optimum. It may be the case that even quicker/smaller PRNGs are possible, but they'll probably have to be quite different in nature.

Mpk · Post by **Mpk** » Sun May 16, 2021 11:38 am

Idiot question time :

How does one get a random number in a range 1 - x, assuming one already has a random number in A using one of the routines above?

I was copying the examples in the Caldwell book that use AND to restrict the range, e.g.

call random ; get a 'random' number.
and 15 ; want vertical in range 0 to 15.

but that's only going to work properly for ranges that end in all 1s. Even numbers will never return an odd value. I think.

AndyC · Post by **AndyC** » Sun May 16, 2021 1:17 pm

The easiest way is to use something like AND to get it to the nearest power of two, then just re-try if you end up with a number that's out of range.

Mpk · Post by **Mpk** » Sun May 16, 2021 11:27 pm

In the end I've divided the random number by the upper range, then multiplied that result by the range again. The difference is the modulus, to which I just add 1.
Possibly wasteful but if it works, it works.

Patrik Rak · Post by **Patrik Rak** » Fri May 28, 2021 4:35 pm

Mpk wrote: ↑Sun May 16, 2021 11:38 am How does one get a random number in a range 1 - x, assuming one already has a random number in A using one of the routines above?

That's actually a pretty common question. See the discussion around here about how to do it right.

Mpk · Post by **Mpk** » Fri May 28, 2021 7:23 pm

Patrik Rak wrote: ↑Fri May 28, 2021 4:35 pm how to do it right.

But...I've already done it wrong!

Thanks though - your code is like half the size of mine.

Spectrum Computing

Pseudorandom number generator

Re: Introduce yourself!

Re: Introduce yourself!

Re: Introduce yourself!

Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Introduce yourself!

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Introduce yourself!

Re: Introduce yourself!

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator

Re: Pseudorandom number generator