A statistical BASIC thought

The place for codemasters or beginners to talk about programming any language for the Spectrum.
Post Reply
equinox
Dynamite Dan
Posts: 1079
Joined: Mon Oct 08, 2018 1:57 am
Location: SE England

A statistical BASIC thought

Post by equinox »

The Spectrum's one-key BASIC input system (A=NEW, B=BORDER, C=CONTINUE) is very unusual and possibly unique, but in my opinion it's actually pretty good: once you learn the main keywords you can write BASIC very fast. There is the down-side where you occasionally need something weird like SGN and can't find the bugger.

Clearly the most common commands (like PRINT, REM, RUN) were allocated single keys, whereas weird stuff like CLOSE# and VAL$ was relegated to various shifts and extend-modes. Also, they tried to keep the main keys somewhat alphabetical: O is POKE, but O is close to P, and P was needed for PRINT. And so on. Still, looking back, there are some real mistakes, regarding efficiency. I mean: should COPY (Z) really get a single key? Of course not. (Maybe they assumed everybody would buy a ZX printer: reminds one of the various unwanted "apps" pushed by Windows 10.)

In terms of BASIC programming, it might have been much wiser to put (say) INKEY$ or RND on the key Z. I just had an interesting idea: perhaps somebody would even like to run a quick (non-BASIC lol) program over our archive of BASIC, and find out the most common keywords, statistically, and how they could have been placed to keep things super-efficient. (We must however consider that CONTINUE, RUN, SAVE, LOAD etc. are common on the command line, but rare inside a program.)

wot do u reckon m8?
AndyC
Dynamite Dan
Posts: 1443
Joined: Mon Nov 13, 2017 5:12 am

Re: A statistical BASIC thought

Post by AndyC »

Now I'm wondering why ctrl+Z (or command+z if you're a weird Mac type) is the shortcut for COPY. Is it some grand conspiracy?

Probably the printer thing though. Back then everyone was assumed to need a printer, whereas now people look at you weird for wanting one.
User avatar
Stefan
Manic Miner
Posts: 823
Joined: Mon Nov 13, 2017 9:51 pm
Location: Belgium
Contact:

Re: A statistical BASIC thought

Post by Stefan »

AndyC wrote: Tue Oct 03, 2023 8:33 am Now I'm wondering why ctrl+Z (or command+z if you're a weird Mac type) is the shortcut for COPY. Is it some grand conspiracy?
As in copies what you just did into limbo?

ctrl+c = copy
ctrl+x = cut
ctrl+v = paste
ctrl+z = undo
ctrl+y = redo
AndyC
Dynamite Dan
Posts: 1443
Joined: Mon Nov 13, 2017 5:12 am

Re: A statistical BASIC thought

Post by AndyC »

This is why you shouldn't think too hard about things before drinking coffee.
User avatar
ParadigmShifter
Manic Miner
Posts: 944
Joined: Sat Sep 09, 2023 4:55 am

Re: A statistical BASIC thought

Post by ParadigmShifter »

Kernighan and Ritchie analysed tons of Pascal programs when they were designing C hence why equality operator is == (as opposed to = in Pascal) and assignment is = (as opposed to := in Pascal), since assignment is used a lot more than testing for equality.
User avatar
TMD2003
Rick Dangerous
Posts: 2047
Joined: Fri Apr 10, 2020 9:23 am
Location: Airstrip One
Contact:

Re: A statistical BASIC thought

Post by TMD2003 »

equinox wrote: Mon Oct 02, 2023 10:35 pm In terms of BASIC programming, it might have been much wiser to put (say) INKEY$ or RND on the key Z. I just had an interesting idea: perhaps somebody would even like to run a quick (non-BASIC lol) program over our archive of BASIC, and find out the most common keywords, statistically, and how they could have been placed to keep things super-efficient. (We must however consider that CONTINUE, RUN, SAVE, LOAD etc. are common on the command line, but rare inside a program.)

wot do u reckon m8?
COPY on key Z was a bit of a mis-step but INKEY$ and RND in the same place wouldn't work because they're functions that don't start a line and by the time they're needed, the cursor will be L and not K. About the only sensible keyword I can think of to put on key Z instead would be STOP, which was a K-cursor keyword (key S) on the ZX80 but had moved to shift-A by the arrival of the ZX81.

The ZX80 had some seemingly questionable choices as to where to put each keyword that the ZX81 fixed:

Image

PRINT on O instead of P, LET on K instead of L... but there must have been a good reason for it, as with only 4K to play with in the ROM, reading the keyboard must have meant testing bits sequentially and the programmers couldn't afford to waste a single byte scanning a key that would do nothing. That must be why it's the outermost keys P, L, M and Z that have no keyword on them. (Has anyone scoured the ZX80 ROM to confirm this? It'd take me a week.)

OR looks out of place on shift-B - couldn't it have been shift-1 for OR and shift-B for NOT? AND IF NOT why NOT?
Spectribution: Dr. Jim's Sinclair computing pages.
Features my own programs, modified type-ins, RZXs, character sets & UDGs, and QL type-ins... so far!
User avatar
SkoolKid
Manic Miner
Posts: 416
Joined: Wed Nov 15, 2017 3:07 pm

Re: A statistical BASIC thought

Post by SkoolKid »

Stefan wrote: Tue Oct 03, 2023 9:19 am ctrl+c = copy
ctrl+x = cut
ctrl+v = paste
Off-topic, but I don't use these shortcuts. Instead I use:

Ctrl-Insert - copy
Shift-Delete - cut
Shift-Insert - paste

Is that unusual?
SkoolKit - disassemble a game today
Pyskool - a remake of Skool Daze and Back to Skool
User avatar
SkoolKid
Manic Miner
Posts: 416
Joined: Wed Nov 15, 2017 3:07 pm

Re: A statistical BASIC thought

Post by SkoolKid »

On-topic: I ran a check of all the BASIC listings in the snapshots produced by the t2s files in the t2sfiles repository, and the top 10 most used BASIC keywords and their number of appearances are:

552181 - LET
350122 - TO
327669 - PRINT
307110 - IF
306159 - THEN
186846 - AT
134457 - INK
104069 - AND
100160 - RETURN
96485 - DATA

The bottom 10 are:

545 - COS
542 - ATN
509 - ACS
477 - TAN
477 - SQR
469 - INKEY$
453 - VAL$
292 - <=
167 - >=
75 - RND

(Note: I excluded 'OPEN #' and 'CLOSE #' from the check because they have spaces in them and so are more difficult to spot.)
SkoolKit - disassemble a game today
Pyskool - a remake of Skool Daze and Back to Skool
User avatar
ParadigmShifter
Manic Miner
Posts: 944
Joined: Sat Sep 09, 2023 4:55 am

Re: A statistical BASIC thought

Post by ParadigmShifter »

Guess you can't search easily for GO TO and GO SUB either then ;)

It makes no sense for RETURN to be listed as one of the top keywords but GO SUB isn't. RETURN without GO SUB error!

Surprised trig functions and SQR are less popular than LN? Logarithms don't come up very often at all in my experience. (Inverse trig functions rare I admit).

And RND the least used? I find that surprising in games written in BASIC?
Last edited by ParadigmShifter on Tue Oct 03, 2023 12:33 pm, edited 1 time in total.
User avatar
SkoolKid
Manic Miner
Posts: 416
Joined: Wed Nov 15, 2017 3:07 pm

Re: A statistical BASIC thought

Post by SkoolKid »

ParadigmShifter wrote: Tue Oct 03, 2023 12:29 pm Guess you can't search easily for GO TO and GO SUB either then ;)
Very true!

I'll see if I can refine the search algorithm a bit. Also, it occurs to me that CODE would be missed because it's typically smushed up against "".
SkoolKit - disassemble a game today
Pyskool - a remake of Skool Daze and Back to Skool
User avatar
ParadigmShifter
Manic Miner
Posts: 944
Joined: Sat Sep 09, 2023 4:55 am

Re: A statistical BASIC thought

Post by ParadigmShifter »

Sounds like you need to strip nonalphanumerics like ( and ) to me, probably why INT and RND aren't showing up?

EDIT: Can't you just parse the basic code in RAM for keyword tokens? They are all in a contiguous block > chr$ 128. Or is that way more complicated than it is worth

EDIT2: Pretty sure CONTINUE, RUN, NEW wouldn't be in many programs either - latter mainly because they clear the variables. Maybe RUN for a restart but that's a bit weird ;)
Last edited by ParadigmShifter on Tue Oct 03, 2023 12:46 pm, edited 3 times in total.
User avatar
Stefan
Manic Miner
Posts: 823
Joined: Mon Nov 13, 2017 9:51 pm
Location: Belgium
Contact:

Re: A statistical BASIC thought

Post by Stefan »

SkoolKid wrote: Tue Oct 03, 2023 12:02 pm Off-topic, but I don't use these shortcuts. Instead I use:

Ctrl-Insert - copy
Shift-Delete - cut
Shift-Insert - paste

Is that unusual?
Yes. :D
User avatar
ParadigmShifter
Manic Miner
Posts: 944
Joined: Sat Sep 09, 2023 4:55 am

Re: A statistical BASIC thought

Post by ParadigmShifter »

I used to use the Brief editor on DOS when that was the best one, it had those keys.

VI and VIM can do one though, take that uber-nerds!
User avatar
SkoolKid
Manic Miner
Posts: 416
Joined: Wed Nov 15, 2017 3:07 pm

Re: A statistical BASIC thought

Post by SkoolKid »

ParadigmShifter wrote: Tue Oct 03, 2023 12:35 pm EDIT: Can't you just parse the basic code in RAM for keyword tokens? They are all in a contiguous block > chr$ 128. Or is that way more complicated than it is worth
That's too complicated for now. I'd have to skip over floating point numbers, for example. Maybe later.

Anyway, quick and dirty search #2 gives the top 15 as:

555199 - LET
370137 - TO
352458 - AT
331535 - PRINT
314445 - IF
310840 - THEN
209552 - VAL
172524 - INK
151612 - IN
142573 - GO SUB
109846 - AND
103828 - RETURN
98984 - FOR
97469 - DATA
96172 - NEXT

And the bottom 15:

3070 - LN
3029 - CAT
3006 - MERGE
2626 - CONTINUE
2594 - POINT
2438 - FORMAT
2157 - VERIFY
1976 - ATN
1582 - ERASE
1503 - SQR
1475 - ASN
1322 - ACS
1319 - VAL$
857 - OPEN #
298 - CLOSE #
SkoolKit - disassemble a game today
Pyskool - a remake of Skool Daze and Back to Skool
User avatar
ParadigmShifter
Manic Miner
Posts: 944
Joined: Sat Sep 09, 2023 4:55 am

Re: A statistical BASIC thought

Post by ParadigmShifter »

GO TO must be higher than GO SUB is it falsely detecting it as the TO keyword instead?

EDIT: You should also see if it roughly follows Zipf's law

https://en.wikipedia.org/wiki/Zipf%27s_law
User avatar
SkoolKid
Manic Miner
Posts: 416
Joined: Wed Nov 15, 2017 3:07 pm

Re: A statistical BASIC thought

Post by SkoolKid »

ParadigmShifter wrote: Tue Oct 03, 2023 12:48 pm GO TO must be higher than GO SUB is it falsely detecting it as the TO keyword instead?
Yep. Here's the top 15 from the third and final iteration of this now not-so-quick and dirty algorithm:

555199 - LET
352458 - AT
331535 - PRINT
314445 - IF
310840 - THEN
222733 - GO TO
209552 - VAL
172524 - INK
151612 - IN
147404 - TO
142576 - GO SUB
109846 - AND
103828 - RETURN
98984 - FOR
97469 - DATA
SkoolKit - disassemble a game today
Pyskool - a remake of Skool Daze and Back to Skool
User avatar
ParadigmShifter
Manic Miner
Posts: 944
Joined: Sat Sep 09, 2023 4:55 am

Re: A statistical BASIC thought

Post by ParadigmShifter »

Looks like it is following Zipf's law I'd say you had about 1,100,000 keywords in the data set?

EDIT: Out by a factor of 10 ;) Maybe up to 1.5M anyway.

EDIT2: Nope there's way more than that lol ;) It's too middle of the day to do maths. Not very familiar with Zipf's law but I'm sure it will apply here.
User avatar
spider
Dynamite Dan
Posts: 1106
Joined: Wed May 01, 2019 10:59 am
Location: Derby, UK
Contact:

Re: A statistical BASIC thought

Post by spider »

equinox wrote: Mon Oct 02, 2023 10:35 pm The Spectrum's one-key BASIC input system (A=NEW, B=BORDER, C=CONTINUE) is very unusual and possibly unique, but in my opinion it's actually pretty good: once you learn the main keywords you can write BASIC very fast. There is the down-side where you occasionally need something weird like SGN and can't find the bugger.
I do agree. I was able to on the rubbery keyed machine hammer out Basic very quickly usually. :)
User avatar
Turtle_Quality
Manic Miner
Posts: 508
Joined: Fri Dec 07, 2018 10:19 pm

Re: A statistical BASIC thought

Post by Turtle_Quality »

Quite likely to have more RETURNs than GOSUBs

GOSUB <key-routine>

LET i$=INKEY$

IF i$="Q" THEN LET Y = Y + 1: RETURN

IF i$="A" THEN LET Y = Y -1: RETURN

IF i$="O" THEN LET X = X - 1: RETURN

IF i$="P" THEN LET X = X + 1

RETURN
Definition of loop : see loop
equinox
Dynamite Dan
Posts: 1079
Joined: Mon Oct 08, 2018 1:57 am
Location: SE England

Re: A statistical BASIC thought

Post by equinox »

I have since realised that there was (apparently) a rule whereby A-Z were "commands" and not "functions". There's no good reason for this, but...
Yeah I wish we just had PUSH and POP (like a Sam Coupé!). sadly speccy basic with GOSUB still means that "n" is global -- it's horror...

Last night I went to the pub down the road and anyway -- never mind -- commands and functions ain't the same thing. (Mr Scary Yelly Beard was in her face going "i love you, i own you", it was so bad. When he was vanished I said "you realise you are being abused, there are organisations out there to help you" -- BRB GETTING A BEER)

TMD2003 (earlier in this thread) seemed to make the same point, but without noticing the issue.

8-bit BASIC has that general horror of "either ints or strings". speccy basic also realised that we can only possibly want 26 strings (A$-Z$) and they can't have any name beyond A to Z. (CONFIRMED!) interesting. i've got half an idea that we can't use multi-letter names even for integer arrays like A(123) but i can't fricken remember. anyway. spectrum basic does suck. Sam Coupé BASIC is definitely the best BASIC in the world ever. i've bored the "Discord" Speccy online chat nerds about this, very often, but it's true, Sam BASIC is really a world-beater, it's probably even better than Microsoft QuickBASIC, which expected some megabytes of RAM.

Love, xox, etc. I will release some shizz soon.
equinox
Dynamite Dan
Posts: 1079
Joined: Mon Oct 08, 2018 1:57 am
Location: SE England

Re: A statistical BASIC thought

Post by equinox »

Turtle_Quality wrote: Tue Oct 03, 2023 7:05 pm Quite likely to have more RETURNs than GOSUBs

GOSUB <key-routine>

LET i$=INKEY$

IF i$="Q" THEN LET Y = Y + 1: RETURN

IF i$="A" THEN LET Y = Y -1: RETURN

IF i$="O" THEN LET X = X - 1: RETURN

IF i$="P" THEN LET X = X + 1

RETURN
I like the idea of "more RETURNs than GOSUBs" being what yer mum would tell you, as a traditional proverb.
= don't have more chickens than you can hatch

"hey mum! i just got a call from Ocean Software! they said they love my art, and they want to hire me as a graphic artist"
"wow Johnny that's great"
"so i drew a screen for Robocop 4, and Terminator 5, and all these films that are gonna come out--"
"now wait Johnny, don't make more returns than gosubs!"
equinox
Dynamite Dan
Posts: 1079
Joined: Mon Oct 08, 2018 1:57 am
Location: SE England

Re: A statistical BASIC thought

Post by equinox »

ParadigmShifter wrote: Tue Oct 03, 2023 12:29 pm It makes no sense for RETURN to be listed as one of the top keywords but GO SUB isn't.
Many callers and one call-ee :)
If we didn't have DEF FN and FN, then we would have to use GOSUB as the only way to perform a function (in which case, there is one RETURN [for the function] but lots of GOSUBS [every caller]). God, writing Speccy BASIC is hell, I'm writing a freaking "crap games competition" game, but it's still hellish -- I just expect a little scope -- oh -- owww -- aahhhh
I love it.
User avatar
IvanBasic
Drutt
Posts: 48
Joined: Mon May 13, 2019 1:24 pm

Re: A statistical BASIC thought

Post by IvanBasic »

SkoolKid wrote: Tue Oct 03, 2023 1:01 pm Yep. Here's the top 15 from the third and final iteration of this now not-so-quick and dirty algorithm:

...
314445 - IF
310840 - THEN
IF and THEN should be exactly the same, there is no way to write a line with IF without THEN, and viceversa. This fact doesn't invalidate your valuable statistics.

(Edit: except if those commands are inserted inside a REM comment, or an alphanumeric DATA list)
Post Reply