Page 1 of 1

A statistical BASIC thought

Posted: Mon Oct 02, 2023 10:35 pm
by equinox
The Spectrum's one-key BASIC input system (A=NEW, B=BORDER, C=CONTINUE) is very unusual and possibly unique, but in my opinion it's actually pretty good: once you learn the main keywords you can write BASIC very fast. There is the down-side where you occasionally need something weird like SGN and can't find the bugger.

Clearly the most common commands (like PRINT, REM, RUN) were allocated single keys, whereas weird stuff like CLOSE# and VAL$ was relegated to various shifts and extend-modes. Also, they tried to keep the main keys somewhat alphabetical: O is POKE, but O is close to P, and P was needed for PRINT. And so on. Still, looking back, there are some real mistakes, regarding efficiency. I mean: should COPY (Z) really get a single key? Of course not. (Maybe they assumed everybody would buy a ZX printer: reminds one of the various unwanted "apps" pushed by Windows 10.)

In terms of BASIC programming, it might have been much wiser to put (say) INKEY$ or RND on the key Z. I just had an interesting idea: perhaps somebody would even like to run a quick (non-BASIC lol) program over our archive of BASIC, and find out the most common keywords, statistically, and how they could have been placed to keep things super-efficient. (We must however consider that CONTINUE, RUN, SAVE, LOAD etc. are common on the command line, but rare inside a program.)

wot do u reckon m8?

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 8:33 am
by AndyC
Now I'm wondering why ctrl+Z (or command+z if you're a weird Mac type) is the shortcut for COPY. Is it some grand conspiracy?

Probably the printer thing though. Back then everyone was assumed to need a printer, whereas now people look at you weird for wanting one.

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 9:19 am
by Stefan
AndyC wrote: Tue Oct 03, 2023 8:33 am Now I'm wondering why ctrl+Z (or command+z if you're a weird Mac type) is the shortcut for COPY. Is it some grand conspiracy?
As in copies what you just did into limbo?

ctrl+c = copy
ctrl+x = cut
ctrl+v = paste
ctrl+z = undo
ctrl+y = redo

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 9:22 am
by AndyC
This is why you shouldn't think too hard about things before drinking coffee.

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 10:07 am
by ParadigmShifter
Kernighan and Ritchie analysed tons of Pascal programs when they were designing C hence why equality operator is == (as opposed to = in Pascal) and assignment is = (as opposed to := in Pascal), since assignment is used a lot more than testing for equality.

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 11:53 am
by TMD2003
equinox wrote: Mon Oct 02, 2023 10:35 pm In terms of BASIC programming, it might have been much wiser to put (say) INKEY$ or RND on the key Z. I just had an interesting idea: perhaps somebody would even like to run a quick (non-BASIC lol) program over our archive of BASIC, and find out the most common keywords, statistically, and how they could have been placed to keep things super-efficient. (We must however consider that CONTINUE, RUN, SAVE, LOAD etc. are common on the command line, but rare inside a program.)

wot do u reckon m8?
COPY on key Z was a bit of a mis-step but INKEY$ and RND in the same place wouldn't work because they're functions that don't start a line and by the time they're needed, the cursor will be L and not K. About the only sensible keyword I can think of to put on key Z instead would be STOP, which was a K-cursor keyword (key S) on the ZX80 but had moved to shift-A by the arrival of the ZX81.

The ZX80 had some seemingly questionable choices as to where to put each keyword that the ZX81 fixed:

Image

PRINT on O instead of P, LET on K instead of L... but there must have been a good reason for it, as with only 4K to play with in the ROM, reading the keyboard must have meant testing bits sequentially and the programmers couldn't afford to waste a single byte scanning a key that would do nothing. That must be why it's the outermost keys P, L, M and Z that have no keyword on them. (Has anyone scoured the ZX80 ROM to confirm this? It'd take me a week.)

OR looks out of place on shift-B - couldn't it have been shift-1 for OR and shift-B for NOT? AND IF NOT why NOT?

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:02 pm
by SkoolKid
Stefan wrote: Tue Oct 03, 2023 9:19 am ctrl+c = copy
ctrl+x = cut
ctrl+v = paste
Off-topic, but I don't use these shortcuts. Instead I use:

Ctrl-Insert - copy
Shift-Delete - cut
Shift-Insert - paste

Is that unusual?

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:24 pm
by SkoolKid
On-topic: I ran a check of all the BASIC listings in the snapshots produced by the t2s files in the t2sfiles repository, and the top 10 most used BASIC keywords and their number of appearances are:

552181 - LET
350122 - TO
327669 - PRINT
307110 - IF
306159 - THEN
186846 - AT
134457 - INK
104069 - AND
100160 - RETURN
96485 - DATA

The bottom 10 are:

545 - COS
542 - ATN
509 - ACS
477 - TAN
477 - SQR
469 - INKEY$
453 - VAL$
292 - <=
167 - >=
75 - RND

(Note: I excluded 'OPEN #' and 'CLOSE #' from the check because they have spaces in them and so are more difficult to spot.)

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:29 pm
by ParadigmShifter
Guess you can't search easily for GO TO and GO SUB either then ;)

It makes no sense for RETURN to be listed as one of the top keywords but GO SUB isn't. RETURN without GO SUB error!

Surprised trig functions and SQR are less popular than LN? Logarithms don't come up very often at all in my experience. (Inverse trig functions rare I admit).

And RND the least used? I find that surprising in games written in BASIC?

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:33 pm
by SkoolKid
ParadigmShifter wrote: Tue Oct 03, 2023 12:29 pm Guess you can't search easily for GO TO and GO SUB either then ;)
Very true!

I'll see if I can refine the search algorithm a bit. Also, it occurs to me that CODE would be missed because it's typically smushed up against "".

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:35 pm
by ParadigmShifter
Sounds like you need to strip nonalphanumerics like ( and ) to me, probably why INT and RND aren't showing up?

EDIT: Can't you just parse the basic code in RAM for keyword tokens? They are all in a contiguous block > chr$ 128. Or is that way more complicated than it is worth

EDIT2: Pretty sure CONTINUE, RUN, NEW wouldn't be in many programs either - latter mainly because they clear the variables. Maybe RUN for a restart but that's a bit weird ;)

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:36 pm
by Stefan
SkoolKid wrote: Tue Oct 03, 2023 12:02 pm Off-topic, but I don't use these shortcuts. Instead I use:

Ctrl-Insert - copy
Shift-Delete - cut
Shift-Insert - paste

Is that unusual?
Yes. :D

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:38 pm
by ParadigmShifter
I used to use the Brief editor on DOS when that was the best one, it had those keys.

VI and VIM can do one though, take that uber-nerds!

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:46 pm
by SkoolKid
ParadigmShifter wrote: Tue Oct 03, 2023 12:35 pm EDIT: Can't you just parse the basic code in RAM for keyword tokens? They are all in a contiguous block > chr$ 128. Or is that way more complicated than it is worth
That's too complicated for now. I'd have to skip over floating point numbers, for example. Maybe later.

Anyway, quick and dirty search #2 gives the top 15 as:

555199 - LET
370137 - TO
352458 - AT
331535 - PRINT
314445 - IF
310840 - THEN
209552 - VAL
172524 - INK
151612 - IN
142573 - GO SUB
109846 - AND
103828 - RETURN
98984 - FOR
97469 - DATA
96172 - NEXT

And the bottom 15:

3070 - LN
3029 - CAT
3006 - MERGE
2626 - CONTINUE
2594 - POINT
2438 - FORMAT
2157 - VERIFY
1976 - ATN
1582 - ERASE
1503 - SQR
1475 - ASN
1322 - ACS
1319 - VAL$
857 - OPEN #
298 - CLOSE #

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 12:48 pm
by ParadigmShifter
GO TO must be higher than GO SUB is it falsely detecting it as the TO keyword instead?

EDIT: You should also see if it roughly follows Zipf's law

https://en.wikipedia.org/wiki/Zipf%27s_law

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 1:01 pm
by SkoolKid
ParadigmShifter wrote: Tue Oct 03, 2023 12:48 pm GO TO must be higher than GO SUB is it falsely detecting it as the TO keyword instead?
Yep. Here's the top 15 from the third and final iteration of this now not-so-quick and dirty algorithm:

555199 - LET
352458 - AT
331535 - PRINT
314445 - IF
310840 - THEN
222733 - GO TO
209552 - VAL
172524 - INK
151612 - IN
147404 - TO
142576 - GO SUB
109846 - AND
103828 - RETURN
98984 - FOR
97469 - DATA

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 1:06 pm
by ParadigmShifter
Looks like it is following Zipf's law I'd say you had about 1,100,000 keywords in the data set?

EDIT: Out by a factor of 10 ;) Maybe up to 1.5M anyway.

EDIT2: Nope there's way more than that lol ;) It's too middle of the day to do maths. Not very familiar with Zipf's law but I'm sure it will apply here.

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 3:08 pm
by spider
equinox wrote: Mon Oct 02, 2023 10:35 pm The Spectrum's one-key BASIC input system (A=NEW, B=BORDER, C=CONTINUE) is very unusual and possibly unique, but in my opinion it's actually pretty good: once you learn the main keywords you can write BASIC very fast. There is the down-side where you occasionally need something weird like SGN and can't find the bugger.
I do agree. I was able to on the rubbery keyed machine hammer out Basic very quickly usually. :)

Re: A statistical BASIC thought

Posted: Tue Oct 03, 2023 7:05 pm
by Turtle_Quality
Quite likely to have more RETURNs than GOSUBs

GOSUB <key-routine>

LET i$=INKEY$

IF i$="Q" THEN LET Y = Y + 1: RETURN

IF i$="A" THEN LET Y = Y -1: RETURN

IF i$="O" THEN LET X = X - 1: RETURN

IF i$="P" THEN LET X = X + 1

RETURN

Re: A statistical BASIC thought

Posted: Mon Oct 09, 2023 12:19 am
by equinox
I have since realised that there was (apparently) a rule whereby A-Z were "commands" and not "functions". There's no good reason for this, but...
Yeah I wish we just had PUSH and POP (like a Sam Coupé!). sadly speccy basic with GOSUB still means that "n" is global -- it's horror...

Last night I went to the pub down the road and anyway -- never mind -- commands and functions ain't the same thing. (Mr Scary Yelly Beard was in her face going "i love you, i own you", it was so bad. When he was vanished I said "you realise you are being abused, there are organisations out there to help you" -- BRB GETTING A BEER)

TMD2003 (earlier in this thread) seemed to make the same point, but without noticing the issue.

8-bit BASIC has that general horror of "either ints or strings". speccy basic also realised that we can only possibly want 26 strings (A$-Z$) and they can't have any name beyond A to Z. (CONFIRMED!) interesting. i've got half an idea that we can't use multi-letter names even for integer arrays like A(123) but i can't fricken remember. anyway. spectrum basic does suck. Sam Coupé BASIC is definitely the best BASIC in the world ever. i've bored the "Discord" Speccy online chat nerds about this, very often, but it's true, Sam BASIC is really a world-beater, it's probably even better than Microsoft QuickBASIC, which expected some megabytes of RAM.

Love, xox, etc. I will release some shizz soon.

Re: A statistical BASIC thought

Posted: Mon Oct 09, 2023 1:00 am
by equinox
Turtle_Quality wrote: Tue Oct 03, 2023 7:05 pm Quite likely to have more RETURNs than GOSUBs

GOSUB <key-routine>

LET i$=INKEY$

IF i$="Q" THEN LET Y = Y + 1: RETURN

IF i$="A" THEN LET Y = Y -1: RETURN

IF i$="O" THEN LET X = X - 1: RETURN

IF i$="P" THEN LET X = X + 1

RETURN
I like the idea of "more RETURNs than GOSUBs" being what yer mum would tell you, as a traditional proverb.
= don't have more chickens than you can hatch

"hey mum! i just got a call from Ocean Software! they said they love my art, and they want to hire me as a graphic artist"
"wow Johnny that's great"
"so i drew a screen for Robocop 4, and Terminator 5, and all these films that are gonna come out--"
"now wait Johnny, don't make more returns than gosubs!"

Re: A statistical BASIC thought

Posted: Mon Oct 09, 2023 1:05 am
by equinox
ParadigmShifter wrote: Tue Oct 03, 2023 12:29 pm It makes no sense for RETURN to be listed as one of the top keywords but GO SUB isn't.
Many callers and one call-ee :)
If we didn't have DEF FN and FN, then we would have to use GOSUB as the only way to perform a function (in which case, there is one RETURN [for the function] but lots of GOSUBS [every caller]). God, writing Speccy BASIC is hell, I'm writing a freaking "crap games competition" game, but it's still hellish -- I just expect a little scope -- oh -- owww -- aahhhh
I love it.

Re: A statistical BASIC thought

Posted: Tue Oct 10, 2023 7:53 pm
by IvanBasic
SkoolKid wrote: Tue Oct 03, 2023 1:01 pm Yep. Here's the top 15 from the third and final iteration of this now not-so-quick and dirty algorithm:

...
314445 - IF
310840 - THEN
IF and THEN should be exactly the same, there is no way to write a line with IF without THEN, and viceversa. This fact doesn't invalidate your valuable statistics.

(Edit: except if those commands are inserted inside a REM comment, or an alphanumeric DATA list)