ParadigmShifter wrote: ↑Sun May 26, 2024 9:02 pm
If you up the CPU speed you may as well improve the CPU too though really.
Z80 is the nicest (and easiest IMO) 8-bit CPU to code for in ASM though I think, but Z80 has some issues.
* Non orthogonal instruction set i.e. you can't do same with all registers. You'd want an ADD B, C instead of only being able to add to A.
* Wants a barrel shifter so you can shift and roll more than 1 bit at a time so SRL B, 2 to shift B down by 2. Also you want 16 bit shifts and rolls.
* You may as well throw in 8 and 16 bit multiply (probably have to make those non orthogonal of course since not many registers). And then you may as well add in a DIV as well as a MUL
* I know Ketmar wants EX SP, SP' which makes sense too
* I'd want PowerPC style instructions where you don't have to set the flags if you don't want
* Register autoincrement and decrement option (don't really need SP then either, can use any register as a stack), like 68000 had. Don't need LDI either then. Handle LDIR with a REPT prefix code like x86 maybe?
* Ability to load flags register with whatever you want to help out with stack abuse 16 bit reads/writes
* Hardware breakpoint instruction
* Obviously the default interrupt vector should be writable too (like on Amstrad I think?) so you aren't stuck with the ROM default IRQ.
* Multiple PUSH/POP maybe like ARM does? ARM allows you to push/pop several registers at once with one instruction.
* Atomic exchange any 8 or 16 bit register with another.
6809 has half of that and 68HC16 has probably everything. 6309 is somewhere between. Really, if you look on "modern" designs with 8bit data bus there is little if anything to improve.
The only exercise left is to either cut down features to minimize transistor count to be around 10K (Z80 has 8K, 6502 has 6K) to make your cpu feel little bit authentic and organize ISA to squeeze maximum from narrow data path (as modern beefed designs really don't care as they are supposed to use fast static rams).
Same issue is with graphic system, if you want something in fashion of home computer you will not get anything better than CPC has because of memory bandwitdth. All early consoles like NES or Sega Master system had to use sideway RAM for graphics because video would consume all memory bandwidth otherwise (even C64 cheats little bit with VIC II using 12bit wide data bus and 4bit wide color ram). You can introduce something like display list with color pallete and memory pointer per pixel line as you can read few bytes during horizontal blanking but that it is. And you can easily read few more bytes for playback of samples but that it is. Any graphics of at least Atari ST quality really needs 16 bit data path or separate memory for video data.
And then you may as well add in an FPU stack too for floating point (it would probably want to be 32 bit floats rather than 40 bit floats Speccy uses though). If you are very bold allow floating point and integer instructions to be interleaved so it can do 2 instructions simultaneously if you do one of each (Pentium did that).
Well, while designing a simple 8 bit CPU can be funny way how to kill half year or so, designing coprocessor seems like next level. Though FP arithemtic would hugely benefit from barrel shift and 32 bit operations already.
Add some cache memory that is faster to access than normal memory either with a global page register (256 bytes of 256 byte aligned memory anywhere) or like what 6502 does with its zero page addressing.
These zero pages are just simplified index registers, add fast indexed addressing mode and you really don't need them.
One thing I really like on 6809 are its indexing regs. It can index X,Y,U and S(tack) with modes like [R] , [R+5 signed bits], [R+8 bits], [R+16bits], [R post increment], [R post increment by 2], [R predecrement], [R predecrement by two], [R+8bit acumulator] and [R+16bit acumulator]. You don't really need more (well, 68HC16 has [R postincrement/predecrement by 3 bit value] ).
And three index registers seems to be enough for most of time.
And yes, 6809 has direct page (256 bytes) selected by DP register anywhere in 64K (To be honest, it has to, because it has just two 8 bit acummulators that act also like one 16 bit acumulator. IMHO, design with separate 8 bit acc and separate 16 bit acc would be better).
On other side, what I really like on Z80 is that it allows artificially concate its registers to form 32 bits registers - ADD IX,BC; ADC HL,DE to add 32 bits, rotate B,C,D,E to shift 32 bits.