Spectrum Next: Technical questions for 3D engine development

The Speccy's spritely young offspring. Discuss everything from FPGA to ZX
Post Reply
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Hi guys - I just discovered these forums today, after waiting for a week on account approval over at specnext forums (and still no response, hence googling for other spectrum forums). So, I figured I might as well start the thread here, as I can see there's few technical people from over there [around here].

Some of the scenarios I mention may already be fixed in latest core, but I recall reading about them on the forums (I perused whole programming section over there), hence why I ask if it's still an issue today.

So, I'm thinking of writing a Z80 backend for my multiplatform 3D engine, as a pet project [initially] as my current primary focus is on a different platform. I've written several of them in past (6502,6502C,68000-68040, DSP, RISC GPU, ...) but never for Z80. From initial look into the architecture (downloaded some Z80.pdfs), it appears that Z80 is quite phenomenal for an 8-bit - it has 2 sets of registers, 16-bit registers and some really nice ops, especially compared to 6502 !

Z80 extensions on Next, especially the bitshifting ops, are very welcome and will definitely speed up things like 3D transformations and scanline traversal or anything involving fixed point calculations.

Now, the questions:
1. CPU Speed: Setting 28 MHz : Register $07 appears to be able to set speed (3.5,7,14,28). Do I, as a programmer, have a full control over that without having to ask user to change speed manually ? Meaning, I set this register and poof, off I putz around at 28 MHz ?

2. CPU Speed: FPGA limit. Is 28 MHz at the limit of what the current FPGA board can pull off ? Or is there a hypothetical option in future that further speed upgrades could be made available via core update?

3. CPU Speed: Wait states at 28 MHz. One of the older threads mentioned that at this speed, there is a wait state inserted after each instruction, hence effectively slowing down the effective instruction throughput to something like 22-24 MHz. Is that still the case with latest core ?

4. CPU Speed: Real available cycles. Outside of the wait states@28 MHz, is there any system activity that would halt the CPU ? For example, on Atari, while the ANTIC chip was reading the framebuffer, the CPU was halted. This resulted in quite a few cycles "stolen" from the CPU by other chips in the system. How does Z80N fare here ? Say, at 28 MHz and 60 fps, do I really have 28,000,000 / 60 = 466,667 cycles per frame of CPU time ?

5. PAL/NTSC: 50/60 Hz: Register $06 appears to be R/W and one of the bits is reserved for 50/60. Again - is that all there is to it ? I am in NTSC land, but what happens if I force 60 Hz for PAL folks ? Hopefully in 2020 and uniform LCDs all over the world we wouldn't have to resort to having two different builds (PAL and NTSC) ?

6. LoRes Layer/Radasjimian double buffering - I intend to design everything around 128x96x256. From my current understanding it would appear it's not really possible to do proper double buffering in this mode like it's possible in other modes ? Hopefully I am just misinterpreting here ? I mean I need to be able to flip to the secondary framebuffer somehow. 16k-bank5 (primary) vs 16k-bank7 (secondary) ?

7. Bank-Switching speed : On Atari, the 6502 was able to switch banks right on the next cycle. Is it also that fast on Spectrum ? If not, what's the delay [in cycles] ?

8. Bank-Switching : MMU vs 128-style. It would appear that 128-style is quite inflexible and MMU 0-7 gives me full control over which target/source bank I need. Is there any clear disadvantage to using 8k scheme via 8 MMUs ?

9. EXX instruction: Exchanges BC, DE, and HL with their shadow registers. AF and AF' are not exchanged. Why is that ? Why can't I have a shadow accumulator exchanged ? My Z80 manual didn't mention anything about this, so I'm curious as to why Spectrum Next wouldn't allow it ? Now, I'm not complaining, compared to 6502, Z80 has 15 registers : A + 2x(B,C,D,E,H,L) + IX/IY. But why ?

As for the dev env, I downloaded CSpect and it even came bundled with SNASM assembler, with couple simple demos that will make it easy to change. In the meantime, I adjusted my rapid prototyping project in Visual Studio (C++/DirectX) to support 128x96x256 so I can quickly experiment with what's possible to do in that resolution before I dive into Assembler.
User avatar
PROSM
Manic Miner
Posts: 476
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: Spectrum Next: Technical questions for 3D engine development

Post by PROSM »

I'm not qualified to talk about any of the Next-specific points, but I can help you with your last query.
Heimdall wrote: Sat Jun 13, 2020 1:32 pm 9. EXX instruction: Exchanges BC, DE, and HL with their shadow registers. AF and AF' are not exchanged. Why is that ? Why can't I have a shadow accumulator exchanged ? My Z80 manual didn't mention anything about this, so I'm curious as to why Spectrum Next wouldn't allow it ? Now, I'm not complaining, compared to 6502, Z80 has 15 registers : A + 2x(B,C,D,E,H,L) + IX/IY. But why ?
You're right, EXX only swaps BC, DE and HL. To swap your accumulator, you'll need to use the EX AF,AF' instruction, which is available on all Z80s. You can even swap your DE and HL register pairs using EX DE, HL if needs be.
All software to-date
Working on something, as always.
User avatar
Seven.FFF
Manic Miner
Posts: 744
Joined: Sat Nov 25, 2017 10:50 pm
Location: USA

Re: Spectrum Next: Technical questions for 3D engine development

Post by Seven.FFF »

1. Yes. The user can still toggle the speed manually with F8 or NMI+8 hotkey, but that can also be disabled in another nextreg.

2. It’s probably the limit. 28MHz was hard to get stable enough. In this design, SRAM settle time is the limiting factor, and SRAM is already heavily used for layer 2.

3. Yes.

4. Standard Spectrums have contended memory where the ULA stops the CPU while rendering the screen. On the Next can be enabled (for compatibility) or disabled. The wait states At 28MHz wait states also slow down instructions slightly. Enabling the expansion bus slows down the CPU to 3.5MHz because legacy hardware addons are not designed to run any faster.

5. Yes. Due to historical NTSC and PAL standards, a 60Hz picture has fewer lines per frame than 50Hz does. There is no room in the Next for a full screen scanbuffer, LCD timings are pretty inflexible, so the machine timings have to be bent to match. The Next can also be compatible with Timex models, and these 60Hz timings are the same as in those models or the rarer NTSC Spectrum.

6. 256 colour lores can’t be double buffered, by 16 colour radastan can (second buffer in bottom half of 128K shadow screen in 8k bank 14). ULA VRAM must be held in FPGA BRAM because SRAM can’t be switched fast enough, and there is insufficient free BRAM for four 8K VRAM banks, which double buffering jimistan would need.

7. Yes, bank switching speed is only limited by how fast the nextreg instructions are.

8. MMU switching is all you need. Paging two 8k banks in slots 6 and 7 is about as fast as a 128K page including the setup.

9. This is how the Z80 was designed. The ALU circuit is separate from the 16 bit registers, so I imagine separate switches were more efficient. Both ex af, af’ and exx are the fastest possible instructions at 4Ts each, and as a coder it is often convenient to not have your operating value switched when you switch addresses. I rarely need to use both instructions together, unless I’m preserving all registers in an ISR.
Robin Verhagen-Guest
SevenFFF / Threetwosevensixseven / colonel32
NXtel NXTP ESP Update ESP Reset CSpect Plugins
Alone Coder
Manic Miner
Posts: 401
Joined: Fri Jan 03, 2020 10:00 am

Re: Spectrum Next: Technical questions for 3D engine development

Post by Alone Coder »

Heimdall wrote: Sat Jun 13, 2020 1:32 pm So, I'm thinking of writing a Z80 backend for my multiplatform 3D engine, as a pet project [initially] as my current primary focus is on a different platform. I've written several of them in past (6502,6502C,68000-68040, DSP, RISC GPU, ...) but never for Z80.
What functions does your 3D engine supply?
For compatibility, I recommend Russian ATM2 standard, that is a full range of computers: ATM-Turbo2(+) @ 7 MHz 1 MB, ATM3 @ 7 MHz 4 MB, ZX Evo @ 14 MHz 4 MB, Pentagon 2.666LE @ 28 MHz nowait, 2 MB. The software library is over 400 titles and counting.
ATM2 wiring is simple, it can be supported in any FPGA based computer, including Spectrum Next.

If your engine performs slow drawing, such as textured mapping, you can write "chunky to planar" procedures for different Z80 computers. For example, 2x2 chunky c2p for ATM's hardware multicolor looks like this:

pop hl ;chunks 0,1
ldd ;top line to (de)
ld a,(hl)
ld (bc),a ;bottom line to (bc)
exx ;another screen layer
pop hl ;chunks 2,3
ldd ;top line to (de)
ld a,(hl)
ld (bc),a ;bottom line to (bc)
exx ;back to the first screen layer

Used in The Board demo:
https://www.youtube.com/watch?v=bbUGcA_ ... e=emb_logo
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Wow, just woke up and already have so many answers. Will get to each of them separately now.

Shouldn't have waited for a week on the other forum for activation that didn't happen (though, it just might - perhaps - later).
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

PROSM wrote: Sat Jun 13, 2020 2:32 pm I'm not qualified to talk about any of the Next-specific points, but I can help you with your last query.
Heimdall wrote: Sat Jun 13, 2020 1:32 pm 9. EXX instruction: Exchanges BC, DE, and HL with their shadow registers. AF and AF' are not exchanged. Why is that ? Why can't I have a shadow accumulator exchanged ? My Z80 manual didn't mention anything about this, so I'm curious as to why Spectrum Next wouldn't allow it ? Now, I'm not complaining, compared to 6502, Z80 has 15 registers : A + 2x(B,C,D,E,H,L) + IX/IY. But why ?
You're right, EXX only swaps BC, DE and HL. To swap your accumulator, you'll need to use the EX AF,AF' instruction, which is available on all Z80s. You can even swap your DE and HL register pairs using EX DE, HL if needs be.
OMG. You're right - I totally missed it in the Z80 doc ! Funny thing is, while the EXX is on the right page, the EX AF,AF' is on the left page - like staring right at me. That takes a special skill to miss :lol:

So, this means that I have 16 registers at my disposal ! That is even coming close to Jaguar's 32 registers on its both RISCs (DSP/GPU) and is absolutely destroying 6502, which only had 3 (A,X,Y).

Looks like I will actually be able to use algorithms from Jag and not really from 6502. Interesting ! Well, as long as I can keep the useable range in <0,255>, that is.
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Seven.FFF wrote: Sat Jun 13, 2020 2:42 pm 1. Yes. The user can still toggle the speed manually with F8 or NMI+8 hotkey, but that can also be disabled in another nextreg.

2. It’s probably the limit. 28MHz was hard to get stable enough. In this design, SRAM settle time is the limiting factor, and SRAM is already heavily used for layer 2.

3. Yes.

4. Standard Spectrums have contended memory where the ULA stops the CPU while rendering the screen. On the Next can be enabled (for compatibility) or disabled. The wait states At 28MHz wait states also slow down instructions slightly. Enabling the expansion bus slows down the CPU to 3.5MHz because legacy hardware addons are not designed to run any faster.

5. Yes. Due to historical NTSC and PAL standards, a 60Hz picture has fewer lines per frame than 50Hz does. There is no room in the Next for a full screen scanbuffer, LCD timings are pretty inflexible, so the machine timings have to be bent to match. The Next can also be compatible with Timex models, and these 60Hz timings are the same as in those models or the rarer NTSC Spectrum.

6. 256 colour lores can’t be double buffered, by 16 colour radastan can (second buffer in bottom half of 128K shadow screen in 8k bank 14). ULA VRAM must be held in FPGA BRAM because SRAM can’t be switched fast enough, and there is insufficient free BRAM for four 8K VRAM banks, which double buffering jimistan would need.

7. Yes, bank switching speed is only limited by how fast the nextreg instructions are.

8. MMU switching is all you need. Paging two 8k banks in slots 6 and 7 is about as fast as a 128K page including the setup.

9. This is how the Z80 was designed. The ALU circuit is separate from the 16 bit registers, so I imagine separate switches were more efficient. Both ex af, af’ and exx are the fastest possible instructions at 4Ts each, and as a coder it is often convenient to not have your operating value switched when you switch addresses. I rarely need to use both instructions together, unless I’m preserving all registers in an ISR.
Thanks for detailed info. Some more questions:

1. So, in short, can I, as a programmer, simply force 28 MHz at start-up of my game via NREG $07 ?
.
.
4. Oh, so this is what "contended" means. I was puzzled by it. But, as long as it can be disabled (hopefully via some NREG at start-up), all is good. I want all the cycles I can get :)
.
6. Unfortunate. I was hoping I misread. So, how exactly is it done, if one wants to use this mode ? DMA transfer after vblank ? Is that even fast enough for 12 KB of framebuffer ? Well, thinking about it, we don't need to transfer whole 12 KB. We can switch in the 8 KB and transfer just 4 KB. 4 KB should be doable in vblank, I think...
.
.
9. I was thinking of scanline traversal - where one set of registers would have Left Edge of polygon and shadow set of registers would have Right edge of polygon. In that scenario, I need all the registers I can get :)
Last edited by Heimdall on Sat Jun 13, 2020 11:55 pm, edited 1 time in total.
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Alone Coder wrote: Sat Jun 13, 2020 5:33 pm
Heimdall wrote: Sat Jun 13, 2020 1:32 pm So, I'm thinking of writing a Z80 backend for my multiplatform 3D engine, as a pet project [initially] as my current primary focus is on a different platform. I've written several of them in past (6502,6502C,68000-68040, DSP, RISC GPU, ...) but never for Z80.
For compatibility, I recommend Russian ATM2 standard, that is a full range of computers: ATM-Turbo2(+) @ 7 MHz 1 MB, ATM3 @ 7 MHz 4 MB, ZX Evo @ 14 MHz 4 MB, Pentagon 2.666LE @ 28 MHz nowait, 2 MB. The software library is over 400 titles and counting.
ATM2 wiring is simple, it can be supported in any FPGA based computer, including Spectrum Next.
Interesting, I wasn't aware of all those accelerated versions. So, the 28 MHz Next isn't the only fast one out there...

7 MHz - I'm not sure I want to butcher it that much. That's a 4:1 ratio in scene complexity (cycle budget)...
Approximately how many units of the 14/28 MHz are there ? Hundreds ? I need to make some research on this...

It's good to know about it as soon as possible so I can figure out if I can support more than one HW via simple #ifdef in the code.
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Alone Coder wrote: Sat Jun 13, 2020 5:33 pm
Heimdall wrote: Sat Jun 13, 2020 1:32 pm So, I'm thinking of writing a Z80 backend for my multiplatform 3D engine, as a pet project [initially] as my current primary focus is on a different platform. I've written several of them in past (6502,6502C,68000-68040, DSP, RISC GPU, ...) but never for Z80.
What functions does your 3D engine supply?

If your engine performs slow drawing, such as textured mapping, you can write "chunky to planar" procedures for different Z80 computers. For example, 2x2 chunky c2p for ATM's hardware multicolor looks like this:

pop hl ;chunks 0,1
ldd ;top line to (de)
ld a,(hl)
ld (bc),a ;bottom line to (bc)
exx ;another screen layer
pop hl ;chunks 2,3
ldd ;top line to (de)
ld a,(hl)
ld (bc),a ;bottom line to (bc)
exx ;back to the first screen layer

Used in The Board demo:
https://www.youtube.com/watch?v=bbUGcA_ ... e=emb_logo
Nice ! You got the textured characters ! I'm more a fan of flatshading, which could look quite nice at 256 colors on Next.

I have some texturing routines for axis-aligned polygons (walls/buildings/road) but am not a huge fan of pixelated textures at those resolutions - hence why I prefer flatshading, as it always looks nice and smooth.

The closest thing to compare it to would be something like StunRunner. I do have a working playable version with a 3D road (not just scanline 2D road), so a driving game is entirely within a reach. With 28 MHz, something like OutRun but with polygonal cars is doable (though, admittedly, still a lot of work :) ).
Either that or something like Star Raiders (but with 3D ships).
MtM
Dizzy
Posts: 88
Joined: Sun May 17, 2020 10:09 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by MtM »

Heimdall wrote: Sat Jun 13, 2020 11:15 pm Wow, just woke up and already have so many answers. Will get to each of them separately now.

Shouldn't have waited for a week on the other forum for activation that didn't happen (though, it just might - perhaps - later).
The SpecNext forum doesn't really get much traffic, which is a pity because it is the perfect place for everything,
but the community seems somewhat splintered via several FB groups, the SpecNext forum, these forums, discord,
just all over the place. I do not know how some people have the time to even try and keep up with them all.
Pity really because the community is very good - look at the superb replies you have received so far. I hope it
continues and you write your engine, good luck with it.
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Yeah, FB is the worst possible outlet for the technical questions. It all disappears after two weeks and you can't find it anymore. Unless you friend three hundred more people...
Not to mention the horrible formatting. It's wall of text or nothing...

Forums give you option of searching for content and find info from five years ago.

It also helps that people don't tend to post their dinner photos on the forums :)


I never tried discord, though...
User avatar
Seven.FFF
Manic Miner
Posts: 744
Joined: Sat Nov 25, 2017 10:50 pm
Location: USA

Re: Spectrum Next: Technical questions for 3D engine development

Post by Seven.FFF »

Heimdall wrote: Sun Jun 14, 2020 11:36 pmI never tried discord, though...
https://discord.gg/DM7n8Xa

Yes, you can switch CPU speed just by writing two bits of a nextreg.
Robin Verhagen-Guest
SevenFFF / Threetwosevensixseven / colonel32
NXtel NXTP ESP Update ESP Reset CSpect Plugins
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Seven.FFF wrote: Sun Jun 14, 2020 11:44 pm
Heimdall wrote: Sun Jun 14, 2020 11:36 pmI never tried discord, though...
https://discord.gg/DM7n8Xa

Yes, you can switch CPU speed just by writing two bits of a nextreg.
That's great. At least the player won't have to fiddle with the speed setting...

Thanks for invite - I just joined discord. It does, on a first glance, look better suited for discussions than FB.
MtM
Dizzy
Posts: 88
Joined: Sun May 17, 2020 10:09 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by MtM »

Heimdall wrote: Mon Jun 15, 2020 12:50 am
Seven.FFF wrote: Sun Jun 14, 2020 11:44 pm https://discord.gg/DM7n8Xa

Yes, you can switch CPU speed just by writing two bits of a nextreg.
That's great. At least the player won't have to fiddle with the speed setting...

Thanks for invite - I just joined discord. It does, on a first glance, look better suited for discussions than FB.
I joined that discord group a while back. I haven't visited in ages, just not fit for purpose for me,
people seemed to like to go with the colourful texts and lots of on screen bling, none of which
does anything. As you say people don't post as much dross in the forums. Just my opinion.
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

I only joined yesterday but so far it's been much better than comparable Atari FB groups.

There's definitely quite a few people who are willing to share technical information and aren't in it just for the joy of shi*ing on other's people work, which is in very stark contrast to the Atari folks who are amongst the most toxic people on this planet I was blessed to interact with.

In less than 24 hours I already learned a lot from discord.

There's no ads like on FB and it appears to scroll freely all the way back.

It's really like IRC , from way back in the day.
Alcoholics Anonymous
Microbot
Posts: 194
Joined: Mon Oct 08, 2018 3:36 am

Re: Spectrum Next: Technical questions for 3D engine development

Post by Alcoholics Anonymous »

Just adding to what has been said already.
Heimdall wrote: Sat Jun 13, 2020 1:32 pm Now, the questions:
1. CPU Speed: Setting 28 MHz : Register $07 appears to be able to set speed (3.5,7,14,28). Do I, as a programmer, have a full control over that without having to ask user to change speed manually ? Meaning, I set this register and poof, off I putz around at 28 MHz ?
As mentioned you can set speed at any time and it will immediately change. There might be a few cycle delay as the clock circuit has to make sure there aren't any glitches or runt pulses in the clock.

What has not been mentioned is that the user has the power to change the cpu speed at any time too by pressing F8 (nmi+8 on a Next keyboard). This will cycle the cpu speed through 3.5, 7, 14 and 28MHz. If you don't want that to happen, you'll need to disable that key in nextreg 0x06. This hotkey is really there for original spectrum software.

Also if the expansion bus is on, the cpu speed is locked at 3.5MHz no matter the setting. This is because spectrum peripherals cannot run at faster than 3.5MHz. NextZXOS keeps the expansion bus off during normal operation so this is not something to worry about.
2. CPU Speed: FPGA limit. Is 28 MHz at the limit of what the current FPGA board can pull off ? Or is there a hypothetical option in future that further speed upgrades could be made available via core update?
We'll have to see. It is approximately the limit the way the implementation currently is with multiple clock domains in it. That might change when the design moves to a single clock domain but really there is no way to know until then.

The other thing that limits speed, besides the fpga itself, is that there is more than one independent device accessing the sram. There's the CPU and layer 2, which is allowed to sit at any location in ram. That means the available memory bandwidth has to be shared with suitable margins between each memory transaction.
3. CPU Speed: Wait states at 28 MHz. One of the older threads mentioned that at this speed, there is a wait state inserted after each instruction, hence effectively slowing down the effective instruction throughput to something like 22-24 MHz. Is that still the case with latest core ?
At 28MHz there is one wait state inserted on every memory read cycle. I'm hopeful that eventually this will be eliminated but it's still there.
This one cycle wait state also applies to dma reads from memory. With the dma programmed to do 2-cycle reads and 2-cycle writes for a total of 4 cycles per byte copied, at 28MHz that gets turned into 5 cycles per byte copied.
4. CPU Speed: Real available cycles. Outside of the wait states@28 MHz, is there any system activity that would halt the CPU ? For example, on Atari, while the ANTIC chip was reading the framebuffer, the CPU was halted. This resulted in quite a few cycles "stolen" from the CPU by other chips in the system. How does Z80N fare here ? Say, at 28 MHz and 60 fps, do I really have 28,000,000 / 60 = 466,667 cycles per frame of CPU time ?
Nothing slows the CPU in the Next except the wait states applied at 28MHz. The original Spectrum has memory contention that pauses the CPU while it accesses memory the ULA is simultaneously accessing but this is an artificial construct in the Next that can be turned off. At speeds above 3.5MHz, there is no contention applied by the Next. If the speed is 3.5MHz, whether contention is applied is controlled by a setting in nextreg 0x08. NextZXOS normally keeps contention turned off so that native Next programs run without it. However, it turns it on when loading original Spectrum software.
5. PAL/NTSC: 50/60 Hz: Register $06 appears to be R/W and one of the bits is reserved for 50/60. Again - is that all there is to it ? I am in NTSC land, but what happens if I force 60 Hz for PAL folks ? Hopefully in 2020 and uniform LCDs all over the world we wouldn't have to resort to having two different builds (PAL and NTSC) ?
You can change 50/60Hz at any time; the change takes effect during the next vbi. As Robin mentioned, there isn't sufficient space on the fpga for a frame buffer so what this actually does is change from a PAL-like frame to an NTSC-like frame and vice versa. This necessarily means that there are fewer lines in an NTSC frame. The change in frame will cause the display to reacquire the signal and that will show a momentary flicker or blackout the length of which depends on the display. So you wouldn't want to change it back and forth willy-nilly :)
6. LoRes Layer/Radasjimian double buffering - I intend to design everything around 128x96x256. From my current understanding it would appear it's not really possible to do proper double buffering in this mode like it's possible in other modes ? Hopefully I am just misinterpreting here ? I mean I need to be able to flip to the secondary framebuffer somehow. 16k-bank5 (primary) vs 16k-bank7 (secondary) ?
Robin touched on this too. The memory shared with the ULA is held in bram rather than in sram so that the ULA (and lores and tilemap) can have high bandwidth access to memory without slowing the CPU. A full 16K is given to bank 5 but only the first 8K of bank 7 is in bram because there isn't enough bram to go around. A consequence of this is that only the 128K's shadow buffer can come from bank 7. The Timex modes and lores are confined to bank 5. While it's true a half size Radastan lores screen could be mapped to the first 8K of bank 7, in reality there is no real advantage to doing this when it can already be double buffered in bank 5. Perhaps moving Radastan to bank 7 without double buffer capability would free space in bank 5 if using the tilemap but the future is likely to have the tilemap addressing bank 7 too.

The are two lores modes -- Radastan from the uno at 128x96x4 and Jimistan at 128x96x8. Jimistan occupies both Timex display files at 0x4000 (the top half of the display) and 0x6000 (the bottom half of the display). Radastan is half the size and only occupies one Timex display file at either 0x4000 or 0x6000. So you can double buffer Radastan lores as there is a switch that will swap between the two sources (0x4000 or 0x6000).

DMA on the Next transfers about 110K per 50Hz frame for a copy at 28MHz and a full Jimistan lores screen is 12288 bytes. The DMA currently occupies the same bus as the CPU so only one of these can run at a time.
8. Bank-Switching : MMU vs 128-style. It would appear that 128-style is quite inflexible and MMU 0-7 gives me full control over which target/source bank I need. Is there any clear disadvantage to using 8k scheme via 8 MMUs ?
No but you may have to think about it (a little) if calling into the operating system.
9. EXX instruction: Exchanges BC, DE, and HL with their shadow registers. AF and AF' are not exchanged. Why is that ? Why can't I have a shadow accumulator exchanged ? My Z80 manual didn't mention anything about this, so I'm curious as to why Spectrum Next wouldn't allow it ? Now, I'm not complaining, compared to 6502, Z80 has 15 registers : A + 2x(B,C,D,E,H,L) + IX/IY. But why ?
Swapping AF separately allows you to communicate a single byte and flag state to the other exx set.

Layer 2 also offers 256x192x8, 320x256x8 and 640x256x4 resolutions. The screen sizes are larger but there are some new instructions related to ldi/ldir that can help somewhat.

You can maybe see some indication of the difference in speed from these demos, the first is using 256x192 layer 2 and the second is using lores. The lores version does not have sprites in it so maybe it's best to compare speed when the layer 2 version isn't showing any sprites as well. Despite doubling the resolution they look similar in quality; I think that must be due to the resolution of the source material.

https://www.youtube.com/watch?v=VJQCVfEmnI0

https://www.youtube.com/watch?v=lcQmN9YRl34
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Alcoholics Anonymous wrote: Wed Jun 17, 2020 4:51 am Just adding to what has been said already.

As mentioned you can set speed at any time and it will immediately change. There might be a few cycle delay as the clock circuit has to make sure there aren't any glitches or runt pulses in the clock.

What has not been mentioned is that the user has the power to change the cpu speed at any time too by pressing F8 (nmi+8 on a Next keyboard). This will cycle the cpu speed through 3.5, 7, 14 and 28MHz. If you don't want that to happen, you'll need to disable that key in nextreg 0x06. This hotkey is really there for original spectrum software.
Good catch. I would need to disable the user's ability to slow down CPU clock on me given that I would design the game around the ~0.5M cycle budget / frame.

Alcoholics Anonymous wrote: Wed Jun 17, 2020 4:51 am At 28MHz there is one wait state inserted on every memory read cycle. I'm hopeful that eventually this will be eliminated but it's still there.
This one cycle wait state also applies to dma reads from memory. With the dma programmed to do 2-cycle reads and 2-cycle writes for a total of 4 cycles per byte copied, at 28MHz that gets turned into 5 cycles per byte copied.
5c/byte should be pretty quick to copy whole screen's 12 KB - e.g. ~60k cycles out of ~0.5M. That's great.
Alcoholics Anonymous wrote: Wed Jun 17, 2020 4:51 am Nothing slows the CPU in the Next except the wait states applied at 28MHz. The original Spectrum has memory contention that pauses the CPU while it accesses memory the ULA is simultaneously accessing but this is an artificial construct in the Next that can be turned off. At speeds above 3.5MHz, there is no contention applied by the Next. If the speed is 3.5MHz, whether contention is applied is controlled by a setting in nextreg 0x08. NextZXOS normally keeps contention turned off so that native Next programs run without it. However, it turns it on when loading original Spectrum software.
I only learnt [somewhat] about Spectrum's contention in last two days. For 28 MHz mode, all I needed to hear is that there is no contention and I'm a happy man :)

Alcoholics Anonymous wrote: Wed Jun 17, 2020 4:51 am You can change 50/60Hz at any time; the change takes effect during the next vbi. As Robin mentioned, there isn't sufficient space on the fpga for a frame buffer so what this actually does is change from a PAL-like frame to an NTSC-like frame and vice versa. This necessarily means that there are fewer lines in an NTSC frame. The change in frame will cause the display to reacquire the signal and that will show a momentary flicker or blackout the length of which depends on the display. So you wouldn't want to change it back and forth willy-nilly :)
Yeah, that would be a one-time, start-up-only change. I still haven't figured exactly how to handle PAL/NTSC (especially audio!), but I presume I will make the adjustments during start-up.
Alcoholics Anonymous wrote: Wed Jun 17, 2020 4:51 am Robin touched on this too. The memory shared with the ULA is held in bram rather than in sram so that the ULA (and lores and tilemap) can have high bandwidth access to memory without slowing the CPU. A full 16K is given to bank 5 but only the first 8K of bank 7 is in bram because there isn't enough bram to go around. A consequence of this is that only the 128K's shadow buffer can come from bank 7. The Timex modes and lores are confined to bank 5. While it's true a half size Radastan lores screen could be mapped to the first 8K of bank 7, in reality there is no real advantage to doing this when it can already be double buffered in bank 5. Perhaps moving Radastan to bank 7 without double buffer capability would free space in bank 5 if using the tilemap but the future is likely to have the tilemap addressing bank 7 too.

The are two lores modes -- Radastan from the uno at 128x96x4 and Jimistan at 128x96x8. Jimistan occupies both Timex display files at 0x4000 (the top half of the display) and 0x6000 (the bottom half of the display). Radastan is half the size and only occupies one Timex display file at either 0x4000 or 0x6000. So you can double buffer Radastan lores as there is a switch that will swap between the two sources (0x4000 or 0x6000).

DMA on the Next transfers about 110K per 50Hz frame for a copy at 28MHz and a full Jimistan lores screen is 12288 bytes. The DMA currently occupies the same bus as the CPU so only one of these can run at a time.
110k ? Sounds about right with the 5c/byte. So, no need to worry about double-buffering. I think I saw some sample DMA code flying around somewhere...

Alcoholics Anonymous wrote: Wed Jun 17, 2020 4:51 am Layer 2 also offers 256x192x8, 320x256x8 and 640x256x4 resolutions. The screen sizes are larger but there are some new instructions related to ldi/ldir that can help somewhat.
While RAM isn't a problem on Next (two 256x192x8 framebuffers consume ~100 KB), scanline traversal is. Doubling vertical resolution literally doubles the most expensive stage of the 3D pipeline - scanline traversal. I suspect the Pixel Fill stage would do something like LDIR, so doubling horizontal resolution won't affect framerate too much.

Any chance of configuring the video modes on Next programmatically ? I would love 256x96 - keeps scanline traversal in check but doubles the pixels (which are extra-super cheap [well, almost free] to draw in 256 color mode) :)

Alcoholics Anonymous wrote: Wed Jun 17, 2020 4:51 am You can maybe see some indication of the difference in speed from these demos, the first is using 256x192 layer 2 and the second is using lores. The lores version does not have sprites in it so maybe it's best to compare speed when the layer 2 version isn't showing any sprites as well. Despite doubling the resolution they look similar in quality; I think that must be due to the resolution of the source material.
Just few weeks ago, on a different sub-100 MHz FPGA platform, I was running benchmarks of my ASM flatshader up to 1920x1080, so I finally have a pretty good idea about each stage of the pipeline (benchmarked set of 10M polys), how many cycles it takes and how it scales. Of course, Z80 implementation will be different, but the principles are identical, regardless of platform - you clear framebuffer, cull world polygon soup into on-screen subset, transform vertices, scanline traversal + pixel fill.

I don't really see a big problem in eventual switching from 128x96 to 256x192 or even 320x256. Maybe bank switching will complicate it a bit...

I will make sure that all resolution-related references are actual variables, not compile-time constants, which should make the switch to a higher res much easier, should the need arise.
Alcoholics Anonymous
Microbot
Posts: 194
Joined: Mon Oct 08, 2018 3:36 am

Re: Spectrum Next: Technical questions for 3D engine development

Post by Alcoholics Anonymous »

Heimdall wrote: Wed Jun 17, 2020 8:07 am Yeah, that would be a one-time, start-up-only change. I still haven't figured exactly how to handle PAL/NTSC (especially audio!), but I presume I will make the adjustments during start-up.
There are some things coming that may help there. A Z80 CTC (counter timer) with 8-channels is going in that can generate timed interrupts independent of the video frame (other options like the ula vbi interrupt and line interrupt are connected to the frame). The copper is able to do stereo digital audio; it waits on display position so is also connected to the video frame but it's probably not too bad to do with some cpu work each frame. The DMA can do mono digital audio but then you can't use it for anything else. The pi can do audio too but then you're requiring the user to have a pi installed. Longer term there might be other things but it will depend on fpga space.

The problem with the interrupts approach is that you'll be using the DMA in continuous mode for copies. That means it holds the bus during the entire operation and that can cause the cpu to miss interrupts (highly likely now but this is changing soon) or receive delayed interrupts. To make things work, you can shorten the length of each dma transfer to give an opportunity for interrupts to occur at short regular intervals. The DMA has a "continue" command that makes it easy to repeat the last operation with the same length but starting from where it left off (src and dst) that can make this easy.
Any chance of configuring the video modes on Next programmatically ? I would love 256x96 - keeps scanline traversal in check but doubles the pixels (which are extra-super cheap [well, almost free] to draw in 256 color mode) :)
Ha, there are so many things yet to do and so little time to do it in :)
Maybe rendering a line and then copying or plotting it to the next line in 256x192 mode is a way to save some cycles in the meantime?
I will make sure that all resolution-related references are actual variables, not compile-time constants, which should make the switch to a higher res much easier, should the need arise.
I like the idea of keeping the options open :)
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Alcoholics Anonymous wrote: Thu Jun 18, 2020 1:28 am
Heimdall wrote: Wed Jun 17, 2020 8:07 am Yeah, that would be a one-time, start-up-only change. I still haven't figured exactly how to handle PAL/NTSC (especially audio!), but I presume I will make the adjustments during start-up.
There are some things coming that may help there. A Z80 CTC (counter timer) with 8-channels is going in that can generate timed interrupts independent of the video frame (other options like the ula vbi interrupt and line interrupt are connected to the frame). The copper is able to do stereo digital audio; it waits on display position so is also connected to the video frame but it's probably not too bad to do with some cpu work each frame. The DMA can do mono digital audio but then you can't use it for anything else. The pi can do audio too but then you're requiring the user to have a pi installed. Longer term there might be other things but it will depend on fpga space.

The problem with the interrupts approach is that you'll be using the DMA in continuous mode for copies. That means it holds the bus during the entire operation and that can cause the cpu to miss interrupts (highly likely now but this is changing soon) or receive delayed interrupts. To make things work, you can shorten the length of each dma transfer to give an opportunity for interrupts to occur at short regular intervals. The DMA has a "continue" command that makes it easy to repeat the last operation with the same length but starting from where it left off (src and dst) that can make this easy.
Well, I'm still many months away from needing audio code, but I need to at least understand the options now. I admit, I hate writing audio interrupts, it's so hard to hunt down random bugs there...
Also, I don't fancy writing synth code...

When do you think the 8-channel + CTC will go in ? Ballpark estimate ?
How many audio libraries are now available for Next ?
Alcoholics Anonymous wrote: Thu Jun 18, 2020 1:28 am
Any chance of configuring the video modes on Next programmatically ? I would love 256x96 - keeps scanline traversal in check but doubles the pixels (which are extra-super cheap [well, almost free] to draw in 256 color mode) :)
Maybe rendering a line and then copying or plotting it to the next line in 256x192 mode is a way to save some cycles in the meantime?
Yeah, duplicating the current scanline should be criminally primitive to implement. It'll be fun benchmarking it against 256x192 and seeing the exact cycle difference!
Alcoholics Anonymous wrote: Thu Jun 18, 2020 1:28 am Ha, there are so many things yet to do and so little time to do it in :)
Well, in a year or two, you'll probably run out of FPGA space, though - right ?
Alcoholics Anonymous
Microbot
Posts: 194
Joined: Mon Oct 08, 2018 3:36 am

Re: Spectrum Next: Technical questions for 3D engine development

Post by Alcoholics Anonymous »

Heimdall wrote: Sat Jun 20, 2020 5:56 am When do you think the 8-channel + CTC will go in ? Ballpark estimate ?
Probably in the next few days. It really depends on when I can put together a few hours to work on it.
How many audio libraries are now available for Next ?
Not much in terms of libraries at the moment. People are doing copper stereo digi music, pi music and dma music. dma digital music is a little bit of a non-starter at the moment because there is only one dma channel and dma is too important to take just for audio in most programs. Pi music and dma music are fairly easy to set up.

There are several AY music players from the Spectrum and a new one that uses all 3 AY chips. At least one person is trying to do a mod player now.
Well, in a year or two, you'll probably run out of FPGA space, though - right ?
It's been there already for a year. The design is not optimal though so there is still room to improve on things there.
Heimdall
Dizzy
Posts: 64
Joined: Sat Jun 13, 2020 12:44 pm

Re: Spectrum Next: Technical questions for 3D engine development

Post by Heimdall »

Alcoholics Anonymous wrote: Sun Jun 21, 2020 3:21 am [Probably in the next few days. It really depends on when I can put together a few hours to work on it.
Cool, I will keep an eye out :)
Alcoholics Anonymous wrote: Sun Jun 21, 2020 3:21 am Not much in terms of libraries at the moment. People are doing copper stereo digi music, pi music and dma music. dma digital music is a little bit of a non-starter at the moment because there is only one dma channel and dma is too important to take just for audio in most programs. Pi music and dma music are fairly easy to set up.

There are several AY music players from the Spectrum and a new one that uses all 3 AY chips. At least one person is trying to do a mod player now.
A sample program that sets up IRQ and feeds the audio registers (from some array) would be probably more than enough, to begin playing with.
I think I saw a copper sample somewhere, not sure about AY sample...
Alcoholics Anonymous wrote: Sun Jun 21, 2020 3:21 am It's been there already for a year. The design is not optimal though so there is still room to improve on things there.
So, for a year, you only could add new functionality by refactoring and trying to squeeze in new stuff ? Nice :)
Post Reply