CPI / CPIR instruction

The place for codemasters or beginners to talk about programming any language for the Spectrum.
User avatar
Ast A. Moore
Dynamite Dan
Posts: 1188
Joined: Mon Nov 13, 2017 3:16 pm

Re: CPI / CPIR instruction

Post by Ast A. Moore » Fri Sep 27, 2019 11:10 pm

R-Tape wrote:
Fri Sep 27, 2019 8:47 pm
can anyone explain why they have the same number of tstates, but one is clearly doing a lot more than the other?
Uh . . . The short answer is: it’s complicated.

The long answer is itself complicated.

You see, when analyzing these combo instructions, it’s best not to rewrite them in pseudocode like you did. Your pseudocode is correct, but only in breaking down the logic of the instruction. That is how the CPU arrives at the result, but that’s actually not what it’s doing.

A better way of breaking down any instruction is to think of it in machine cycles, not T states. Each machine cycle can take several T states, and each instruction takes at least one M cycle—the opcode fetch. The absolute minimum number of T states in a fetch M cycle is four. Some instructions take just that many T states (say, INC A). That’s how long it takes to place the PC register on the address bus and read the opcode. Extended instructions (prefixed by ED, CB, DD, and FD), take an additional 4 T states, because their opcodes are two bytes long. IX and IY bit instructions (prefixed by DDCB and FDCB) take even longer. Compare, for example, the regular LD HL,(**) instruction (opcode 22; 16 T states) with its undocumented counterpart (opcode ED6B; 20 T states).

Now, each of the pseudocode instructions that you wrote out doesn’t need to be fetched and parsed individually; only one instruction fetch happens in either LDI or CPI. Since those are extended instructions (with the ED prefix), the fetch machine cycle for each takes 8 T states.

Next machine cycles (if they exist at all) are for moving data between the CPU and the RAM/ROM or other devices (I/O). They can take anywhere from three to five T states. Some instruction don’t move any data (INC A) and thus take much less time. Incrementing an index register, however, will take longer, because, say, INC IXh is an extended instruction; it takes another 4 T states to fetch the second byte of its opcode. Yet something like EX (SP),IX can take as many as six machine cycles and 23 T states (!) (two fetches, two memory reads and two writes—one for each byte).

The internal workings of the Z80 are not as easily broken down timing-wise and they do depend on numerous factors, including, as you put it—“the wiring.” Suffice it to say, that actually incrementing a register (or register pair) doesn’t take up 6 T states. Moreover, increments and decrements can be grouped together and impose little to no overhead when executed simultaneously—they’re not necessarily cumulative. The incrementer/decrementer circuity in the Z80 is quite clever and can do various things. It can, too, pass a value without incrementing or decrementing it; thus, similar to the WZ register pair, it can be used for storing data temporarily.

The HL/DE registers pairs can be very easily swapped in hardware. In fact, they are not strictly speaking physically separate registers at all. Instructions like EX DE,HL don’t actually exchange data between DE and HL, but it sure looks like it to the programmer.

Some internal operations in the Z80 can be pipelined and thus overlap, but not all. For example, you can’t directly copy a value from one register to another (yes, even the LD B,C mnemonic is a lie). The operation must be done through the ALU. But the ALU in the Z80 is 4-bit, and using it for transferring data between 16-bit registers would be too slow. It’s much faster to use the incrementer/decrementer circuity for that. Now, the ALU (and register) operations can finish while the CPU is fetching another instruction, but since that requires the incrementer/decrementer latch, if an instruction requires its use, it must be completed first before the next instruction can be fetched. This explains why INC A is faster than INC HL, for example. Block transfers (LDIR, CPIR, etc.) sure use the incrementer latch a lot.

Like I said, it’s complicated. Hopefully, I’ve now confused you beyond reason, and you have no desire to investigate the matter any further. :lol:
3 x
Every man should plant a tree, build a house, and write a ZX Spectrum game.

Author of A Yankee in Iraq, a 50 fps shoot-’em-up—the first game to utilize the floating bus on the +2A/+3,
and zasm Z80 Assembler syntax highlighter.

User avatar
Manic Miner
Posts: 536
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: CPI / CPIR instruction

Post by djnzx48 » Fri Sep 27, 2019 11:17 pm

Those instructions sequences aren't exactly equivalent as CPI/LDI set flags if BC is equal to zero.

If LDI and CPI take the same number of T-states, my guess is that CPI uses the same circuitry but doesn't output the value to memory, or maybe just writes it back to HL.
0 x

User avatar
Manic Miner
Posts: 509
Joined: Wed Nov 15, 2017 2:52 pm
Location: Sunny Somerset in the U.K. in Europe

Re: CPI / CPIR instruction

Post by 1024MAK » Sat Sep 28, 2019 8:58 am

The Z80 MPU has a number of features (including clever ideas) that make it a bit unconventional.

Remember, the mnemonics are only there to help humans remember the effect of the instruction. They do not necessarily accurately indicate how the Z80 carries out the operation. As all sorts of hardware tricks take place. The exchange of alternative registers sets is a good example. No copy/swap operation takes place, instead a single latch/flip-flop bit changes state to tell the Z80 which registers are the current in-use set.

The other thing to remember, is that MPU/CPU design is closely tied in with memory performance. At the time that the Z80 was designed, DRAM and ROM memory chips were painfully slow (in fact, DRAM memory is still painfully slow, we have just come up with many more tricks to make it look a bit faster). Hence where possible MPU/CPU designers avoided unnecessary memory accesses where they could. Memory was also very expensive. So again, instructions that did a lot of useful work for not many bytes of code were favoured, so that code could be compact.

One limiting factor with the Z80 design, is that because it was designed to run 8080 code, this rather limited the flexibility of the instruction set. Hence there are a lot of operations that take longer than is actually needed compared with if you started with a clean sheet approach.

1 x

User avatar
Juan F. Ramirez
Rick Dangerous
Posts: 2014
Joined: Tue Nov 14, 2017 6:55 am
Location: Málaga, Spain

Re: CPI / CPIR instruction

Post by Juan F. Ramirez » Sat Sep 28, 2019 12:23 pm

I read the whole thread because of @R-Tape 's meme, I don't usually wander around this kind of threads! No idea of coding! :mrgreen:
1 x

User avatar
Manic Miner
Posts: 602
Joined: Mon Nov 13, 2017 8:50 am
Location: Bristol, UK

Re: CPI / CPIR instruction

Post by Morkin » Sat Sep 28, 2019 1:26 pm

Juan F. Ramirez wrote:
Sat Sep 28, 2019 12:23 pm
I read the whole thread because of @R-Tape 's meme, I don't usually wander around this kind of threads! No idea of coding! :mrgreen:
Heh - I'm the same with the hardware threads, enjoy reading them but no idea what anyone's talking about... :lol:
1 x

Post Reply