A new data compressor called ZX0

Einar Saukas · Post by **Einar Saukas** » Tue Jan 19, 2021 7:26 am

When you need to compress data in your Spectrum programs, you usually have 2 options:

Use a complex tool like Exomizer, that will compress your data as much as possible, but then you will need to use a slower larger routine to decompress it.

Use a simpler tool like ZX7, that gives you a small routine to decompress your data much faster, but then your compressed data won't be so small.

Wouldn't it be nice to have a simpler compressor that works like ZX7, but that could compress data as much as Exomizer? It this even possible?

Yes, it is! It's called ZX0

The full source code and documentation is here:

https://github.com/einar-saukas/ZX0

The only noticeable difference from ZX7, is that the cross-platform ZX0 compressor can take several seconds to compress data. For this reason, ZX0 compressor provides an extra option "-q" (quick mode) for faster, non-optimal compression. Thus when you need to repeatedly modify, compress and retest a certain file, you can use option "-q" to speed up this process, then later remove this option to compress even better. Except for this, everything else works exactly the same in ZX0 and ZX7. In particular, the standard ZX0 decompressor has 82 bytes only, and it decompresses data at about the same speed as ZX7.

TEST RESULTS

To ensure that my test results are not biased, I will be posting here a comparison between different compressors using someone else's data. This way, I can be sure I'm not giving preference (even involuntarily) to the kind of data that would conveniently work better for ZX0.

I just found quite a few interesting articles on the web that compare Z80 data compressor results. However it seems the dataset used in these comparison is only available for download in one of them:

https://hype.retroscene.org/blog/dev/740.html

This is an excellent article written by introspec. It's in Russian but Google translate does a good job here, so I recommend reading it. I hope he won't mind that I will use his dataset to complement his comparison!

This is the original table presented in this article:

Code: Select all

             unpacked   ApLib  Exomizer   Hrum    Hrust1   Hrust2    LZ4     MegaLZ  Pletter5 PuCrunch   ZX7

Calgary       273526    98192    96248   111380   103148   102742   120843   109519   106650    99041   117658
Canterbury     81941    26609    26968    31767    28441    28791    34976    31338    30247    27792    32268
Graphics      289927   169879   164868   173026   169221   171249   195544   172089   171807   169767   172140
Music         151657    59819    59857    62977    60902    62678    77617    62568    63661    63977    66692
Misc          436944   252334   248220   262508   251890   255363   293542   261396   263432   256278   265121

TOTAL:       1233995   606833   596161   641658   613602   620823   722522   636910   635797   616855   653879

Noticed that, in the original article, Exomizer provided the best results overall, even using an older version. To be fair, I have tested it again using latest version Exomizer 3.1.0, that provided even better results. I have also tested ZX0 both in full and "quick" modes, for convenience. These are the new results:

Code: Select all

              ZX7       ZX0-q   Exomizer3    ZX0

Calgary      117658    116685     95997     99993
Canterbury    32268     30725     26365     26337
Graphics     172140    164692    163915    163647
Music         66692     61923     58586     58388
Misc         265121    249638    245434    244252

TOTAL:       653879    623663    590297    592617

These results indicate that ZX0 didn't perform so well in the Calgary test (Exomizer, ApLib and PuCrunch all compressed better in this case). However ZX0 performed better than every other compressor in all other test cases. The TOTAL result shows ZX0 surpassed only by Exomizer3, just because of the Calgary test. Does it mean that Exomizer3 is a better compressor overall? Well, I don't think so...

The Calgary test is a subset of the "classic" Calgary corpus. Exactly 7 out of 8 files in this subset are pure text, with an average size of 36Kb each. Although this is an interesting test, it doesn't represent the typical content you need to compress in a Spectrum. What kind of Spectrum program needs 36Kb of compressed text? Perhaps a text-only adventure, but in this case you would either compress it separately as multiple smaller blocks, or use a dictionary-based custom compression so you could access sentences individually.

The Canterbury test files have more diversity, but they are not typical Spectrum content either. The best representation for the kind of data that needs to be compressed in Spectrum programs are the remaining test cases: Graphics (Spectrum screens, etc), Music (AY, etc) and Misc (tape and disk images). In these test cases, ZX0 worked noticeably better.

However I'm not saying that ZX0 will always compress better than Exomizer3 in practice. My other tests seem to indicate that Exomizer3 works better in a few practical cases and ZX0 in others. The difference is so close, that it's hard to choose a winner.

The real advantage of ZX0 is to achieve about the same compression as Exomizer3, but using a much smaller faster decompressor.

So give it a try! Have fun

azesmbog · Post by **azesmbog** » Tue Jan 19, 2021 9:13 am

I am a rather lazy user, so I have always used something simpler, usability is sometimes more important to me

Therefore, I used mlz
I tried this compressor, as well as mlz for convenience, here is the result:

Code: Select all

File name        Size              
-------------    ---------------   
crugi.bin                 10,987   
crugi.bin.zx0              1,406   
crugi.bin.mlz              1,548
  
krugi.scr                  6,912   
krugi.scr.zx0                657   
krugi.scr.mlz                692

my data is "loose", so good compression

I will use it, thanks!
* - zx0 -standard compression

R-Tape · Post by **R-Tape** » Tue Jan 19, 2021 11:08 am

Nice one Einar.

I'm a regular ZX7 user, and of the screens I've checked so far, ZX0 saves at least 200 bytes per screen, which will be very useful in things like a woot slideshow.

Cheers!

utz · Post by **utz** » Tue Jan 19, 2021 4:23 pm

Fantastic work, thanks a lot Einar! Running a few tests shows that ZX0 works very well for my most common use case, compressing data for music players. Very exited to see something that can beat apack.

Einar Saukas · Post by **Einar Saukas** » Tue Jan 19, 2021 10:49 pm

Thank you!

In the meantime, I have implemented support for ZX0 in RCS:

https://github.com/einar-saukas/RCS

It works exactly like support for ZX7.

arkannoyed · Post by **arkannoyed** » Wed Jan 20, 2021 12:07 pm

Having read the stuff on GitHub, am I correct that zx0 has the ability to utilise a ‘dictionary’ at a different location to where the code decompresses?

This really is wonderful. Any chance of an overview of the compression methods?

introspec · Post by **introspec** » Wed Jan 20, 2021 12:26 pm

Very impressive work Einar! I do not think this is better in terms of compression ratio than the latest Exomizer, but it is definitely in the same league, and with a lot less demanding decompressor. On average, it beats older versions of Exomizer in my tests, and the decompression speeds I am seeing even with the standard decompressor are faster than the fastest Exomizer can do at present. I think that if the decompression speed could get at least to the speed of recent aplib decompressors, you'll have a definite winner.

I do not have much time during the term time, but I'll definitely study your decompressors and will try to contribute to them, if/when I can.

P.S. I know you usually don't keep version numbers in your software, but could you please consider adding a version number to ZX0? It makes it so much easier to identify pieces of software evolving over the years.

Einar Saukas · Post by **Einar Saukas** » Wed Jan 20, 2021 3:41 pm

arkannoyed wrote: ↑Wed Jan 20, 2021 12:07 pmHaving read the stuff on GitHub, am I correct that zx0 has the ability to utilise a ‘dictionary’ at a different location to where the code decompresses?

Not exactly a dictionary. Compressing with prefix (or suffix) means that, if you know there will be certain data in the area immediately before (or after) you are decompressing something, then you can use this information to improve compression.

Although I agree this could be used like a dictionary. If you put together a block of common words, and use it as prefix when compressing other text blocks, then it will help reduce the compressed size of the other blocks. However this is not exactly dictionary-based compression. If you need to compress lots of text, it's probably better to use a dictionary-based custom compression instead.

arkannoyed wrote: ↑Wed Jan 20, 2021 12:07 pmThis really is wonderful. Any chance of an overview of the compression methods?

The ZX0 format is very simple. There are only 3 kinds of blocks:

Literal (copy next N bytes from compressed input to output)

Code: Select all

0  <length>  <byte1>  <byte2>  ...  <byteN>

Copy from last offset (repeat N bytes from last offset to output)

Code: Select all

0  <length>

Copy from new offset (repeat N bytes from new offset to output)

Code: Select all

1  <MSB_offset>  <LSB_offset>  <length-1>

We only need 1 bit to distinguish these blocks, because we cannot have 2 consecutive literal blocks, and reusing last offset only makes sense after a literal block.

Both length and offset MSB are stored using Elias Gamma Coding. The offset LSB is stored using 7 bits instead of 8, because it produces better results in most practical cases.

The compressed block always start with a literal, so the first bit is omitted.

introspec · Post by **introspec** » Wed Jan 20, 2021 3:47 pm

Einas, in your decompressors you are using a new kind of optimization which I am struggling to get my head around. Specifically, you do something like this:

Code: Select all

copy_literals:
	call read_elias_code	; for the length of literal run
	ldir
	add a,a : jr c,read_new_offset
	call read_elias_code	; for match length
	
	; [skipped code for copying matches]
	
	add a,a : jr nc,copy_literals
read_new_offset:
	call read_elias_code_carry

On both occasions with the command ADD A,A you get a new bit from the bit reservoir in A to see what needs to be done next. What I am struggling to understand is: why do you NOT need to check if A==0 after ADD A,A? I understand (and used in my decompressors before) a somewhat subtle idea that if after ADD A,A flag C is down, then we are certain that A does not need to be reloaded. However, I am failing to see, how the alternative works. If after ADD A,A flag C is up, how do you know that it is an actual 1 from the bitstream, and not the "signal bit" 1 that was pushed into A using LD A,(HL) : INC HL : RLA during bit stream parsing?

UPDATE 1: OMG, I think I understand. Gamma codes always have odd number of bits, so for your chosen format EVERY POSSIBLE COMMAND uses exactly EVEN number of bits from the bitsteam. Hence, the flag bit can never exhaust the bit reservoir. INSANE. I love it. And this also means that you cannot really use 8 bit LSB with the same decompressor structure, as it will break literally EVERYTHING

UPDATE 2: However, since this is the case, why did you not use interlaced gamma code? I.e. replace 1, 01x, 001xx, etc by 1, 0x1, 0x0x1, etc? Since your encoding can easily ensure which gamma bits occupy even and odd positions in the bitstream, this would be a marvelous speed optimization. I suspect that the size-optimized code can also be made smaller with this style of encoding...

Joefish · Post by **Joefish** » Wed Jan 20, 2021 3:49 pm

That's what I used in my encoding! Actually I think I cached two or three recently used offsets in a stack, although your usage of one bit to pick from your three options is pretty ruddy clever.

Although why do you say that using 'last offset' only makes sense after a literal block? If you cached the previous offset when you define a new one, you could still revert to it (or switch new and old) after copying from the new offset. There are going to be some savings to be found there too... maybe... with graphics, or anywhere there's lots of repetition with small variations from one copy to the next.

Alone Coder · Post by **Alone Coder** » Wed Jan 20, 2021 4:02 pm

Introspec has excluded native compressors RIP, mRIP, and ZXRar from his article. ZXRar can be easily tested in NedoOS:

calgary - 98571 (lost to Exomizer3)
canterbury - 27939
graphics - 163173 (better than others)
music - 63721
misc - 254202

RAR headers included.
Decompressor size is 342 bytes (uses 1566 bytes for temporary buffer).

TR-DOS version of ZXRar optionally supports compression with 2-byte blocks (helpful for code), with 386-byte decompressor. It has to be tested too.

The other compressors use similar methods, but smaller decompressors (mRIP is 220 bytes), and no headers.

These formats, if implemented on PC with optimal compression (like 7zip does with ZIP format), can be more effective with the same decompressors.

introspec · Post by **introspec** » Wed Jan 20, 2021 4:03 pm

Joefish wrote: ↑Wed Jan 20, 2021 3:49 pmAlthough why do you say that using 'last offset' only makes sense after a literal block? If you cached the previous offset when you define a new one, you could still revert to it (or switch new and old) after copying from the new offset. There are going to be some savings to be found there too... maybe... with graphics, or anywhere there's lots of repetition with small variations from one copy to the next.

If you do not have a literal block there, than it means you can encode the same data by simply using longer match length. So, it already represents a fairly substantial saving. This type of context dependent encodings are popular in good modern compression algorithms, such as LZMA. I think the cleanest discussion of this kind of ideas can be found in the Charles Bloom article about his compression format LZNIB: http://cbloomrants.blogspot.com/2019/01 ... rithm.html

Alone Coder · Post by **Alone Coder** » Wed Jan 20, 2021 5:31 pm

Alone Coder wrote: ↑Wed Jan 20, 2021 4:02 pm calgary - 98571 (lost to Exomizer3)
canterbury - 27939
graphics - 163173 (better than others)
music - 63721
misc - 254202

RAR headers included.
Decompressor size is 342 bytes (uses 1566 bytes for temporary buffer).

TR-DOS version of ZXRar optionally supports compression with 2-byte blocks (helpful for code), with 386-byte decompressor. It has to be tested too.

Temporarily added 2-byte blocks in NedoOS version:
calgary - 98525
canterbury - 27911
graphics - 163943
music - 62740
misc - 250341

So, some files pack better, and some others worse.
I suppose, a future ZXRar compressor must automatically switch 2-byte blocks on and off depending on the results for a given file.

Einar Saukas · Post by **Einar Saukas** » Wed Jan 20, 2021 7:08 pm

introspec wrote: ↑Wed Jan 20, 2021 12:26 pmVery impressive work Einar!

Thank you!

introspec wrote: ↑Wed Jan 20, 2021 12:26 pmI do not think this is better in terms of compression ratio than the latest Exomizer, but it is definitely in the same league, and with a lot less demanding decompressor.

Agreed! This was exactly my conclusion.

introspec wrote: ↑Wed Jan 20, 2021 12:26 pmOn average, it beats older versions of Exomizer in my tests, and the decompression speeds I am seeing even with the standard decompressor are faster than the fastest Exomizer can do at present. I think that if the decompression speed could get at least to the speed of recent aplib decompressors, you'll have a definite winner.

The "mega" ZX0 decompressor is faster than all ZX7 decompressors. And according to this link, ZX7 is faster than aplib. Unless there's a much faster aplib decompressor out there?

introspec wrote: ↑Wed Jan 20, 2021 12:26 pmI do not have much time during the term time, but I'll definitely study your decompressors and will try to contribute to them, if/when I can.

Thanks!

introspec wrote: ↑Wed Jan 20, 2021 12:26 pmP.S. I know you usually don't keep version numbers in your software, but could you please consider adding a version number to ZX0? It makes it so much easier to identify pieces of software evolving over the years.

ZX0 is already an optimal compressor, so I don't expect it to evolve any further. But if I make any significant changes, I will add a version number to distinguish it. OK?

Joefish · Post by **Joefish** » Wed Jan 20, 2021 7:19 pm

introspec wrote: ↑Wed Jan 20, 2021 4:03 pm If you do not have a literal block there, than it means you can encode the same data by simply using longer match length. So, it already represents a fairly substantial saving. This type of context dependent encodings are popular in good modern compression algorithms, such as LZMA. I think the cleanest discussion of this kind of ideas can be found in the Charles Bloom article about his compression format LZNIB: http://cbloomrants.blogspot.com/2019/01 ... rithm.html

What's discussed there isn't relevant. When you're compressing blocks as small as 64K or less on a modern PC (which all Spectrum data is going to be) then you can take the time to consider EVERY POSSIBLE compression combination route, whether that be skipping over a possible back-reference and just stick to an unbroken string of literals, or whether to re-use an old reference for a partial copy or pick a new reference for a longer back-copy. You don't have to make those decisions on-the-fly during encoding based on statistical probabilities; you can simply brute-force it and follow every possible choice through to its end and know for sure which is the best option.

The choices come down to how many bits you give over in your encoding scheme to encode each type of data. I used the first few values in the smallest back-reference encoding to indicate a possibly cached, older back-reference. Then any higher number was a new back-reference. But it meant using the same number of bits as a new, small back-reference, to be able to re-use a cached one. Which made many potential re-uses irrelevant. By hitting on the idea of just using a single bit to signal a new reference or re-use the last (cached) reference, Einar's got a better encoding scheme. But he's not using it at every opportunity, I feel. Having said that, his decoder is way, way more efficiently written than anything I've ever done!

I like his take on Elias Gamma Coding, particularly for speed, as it starts with a whole byte (where 7 bits are data and the last is an extension bit) rather than having to stream an arbitrary number of bits. A question though. Can it always guarantee it can copy enough literals? What if it can't represent a copy-count big enough and is forced to swap to a back-reference, and then can't find one?

Einar Saukas · Post by **Einar Saukas** » Wed Jan 20, 2021 8:51 pm

introspec wrote: ↑Wed Jan 20, 2021 3:47 pmAnd this also means that you cannot really use 8 bit LSB with the same decompressor structure, as it will break literally EVERYTHING

I actually implemented another alternate version with 8 bit LSB, that was called ZXX. Using a full byte for LSB simplifies other parts of the code, so it somewhat compensates not being able to use the optimization you mentioned. Because of this, ZXX decoder was even shorter. It also compressed better in a few benchmarks (especially the Calgary test), but worse in most practical cases. In the end, I decided that ZXX was not worth it.

You will find ZXX in this old commit if you are interested.

introspec wrote: ↑Wed Jan 20, 2021 3:47 pmUPDATE 2: However, since this is the case, why did you not use interlaced gamma code?

Because using interlaced gamma code would always make each compressed file exactly 1 byte larger. I didn't think it was worth it.

It's the same reason I didn't use interlaced gamma code in ZX7. This idea was originally discussed during ZX7 development here:

https://worldofspectrum.org/forums/disc ... ent_673185

I have considered to use interlaced gamma code in ZX1, which is going to be a simplified version of ZX0 with a much smaller decompressor. But I still need to run more tests before I can make a final decision.

Einar Saukas · Post by **Einar Saukas** » Wed Jan 20, 2021 9:32 pm

Joefish wrote: ↑Wed Jan 20, 2021 3:49 pm Although why do you say that using 'last offset' only makes sense after a literal block? If you cached the previous offset when you define a new one, you could still revert to it (or switch new and old) after copying from the new offset. There are going to be some savings to be found there too... maybe... with graphics, or anywhere there's lots of repetition with small variations from one copy to the next.

Alternating between the last 2 offsets, or allowing to reuse the last N offsets, are all good ideas. However providing more choices also require more bits to distinguish them. Is it worth it? I have no idea

Besides good compression, my other goals for ZX0 were a small fast decompressor, no additional storage, no self-modifying code, and only use main registers. These goals wouldn't be possible if it had to remember multiple offsets.

Einar Saukas · Post by **Einar Saukas** » Wed Jan 20, 2021 10:26 pm

Joefish wrote: ↑Wed Jan 20, 2021 7:19 pmWhen you're compressing blocks as small as 64K or less on a modern PC (which all Spectrum data is going to be) then you can take the time to consider EVERY POSSIBLE compression combination route, whether that be skipping over a possible back-reference and just stick to an unbroken string of literals, or whether to re-use an old reference for a partial copy or pick a new reference for a longer back-copy. You don't have to make those decisions on-the-fly during encoding based on statistical probabilities; you can simply brute-force it and follow every possible choice through to its end and know for sure which is the best option.

Not really

The number of possibilities increase exponentially with file size. Because of this, my first brute-force implementation took a few hours to compress 64Kb! Unfortunately most optimizations I originally created for the ZX7 compressor didn't apply to the ZX0 format, so I had to figure out new ones in order to make it viable.

Joefish wrote: ↑Wed Jan 20, 2021 7:19 pmHaving said that, his decoder is way, way more efficiently written than anything I've ever done!

Thank you!

Joefish wrote: ↑Wed Jan 20, 2021 7:19 pmCan it always guarantee it can copy enough literals? What if it can't represent a copy-count big enough and is forced to swap to a back-reference, and then can't find one?

The copy-count limit in the Z80 version of the decompressors is 65535 bytes. It's enough to fill the entire memory

The copy-count limit in the C version (both compressor and decompressor) is MAX_INT, which is usually over 2 billion. The only reason it's not MAX_LONG, is because I was trying to squeeze as much performance as possible from the compressor.

Einar Saukas · Post by **Einar Saukas** » Wed Jan 20, 2021 10:31 pm

Alone Coder wrote: ↑Wed Jan 20, 2021 4:02 pm Introspec has excluded native compressors RIP, mRIP, and ZXRar from his article. ZXRar can be easily tested in NedoOS:

That's impressive!

introspec · Post by **introspec** » Wed Jan 20, 2021 10:40 pm

Einar Saukas wrote: ↑Wed Jan 20, 2021 7:08 pm
introspec wrote: ↑Wed Jan 20, 2021 12:26 pmI think that if the decompression speed could get at least to the speed of recent aplib decompressors, you'll have a definite winner.
The "mega" ZX0 decompressor is faster than all ZX7 decompressors. And according to this link, ZX7 is faster than aplib. Unless there's a much faster aplib decompressor out there?

Well, the fastest ZX7 decompressor on that diagram is actually unpublished and it is definitely quite a bit faster than dzx7_mega.asm. However, I did make a mistake and you are indeed a bit faster than the fastest ApLib decompressor. Which is great, because I am sure dzx7_mega can be improved further! When my decompression time tests will finish running, I'll give you more precise numbers.

Einar Saukas wrote: ↑Wed Jan 20, 2021 7:08 pm
introspec wrote: ↑Wed Jan 20, 2021 12:26 pmcould you please consider adding a version number to ZX0? It makes it so much easier to identify pieces of software evolving over the years.
ZX0 is already an optimal compressor, so I don't expect it to evolve any further. But if I make any significant changes, I will add a version number to distinguish it. OK?

I am secretly hoping for a much faster compressor

Apultra proves it can be done, almost.

introspec · Post by **introspec** » Wed Jan 20, 2021 11:13 pm

Einar Saukas wrote: ↑Wed Jan 20, 2021 7:08 pmThe "mega" ZX0 decompressor is faster than all ZX7 decompressors. And according to this link, ZX7 is faster than aplib. Unless there's a much faster aplib decompressor out there?

Pretty amazing! This further supports my idea that Pareto frontier should not have any kinks or dips - your new compressor is already top notch, and every improvement to decompressors will move you further to the right!

catmeows · Post by **catmeows** » Thu Jan 21, 2021 1:04 pm

Einar Saukas wrote: ↑Wed Jan 20, 2021 9:32 pm

Alternating between the last 2 offsets, or allowing to reuse the last N offsets, are all good ideas. However providing more choices also require more bits to distinguish them. Is it worth it? I have no idea

According Bloom, entries in old offset table have diminishing return. Second entry gives less gain than first, third entry gives less gain than second etc. Aplib keeps one entry, LZMA keeps 4. When exactly cost of additional entries (flags) starts to hurt compression depends on data. It is generally good idea to use some kind of adaptive coding to cancel out pathological cases when you play with many stored offsets.
Anyway, I always thought ZX7 is quite old fashioned and someone should create something like ZX0. I think you really found sweetpot.

introspec · Post by **introspec** » Thu Jan 21, 2021 2:49 pm

Current "mega" decompressor is very wasteful. Here is 158/163 byte decompressor that is 10%/15% faster:

Code: Select all

;	DEFINE	AllowSelfmodifyingCode

		MACRO	GET_BIT
			add a : call z,ReloadByte
		ENDM

		MACRO	FASTER_GET_BIT
			add a : jp nz,1f : ld a,(hl) : inc hl : rla
1
		ENDM


@DecompressZX0:		ld bc,#FFFF
		IFNDEF	AllowSelfmodifyingCode
			push bc						; default offset is -1
		ELSE
			ld (PrevOffset),bc
		ENDIF
			inc bc : inc c
			ld a,#80 : jr RunOfLiterals

UsualMatch:
		IFNDEF	AllowSelfmodifyingCode
			inc sp : inc sp					; discard last offset
		ENDIF
			FASTER_GET_BIT : call nc,ReadEliasGamma
			dec b : ret z					; EoD

			exa : xor a : sub c : ld b,a : exa
			ld c,(hl) : inc hl
			sra b : rr c					; last offset bit becomes first length bit

		IFNDEF	AllowSelfmodifyingCode
        		push bc						; preserve new offset
		ELSE
			ld (PrevOffset),bc
		ENDIF
			ld bc,1 : call nc,ReadEliasGamma
			inc bc

		IFNDEF	AllowSelfmodifyingCode
CopyMatch		ex (sp),hl					; preserve source, restore offset
        		push hl						; preserve offset
			add hl,de
			ldir : inc c
			pop hl						; restore offset
			ex (sp),hl					; ex (sp),hl - preserve offset, restore source
		ELSE
CopyMatch		push hl						; preserve source
PrevOffset		EQU $+1 : ld hl,#FFFF				; restore offset (default offset is -1)
			add hl,de
			ldir : inc c
			pop hl						; restore source
		ENDIF

			; after a match you can have either
			; 0 + <elias length> = run of literals, or
			; 1 + = another match
			add a : jr c,UsualMatch

RunOfLiterals:		FASTER_GET_BIT : call nc,ReadEliasGamma
			ldir : inc c

			; after a literal run you can have either
			; 0 + <elias length> = match using a repeated offset, or
			; 1 + = another match
			add a : jr c,UsualMatch

RepMatch:		FASTER_GET_BIT : call nc,ReadEliasGamma
			jp CopyMatch


;
;  elias gamma reader
;  1 -> 1
;  01x -> 1x
;  001xx -> 1xx etc

ReadEliasGamma:		add a : jr c,EliasCode01
			FASTER_GET_BIT : jr c,EliasCode001
			add a : jr c,EliasCode0001
			GET_BIT : jr c,EliasCode00001

			ld c,5-1
.CodeLengthLoop		inc c : GET_BIT
			jr nc,.CodeLengthLoop

			ld ixl,c : ld c,1
.CodeValueLoop		GET_BIT : rl c : rl b
			dec ixl : jr nz,.CodeValueLoop
			ret

EliasCode00001		add a : rl c
EliasCode0001		GET_BIT : rl c
EliasCode001		add a : rl c
EliasCode01		add a : jr z,ReloadByte1
			rl c : ret
ReloadByte1		ld a,(hl) : inc hl : rla
			rl c : ret

ReloadByte:		ld a,(hl) : inc hl : rla : ret

I know you mentioned that you tried to avoid self-modifying code, but 5% on top of everything else is not small. The rest of performance improvements are coming from processing the first bit of Elias gamma inline and also from agressive inlining of the key code paths through it (esp. taking into account which bits are even and which are odd). The inner (general) code inside of the Elias gamma reader is currently just lazy; it needs to be optimized for speed (as it does not affect the performance anymore).

introspec · Post by **introspec** » Thu Jan 21, 2021 5:39 pm

This is an updated diagram, showing the improved performance of the new decompressor.

Einar, now you can definitely claim that ZX0 is faster that ZX7

at least on average.

Einar Saukas · Post by **Einar Saukas** » Fri Jan 22, 2021 1:43 am

introspec wrote: ↑Wed Jan 20, 2021 10:40 pm Well, the fastest ZX7 decompressor on that diagram is actually unpublished and it is definitely quite a bit faster than dzx7_mega.asm.

I just noticed a faster ZX7 decompressor at the bottom of this page. It didn't seem to be faster than dzx0_mega, although I didn't have time to test it extensively.

Is this the unpublished ZX7 decompressor that you mentioned, or is there another even faster?

Spectrum Computing

A new data compressor called ZX0

A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0

Re: A new data compressor called ZX0