A new data compressor called ZX0

TMD2003 · Post by **TMD2003** » Sun Sep 25, 2022 12:43 pm

No problems there, then. I can afford four bytes, come what may.

Prodatron · Post by **Prodatron** » Tue Jun 06, 2023 4:58 pm

Hi all, I am new in this forum, not (yet) a ZX Spectrum, but at least a Z80 guy, active on some other platforms like MSX, Enterprise and Amstrad CPC (my childhood system). Some weeks ago I discovered this fantastic ZX0 compressor and was so amazed, that I decided to fully integrate it into my project called "SymbOS", an operating system for Z80 systems (unfortunately currently not for the Spectrum).

Thanks so much to Einar Saukas for this fantastic piece of software!!

The next SymbOS version supports loading of ZX0 compressed executables (EXE, COM etc.) but also provides functions for applications for decompressing data. You can load compressed data from an opened file in a transparent way like if it would be uncompressed, which makes it very easy to update even existing applications for handling compressed data.

I added a little header to all kind of ZX0 compressed data...
- 1W: length of the compressed data
- 4B: last 4 bytes of uncompressed data
- 1W: number of bytes at the beginning, which should be skipped/shouldn't be compressed

The first word is necessary for the loading routine, which only knows the uncompressed size from the application.

The next 4 bytes are the last 4 bytes of the uncompressed data, which won't be included in the compressed ones. This makes it possible to load the compressed data exactly at the end of the area, where it will be decompressed. I never saw a delta>3, so to be safe I choosed to save 4 bytes, so there will never be a problem with the delta.

The last word is necessary for files, which have some meta-data at the beginning. These shouldn't be compressed, as you may want to load it separately without the need to decompress the whole thing. Therefore the prefix feature in ZX0 is a great thing for still using this part as dictionary.

Now all EXE (GUI executables) and COM (command line executables) files can be compressed, as well as some special executables "*.WDG" (desktop widgets) and "*.SAV" (screen saver).
Picture files (SGX) can be compressed, too, including desktop background pictures, as well as help files (HLP).
Last but not least I decided to add a header for compressed music files, which are supported in the SymAmp music player. These are PT3, ST2 (Amstrad Soundtrakker 128), SKM (Amstrad Starkos Tracker) and SA2 (Adlib Tracker 2), which can all be ZX0-compressed now.

The compression ratio is just fantastic, in most cases it's better than ZIP! Now it's possible again to place the whole operating system with all system apps, background picture and additional stuff which is loaded during booting on one Amstrad standard disc side (178K).
I am using the Turbo decompressor, which is still small and has a great speed. It's directly integrated into the SymbOS kernel, which makes it possible to decompress data with the full linear size of up to 63K!

I hope to be able to release the next SymbOS version later this year. I know that currently all this is not interesting for you Spectrum guys, but I just wanted to share this info, as I am using this awesome software now. Maybe/hopefully there will be a Spectrum Next version in the future.

Again a lot of thank you and respect to Einar Saukas (for the whole thing) and Introspec (for the Turbo decompressor)!

Einar Saukas · Post by **Einar Saukas** » Wed Jun 07, 2023 2:42 am

That's awesome! Thank you!!!

Now here's an idea. Your current format is as follows:

- 1W: total length of data after decompression
- last 4 bytes of uncompressed data
- 1W: number of uncompressed bytes at the beginning
- N uncompressed bytes at the beginning
- remaining compressed bytes

My suggestion is to use a simpler header like this:

- 1W: total length of data after decompression
- 1W: number of uncompressed bytes at the beginning (at least 4)
- N uncompressed bytes at the beginning (N>=4)
- remaining bytes, compressed backwards

This way, you will only need N uncompressed bytes total, instead of N+4.

To decompress, you copy into memory exactly N-4 uncompressed bytes, immediatelly followed by the compressed block. Afterwards decompress it backwards from the top of memory area. Finally copy the remaining 4 uncompressed bytes.

Compressing backwards may sound awkward, but you should get a slightly better compression in almost every file...

Prodatron · Post by **Prodatron** » Wed Jun 07, 2023 5:26 pm

Hi Einar, thanks a lot for your answer and your ideas!

Using backwards compression to have the "4 bytes" (for avoiding the delta problem) at the beginning in the uncompressed area is a good idea!

Einar Saukas wrote: ↑Wed Jun 07, 2023 2:42 am Compressing backwards may sound awkward, but you should get a slightly better compression in almost every file...

Is it really like this?
I made some quick and dirty tests with everal SymbOS EXE files, and the ratio with backward compression was always a little bit worse compared to normal compression. Then I remembered, that EXE files have the relocator table at the end, which may be a bad start for the dictionary. So I did the same with bitmap files, but here there is the same result, all backward compressed files are a little bit larger than normal compressed ones.
I don't really get this, as backward or forward is just swapping LDIR and LDDR, and why should the dictionary be somehow more optimal, when you start from the beginning or from the end.
Anyway I will make some more tests, especially I like to find out, if your suggestion will save some code on OS side, which is always a very good thing

Einar Saukas · Post by **Einar Saukas** » Thu Jun 08, 2023 12:21 am

Prodatron wrote: ↑Wed Jun 07, 2023 5:26 pm Is it really like this?

Ops, I should have explained what I meant!

The compression itself shouldn't make much difference. Some files work slightly better with forward compression, others with backward compression.

However the new format I suggested (using backward compression) will allow compressing 4 more bytes per file.

zara6502 · Post by **zara6502** » Wed Nov 08, 2023 2:50 pm

Hello friends. I do not know English, so I use a translator, please forgive me if he does not translate correctly.

Probably my question is addressed to Einar, but I will be glad if someone else answers.

I like just learning different algorithms and coding something in C# and ASM for ATARI. But my programming skills are very low.

I have read the description of the algorithm ZX0 on Github, but I cannot understand for you that "Copy from the last offset (repeat N bytes from the last offset)" - what is offset and last offset in this context, where do they come from? What is new offset and how is it formed? Why is the second block "0" immediately following Literal?

PS: I made my compression algorithm based on the LZSS and Gamma-Code Elias paradigm even before I met ZX0 and I liked its results, but ZX0 literally ruined my plans XD I would be interested to know the answers to my questions, maybe it will help me improve my algorithm. Thx

PeterJ · Post by **PeterJ** » Wed Nov 08, 2023 7:44 pm

Welcome @zara6502,

I have used the mention system to alert @Einar Saukas to your message.

Translation services are very good these days. No need to apologise for using one.

XoRRoX · Post by **XoRRoX** » Wed Nov 08, 2023 8:28 pm

@Einar Saukas

In a new 128k game project, I am already using your zx0 de-compression for graphics in 128k banks to much satisfaction. But now, as it will be a physical tape game, to shorten the loading time I'd also like to compress the main-part. It occupies memory from 24832-65535=40703 which compresses down to 21.755 bytes (!!!

)
I have read the documentation and think I should in this case use backward (de)compression but am until now unsuccessful in doing so.

Could you please explain how I should set this up? If needed, to create space for the loader & decompressor, I could shave necessary memory off the beginning of the file and load that last, for example.

Thank you in advance.

Einar Saukas · Post by **Einar Saukas** » Wed Nov 08, 2023 11:00 pm

zara6502 wrote: ↑Wed Nov 08, 2023 2:50 pm I have read the description of the algorithm ZX0 on Github, but I cannot understand for you that "Copy from the last offset (repeat N bytes from the last offset)" - what is offset and last offset in this context, where do they come from? What is new offset and how is it formed? Why is the second block "0" immediately following Literal?

PS: I made my compression algorithm based on the LZSS and Gamma-Code Elias paradigm even before I met ZX0 and I liked its results, but ZX0 literally ruined my plans XD I would be interested to know the answers to my questions, maybe it will help me improve my algorithm. Thx

"Offset" and "literal" are concepts from standard LZSS. If you have already implemented a compression algorithm based on LZSS, you certainly used them yourself!

When decompressing, you are basically reading bytes from a compressed file ("source") while writing bytes to the decompression memory area ("destination"):

If source indicates that a compressed block is a literal, it simply means to copy next byte from source to destination.
Otherwise compressed block is a repetition. It simply means to repeat a few (previously already decompressed) bytes again in destination. Basically copy a few bytes ("length") from a few positions back in destination ("offset") to destination again.

For instance, imagine you have a sequence "ABCDBC". In ZXSS, it will be compressed as follows:

Literal "A"
Literal "B"
Literal "C"
Literal "D"
Repeat 2 bytes from 3 positions behind (length=2 and offset=3), thus producing "BC".

There's a more detailed explanation in Wikipedia. You can also try Youtube, there are probably online LZSS tutorials in your native language.

Einar Saukas · Post by **Einar Saukas** » Thu Nov 09, 2023 2:19 am

XoRRoX wrote: ↑Wed Nov 08, 2023 8:28 pm It occupies memory from 24832-65535=40703 which compresses down to 21.755 bytes (!!! )
I have read the documentation and think I should in this case use backward (de)compression but am until now unsuccessful in doing so.

When you compress it backwards, take note of the "delta" that it will provide.

Suppose you got delta=4. It means you must load the compressed file to address 24832-4=24828 (or lower). You also need a ZX0 backwards decoder, for instance you can load dzx0_standard_back (69 bytes) at address 24828-69=24759.

If you load a compressed file with size 21755 bytes to address 24828, then the last byte of this compressed file will be at address 24828+21755-1=46582. That's your initial decompression address to decompress backwards. Now you only need to execute something like this: