ZAD Adventure System - Progress

Show us what you're working on, (preferably with screenshots).
ArtOfWalls
Drutt
Posts: 1
Joined: Fri Jan 19, 2018 2:17 am

Re: ZAD Adventure System - Progress

Post by ArtOfWalls »

This looks great! I'm willing to see your progress.
The closest things I remember in Spectrum are "The Trap Door" and "Flunky". Funny games with great graphics. If your engine can make games to this level then it will be a success!
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

R-Tape wrote: Thu Jan 18, 2018 10:50 pm Does this cyan blob have a name by the way? Tell me it's not Horace!
It's Thomas Stakes. Surprisingly, it's very hard (for me at least) to come up with an infrequently used name for a game character. I must've tried fifty others in Google before settling on that one.
All software to-date
Working on something, as always.
Wall_Axe
Manic Miner
Posts: 492
Joined: Mon Nov 13, 2017 11:13 pm

Re: ZAD Adventure System - Progress

Post by Wall_Axe »

if the main character can be customized, he could look like Dan Dare from Dan Dare 3?
I dont think that game used masking
Image
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

Wall_Axe wrote: Fri Jan 19, 2018 8:52 pm if the main character can be customized, he could look like Dan Dare from Dan Dare 3?
I dont think that game used masking
Image
I'm not sure what you mean by customized. If you mean for games that you write yourself, then yes, all you would have to do is convert the Dan Dare 3 sprites into my 'FRAME' format and set up the walking animation pointers accordingly so that the software knows where to look for each direction. The player character shown in my video is set up in exactly the same manner. However, you may need to disable the blinking animations if your player sprites don't use them.

In regards to masking, the ZAD adventure system allows a basic form of masking via attributes, where character blocks of frames can either be entirely opaque, take on the PAPER colour of the background tile being covered or can be completely empty, so the background character is not overwritten at all.
All software to-date
Working on something, as always.
Wall_Axe
Manic Miner
Posts: 492
Joined: Mon Nov 13, 2017 11:13 pm

Re: ZAD Adventure System - Progress

Post by Wall_Axe »

oh good, so the main character can look cool then :)
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

ZAD Adventure System - Progress Update

Post by PROSM »

Just a quick update on the progress of the ZAD Adventure System: A cursor has been implemented, allowing the player to select objects on the screen. When objects are moused over, the object name is displayed. This uses the same key input code as the player movement routine, but of course uses it for a different purpose. The different interaction modes will call the cursor routines, and when the ENTER key is pressed while the cursor is hovering over an object, the object reference will be passed back to the interaction routine selected so that it can carry out the appropriate action.

A quick note on player input: the directional movement is controlled via keys 6 to 9 on the keyboard. This not only makes it easier to program, but also allows for Sinclair joystick support (if one desires to use it, that is).

Here's a screenshot which shows the current development build of the engine, with the cursor selected, hovering over the player character, Thomas Stakes:

Image

I'll probably make a further update in the next few days, when I have got the region selection working as well. To make things clear, objects are the movable items on screen, but regions are the areas of the background that can be interacted with. They are, as a consequence, static and unmovable.
All software to-date
Working on something, as always.
User avatar
Uto
Drutt
Posts: 24
Joined: Wed Nov 15, 2017 10:28 am

Re: ZAD Adventure System - Progress

Post by Uto »

Very interesting tool! Congratulations!

Would you consider supporting Kempston mouse to move that cursor? It can be used in a real Spectrum with the original mouse interface, and most modern clones as ZX-Uno and Spectrum Next support it. In the end you can move that cursor with keys or joystick if you want, but also with mouse in case there is one plugged in.

Technical specs here, is quite simple:
http://uto.speccy.org/zxunofaq_en.html#7100
http://uto.speccy.org
Follow me at @uto_dev
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

Uto wrote: Sun Jan 28, 2018 3:37 pm Very interesting tool! Congratulations!

Would you consider supporting Kempston mouse to move that cursor? It can be used in a real Spectrum with the original mouse interface, and most modern clones as ZX-Uno and Spectrum Next support it. In the end you can move that cursor with keys or joystick if you want, but also with mouse in case there is one plugged in.

Technical specs here, is quite simple:
http://uto.speccy.org/zxunofaq_en.html#7100
Sorry for my late reply, [mention]Uto[/mention]. I will support the Kempston mouse interface, but right now it is not the highest priority since the rest of the engine has yet to be completed. It is definitely on the cards, however, as it doesn't look too hard to implement a driver for mouse functionality. It may even be a better method of control when playing via an emulator.
All software to-date
Working on something, as always.
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

ZAD Adventure System - Huffman, Proportional Text and Compilers

Post by PROSM »

Hello again!

I apologize for the lack of recent updates, because the project was briefly put on hiatus in order to write my CSSCGC 2018 entry, a conversion of Plumbers Don't Wear Ties. I will return to writing this engine and the accompanying development utilities.

To start, I've been looking at text compression lately as the amount of space text takes up currently makes that including speech or text descriptions rather costly in terms of memory. At the moment, text is stored using my own custom character mapping in a 6-bit per character format, which means that there is only space in the character map for capital letters. However, Huffman text compression is a very good choice for this engine, since strings are actually embedded within the bytecode scripts themselves (with the exception being strings that are duplicated across numerous scripts; the later instances are instead just references to the first embedded version). Because the strings are stored like this, strewn across memory, Lempel-Ziv compression would not be very effective because the window would have to be extremely large to allow for the use of repeated phrases.

The cost of the Huffman tree in memory would be outweighed by the compression gains, as I estimate that the size of text can be halved by Huffman coding. In addition to this, the limit upon the character map size imposed by the current 6-bit format would allow one to use lowercase letters or even accented characters if needed.

However, I am having trouble implementing a Huffman compressor in Python, as I'm finding it hard to understand how the tree structure and the nodes could be implemented. I use Python as a test bed for my code, and then port it to C for the development utilities, so that I can make sure that my comprehension of a concept is rock-solid before I invest time into writing a C program.

Now, we move onto proportional text. I've decided to do away with the 64 column text display and instead use variable spacing for each character. This will reduce the length of messages that can be printed there, but it will make said messages easier to read.

Finally, the compiler. The rooms of the game are split into 'modules', which are all independent of each other. These modules will be assembled individually (with jump targets yet to be defined) and then the compiler will try to find the best fit, one which allows all modules to co-exist in memory and leave the largest amount of free contiguous memory possible. This is being written in Python so that I can get the core bits correct before I move them into C.

I am not sure whether to create a C-style programming language, or simply let the end-user program directly in the bytecode language. The latter would be simpler for me and would give the user more control over the system, but the former may be more approachable for novices. Please leave your opinion below, as I need to hear from potential users about what they would want.

Thank you for reading! Please leave any suggestions or comments you have.
All software to-date
Working on something, as always.
Ralf
Rick Dangerous
Posts: 2279
Joined: Mon Nov 13, 2017 11:59 am
Location: Poland

Re: ZAD Adventure System - Progress

Post by Ralf »

I am not sure whether to create a C-style programming language, or simply let the end-user program directly in the bytecode language.
The easier it will be, the more users you will have. I understand that "easy for user"="harder for you" so you may want to find your compromise.

Personally I would go after Basic syntax, not C. Most people in Zx Spectrum world have experience with Basic but not necessarily C and may be afraid of all these brackets, not to mentions pointers ;)
To start, I've been looking at text compression lately
You can do it without compression if it's too much trouble. You can go down to 5 bits per letters: 32 combinations for 26 English letters. You can also use prefixes for some rare symbols (5 bits per prefix and next 5 bits for rare character which give you another 32 symbols). You can also have some logic like "after dot symbol there is always a capital letter" as start of new sentence. I have an unfinished game using such ideas so these are actually tricks that worked for me.
The rooms of the game are split into 'modules',
But don't foget to do some global variables. Actions in one location should be able to influence things in another location.

Good luck with your project! Don' forget to post here ;)
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

Ralf wrote: Thu Feb 15, 2018 8:41 pm
I am not sure whether to create a C-style programming language, or simply let the end-user program directly in the bytecode language.
The easier it will be, the more users you will have. I understand that "easy for user"="harder for you" so you may want to find your compromise.

Personally I would go after Basic syntax, not C. Most people in Zx Spectrum world have experience with Basic but not necessarily C and may be afraid of all these brackets, not to mentions pointers ;)
To start, I've been looking at text compression lately
You can do it without compression if it's too much trouble. You can go down to 5 bits per letters: 32 combinations for 26 English letters. You can also use prefixes for some rare symbols (5 bits per prefix and next 5 bits for rare character which give you another 32 symbols). You can also have some logic like "after dot symbol there is always a capital letter" as start of new sentence. I have an unfinished game using such ideas so these are actually tricks that worked for me.
The rooms of the game are split into 'modules',
But don't foget to do some global variables. Actions in one location should be able to influence things in another location.

Good luck with your project! Don' forget to post here ;)
A BASIC-style syntax should be easy to implement and also be easy to write for the user, but I'm not sure how to implement one of the ZAD system concepts in this style of programming.

To reduce the amount of time spent in the bytecode execution loop, there is a 'centre point'. This can be set anywhere in the code. The centre point is where bytecode execution begins from on each loop. This means that for an animation sequence, we can set the centre point to start executing the animation loop immediately from the beginning of a new execution loop, and when we're done, we can set the centre point back to the main logic loop for the entire room. I'm not entirely sure how I could implement that cleanly in a BASIC-style syntax. I suppose it could automatically centre at the start of each room module and then loops with their own centre points could be indented to make it clearer that they are separate.

In regards to the compression, I understand the concept of Huffman compression, but it's just trying to implement it. I'll keep picking away at it whilst I implement the other things.

Finally, in response to your note about global variables, the way that variables are allocated within the system technically makes them all global, but the way that the compiler will allocate them abstracts this from the end-user. Local variables are allocated upwards from $00 and global variables are allocated downwards from $5F. There are variables above $60 in the variable page, but these are given a dedicated purpose (object positions and their frame pointers, region information, system flags, etc.).
All software to-date
Working on something, as always.
Nomad
Manic Miner
Posts: 600
Joined: Thu Dec 28, 2017 12:38 pm

Re: ZAD Adventure System - Progress

Post by Nomad »

If I was that concerned about space and large amounts of text I would probably just go for tables of two byte tokens... per word.

65535 Token IDs then. The trick is you set space and other punctuation as its own token in the stream. Then you can have compound token groups in the stream and probably capture all of the English language. :lol:

I think that is a very rich vocabulary for English anyway.

http://www.slate.com/articles/life/the_ ... count.html

Well it depends what you want... I found this article interesting on the subject. Even with some crazy large number because this can do compounds you can capture the whole of the set of English words with a relatively small number of elements...

Also think that you can do cool stuff with this approach if its coded right... Make sure that you don't bake in the space in each token, have the space as its own distinct token. That way you can create compound words when encoding and save a lot of time and space in your definitions at the expense of a little speed.

: HE h e ;
: THE t he ;
: THEM the m ;

: AT a t ;
: TACK t a c k ;
: ATTACK at tack ;

But that would be pretty fancy you are on the verge of going phoneme route then - I always wondered what could be done if you associated graphics with phoneme instead of sounds in a limited target like the specy.

You would be looking at 44 tokens then for all the English sounds, associate that with a routine to do whatever - print a text chunk, make a sound, do some program effect. Everything can then be generated from combinations of the 44 tokens.
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

Nomad wrote: Fri Feb 16, 2018 12:19 am If I was that concerned about space and large amounts of text I would probably just go for tables of two byte tokens... per word.

65535 Token IDs then. The trick is you set space and other punctuation as its own token in the stream. Then you can have compound token groups in the stream and probably capture all of the English language. :lol:

I think that is a very rich vocabulary for English anyway.

http://www.slate.com/articles/life/the_ ... count.html

Well it depends what you want... I found this article interesting on the subject. Even with some crazy large number because this can do compounds you can capture the whole of the set of English words with a relatively small number of elements...

Also think that you can do cool stuff with this approach if its coded right... Make sure that you don't bake in the space in each token, have the space as its own distinct token. That way you can create compound words when encoding and save a lot of time and space in your definitions at the expense of a little speed.

: HE h e ;
: THE t he ;
: THEM the m ;

: AT a t ;
: TACK t a c k ;
: ATTACK at tack ;

But that would be pretty fancy you are on the verge of going phoneme route then - I always wondered what could be done if you associated graphics with phoneme instead of sounds in a limited target like the specy.

You would be looking at 44 tokens then for all the English sounds, associate that with a routine to do whatever - print a text chunk, make a sound, do some program effect. Everything can then be generated from combinations of the 44 tokens.
I like the idea, [mention]Nomad[/mention], but with phonemes, you are only able to spell the words phonetically, rather than how they are actually spelt, so it would be good for speech synthesis, but not for text displays. However, your idea about phonemes raises an interesting point, as you could instead use graphemes (how the phonemes are spelt). There are several more graphemes than phonemes since there are often several ways in which to spell them, but it should help the compression ratio significantly, as there are many three or four letter graphemes, which reduce down to a single token. This allows us to encode words like through as th r ough. With conventional ASCII, this word would take 7 bytes, but using the graphemes, it only takes 3 bytes, one for each grapheme token.

Coupled with Huffman compression, significant space savings could be made, as more frequent tokens will have shorter tokens, reducing the space taken even further.

Of course, this works only for English language games. For games in other languages, they could either define their own list of graphemes or they could just use a conventional alphabet character mapping and do away with the graphemes.
All software to-date
Working on something, as always.
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

PROSM wrote: Fri Feb 16, 2018 8:24 am I like the idea, @Nomad, but with phonemes, you are only able to spell the words phonetically, rather than how they are actually spelt, so it would be good for speech synthesis, but not for text displays. However, your idea about phonemes raises an interesting point, as you could instead use graphemes (how the phonemes are spelt). There are several more graphemes than phonemes since there are often several ways in which to spell them, but it should help the compression ratio significantly, as there are many three or four letter graphemes, which reduce down to a single token. This allows us to encode words like through as th r ough. With conventional ASCII, this word would take 7 bytes, but using the graphemes, it only takes 3 bytes, one for each grapheme token.

Coupled with Huffman compression, significant space savings could be made, as more frequent tokens will have shorter tokens, reducing the space taken even further.

Of course, this works only for English language games. For games in other languages, they could either define their own list of graphemes or they could just use a conventional alphabet character mapping and do away with the graphemes.
Well, I did some tests in Python and discovered that there are over 130 unique graphemes (at least from the list I used), so it would be impractical to both store them in memory and also use Huffman coding on them, since this works best when you have a limited character set. I think I'll stick to writing an implementation of Huffman coding, since the decompression tree can be really small. I've looked at this post on Battletoads' text compression, and the way they implement the decompression tree is really compact and fast as well.
All software to-date
Working on something, as always.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: ZAD Adventure System - Progress

Post by djnzx48 »

I'd be interested to know the details of how you're planning to store the tree in memory, since I'm also considering using Huffman coding for in-game text in my own project. Storing the tree as a structure of nodes and pointers seems to me like it would use a lot of memory, but at least it would be very fast to decompress.

I remember reading somewhere that it was possible to store a tree going top to bottom, left to right, using 0 to represent a node and 1 followed by a symbol to represent a leaf. This would be a lot slower as you'd have to traverse most of the tree to decode a single symbol, but the tree itself would take up a lot less space - only 1 bit per node. Although maybe the stack space required to traverse the tree would outweigh the benefits in this case.

I also had the idea that if this method was used, it might be possible to change the text encoding scheme when compiling/assembling. Basically rather than using, for example, ASCII to encode the uncompressed text, you could arrange the characters in the order that they appear in the tree. So then instead of having to store the symbols in the tree itself, you could simply increment a counter as you traversed the tree and whatever value you stopped on would be the character's encoding. A side effect of this method would be that you wouldn't be able to use the table in the ROM for translating keypresses to characters, but this doesn't sound like it would be a major problem.
Nomad
Manic Miner
Posts: 600
Joined: Thu Dec 28, 2017 12:38 pm

Re: ZAD Adventure System - Progress

Post by Nomad »

Its better to think of it as a matrix or even better a topology. When you can abstract it in that way you can start to look for solutions from a wide number of sources. It then becomes a question of if the maths solution can be implemented effectively in 8-bits.

You already are in a good place, you have a known size to the data you are trying to represent (the size of the dictionary is bound by your implementation. - its got to fit into memory even if you are using a tokenizer.) If its a fixed, complete continuous series of values that are bound within a finite range then that sounds to me a lot like a ring. Ring theory is beautiful - there is lots of deep theory to it and its been extensively studied. Again the maths is not complicated - its only a little bit more difficult than set theory.

The big advantage to using commutative algebra (ring theory) is its already a pretty good fit for computer science. its dependent on binary. and commutative algebra is the gateway drug to abstract algebra. :lol:

With computer science/programming you already know binary, you will have a intuitive understanding how how this stuff works with a little work.

The problem is you are looking for implementation specific solutions already when there is a world of rich solutions waiting to be considered. Matrix algebra is not all that hard either its just a dry subject. :lol: I'd bet 20p that there are some slick techniques just sitting gathering dust in some journal/monograph somewhere that would solve ya problem.
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

ZAD Adventure System - Huffman Storage Technique

Post by PROSM »

I'm not able to post very frequently these days due to studies. The present situation will continue for a few months but I am still actively developing this engine in my spare time.
djnzx48 wrote: Mon Feb 19, 2018 9:01 am I'd be interested to know the details of how you're planning to store the tree in memory, since I'm also considering using Huffman coding for in-game text in my own project. Storing the tree as a structure of nodes and pointers seems to me like it would use a lot of memory, but at least it would be very fast to decompress.
I was planning on storing the tree as some sort of table. My method is inspired by how Battletoads stores its tree, but is optimized for the Z80 architecture rather than the 6502. Each node would have two entries of one byte each, for the left and right child of the node. To make space savings, if the child node is a symbol, then the symbol itself is instead stored in the node reference, with the MSB set to one to indicate that it is a symbol, not a node reference.

Below is a diagram, to help you visualize the tree that I will use in my example table:

Image

Here is the tree that I've assembled for the message "MISSISSIPPIRIVER" (no spaces). The most common letter, I, is given the shortest code, 11. Now here is the table of nodes that the engine would use to decompress messages encoded with this tree:

Image

The top table shows how the data is interpreted and the bottom table shows the raw data itself in the child node columns. Numbers in red denote a reference to another node. The text symbols are encoded in ASCII for this example, but this will probably not be the case for a real game's text.

The entire table would be encoded as:

Code: Select all

02 01 D3 C4 05 03 CD 04 D6 C5 D0 D2
For efficiency, the table is aligned on a page boundary. This will not leave any wasted space, however, because there is already a page-aligned structure in the engine that occupies exactly a page, so the tree can just follow it.
Nomad wrote: Mon Feb 19, 2018 10:52 am Its better to think of it as a matrix or even better a topology. When you can abstract it in that way you can start to look for solutions from a wide number of sources. It then becomes a question of if the maths solution can be implemented effectively in 8-bits.

You already are in a good place, you have a known size to the data you are trying to represent (the size of the dictionary is bound by your implementation. - its got to fit into memory even if you are using a tokenizer.) If its a fixed, complete continuous series of values that are bound within a finite range then that sounds to me a lot like a ring. Ring theory is beautiful - there is lots of deep theory to it and its been extensively studied. Again the maths is not complicated - its only a little bit more difficult than set theory.

The big advantage to using commutative algebra (ring theory) is its already a pretty good fit for computer science. its dependent on binary. and commutative algebra is the gateway drug to abstract algebra. :lol:

With computer science/programming you already know binary, you will have a intuitive understanding how how this stuff works with a little work.

The problem is you are looking for implementation specific solutions already when there is a world of rich solutions waiting to be considered. Matrix algebra is not all that hard either its just a dry subject. :lol: I'd bet 20p that there are some slick techniques just sitting gathering dust in some journal/monograph somewhere that would solve ya problem.
:? I'm afraid most of what you have said has flown over my head, [mention]Nomad[/mention]. Your ideas seem very interesting, but I'm not particularly well-versed in what you've just discussed, so I don't quite understand it. Sorry!
All software to-date
Working on something, as always.
Ralf
Rick Dangerous
Posts: 2279
Joined: Mon Nov 13, 2017 11:59 am
Location: Poland

Re: ZAD Adventure System - Progress

Post by Ralf »

I'm afraid most of what you have said has flown over my head, @Nomad
Absolutely agree. Total overkill ;)

Topology, ring theory... I've heard these words for the last time over 20 years ago. We had in high school class some ubernerd who actually understood such staff and read about it each day before going to sleep ;)
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: ZAD Adventure System - Progress

Post by djnzx48 »

Ah, OK. Looks like a reasonable tradeoff between speed and size if you're only dealing with a small number of symbols, as with ASCII. I was thinking about tokenising text into words and then using Huffman coding on the word references, but it's probably not a good idea since you'd end up with hundreds of different symbols and require a huge table.

Good luck on your engine. I mostly played the classic Sierra games (King's Quest, Space Quest) so it would be nice to see those kind of games accommodated for as well. Have you thought about including a RAMSAVE feature? ;)
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

djnzx48 wrote: Sun Feb 25, 2018 10:59 pm Ah, OK. Looks like a reasonable tradeoff between speed and size if you're only dealing with a small number of symbols, as with ASCII. I was thinking about tokenising text into words and then using Huffman coding on the word references, but it's probably not a good idea since you'd end up with hundreds of different symbols and require a huge table.

Good luck on your engine. I mostly played the classic Sierra games (King's Quest, Space Quest) so it would be nice to see those kind of games accommodated for as well. Have you thought about including a RAMSAVE feature? ;)
I suppose I could include a RAMSAVE feature, but it would take up 2.5K for the save file, so it would probably be 128K only. You could implement it in 48K machines as well, but that would depend on whether or not you wanted to sacrifice 8% of your usable game data space.
All software to-date
Working on something, as always.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: ZAD Adventure System - Progress

Post by RMartins »

PROSM wrote: Fri Feb 16, 2018 8:24 am ...
However, your idea about phonemes raises an interesting point, as you could instead use graphemes (how the phonemes are spelt). There are several more graphemes than phonemes since there are often several ways in which to spell them, but it should help the compression ratio significantly, as there are many three or four letter graphemes, which reduce down to a single token. This allows us to encode words like through as th r ough. With conventional ASCII, this word would take 7 bytes, but using the graphemes, it only takes 3 bytes, one for each grapheme token.

Coupled with Huffman compression, significant space savings could be made, as more frequent tokens will have shorter tokens, reducing the space taken even further.
...
graphemes is a good way to make your tree shorter, by using a graphene per node, instead of a character per node, so do use graphemes.

But I would suggest a different approach for the tree, given the premiss of a lot of text, but not that many distinct words in the text.

I would build the tree (or set of trees, or sequences) in a different way.
I would define the leaf of a tree (or start of a sequence) as the beginning of a word, that is constructed, by following the nodes up to the root.
So you will have many roots, for example, one would be all words that end in "th". You only populate the words you actually use in your texts of course.

So to encode a text, you would keep a sequence of word IDs (how you encode this is up to you, to get maximum compression, or minimum use of bits). But given that words that end in the same grapheme will probably improve the space locality of the data, you can use relative distances, with variable bit sizes for length if needed, to link a node to it's parent node.

When decoding, you just grab the first word ID (typically the address of the left node that corresponds to that word), and you crawl up it's three, until you get to it's root, hence producing the required word, by concatenating the node contents while traversing it up.

Add a space and grab the next word ID, and repeat, until your text is done.

The largest data part, will surely be the multitude of word trees, but then adding extra text, should cost very little, at most 2 bytes per word, no matter how long the word is. And if you compress these sequence of IDs you can have a very compact and fast text decompression.

You can make optimizations for plurals, like "tree" and "trees", so that you don't have to create a new tree node sequence just to get a plural of a word.

You can take this idea, one level further, by building sequences of sentences (commonly used Sequences of word IDs ), if you find there are enough that repeat, to justify another structure for it.

I hope I managed to make it clear.

NOTE: you should optimize for the data size, in this case, and not code size, since data size will overwhelm any code size needed for this.

TIP: you code use a single bit, similar to some UTF encodings, to encode if the next byte is part of the current grapheme (0) or not(1). when this bit changes (to 1), the next char, will have the start of the relative position of the parent node. which can again be variable bit encoded, to minimize size.
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

RMartins wrote: Mon Feb 26, 2018 9:05 pm graphemes is a good way to make your tree shorter, by using a graphene per node, instead of a character per node, so do use graphemes.

But I would suggest a different approach for the tree, given the premiss of a lot of text, but not that many distinct words in the text.

I would build the tree (or set of trees, or sequences) in a different way.
I would define the leaf of a tree (or start of a sequence) as the beginning of a word, that is constructed, by following the nodes up to the root.
So you will have many roots, for example, one would be all words that end in "th". You only populate the words you actually use in your texts of course.

So to encode a text, you would keep a sequence of word IDs (how you encode this is up to you, to get maximum compression, or minimum use of bits). But given that words that end in the same grapheme will probably improve the space locality of the data, you can use relative distances, with variable bit sizes for length if needed, to link a node to it's parent node.

When decoding, you just grab the first word ID (typically the address of the left node that corresponds to that word), and you crawl up it's three, until you get to it's root, hence producing the required word, by concatenating the node contents while traversing it up.

Add a space and grab the next word ID, and repeat, until your text is done.

The largest data part, will surely be the multitude of word trees, but then adding extra text, should cost very little, at most 2 bytes per word, no matter how long the word is. And if you compress these sequence of IDs you can have a very compact and fast text decompression.

You can make optimizations for plurals, like "tree" and "trees", so that you don't have to create a new tree node sequence just to get a plural of a word.

You can take this idea, one level further, by building sequences of sentences (commonly used Sequences of word IDs ), if you find there are enough that repeat, to justify another structure for it.

I hope I managed to make it clear.

NOTE: you should optimize for the data size, in this case, and not code size, since data size will overwhelm any code size needed for this.

TIP: you code use a single bit, similar to some UTF encodings, to encode if the next byte is part of the current grapheme (0) or not(1). when this bit changes (to 1), the next char, will have the start of the relative position of the parent node. which can again be variable bit encoded, to minimize size.
I apologize for my late reply to your comment, [mention]RMartins[/mention]. I'm slightly confused by your tree explanation. I don't quite understand how a tree with many leaves and roots would be built and connected. The word ID concept is easy to follow, but I can't see how your tree would be constructed. Could you perhaps give an example of such a tree, please?
All software to-date
Working on something, as always.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: ZAD Adventure System - Progress

Post by RMartins »

PROSM wrote: Mon Mar 05, 2018 8:07 pm ...

I apologize for my late reply to your comment, @RMartins. I'm slightly confused by your tree explanation. I don't quite understand how a tree with many leaves and roots would be built and connected. The word ID concept is easy to follow, but I can't see how your tree would be constructed. Could you perhaps give an example of such a tree, please?
It's not a single tree, but a tree for each set of words that end with the same grapheme.
And it's not a tree in the sense that you do not start at the root, so you do not even need to know or have a list of all the roots, at least not at runtime.

Each branch from root to a specific leaf will define a complete word.
We traverse the tree in reverse, from leaf to root, since the words end at root.

Let's grab 2 words from the text above "traverse" and "reverse".
They both end with the same grapheme "se", so lets decompose these.

Code: Select all

"tra"-"ver"-"se"
"re"-"ver"-"se"
We can clearly see that they share more than one graphene in sequence "ver"-"se".
So let's build a tree in reverse (using ASCII art).

Code: Select all

"tra" +\
         +- "ver" -+- "se"
"re" +/
"se" is the root node of this tree, but we will always refer to it's elements, by starting from a leave.
Take into account, we can start at something that does not look like a leaf, but it's actually a word within a word, like "verse" above, so we may start at that.

You can look at this not a like a tree, but a group of sequences, that share the ending parts of the sequence, to avoid any confusion with any concepts of tree you might already have set your mind to.

So you will have many of these groups of sequences.

Hope it helped.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: ZAD Adventure System - Progress

Post by RMartins »

To simplify the main idea, if instead of graphemes, we would use single letters on each node, we would in fact have exactly 26 trees, since a word can only end in one of the 26 letters of the alphabet (not counting upper/lower case).
User avatar
PROSM
Manic Miner
Posts: 473
Joined: Fri Nov 17, 2017 7:18 pm
Location: Sunderland, England
Contact:

Re: ZAD Adventure System - Progress

Post by PROSM »

RMartins wrote: Tue Mar 06, 2018 2:00 pm ...
It's not a single tree, but a tree for each set of words that end with the same grapheme.
And it's not a tree in the sense that you do not start at the root, so you do not even need to know or have a list of all the roots, at least not at runtime.

Each branch from root to a specific leaf will define a complete word.
We traverse the tree in reverse, from leaf to root, since the words end at root.

Let's grab 2 words from the text above "traverse" and "reverse".
They both end with the same grapheme "se", so lets decompose these.

Code: Select all

"tra"-"ver"-"se"
"re"-"ver"-"se"
We can clearly see that they share more than one graphene in sequence "ver"-"se".
So let's build a tree in reverse (using ASCII art).

Code: Select all

"tra" +\
         +- "ver" -+- "se"
"re" +/
"se" is the root node of this tree, but we will always refer to it's elements, by starting from a leave.
Take into account, we can start at something that does not look like a leaf, but it's actually a word within a word, like "verse" above, so we may start at that.

You can look at this not a like a tree, but a group of sequences, that share the ending parts of the sequence, to avoid any confusion with any concepts of tree you might already have set your mind to.

So you will have many of these groups of sequences.

Hope it helped.
I can see how it would work now. I'll try to get something like that implemented in the next week or so, just to see how big of a space saving could be made using it. It looks like it would shrink the text down by quite a bit. Thank you very much!
All software to-date
Working on something, as always.
Post Reply