Database model ZXDB

This is the place for general discussion and updates about the ZXDB Database. This forum is not specific to Spectrum Computing.

Moderator: druellan

User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

Thanks for the kind words!

Regarding your question about emulators, I created a separate thread to avoid going off-topic here:

viewtopic.php?f=23&t=607
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

Yet another large ZXDB update is out! Download it here.

This time we added entries for almost a thousand electronic magazines. However the corresponding files are not available yet, it will be easier to add them in a separate update.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

Next ZXDB update will have one more change:

* Table ROLES: Remove redundant column AUTHOR_SEQ.

You will notice that table AUTHORS already have natural key PK = (ENTRY_ID, LABEL_ID) instead of previous PK = (ENTRY_ID, AUTHOR_SEQ). Although column AUTHOR_SEQ won't be removed from table AUTHORS, anyone integrating or importing from ZXDB should use the new PK.

Likewise, table ROLES now have PK = (ENTRY_ID, LABEL_ID, ROLETYPE_ID) and FK = (ENTRY_ID, LABEL_ID). They were both based on AUTHOR_SEQ, that will be removed now.

I could have made this change in last ZXDB update, but I'm doing it in 2 steps on purpose (added ROLES.LABEL_ID in previous update and will remove ROLES.AUTHOR_SEQ in next update), to make it easier for other systems to adapt.
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: Database model ZXDB

Post by moroz1999 »

I really hope that LABEL_ID would stay the same.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

moroz1999 wrote: Sun Apr 08, 2018 5:26 pm I really hope that LABEL_ID would stay the same.
Yes, it will stay the same.
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: Database model ZXDB

Post by moroz1999 »

Thanks! I've just already used them for marking ZX-ART authors, groups, author aliases and group aliases accordance to ZXDB :)
User avatar
pavero
Dynamite Dan
Posts: 1570
Joined: Sat Dec 09, 2017 11:49 pm
Location: The Czech Republic
Contact:

Re: Database model ZXDB

Post by pavero »

Einar Saukas wrote: Wed Apr 04, 2018 10:24 pm
The main changes in ZXDB will be:

* Table DOWNLOADS: Remove meaningless column ID. It will use natural key (ENTRY_ID, FILE_LINK) as PK.

* Table LABELFILES: Remove meaningless column ID. It will use natural key (LABEL_ID, FILE_LINK) as PK.

* Table MAGFILES: Remove meaningless column ID. It will use natural key (ISSUE_ID, FILE_LINK) as PK.
Hi Einar,

unfortunately I have found this post until now, when I have begun to investigate why my scripts with an online emulator totally stopped working. :(

My opinion to this matter:
Unique IDs in SQL tables are always useful even if they are redundant. Especially if you want to share informations between two PHP scripts via URL string.

Here is an example:

Until this moment I could easily use:
URL ... "playonline.php?dowload_id=?????"

Now I have to use:
URL ... "playonline.php?entry_id=?????&file_link=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

And there is always much better to use "short" numerical values in URLs, if there is possible, instead of "very long" strings.

(Note: all these scripts are dynamic, it doesn't matter at all if values of these IDs will change in the future or not.)
User avatar
pavero
Dynamite Dan
Posts: 1570
Joined: Sat Dec 09, 2017 11:49 pm
Location: The Czech Republic
Contact:

Re: Database model ZXDB

Post by pavero »

I only wanted to say that unique IDs are always useful in some situations and there was no reason to remove them.
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: Database model ZXDB

Post by moroz1999 »

I totally agree to pavero.
It's not a problem for me to use natural keys as long as it don't change, but I can foresee that file link format can really change in future because of some refactoring or file structure optimization. Stable non-changing numeric IDs are a most practical solution when it comes to integration with any outer systems - whether they are other archives or emulators.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

Those are good points! Well, one the main reasons I post about ZXDB changes in advance is exactly to hear suggestions and feedback :)

In table DOWNLOADS, I considered that PK = (ENTRY_ID + FILE_LINK) would work better than PK = ID for systems integration but, to be honest, both options are error-prone. Consider the following (real) scenarios:

* A certain tape image is bugged, the authors provides a fixed file with the same name. The file is replaced in the repository, but if authors don't give version numbers to their releases, we don't invent version numbers ourselves either, thus the new file will be simply stored with the same filename as before. Therefore we will have a different file, but since there was no need to update the metadata in ZXDB, it will keep the same FILE_LINK and ID.

* A certain tape image seems bugged, so it's removed from the repository. Later we find out the authors provided file with incorrect extension TAP, it was actually a TZX file. Then the file is added again to ZXDB with a fixed filename. Therefore we will have the same file, with different FILE_LINK and, since it's getting added again in a different ZXDB release, it will also receive a different ID.

There are lots of other cases, but these 2 scenarios above should be enough to understand it's complicated to rely on either FILE_LINK or ID. What we really need is an identification associated to the actual content of each file, to identify it properly and detect when it has changed. In other words, we need a hash.

Even using a long hash, there's always a mathematical possibility of 2 different files with the same hash. Instead, I suggest identifying files with a short hash like CRC32, combined with ENTRY_ID. Therefore the primary key of table DOWNLOADS will be PK = (ENTRY_ID, CRC32). If ZXDB ever receives files with repeated hashes, at least it will happen within the same ENTRY_ID so it will be much easier for me to fix it.

Now here's my proposal:

* In table DOWNLOADS, I will immediately restore the old DOWNLOAD_ID (temporarily).

* Afterwards I will add ASAP a new column CRC32 in table DOWNLOADS, then remove again DOWNLOAD_ID and use PK = (ENTRY_ID, CRC32) instead.

In practice, it means Pavero will have to replace this:

Code: Select all

URL ... "playonline.php?dowload_id=106584"
with this:

Code: Select all

URL ... "playonline.php?id=30237&hash=229a2e3e"
Is this OK?
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

OK, there's already a new ZXDB update available here!

This update restores ID on tables DOWNLOADS, LABELFILES, and MAGFILES, for now. I temporarily reverted these changes in a hurry, because of the impact on Pavero's online emulator. Now I will wait for feedback on my proposed change to use ENTRY_ID+CRC32 instead.

This ZXDB release also contains about 500 new electronic magazine files, and a few minor archive updates.
User avatar
pavero
Dynamite Dan
Posts: 1570
Joined: Sat Dec 09, 2017 11:49 pm
Location: The Czech Republic
Contact:

Re: Database model ZXDB

Post by pavero »

Einar Saukas wrote: Sun Apr 08, 2018 11:03 pm
Now here's my proposal:

* In table DOWNLOADS, I will immediately restore the old DOWNLOAD_ID (temporarily).

* Afterwards I will add ASAP a new column CRC32 in table DOWNLOADS, then remove again DOWNLOAD_ID and use PK = (ENTRY_ID, CRC32) instead.

In practice, it means Pavero will have to replace this:

Code: Select all

URL ... "playonline.php?dowload_id=106584"
with this:

Code: Select all

URL ... "playonline.php?id=30237&hash=229a2e3e"
Is this OK?
Yes, it could be a compromise solution. The short hash in URL is much better than a string with full path. :)

But, is there really necessary to set ENTRY_ID+CRC32 as PK in downloads table?

And what about this solution?

ID - PK with autoincrement
ENTRY_ID+CRC32 - set INDEX only
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

I don't like the idea of using meaningless auto-incremented ID as PK, because eventually someone will assume again they can rely on it to identify changes. Even if I document it, it won't help. Nobody reads manuals nowadays. Frankly how many people here have read the entire "ZXDB.txt" documentation that I wrote?

I believe PK = (ENTRY_ID, CRC32) is a good tradeoff between providing a short convenient PK, that's more meaningful and fail proof.
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: Database model ZXDB

Post by moroz1999 »

I've been thinking about natural key containing CRC32. If file gets replaced, then CRC32 gets recalculated, and this is how the integrated systems would understand that the file has changed. This would work exactly the same way with integer PK really, so it won't bring the benefit :)
It has the following problem though:
1. File (or release) has been added to prod.
2. It gets synced with external system. External system remembers synced PK (CRC32+entry_id).
3. Somebody comments on file in external system reporting the problem, comment gets connected to that file.
4. File gets updated with bugfix, version changes in ZXDB, CRC32 changes as well.
Result: the comment is still connected to old PK using old CRC32. After re-syncing external system either gets two files (old+new) or loses the comment.

I suggest using entry_id + file download link, they are not the best solution possible, but it makes the continuous integration possible.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

moroz1999 wrote: Tue Apr 10, 2018 8:11 pmI've been thinking about natural key containing CRC32. If file gets replaced, then CRC32 gets recalculated, and this is how the integrated systems would understand that the file has changed. This would work exactly the same way with integer PK really, so it won't bring the benefit :)
The benefit is to ensure the same file will always have the same PK, and a different file will always have a different PK. Using an arbitrary ID doesn't guarantee it. See the examples I mentioned in my last post.

moroz1999 wrote: Tue Apr 10, 2018 8:11 pmIt has the following problem though:
1. File (or release) has been added to prod.
2. It gets synced with external system. External system remembers synced PK (CRC32+entry_id).
3. Somebody comments on file in external system reporting the problem, comment gets connected to that file.
4. File gets updated with bugfix, version changes in ZXDB, CRC32 changes as well.
Result: the comment is still connected to old PK using old CRC32. After re-syncing external system either gets two files (old+new) or loses the comment.

I suggest using entry_id + file download link, they are not the best solution possible, but it makes the continuous integration possible.
I agree this should work better for you, but it won't completely eliminate this possibility. For instance, in step 4, Alessandro Grussu may send 26 new files called "Apulija-13V2.0*.zip", asking for previous versions "Apulija-13V1.1*.zip" to be removed. So you will have to decide what to do in this case.

Anyway I agree that ENTRY_ID + FILE_LINK is best for you. This is exactly what I had in mind in my previous change. But it's not appropriate for Pavero. So my proposal is that ZXDB uses PK = (ENTRY_ID, CRC32) for Pavero, but also declares UK = (ENTRY_ID, FILE_LINK). This way, when you import data, you can adopt (ENTRY_ID, FILE_LINK) as key on your side. Makes sense?
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: Database model ZXDB

Post by moroz1999 »

Einar Saukas wrote: Tue Apr 10, 2018 9:20 pm The benefit is to ensure the same file will always have the same PK, and a different file will always have a different PK. Using an arbitrary ID doesn't guarantee it. See the examples I mentioned in my last post.
I totally understand. The real question is: should the primary key really depend on file contents and get changed every time the file changes?
For me the release (download) is a separate entity, and it can exist without file at all, or the file can be updated and the release would still remain the same entity. I understand that's a different look at the info system architecture, but I'm seeing files as attachments to entities. For example, the post on this forum won't change it's ID if it's attached files get updated.
May be we are talking about different entities? For me the release and the download are the same thing. In ZXDB though they are separated (I know why, because of downloads combine different types of files).
May be I'm looking at the wrong thing? May be the table with releases is a one thing I need? Releases table doesn't have an id as well, it has PK entry_id+seq. What would happen if one software had more than one release, and would have to be deleted. Will the seq change? It should, but it means that any system/software/emulator relying on it will get a lot of problems. If releases could have static non-changeable numeric IDs, then I personally wouldn't care about downloads at all, because the ZXDB releases table would be a lot more logical to integrate ZX-Art releases with.
Einar Saukas wrote: Tue Apr 10, 2018 9:20 pm I agree this should work better for you, but it won't completely eliminate this possibility. For instance, in step 4, Alessandro Grussu may send 26 new files called "Apulija-13V2.0*.zip", asking for previous versions "Apulija-13V1.1*.zip" to be removed. So you will have to decide what to do in this case.
I think that new downloads should have been added instea
Einar Saukas wrote: Tue Apr 10, 2018 9:20 pm Anyway I agree that ENTRY_ID + FILE_LINK is best for you. This is exactly what I had in mind in my previous change. But it's not appropriate for Pavero. So my proposal is that ZXDB uses PK = (ENTRY_ID, CRC32) for Pavero, but also declares UK = (ENTRY_ID, FILE_LINK). This way, when you import data, you can adopt (ENTRY_ID, FILE_LINK) as key on your side. Makes sense?
Thanks! I don't really physically need a unique key, it's all down to business logic of import procedure. It's enough for me to know that file link won't change in future no matter what happens, so I can use them as reliable identificator in future as well. But I somehow doubt it is really possible, is it?
Anyway, having such PK seem to consume more memory that PK on integer, so is there any benefit really?
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: Database model ZXDB

Post by moroz1999 »

To summarize my position: for me this discussion has mostly academic value, because I'm working for 10 years in commerce web development and integrated a lot of really different systems of different age with each other, so it's interesting to hear the arguments. I'm not using ZXDB dynamically, it's near to impossible for ZX-Art, so you don't really have to build your architecture decisions on anything I suggest really.
For my synchronization mechanism I only need one reliable combination of columns to use as a stable non-changing GUID for releases which I would be sure will never change, whether it is file link+entry_id of separated integer.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

moroz1999 wrote: Tue Apr 10, 2018 10:01 pmI totally understand. The real question is: should the primary key really depend on file contents and get changed every time the file changes?
Which PK?

The release PK shouldn't change, if it's a different file in the same release.

But the file PK should change, if it's a different file.

moroz1999 wrote: Tue Apr 10, 2018 10:01 pmFor me the release (download) is a separate entity, and it can exist without file at all, or the file can be updated and the release would still remain the same entity. I understand that's a different look at the info system architecture, but I'm seeing files as attachments to entities. For example, the post on this forum won't change it's ID if it's attached files get updated.
May be we are talking about different entities? For me the release and the download are the same thing.
For individual screens and songs, there was no reason to distinguish between a release and a download file. So it made perfect sense that ZX-Art didn't have this distinction. But when you expand ZX-Art to also store software, then this distinction becomes important.

Suppose a certain game was originally published by RetroWorks. This is a release, that may contain several downloads (perhaps tape and disk versions, separate files for each idiom, 48K and 128K versions, or even a separate playable file for each part of the game). Later the same game is re-published by Monument Microgames, perhaps with a few changes like a different publisher logo in the load screen, or a few extra levels. Again you may have a complete different set of files.

For screens and songs, you can simply assign each file to a separate release, so these will work like before. But take advantage of this distinction for programs.

Also I think it makes sense to assign comments to the release, instead of individual files. Even if a game has separate 48K and 128K versions, and I want to comment specifically about the AY music in 128K, it's still better to put this comment once in the release, instead of repeating this comment in the English 128K file and again in the Spanish 128K file. Or if I want to say that parts 1 and 2 are better than part 3, a single comment on the release still works better than separate comments on the file of each part.

moroz1999 wrote: Tue Apr 10, 2018 10:01 pmMay be the table with releases is a one thing I need?
Yes!

moroz1999 wrote: Tue Apr 10, 2018 10:01 pmReleases table doesn't have an id as well, it has PK entry_id+seq. What would happen if one software had more than one release, and would have to be deleted. Will the seq change? It should, but it means that any system/software/emulator relying on it will get a lot of problems. If releases could have static non-changeable numeric IDs, then I personally wouldn't care about downloads at all, because the ZXDB releases table would be a lot more logical to integrate ZX-Art releases with.
Releases are static non-changeable numeric IDs. You can rely on them!

The only chance RELEASE_SEQ will change, is if we find out that a release was duplicated by mistake. But this is no different than finding out an entire entry was duplicated by mistake so an ENTRY_ID should be removed. Anyway removing either ENTRY_ID or RELEASE_SEQ will happen very very rarely, and when it happens I will announce it here so you can easily adjust it manually on your side.

moroz1999 wrote: Tue Apr 10, 2018 10:01 pmThanks! I don't really physically need a unique key, it's all down to business logic of import procedure. It's enough for me to know that file link won't change in future no matter what happens, so I can use them as reliable identificator in future as well. But I somehow doubt it is really possible, is it?
Not really. Anyway it seems associating comments (and any other information) to ENTRY_ID + RELEASE_SEQ will work better for you than any other option.

moroz1999 wrote: Tue Apr 10, 2018 10:01 pmAnyway, having such PK seem to consume more memory that PK on integer, so is there any benefit really?
This difference in memory consumption is too small, to be even noticeable in any hardware capable of running a SQL database. There's no reason to consider sacrificing design, when the performance gain would be negligible.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

moroz1999 wrote: Tue Apr 10, 2018 10:07 pm To summarize my position: for me this discussion has mostly academic value, because I'm working for 10 years in commerce web development and integrated a lot of really different systems of different age with each other, so it's interesting to hear the arguments. I'm not using ZXDB dynamically, it's near to impossible for ZX-Art, so you don't really have to build your architecture decisions on anything I suggest really.
For my synchronization mechanism I only need one reliable combination of columns to use as a stable non-changing GUID for releases which I would be sure will never change, whether it is file link+entry_id of separated integer.
The most stable GUID is certainly ENTRY_ID + RELEASE_SEQ.

And if you prefer a unique numeric value for each release, you can simply combine these numbers. The ENTRY_ID needs 7 digits and RELEASE_SEQ needs 2 digits, so (27762,1) can be represented as long value 002776201.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

moroz1999 wrote: Tue Apr 10, 2018 10:01 pm
Einar Saukas wrote: Tue Apr 10, 2018 9:20 pm I agree this should work better for you, but it won't completely eliminate this possibility. For instance, in step 4, Alessandro Grussu may send 26 new files called "Apulija-13V2.0*.zip", asking for previous versions "Apulija-13V1.1*.zip" to be removed. So you will have to decide what to do in this case.
I think that new downloads should have been added instea
It depends. We preserve previous versions when they are relevant. For instance, if they had somewhat different gameplay that people may also find interesting to see. Or multiple AGD and PAW versions, because games developed in one version may not work in another.

But if a new game version simply fixes bugs in previous versions, there's no reason to keep bugged versions too.
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: Database model ZXDB

Post by moroz1999 »

Ok, let's synchronize the terms:
ZX-Art production = ZXDB entry
ZX-Art release = ZXDB release (currently I sticked to downloads of particular type but I will change this to use releases)

Software (production, ZXDB entry) can have original author, screenshots, comments, year of first publication
Release can have year, publisher, additional authors, additional screenshots, additional comments and most importantly - file download.
Einar Saukas wrote: Wed Apr 11, 2018 3:40 pm Which PK?
The release PK shouldn't change, if it's a different file in the same release.
But the file PK should change, if it's a different file.
Thanks! I totally agree, I'll just use the releases table now. That download table usage decision is legacy which I will change.
Einar Saukas wrote: Wed Apr 11, 2018 3:40 pm Also I think it makes sense to assign comments to the release, instead of individual files. Even if a game has separate 48K and 128K versions, and I want to comment specifically about the AY music in 128K, it's still better to put this comment once in the release, instead of repeating this comment in the English 128K file and again in the Spanish 128K file. Or if I want to say that parts 1 and 2 are better than part 3, a single comment on the release still works better than separate comments on the file of each part.
I think that both program (production, entry) and release should be commentable. Some releases can be mods which fix the issues, add new materials, change the graphics but still remain the same game. So there can be real need to comment them.

Einar Saukas wrote: Wed Apr 11, 2018 3:40 pm Releases are static non-changeable numeric IDs. You can rely on them!
The only chance RELEASE_SEQ will change, is if we find out that a release was duplicated by mistake. But this is no different than finding out an entire entry was duplicated by mistake so an ENTRY_ID should be removed. Anyway removing either ENTRY_ID or RELEASE_SEQ will happen very very rarely, and when it happens I will announce it here so you can easily adjust it manually on your side.
Thank you! That really seems logical.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

Excellent! I agree with everything :)
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Database model ZXDB

Post by Einar Saukas »

There's now a new ZXDB thread here. Please take a look!

From now on, all announcements about ZXDB will be posted in the new thread.
Locked