What? Where did you get 1000 missing titles?This time, it contains over a thousand new titles and over a thousand new files.
Database model ZXDB
Moderator: druellan
Re: Database model ZXDB
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
Peter just posted a detailed list of changes here. We have lots of people collaborating with ZXDB content now, so we are gathering new content faster than ever!
A few numbers:
* Exactly 5,108 new titles are now catalogued in ZXDB (including all available information about authors, publishers, etc). The original WoS archive catalogued 32,018 titles, therefore ZXDB titles increased 19%.
* Exactly 8,457 new files are now indexed in ZXDB. The original WoS archive indexed 96,126 files, therefore ZXDB files increased 8.8%.
* Exactly 1,650 author roles (programmer, artist, musician, etc) are now identified in ZXDB. The original WoS archive didn't distinguish any author roles.
* Exactly 19,484 new external references were added to ZXDB. The original WoS archive had 7,279 external references (1,009 Wikipedia references, 4,893 Tipshop references, 222 Lost in Translation references, and 1,155 Freebase references). ZXDB added 10,394 ZXSR references, 3,255 MobyGames references, 2,233 CASA references, 3,027 RZX Archive references, 651 ZX81Stuff references, 1,975 RZX YouTube Channel references, 107 ZX81 Youtube references, and a few others. Therefore integration with other websites is about 3.7 times larger than the original WoS archive. In practice, it means whenever you look at your favorite games at Spectrum Computing or ZX Info, you get many links to check additional information about the game almost everywhere else (including links to the old WoS archive).
Moreover, about 10 different Spectrum sites now support the same IDs from ZXDB (such as ZX-Art and Speccy Screenshot Maps), thus making it much easier to have automatic integration between all of them. There's also the ZX Pokemaster project (by Helga) that associates the entire TOSEC archive to their corresponding ZXDB entries, and the ZXDB API (by Thomas) that allows direct access to ZXDB from everywhere else.
I never thought this project would get so big!
Re: Database model ZXDB
It's big and beautiful
Thank you all for making ZXDB probably the best source for Sinclair related information.
I know you mentioned elsewhere, but it is also worth to mention that ZXDB has become a fab source for ZX81 as well - collection far more information about ZX81 releases, than seen before - including new releases, proper ZXPand and Chroma interface tagging, lots of screenshots, Youtube links etc.
/T
https://api.zxinfo.dk/v3/ - ZXDB API for developers
zxinfo-file-browser - Cross platform app to manage your files
https://zxinfo.dk - another ZXDB frontend
zxinfo-file-browser - Cross platform app to manage your files
https://zxinfo.dk - another ZXDB frontend
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
Good point!
Next ZXDB update will have information about more than 1,700 programs for the ZX81. Over half of this data was catalogued by [mention]kolbeck[/mention] alone
Next ZXDB update will have information about more than 1,700 programs for the ZX81. Over half of this data was catalogued by [mention]kolbeck[/mention] alone
Re: Database model ZXDB
Crikey [mention]R-Tape[/mention] the next update page will need to be serialised across 10 volumes!
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
No! I just mean next ZXDB update will reach over 1,700 ZX81 titles in total. It already has over 1,600 ZX81 titles right now.
Re: Database model ZXDB
Phew.... Stand done [mention]R-Tape[/mention]
Re: Database model ZXDB
I think its a minority view, but I look forward to the potential analytics from all these new programs. If they could be archived in a way that the code could be analyzed it could answer so many interesting questions about how people created software.
Especially stuff like the utilities/tools so little was written about them in the magazines compared to the games.
Especially stuff like the utilities/tools so little was written about them in the magazines compared to the games.
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
This is a fairly technical post, targeted at people using ZXDB database in their websites or importing data directly from it. Everybody else should ignore this post!
I'm planning a few changes in database schema for the next ZXDB update. The main reasons are:
* Improve ZXDB model, by removing redundant or obsolete fields. Now that all relevant inconsistencies from original WoS archive were sorted out in ZXDB, we can finally afford to adopt a more strict model, that should help prevent most inconsistencies from ever happening again.
* Help integration with other databases and systems, by ensuring ZXDB tables use meaningful natural keys as PK whenever possible. This should induce any external references to ZXDB data to be more meaninful too, thus helping to better identify what exactly has changed in ZXDB in case of incremental imports.
The main changes in ZXDB will be:
* Table DOWNLOADS: Remove meaningless column ID. It will use natural key (ENTRY_ID, FILE_LINK) as PK.
* Table LABELFILES: Remove meaningless column ID. It will use natural key (LABEL_ID, FILE_LINK) as PK.
* Table MAGFILES: Remove meaningless column ID. It will use natural key (ISSUE_ID, FILE_LINK) as PK.
* Table LABELS: Remove columns IS_COMPANY and COMPANYTYPE_ID. These are deprecated fields, everybody should be already using LABELTYPE_ID instead.
There will be also changes in a few historical tables that atore deleted and "lost" files from old WoS archive, such as table EXTRAS. I'm sure nobody is using those other tables anyway, therefore nobody should be affected.
I'm planning a few changes in database schema for the next ZXDB update. The main reasons are:
* Improve ZXDB model, by removing redundant or obsolete fields. Now that all relevant inconsistencies from original WoS archive were sorted out in ZXDB, we can finally afford to adopt a more strict model, that should help prevent most inconsistencies from ever happening again.
* Help integration with other databases and systems, by ensuring ZXDB tables use meaningful natural keys as PK whenever possible. This should induce any external references to ZXDB data to be more meaninful too, thus helping to better identify what exactly has changed in ZXDB in case of incremental imports.
The main changes in ZXDB will be:
* Table DOWNLOADS: Remove meaningless column ID. It will use natural key (ENTRY_ID, FILE_LINK) as PK.
* Table LABELFILES: Remove meaningless column ID. It will use natural key (LABEL_ID, FILE_LINK) as PK.
* Table MAGFILES: Remove meaningless column ID. It will use natural key (ISSUE_ID, FILE_LINK) as PK.
* Table LABELS: Remove columns IS_COMPANY and COMPANYTYPE_ID. These are deprecated fields, everybody should be already using LABELTYPE_ID instead.
There will be also changes in a few historical tables that atore deleted and "lost" files from old WoS archive, such as table EXTRAS. I'm sure nobody is using those other tables anyway, therefore nobody should be affected.
Re: Database model ZXDB
Absolutely superb work Einar, as always, and a big thank you to you and all those helping you.
I'm paticularly interested in the ZX81 information which is currently growing as this was my first computer (10 years old, 1981). I have some very fond, but hazy (it was so long ago), memories of these early years. I keep checking in on the DB updates, keep seeing whats new and with people contributing to the ZX81 info, links have been appearing to sites which have been great to have a look through.
Only the other day, i managed to find "Hitch Hiker" and was playing it in an emulator. I loved that game.
Out of interest, what is a good zx81 emulator these days? One that i can just throw .p files at, good features and accurate emulation? I;ve got eightyone0.43c at the moment... Anyone?
Cheers.
I'm paticularly interested in the ZX81 information which is currently growing as this was my first computer (10 years old, 1981). I have some very fond, but hazy (it was so long ago), memories of these early years. I keep checking in on the DB updates, keep seeing whats new and with people contributing to the ZX81 info, links have been appearing to sites which have been great to have a look through.
Only the other day, i managed to find "Hitch Hiker" and was playing it in an emulator. I loved that game.
Out of interest, what is a good zx81 emulator these days? One that i can just throw .p files at, good features and accurate emulation? I;ve got eightyone0.43c at the moment... Anyone?
Cheers.
I don't have anything cool to put here, so i'll just be off now to see a priest with yeast stuck between his teeth and his friend called Keith who's a hairpiece thief...
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
Thanks for the kind words!
Regarding your question about emulators, I created a separate thread to avoid going off-topic here:
viewtopic.php?f=23&t=607
Regarding your question about emulators, I created a separate thread to avoid going off-topic here:
viewtopic.php?f=23&t=607
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
Yet another large ZXDB update is out! Download it here.
This time we added entries for almost a thousand electronic magazines. However the corresponding files are not available yet, it will be easier to add them in a separate update.
This time we added entries for almost a thousand electronic magazines. However the corresponding files are not available yet, it will be easier to add them in a separate update.
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
Next ZXDB update will have one more change:
* Table ROLES: Remove redundant column AUTHOR_SEQ.
You will notice that table AUTHORS already have natural key PK = (ENTRY_ID, LABEL_ID) instead of previous PK = (ENTRY_ID, AUTHOR_SEQ). Although column AUTHOR_SEQ won't be removed from table AUTHORS, anyone integrating or importing from ZXDB should use the new PK.
Likewise, table ROLES now have PK = (ENTRY_ID, LABEL_ID, ROLETYPE_ID) and FK = (ENTRY_ID, LABEL_ID). They were both based on AUTHOR_SEQ, that will be removed now.
I could have made this change in last ZXDB update, but I'm doing it in 2 steps on purpose (added ROLES.LABEL_ID in previous update and will remove ROLES.AUTHOR_SEQ in next update), to make it easier for other systems to adapt.
* Table ROLES: Remove redundant column AUTHOR_SEQ.
You will notice that table AUTHORS already have natural key PK = (ENTRY_ID, LABEL_ID) instead of previous PK = (ENTRY_ID, AUTHOR_SEQ). Although column AUTHOR_SEQ won't be removed from table AUTHORS, anyone integrating or importing from ZXDB should use the new PK.
Likewise, table ROLES now have PK = (ENTRY_ID, LABEL_ID, ROLETYPE_ID) and FK = (ENTRY_ID, LABEL_ID). They were both based on AUTHOR_SEQ, that will be removed now.
I could have made this change in last ZXDB update, but I'm doing it in 2 steps on purpose (added ROLES.LABEL_ID in previous update and will remove ROLES.AUTHOR_SEQ in next update), to make it easier for other systems to adapt.
Re: Database model ZXDB
I really hope that LABEL_ID would stay the same.
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
Yes, it will stay the same.
Re: Database model ZXDB
Thanks! I've just already used them for marking ZX-ART authors, groups, author aliases and group aliases accordance to ZXDB
- pavero
- Dynamite Dan
- Posts: 1586
- Joined: Sat Dec 09, 2017 11:49 pm
- Location: The Czech Republic
- Contact:
Re: Database model ZXDB
Hi Einar,Einar Saukas wrote: ↑Wed Apr 04, 2018 10:24 pm
The main changes in ZXDB will be:
* Table DOWNLOADS: Remove meaningless column ID. It will use natural key (ENTRY_ID, FILE_LINK) as PK.
* Table LABELFILES: Remove meaningless column ID. It will use natural key (LABEL_ID, FILE_LINK) as PK.
* Table MAGFILES: Remove meaningless column ID. It will use natural key (ISSUE_ID, FILE_LINK) as PK.
unfortunately I have found this post until now, when I have begun to investigate why my scripts with an online emulator totally stopped working.
My opinion to this matter:
Unique IDs in SQL tables are always useful even if they are redundant. Especially if you want to share informations between two PHP scripts via URL string.
Here is an example:
Until this moment I could easily use:
URL ... "playonline.php?dowload_id=?????"
Now I have to use:
URL ... "playonline.php?entry_id=?????&file_link=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
And there is always much better to use "short" numerical values in URLs, if there is possible, instead of "very long" strings.
(Note: all these scripts are dynamic, it doesn't matter at all if values of these IDs will change in the future or not.)
- pavero
- Dynamite Dan
- Posts: 1586
- Joined: Sat Dec 09, 2017 11:49 pm
- Location: The Czech Republic
- Contact:
Re: Database model ZXDB
I only wanted to say that unique IDs are always useful in some situations and there was no reason to remove them.
Re: Database model ZXDB
I totally agree to pavero.
It's not a problem for me to use natural keys as long as it don't change, but I can foresee that file link format can really change in future because of some refactoring or file structure optimization. Stable non-changing numeric IDs are a most practical solution when it comes to integration with any outer systems - whether they are other archives or emulators.
It's not a problem for me to use natural keys as long as it don't change, but I can foresee that file link format can really change in future because of some refactoring or file structure optimization. Stable non-changing numeric IDs are a most practical solution when it comes to integration with any outer systems - whether they are other archives or emulators.
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
Those are good points! Well, one the main reasons I post about ZXDB changes in advance is exactly to hear suggestions and feedback
In table DOWNLOADS, I considered that PK = (ENTRY_ID + FILE_LINK) would work better than PK = ID for systems integration but, to be honest, both options are error-prone. Consider the following (real) scenarios:
* A certain tape image is bugged, the authors provides a fixed file with the same name. The file is replaced in the repository, but if authors don't give version numbers to their releases, we don't invent version numbers ourselves either, thus the new file will be simply stored with the same filename as before. Therefore we will have a different file, but since there was no need to update the metadata in ZXDB, it will keep the same FILE_LINK and ID.
* A certain tape image seems bugged, so it's removed from the repository. Later we find out the authors provided file with incorrect extension TAP, it was actually a TZX file. Then the file is added again to ZXDB with a fixed filename. Therefore we will have the same file, with different FILE_LINK and, since it's getting added again in a different ZXDB release, it will also receive a different ID.
There are lots of other cases, but these 2 scenarios above should be enough to understand it's complicated to rely on either FILE_LINK or ID. What we really need is an identification associated to the actual content of each file, to identify it properly and detect when it has changed. In other words, we need a hash.
Even using a long hash, there's always a mathematical possibility of 2 different files with the same hash. Instead, I suggest identifying files with a short hash like CRC32, combined with ENTRY_ID. Therefore the primary key of table DOWNLOADS will be PK = (ENTRY_ID, CRC32). If ZXDB ever receives files with repeated hashes, at least it will happen within the same ENTRY_ID so it will be much easier for me to fix it.
Now here's my proposal:
* In table DOWNLOADS, I will immediately restore the old DOWNLOAD_ID (temporarily).
* Afterwards I will add ASAP a new column CRC32 in table DOWNLOADS, then remove again DOWNLOAD_ID and use PK = (ENTRY_ID, CRC32) instead.
In practice, it means Pavero will have to replace this:
with this:
Is this OK?
In table DOWNLOADS, I considered that PK = (ENTRY_ID + FILE_LINK) would work better than PK = ID for systems integration but, to be honest, both options are error-prone. Consider the following (real) scenarios:
* A certain tape image is bugged, the authors provides a fixed file with the same name. The file is replaced in the repository, but if authors don't give version numbers to their releases, we don't invent version numbers ourselves either, thus the new file will be simply stored with the same filename as before. Therefore we will have a different file, but since there was no need to update the metadata in ZXDB, it will keep the same FILE_LINK and ID.
* A certain tape image seems bugged, so it's removed from the repository. Later we find out the authors provided file with incorrect extension TAP, it was actually a TZX file. Then the file is added again to ZXDB with a fixed filename. Therefore we will have the same file, with different FILE_LINK and, since it's getting added again in a different ZXDB release, it will also receive a different ID.
There are lots of other cases, but these 2 scenarios above should be enough to understand it's complicated to rely on either FILE_LINK or ID. What we really need is an identification associated to the actual content of each file, to identify it properly and detect when it has changed. In other words, we need a hash.
Even using a long hash, there's always a mathematical possibility of 2 different files with the same hash. Instead, I suggest identifying files with a short hash like CRC32, combined with ENTRY_ID. Therefore the primary key of table DOWNLOADS will be PK = (ENTRY_ID, CRC32). If ZXDB ever receives files with repeated hashes, at least it will happen within the same ENTRY_ID so it will be much easier for me to fix it.
Now here's my proposal:
* In table DOWNLOADS, I will immediately restore the old DOWNLOAD_ID (temporarily).
* Afterwards I will add ASAP a new column CRC32 in table DOWNLOADS, then remove again DOWNLOAD_ID and use PK = (ENTRY_ID, CRC32) instead.
In practice, it means Pavero will have to replace this:
Code: Select all
URL ... "playonline.php?dowload_id=106584"
Code: Select all
URL ... "playonline.php?id=30237&hash=229a2e3e"
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
OK, there's already a new ZXDB update available here!
This update restores ID on tables DOWNLOADS, LABELFILES, and MAGFILES, for now. I temporarily reverted these changes in a hurry, because of the impact on Pavero's online emulator. Now I will wait for feedback on my proposed change to use ENTRY_ID+CRC32 instead.
This ZXDB release also contains about 500 new electronic magazine files, and a few minor archive updates.
This update restores ID on tables DOWNLOADS, LABELFILES, and MAGFILES, for now. I temporarily reverted these changes in a hurry, because of the impact on Pavero's online emulator. Now I will wait for feedback on my proposed change to use ENTRY_ID+CRC32 instead.
This ZXDB release also contains about 500 new electronic magazine files, and a few minor archive updates.
- pavero
- Dynamite Dan
- Posts: 1586
- Joined: Sat Dec 09, 2017 11:49 pm
- Location: The Czech Republic
- Contact:
Re: Database model ZXDB
Yes, it could be a compromise solution. The short hash in URL is much better than a string with full path.Einar Saukas wrote: ↑Sun Apr 08, 2018 11:03 pm
Now here's my proposal:
* In table DOWNLOADS, I will immediately restore the old DOWNLOAD_ID (temporarily).
* Afterwards I will add ASAP a new column CRC32 in table DOWNLOADS, then remove again DOWNLOAD_ID and use PK = (ENTRY_ID, CRC32) instead.
In practice, it means Pavero will have to replace this:
with this:Code: Select all
URL ... "playonline.php?dowload_id=106584"
Is this OK?Code: Select all
URL ... "playonline.php?id=30237&hash=229a2e3e"
But, is there really necessary to set ENTRY_ID+CRC32 as PK in downloads table?
And what about this solution?
ID - PK with autoincrement
ENTRY_ID+CRC32 - set INDEX only
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
I don't like the idea of using meaningless auto-incremented ID as PK, because eventually someone will assume again they can rely on it to identify changes. Even if I document it, it won't help. Nobody reads manuals nowadays. Frankly how many people here have read the entire "ZXDB.txt" documentation that I wrote?
I believe PK = (ENTRY_ID, CRC32) is a good tradeoff between providing a short convenient PK, that's more meaningful and fail proof.
I believe PK = (ENTRY_ID, CRC32) is a good tradeoff between providing a short convenient PK, that's more meaningful and fail proof.
Re: Database model ZXDB
I've been thinking about natural key containing CRC32. If file gets replaced, then CRC32 gets recalculated, and this is how the integrated systems would understand that the file has changed. This would work exactly the same way with integer PK really, so it won't bring the benefit
It has the following problem though:
1. File (or release) has been added to prod.
2. It gets synced with external system. External system remembers synced PK (CRC32+entry_id).
3. Somebody comments on file in external system reporting the problem, comment gets connected to that file.
4. File gets updated with bugfix, version changes in ZXDB, CRC32 changes as well.
Result: the comment is still connected to old PK using old CRC32. After re-syncing external system either gets two files (old+new) or loses the comment.
I suggest using entry_id + file download link, they are not the best solution possible, but it makes the continuous integration possible.
It has the following problem though:
1. File (or release) has been added to prod.
2. It gets synced with external system. External system remembers synced PK (CRC32+entry_id).
3. Somebody comments on file in external system reporting the problem, comment gets connected to that file.
4. File gets updated with bugfix, version changes in ZXDB, CRC32 changes as well.
Result: the comment is still connected to old PK using old CRC32. After re-syncing external system either gets two files (old+new) or loses the comment.
I suggest using entry_id + file download link, they are not the best solution possible, but it makes the continuous integration possible.
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: Database model ZXDB
The benefit is to ensure the same file will always have the same PK, and a different file will always have a different PK. Using an arbitrary ID doesn't guarantee it. See the examples I mentioned in my last post.moroz1999 wrote: ↑Tue Apr 10, 2018 8:11 pmI've been thinking about natural key containing CRC32. If file gets replaced, then CRC32 gets recalculated, and this is how the integrated systems would understand that the file has changed. This would work exactly the same way with integer PK really, so it won't bring the benefit
I agree this should work better for you, but it won't completely eliminate this possibility. For instance, in step 4, Alessandro Grussu may send 26 new files called "Apulija-13V2.0*.zip", asking for previous versions "Apulija-13V1.1*.zip" to be removed. So you will have to decide what to do in this case.moroz1999 wrote: ↑Tue Apr 10, 2018 8:11 pmIt has the following problem though:
1. File (or release) has been added to prod.
2. It gets synced with external system. External system remembers synced PK (CRC32+entry_id).
3. Somebody comments on file in external system reporting the problem, comment gets connected to that file.
4. File gets updated with bugfix, version changes in ZXDB, CRC32 changes as well.
Result: the comment is still connected to old PK using old CRC32. After re-syncing external system either gets two files (old+new) or loses the comment.
I suggest using entry_id + file download link, they are not the best solution possible, but it makes the continuous integration possible.
Anyway I agree that ENTRY_ID + FILE_LINK is best for you. This is exactly what I had in mind in my previous change. But it's not appropriate for Pavero. So my proposal is that ZXDB uses PK = (ENTRY_ID, CRC32) for Pavero, but also declares UK = (ENTRY_ID, FILE_LINK). This way, when you import data, you can adopt (ENTRY_ID, FILE_LINK) as key on your side. Makes sense?