ZXDB and russian web resources
Moderator: druellan
ZXDB and russian web resources
1. zxn.ru
What do you think about importing zxn.ru into ZXDB?
That's a fork of https://zxaaa.net/ which was made by main developer when owner of zxaaa.net decided to delete the whole database to prevent database copying to other websites like ZX-Art.
So now there are two databases: zxn.ru and zxaaa.net, developed by different persons and having different frontends. I will surely import all the content of zxn.ru to ZX-Art, that's a question of time.
1. What do you think about adding all these demos (the largest archive of TRDOS demos) to ZXDB?
2. How can we avoid duplication of information? I will store all archives original IDs for all programs and releases in ZX-Art, for example, so it would be possible to combine the entries from different database into single entry.
I think that I would be able to agree opening the API with the developer of zxn.ru.
2. vtrd.in
That's a really old TR-DOS archive, full of demos, games and system software.
3. ZX-ART
I don't have really much unique content at the moment - 99% have been imported from ZXDB. However, this will soon change as I will add zxn.ru, vtrd.in and may be www.worldofsam.org
I'm ready to provide the detailed API for synchronization with any system really. I'm also storing the IDs for all entries, releases, authors, groups, aliases, so it would be possible to unite the information from the different sources and re-run the synchronization as much as we need.
4. spectrum4ever.org
A collection of Russian tape cracks. A historical value mostly, and infringes copyright law. I will most possibly import it to ZX-Art as well. What about ZXDB?
What do you think about importing zxn.ru into ZXDB?
That's a fork of https://zxaaa.net/ which was made by main developer when owner of zxaaa.net decided to delete the whole database to prevent database copying to other websites like ZX-Art.
So now there are two databases: zxn.ru and zxaaa.net, developed by different persons and having different frontends. I will surely import all the content of zxn.ru to ZX-Art, that's a question of time.
1. What do you think about adding all these demos (the largest archive of TRDOS demos) to ZXDB?
2. How can we avoid duplication of information? I will store all archives original IDs for all programs and releases in ZX-Art, for example, so it would be possible to combine the entries from different database into single entry.
I think that I would be able to agree opening the API with the developer of zxn.ru.
2. vtrd.in
That's a really old TR-DOS archive, full of demos, games and system software.
3. ZX-ART
I don't have really much unique content at the moment - 99% have been imported from ZXDB. However, this will soon change as I will add zxn.ru, vtrd.in and may be www.worldofsam.org
I'm ready to provide the detailed API for synchronization with any system really. I'm also storing the IDs for all entries, releases, authors, groups, aliases, so it would be possible to unite the information from the different sources and re-run the synchronization as much as we need.
4. spectrum4ever.org
A collection of Russian tape cracks. A historical value mostly, and infringes copyright law. I will most possibly import it to ZX-Art as well. What about ZXDB?
Re: ZXDB and russian web resources
I would figure get as much data as possible then worry about removing duplication at a later date.
Re: ZXDB and russian web resources
Fair enough, but we are dealing with thousands of software titles, so anything which can be automated, should be automated
Re: ZXDB and russian web resources
One thing I have been looking forward to doing is comparing all of the files in the database for duplication. That is going to really open things up. To see how code got re-used throughout a community is going to be fascinating.
Take your point about the duplication that are recorded but even if its a few terabytes its not going to be the end of the world.
Take your point about the duplication that are recorded but even if its a few terabytes its not going to be the end of the world.
Re: ZXDB and russian web resources
If you need a file extraction for deep inspection you can use my PHP parsers of TAP, TRD, SCL. TZX will surely come in future as well, it's just a bit more complicated than others
https://github.com/moroz1999/zx-files
https://github.com/moroz1999/zx-files
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: ZXDB and russian web resources
Sorry for the late reply, I didn't notice this thread before...moroz1999 wrote: ↑Fri Mar 30, 2018 10:06 pm 1. zxn.ru
What do you think about importing zxn.ru into ZXDB?
That's a fork of https://zxaaa.net/ which was made by main developer when owner of zxaaa.net decided to delete the whole database to prevent database copying to other websites like ZX-Art.
So now there are two databases: zxn.ru and zxaaa.net, developed by different persons and having different frontends. I will surely import all the content of zxn.ru to ZX-Art, that's a question of time.
1. What do you think about adding all these demos (the largest archive of TRDOS demos) to ZXDB?
2. How can we avoid duplication of information? I will store all archives original IDs for all programs and releases in ZX-Art, for example, so it would be possible to combine the entries from different database into single entry.
I think that I would be able to agree opening the API with the developer of zxn.ru.
2. vtrd.in
That's a really old TR-DOS archive, full of demos, games and system software.
3. ZX-ART
I don't have really much unique content at the moment - 99% have been imported from ZXDB. However, this will soon change as I will add zxn.ru, vtrd.in and may be www.worldofsam.org
I'm ready to provide the detailed API for synchronization with any system really. I'm also storing the IDs for all entries, releases, authors, groups, aliases, so it would be possible to unite the information from the different sources and re-run the synchronization as much as we need.
4. spectrum4ever.org
A collection of Russian tape cracks. A historical value mostly, and infringes copyright law. I will most possibly import it to ZX-Art as well. What about ZXDB?
Yes, I'm very interested to import all Spectrum-related titles to ZXDB. Notice that ZXDB already have several scene demos, TRDOS games, and SAM Coupe titles. Many of them were recovered from WoS internal "lost" files. Therefore it makes perfect sense to complete this information in ZXDB now. Except for Russian cracks, they don't really fit into ZXDB and there's no need to catalogue them anyway, since these are not new titles. In this case, I think it will work better to simply let Archive.org store all cracks in TOSEC, and get them mapped to their corresponding entries in ZXDB, without storing them in ZXDB directly. In practice, the end result will be roughly the same.
My only concern is, it's our policy to contact site owners and request their approval before importing their data, and we only import as much data as they agreed. In return, ZXDB always store links back to the corresponding pages in their websites. This way, integrating with ZXDB typically provides more visibility (therefore visits) to their website. The idea is that integration should be mutually beneficial to both sites, not stealing away the other site's users. If you can help me get in touch with these site owners to discuss it, I will appreciate it!
Something else we need to decide is, if it makes more sense for you to integrate with all sites separately, or integrate them with ZXDB first so ZX-Art can simply obtain this information from a single source afterwards. Again, let's talk about it in further detail to find out the best solution for everybody involved.
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: ZXDB and russian web resources
I disagree. At first, this would be the easiest option since you would get an immediate result. However it would produce thousands of duplicates and it would be a nightmare to sort them out afterwards. Every data inconsistency would require spending time to track down the original information source again to investigate it.
The best way to avoid duplicated information, is mapping corresponding titles in both systems first, before combining their data. It requires more patience, but saves time in the long run.
Re: ZXDB and russian web resources
Einar, does it mean that you don't want individual MIA Russian games to be submitted to Spectrum Computing and ZXDB like here viewtopic.php?f=30&t=591 because you are going to get it all in some big import?
Re: ZXDB and russian web resources
There is no rushEinar Saukas wrote: ↑Tue Apr 10, 2018 3:19 am Sorry for the late reply, I didn't notice this thread before...
It totally makes sense, thanks. I'll still try to archive them all but marking the cracks with appropriate type of release.Einar Saukas wrote: ↑Tue Apr 10, 2018 3:19 am Except for Russian cracks, they don't really fit into ZXDB and there's no need to catalogue them anyway, since these are not new titles. In this case, I think it will work better to simply let Archive.org store all cracks in TOSEC, and get them mapped to their corresponding entries in ZXDB, without storing them in ZXDB directly. In practice, the end result will be roughly the same.
Einar Saukas wrote: ↑Tue Apr 10, 2018 3:19 am My only concern is, it's our policy to contact site owners and request their approval before importing their data, and we only import as much data as they agreed. In return, ZXDB always store links back to the corresponding pages in their websites. This way, integrating with ZXDB typically provides more visibility (therefore visits) to their website. The idea is that integration should be mutually beneficial to both sites, not stealing away the other site's users.
May be you are right, I'm a bit more pessimistic after all the nasty wars over file archives, when a man, who was trusted by numerous authors to preserve and distribute their works, all of a sudden decides he now owns the legacy of ZX Spectrum only because he collected and maintained the archive. I've seen nasty cases and I decided that I would better hurt somebody's feeling than allow the software legacy to get lost because of somebody's untimely ambitions.
My point is: the only one who decides for the work to be distributed or not is this work's author. Not some collection owner. Of course, if somebody doesn't want his work to be published on ZX-Art, I'll take down download links with no questions. There has already been a pair of such cases.
But I totally understand why are you trying to organize the things the way you want. I sincerely hope that you would be able to achieve it.
I've sent you contacts in private message. Please ask me if you need to contact anybody, I can help in some cases.Einar Saukas wrote: ↑Tue Apr 10, 2018 3:19 amIf you can help me get in touch with these site owners to discuss it, I will appreciate it!
That's the most complicated question really. Ralf has really got the point: new software has been added to different databases constantly as we are speaking, so even if you import one of archives, the week after you'll have to repeat it and somehow deal with the fact that same software gets added into different non-related databases.Einar Saukas wrote: ↑Tue Apr 10, 2018 3:19 am Something else we need to decide is, if it makes more sense for you to integrate with all sites separately, or integrate them with ZXDB first so ZX-Art can simply obtain this information from a single source afterwards. Again, let's talk about it in further detail to find out the best solution for everybody involved.
I'm going to resolve it this way:
1. I'll hold the unique guids for each author, alias, group, production and release from each database. This will allow me to run the import procedure more than once, so only the added information would have been imported.
2. For every new file I'll check the file's MD5, and if it already exists in database, I'll just save an additional guid, not make a duplicate.
3. For every new file non-existing in database, I'll try to find the existing author and author's existing software (by name+year) and add a new release to the existing software.
This means that I'll gather all the cracks, versions, mods and rereleases. Also, every sync procedure would be run periodically, I will need to manually fix the sync errors and improve the algorithms.
How are you planning to deal with this problem?
Re: ZXDB and russian web resources
Just on example. CKD Remake we've done last year was release on ZX-Art way before it was added to ZXDB.
https://zxart.ee/rus/soft/game/arcade/a ... izzy-2017/
This means that if I just import the latest version of ZXDB without any clever algorithms and MD5 checking, I'll get the duplicated software and duplicated release. I also wouldn't like to delete the release before re-importing it from ZXDB, because I will lose the meta-information such as downloads count (I will later track the emulator playing time as well to display the most popular software for each category!). So, really, deleting or losing anything is not an option for me. And for that purpose we have to provide guids for each database.
ZX-Art really has such guids naturally for every entity - it's own IDs and a mechanism for storing the guids from other databases.
https://zxart.ee/rus/soft/game/arcade/a ... izzy-2017/
This means that if I just import the latest version of ZXDB without any clever algorithms and MD5 checking, I'll get the duplicated software and duplicated release. I also wouldn't like to delete the release before re-importing it from ZXDB, because I will lose the meta-information such as downloads count (I will later track the emulator playing time as well to display the most popular software for each category!). So, really, deleting or losing anything is not an option for me. And for that purpose we have to provide guids for each database.
ZX-Art really has such guids naturally for every entity - it's own IDs and a mechanism for storing the guids from other databases.
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: ZXDB and russian web resources
Any integration will take considerable time to be done properly. In the meantime, we shouldn't stop adding titles to the database.Ralf wrote: ↑Tue Apr 10, 2018 8:18 am Einar, does it mean that you don't want individual MIA Russian games to be submitted to Spectrum Computing and ZXDB like here viewtopic.php?f=30&t=591 because you are going to get it all in some big import?
Please continue providing more titles but, whenever you have the corresponding page links at those Russians sites, please provide them too. We will also store these links in ZXDB, so it will be much easier to integrate with these sites later.
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm
Re: ZXDB and russian web resources
Thanks a lot! Since you also have similar plans, I will certainly involve you in these discussions too. So we can find the solution that works best for everyone.
ZXDB already deals with integrations that need to be updated periodically, such as RZX Archive and ZXSR. It's not too much of a problem, we just need to be careful on establishing a good process for each case.moroz1999 wrote: ↑Tue Apr 10, 2018 9:34 pmThat's the most complicated question really. Ralf has really got the point: new software has been added to different databases constantly as we are speaking, so even if you import one of archives, the week after you'll have to repeat it and somehow deal with the fact that same software gets added into different non-related databases.Einar Saukas wrote: ↑Tue Apr 10, 2018 3:19 amSomething else we need to decide is, if it makes more sense for you to integrate with all sites separately, or integrate them with ZXDB first so ZX-Art can simply obtain this information from a single source afterwards. Again, let's talk about it in further detail to find out the best solution for everybody involved.
What does "production" mean?
I agree about the others.
Also if a file disappears from ZXDB, it means the file was considered obsolete and replaced with a better version, so it may be better for you to replace it too.
Therefore it may work better for you if you simply drop all files from ZXDB, then import them all again. If any additional information you have (like comments) is associated with releases instead of individual files, then reimporting all files won't affect anything on your side.
This won't be necessary for new files reimported from ZXDB, because each one of them will already have ENTRY_ID and RELEASE_SEQ.
It won't be necessary for TOSEC files either, because you can obtain the ENTRY_ID for each of them from ZX Pokemaster.
It may not be necessary for Russian sites because, if ZXDB integrates with them, you should be able to obtain this information from ZXDB too. That's something we will need to decide.
ZXDB aims to have everything (versions, mods and re-releases), except cracks. And TOSEC aims to have everything including cracks, associated with ZXDB through ZX Pokemaster. Also ZXDB aims to keep information about corresponding information in every other site. Therefore importing data from ZXDB would give you all information you need for integrating with everybody else with minimum effort.
There's still a lot of work to be done before we get there, but we are moving faster than ever
Re: ZXDB and russian web resources
The difference is that new RZX's are being added only to RZX Archive, so there is no problem determing which one do you have and which one not.Einar Saukas wrote: ↑Wed Apr 11, 2018 6:29 pmZXDB already deals with integrations that need to be updated periodically, such as RZX Archive and ZXSR. It's not too much of a problem, we just need to be careful on establishing a good process for each case.
Imagine there was an online submission form for RZX, reviews, software or releases on spectrumcomputing.co.uk, which would instantly modify the database. Then the integration task becomes complicated
I just call entry "production", otherwise it's mostly the same.Einar Saukas wrote: ↑Wed Apr 11, 2018 6:29 pm What does "production" mean?
I agree about the others.
Yes, I agree. This is why I don't want to use entry_id+crc32 as GUID, because this would mean that after file update I won't find which release it belonged to previously.Einar Saukas wrote: ↑Wed Apr 11, 2018 6:29 pm Also if a file disappears from ZXDB, it means the file was considered obsolete and replaced with a better version, so it may be better for you to replace it too.
I'm afraid that's not so easy. The import procedure already takes hours, and if I re-download each file during each import, the problem get worse. Also, this will put a load on external database, which I would like to avoid at any means.Einar Saukas wrote: ↑Wed Apr 11, 2018 6:29 pm Therefore it may work better for you if you simply drop all files from ZXDB, then import them all again. If any additional information you have (like comments) is associated with releases instead of individual files, then reimporting all files won't affect anything on your side.
Thanks! ENTRY_ID + RELEASE_SEQ seems like the best choice for me at the moment.Einar Saukas wrote: ↑Wed Apr 11, 2018 6:29 pmThis won't be necessary for new files reimported from ZXDB, because each one of them will already have ENTRY_ID and RELEASE_SEQ.
I can surely guarantee, that really somebody would upload release to zxn.ru, somebody would upload it to zxart, and somebody would submit it to WOS. And they all be identical but all have different IDsEinar Saukas wrote: ↑Wed Apr 11, 2018 6:29 pmIt may not be necessary for Russian sites because, if ZXDB integrates with them, you should be able to obtain this information from ZXDB too. That's something we will need to decide.
We are all together basically building a distributed ZX Spectrum archive. It's complicated, but it's better protected from extinction than a single-point-of-failure centralized solution.
Great! Let's see what we'll have in result.Einar Saukas wrote: ↑Wed Apr 11, 2018 6:29 pm ZXDB aims to have everything (versions, mods and re-releases), except cracks. And TOSEC aims to have everything including cracks, associated with ZXDB through ZX Pokemaster. Also ZXDB aims to keep information about corresponding information in every other site. Therefore importing data from ZXDB would give you all information you need for integrating with everybody else with minimum effort.
There's still a lot of work to be done before we get there, but we are moving faster than ever
- Einar Saukas
- Bugaboo
- Posts: 3143
- Joined: Wed Nov 15, 2017 2:48 pm