ZXDB and russian web resources

This is the place for general discussion and updates about the ZXDB Database. This forum is not specific to Spectrum Computing.

Moderator: druellan

Post Reply
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

ZXDB and russian web resources

Post by moroz1999 »

1. zxn.ru
What do you think about importing zxn.ru into ZXDB?
That's a fork of https://zxaaa.net/ which was made by main developer when owner of zxaaa.net decided to delete the whole database to prevent database copying to other websites like ZX-Art.

So now there are two databases: zxn.ru and zxaaa.net, developed by different persons and having different frontends. I will surely import all the content of zxn.ru to ZX-Art, that's a question of time.
1. What do you think about adding all these demos (the largest archive of TRDOS demos) to ZXDB?
2. How can we avoid duplication of information? I will store all archives original IDs for all programs and releases in ZX-Art, for example, so it would be possible to combine the entries from different database into single entry.
I think that I would be able to agree opening the API with the developer of zxn.ru.

2. vtrd.in
That's a really old TR-DOS archive, full of demos, games and system software.

3. ZX-ART
I don't have really much unique content at the moment - 99% have been imported from ZXDB. However, this will soon change as I will add zxn.ru, vtrd.in and may be www.worldofsam.org
I'm ready to provide the detailed API for synchronization with any system really. I'm also storing the IDs for all entries, releases, authors, groups, aliases, so it would be possible to unite the information from the different sources and re-run the synchronization as much as we need.

4. spectrum4ever.org
A collection of Russian tape cracks. A historical value mostly, and infringes copyright law. I will most possibly import it to ZX-Art as well. What about ZXDB?
Nomad
Manic Miner
Posts: 600
Joined: Thu Dec 28, 2017 12:38 pm

Re: ZXDB and russian web resources

Post by Nomad »

I would figure get as much data as possible then worry about removing duplication at a later date.
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: ZXDB and russian web resources

Post by moroz1999 »

Fair enough, but we are dealing with thousands of software titles, so anything which can be automated, should be automated :)
Nomad
Manic Miner
Posts: 600
Joined: Thu Dec 28, 2017 12:38 pm

Re: ZXDB and russian web resources

Post by Nomad »

One thing I have been looking forward to doing is comparing all of the files in the database for duplication. That is going to really open things up. To see how code got re-used throughout a community is going to be fascinating.

Take your point about the duplication that are recorded but even if its a few terabytes its not going to be the end of the world.
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: ZXDB and russian web resources

Post by moroz1999 »

If you need a file extraction for deep inspection you can use my PHP parsers of TAP, TRD, SCL. TZX will surely come in future as well, it's just a bit more complicated than others :)
https://github.com/moroz1999/zx-files
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: ZXDB and russian web resources

Post by Einar Saukas »

moroz1999 wrote: Fri Mar 30, 2018 10:06 pm 1. zxn.ru
What do you think about importing zxn.ru into ZXDB?
That's a fork of https://zxaaa.net/ which was made by main developer when owner of zxaaa.net decided to delete the whole database to prevent database copying to other websites like ZX-Art.

So now there are two databases: zxn.ru and zxaaa.net, developed by different persons and having different frontends. I will surely import all the content of zxn.ru to ZX-Art, that's a question of time.
1. What do you think about adding all these demos (the largest archive of TRDOS demos) to ZXDB?
2. How can we avoid duplication of information? I will store all archives original IDs for all programs and releases in ZX-Art, for example, so it would be possible to combine the entries from different database into single entry.
I think that I would be able to agree opening the API with the developer of zxn.ru.

2. vtrd.in
That's a really old TR-DOS archive, full of demos, games and system software.

3. ZX-ART
I don't have really much unique content at the moment - 99% have been imported from ZXDB. However, this will soon change as I will add zxn.ru, vtrd.in and may be www.worldofsam.org
I'm ready to provide the detailed API for synchronization with any system really. I'm also storing the IDs for all entries, releases, authors, groups, aliases, so it would be possible to unite the information from the different sources and re-run the synchronization as much as we need.

4. spectrum4ever.org
A collection of Russian tape cracks. A historical value mostly, and infringes copyright law. I will most possibly import it to ZX-Art as well. What about ZXDB?
Sorry for the late reply, I didn't notice this thread before...

Yes, I'm very interested to import all Spectrum-related titles to ZXDB. Notice that ZXDB already have several scene demos, TRDOS games, and SAM Coupe titles. Many of them were recovered from WoS internal "lost" files. Therefore it makes perfect sense to complete this information in ZXDB now. Except for Russian cracks, they don't really fit into ZXDB and there's no need to catalogue them anyway, since these are not new titles. In this case, I think it will work better to simply let Archive.org store all cracks in TOSEC, and get them mapped to their corresponding entries in ZXDB, without storing them in ZXDB directly. In practice, the end result will be roughly the same.

My only concern is, it's our policy to contact site owners and request their approval before importing their data, and we only import as much data as they agreed. In return, ZXDB always store links back to the corresponding pages in their websites. This way, integrating with ZXDB typically provides more visibility (therefore visits) to their website. The idea is that integration should be mutually beneficial to both sites, not stealing away the other site's users. If you can help me get in touch with these site owners to discuss it, I will appreciate it!

Something else we need to decide is, if it makes more sense for you to integrate with all sites separately, or integrate them with ZXDB first so ZX-Art can simply obtain this information from a single source afterwards. Again, let's talk about it in further detail to find out the best solution for everybody involved.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: ZXDB and russian web resources

Post by Einar Saukas »

Nomad wrote: Sat Mar 31, 2018 1:47 am I would figure get as much data as possible then worry about removing duplication at a later date.
I disagree. At first, this would be the easiest option since you would get an immediate result. However it would produce thousands of duplicates and it would be a nightmare to sort them out afterwards. Every data inconsistency would require spending time to track down the original information source again to investigate it.

The best way to avoid duplicated information, is mapping corresponding titles in both systems first, before combining their data. It requires more patience, but saves time in the long run.
Ralf
Rick Dangerous
Posts: 2279
Joined: Mon Nov 13, 2017 11:59 am
Location: Poland

Re: ZXDB and russian web resources

Post by Ralf »

Einar, does it mean that you don't want individual MIA Russian games to be submitted to Spectrum Computing and ZXDB like here viewtopic.php?f=30&t=591 because you are going to get it all in some big import?
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: ZXDB and russian web resources

Post by moroz1999 »

Einar Saukas wrote: Tue Apr 10, 2018 3:19 am Sorry for the late reply, I didn't notice this thread before...
There is no rush :)
Einar Saukas wrote: Tue Apr 10, 2018 3:19 am Except for Russian cracks, they don't really fit into ZXDB and there's no need to catalogue them anyway, since these are not new titles. In this case, I think it will work better to simply let Archive.org store all cracks in TOSEC, and get them mapped to their corresponding entries in ZXDB, without storing them in ZXDB directly. In practice, the end result will be roughly the same.
It totally makes sense, thanks. I'll still try to archive them all but marking the cracks with appropriate type of release.
Einar Saukas wrote: Tue Apr 10, 2018 3:19 am My only concern is, it's our policy to contact site owners and request their approval before importing their data, and we only import as much data as they agreed. In return, ZXDB always store links back to the corresponding pages in their websites. This way, integrating with ZXDB typically provides more visibility (therefore visits) to their website. The idea is that integration should be mutually beneficial to both sites, not stealing away the other site's users.

May be you are right, I'm a bit more pessimistic after all the nasty wars over file archives, when a man, who was trusted by numerous authors to preserve and distribute their works, all of a sudden decides he now owns the legacy of ZX Spectrum only because he collected and maintained the archive. I've seen nasty cases and I decided that I would better hurt somebody's feeling than allow the software legacy to get lost because of somebody's untimely ambitions.
My point is: the only one who decides for the work to be distributed or not is this work's author. Not some collection owner. Of course, if somebody doesn't want his work to be published on ZX-Art, I'll take down download links with no questions. There has already been a pair of such cases.

But I totally understand why are you trying to organize the things the way you want. I sincerely hope that you would be able to achieve it.
Einar Saukas wrote: Tue Apr 10, 2018 3:19 amIf you can help me get in touch with these site owners to discuss it, I will appreciate it!
I've sent you contacts in private message. Please ask me if you need to contact anybody, I can help in some cases.
Einar Saukas wrote: Tue Apr 10, 2018 3:19 am Something else we need to decide is, if it makes more sense for you to integrate with all sites separately, or integrate them with ZXDB first so ZX-Art can simply obtain this information from a single source afterwards. Again, let's talk about it in further detail to find out the best solution for everybody involved.
That's the most complicated question really. Ralf has really got the point: new software has been added to different databases constantly as we are speaking, so even if you import one of archives, the week after you'll have to repeat it and somehow deal with the fact that same software gets added into different non-related databases.
I'm going to resolve it this way:
1. I'll hold the unique guids for each author, alias, group, production and release from each database. This will allow me to run the import procedure more than once, so only the added information would have been imported.
2. For every new file I'll check the file's MD5, and if it already exists in database, I'll just save an additional guid, not make a duplicate.
3. For every new file non-existing in database, I'll try to find the existing author and author's existing software (by name+year) and add a new release to the existing software.
This means that I'll gather all the cracks, versions, mods and rereleases. Also, every sync procedure would be run periodically, I will need to manually fix the sync errors and improve the algorithms.

How are you planning to deal with this problem?
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: ZXDB and russian web resources

Post by moroz1999 »

Just on example. CKD Remake we've done last year was release on ZX-Art way before it was added to ZXDB.
https://zxart.ee/rus/soft/game/arcade/a ... izzy-2017/

This means that if I just import the latest version of ZXDB without any clever algorithms and MD5 checking, I'll get the duplicated software and duplicated release. I also wouldn't like to delete the release before re-importing it from ZXDB, because I will lose the meta-information such as downloads count (I will later track the emulator playing time as well to display the most popular software for each category!). So, really, deleting or losing anything is not an option for me. And for that purpose we have to provide guids for each database.
ZX-Art really has such guids naturally for every entity - it's own IDs and a mechanism for storing the guids from other databases.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: ZXDB and russian web resources

Post by Einar Saukas »

Ralf wrote: Tue Apr 10, 2018 8:18 am Einar, does it mean that you don't want individual MIA Russian games to be submitted to Spectrum Computing and ZXDB like here viewtopic.php?f=30&t=591 because you are going to get it all in some big import?
Any integration will take considerable time to be done properly. In the meantime, we shouldn't stop adding titles to the database.

Please continue providing more titles but, whenever you have the corresponding page links at those Russians sites, please provide them too. We will also store these links in ZXDB, so it will be much easier to integrate with these sites later.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: ZXDB and russian web resources

Post by Einar Saukas »

moroz1999 wrote: Tue Apr 10, 2018 9:34 pmI've sent you contacts in private message. Please ask me if you need to contact anybody, I can help in some cases.
Thanks a lot! Since you also have similar plans, I will certainly involve you in these discussions too. So we can find the solution that works best for everyone.

moroz1999 wrote: Tue Apr 10, 2018 9:34 pm
Einar Saukas wrote: Tue Apr 10, 2018 3:19 amSomething else we need to decide is, if it makes more sense for you to integrate with all sites separately, or integrate them with ZXDB first so ZX-Art can simply obtain this information from a single source afterwards. Again, let's talk about it in further detail to find out the best solution for everybody involved.
That's the most complicated question really. Ralf has really got the point: new software has been added to different databases constantly as we are speaking, so even if you import one of archives, the week after you'll have to repeat it and somehow deal with the fact that same software gets added into different non-related databases.
ZXDB already deals with integrations that need to be updated periodically, such as RZX Archive and ZXSR. It's not too much of a problem, we just need to be careful on establishing a good process for each case.

moroz1999 wrote: Tue Apr 10, 2018 9:34 pmI'm going to resolve it this way:
1. I'll hold the unique guids for each author, alias, group, production and release from each database. This will allow me to run the import procedure more than once, so only the added information would have been imported.
What does "production" mean?

I agree about the others.

moroz1999 wrote: Tue Apr 10, 2018 9:34 pm2. For every new file I'll check the file's MD5, and if it already exists in database, I'll just save an additional guid, not make a duplicate.
Also if a file disappears from ZXDB, it means the file was considered obsolete and replaced with a better version, so it may be better for you to replace it too.

Therefore it may work better for you if you simply drop all files from ZXDB, then import them all again. If any additional information you have (like comments) is associated with releases instead of individual files, then reimporting all files won't affect anything on your side.

moroz1999 wrote: Tue Apr 10, 2018 9:34 pm3. For every new file non-existing in database, I'll try to find the existing author and author's existing software (by name+year) and add a new release to the existing software.
This won't be necessary for new files reimported from ZXDB, because each one of them will already have ENTRY_ID and RELEASE_SEQ.

It won't be necessary for TOSEC files either, because you can obtain the ENTRY_ID for each of them from ZX Pokemaster.

It may not be necessary for Russian sites because, if ZXDB integrates with them, you should be able to obtain this information from ZXDB too. That's something we will need to decide.

moroz1999 wrote: Tue Apr 10, 2018 9:34 pmThis means that I'll gather all the cracks, versions, mods and rereleases. Also, every sync procedure would be run periodically, I will need to manually fix the sync errors and improve the algorithms.
ZXDB aims to have everything (versions, mods and re-releases), except cracks. And TOSEC aims to have everything including cracks, associated with ZXDB through ZX Pokemaster. Also ZXDB aims to keep information about corresponding information in every other site. Therefore importing data from ZXDB would give you all information you need for integrating with everybody else with minimum effort.

There's still a lot of work to be done before we get there, but we are moving faster than ever :)
User avatar
moroz1999
Manic Miner
Posts: 329
Joined: Fri Mar 30, 2018 9:22 pm

Re: ZXDB and russian web resources

Post by moroz1999 »

Einar Saukas wrote: Wed Apr 11, 2018 6:29 pmZXDB already deals with integrations that need to be updated periodically, such as RZX Archive and ZXSR. It's not too much of a problem, we just need to be careful on establishing a good process for each case.
The difference is that new RZX's are being added only to RZX Archive, so there is no problem determing which one do you have and which one not.
Imagine there was an online submission form for RZX, reviews, software or releases on spectrumcomputing.co.uk, which would instantly modify the database. Then the integration task becomes complicated :)

Einar Saukas wrote: Wed Apr 11, 2018 6:29 pm What does "production" mean?
I agree about the others.
I just call entry "production", otherwise it's mostly the same.
Einar Saukas wrote: Wed Apr 11, 2018 6:29 pm Also if a file disappears from ZXDB, it means the file was considered obsolete and replaced with a better version, so it may be better for you to replace it too.
Yes, I agree. This is why I don't want to use entry_id+crc32 as GUID, because this would mean that after file update I won't find which release it belonged to previously.
Einar Saukas wrote: Wed Apr 11, 2018 6:29 pm Therefore it may work better for you if you simply drop all files from ZXDB, then import them all again. If any additional information you have (like comments) is associated with releases instead of individual files, then reimporting all files won't affect anything on your side.
I'm afraid that's not so easy. The import procedure already takes hours, and if I re-download each file during each import, the problem get worse. Also, this will put a load on external database, which I would like to avoid at any means.
Einar Saukas wrote: Wed Apr 11, 2018 6:29 pmThis won't be necessary for new files reimported from ZXDB, because each one of them will already have ENTRY_ID and RELEASE_SEQ.
Thanks! ENTRY_ID + RELEASE_SEQ seems like the best choice for me at the moment.
Einar Saukas wrote: Wed Apr 11, 2018 6:29 pmIt may not be necessary for Russian sites because, if ZXDB integrates with them, you should be able to obtain this information from ZXDB too. That's something we will need to decide.
I can surely guarantee, that really somebody would upload release to zxn.ru, somebody would upload it to zxart, and somebody would submit it to WOS. And they all be identical but all have different IDs :)
We are all together basically building a distributed ZX Spectrum archive. It's complicated, but it's better protected from extinction than a single-point-of-failure centralized solution.
Einar Saukas wrote: Wed Apr 11, 2018 6:29 pm ZXDB aims to have everything (versions, mods and re-releases), except cracks. And TOSEC aims to have everything including cracks, associated with ZXDB through ZX Pokemaster. Also ZXDB aims to keep information about corresponding information in every other site. Therefore importing data from ZXDB would give you all information you need for integrating with everybody else with minimum effort.

There's still a lot of work to be done before we get there, but we are moving faster than ever :)
Great! Let's see what we'll have in result.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: ZXDB and russian web resources

Post by Einar Saukas »

Cool :)
Post Reply