New Database Model ZXDB

This is the place for general discussion and updates about the ZXDB Database. This forum is not specific to Spectrum Computing.

Moderator: druellan

Post Reply
User avatar
Juan F. Ramirez
Bugaboo
Posts: 5101
Joined: Tue Nov 14, 2017 6:55 am
Location: Málaga, Spain

Re: New Database Model ZXDB

Post by Juan F. Ramirez »

Einar Saukas wrote: Sat Jun 01, 2019 5:43 am A new ZXDB update is available!

This new version contains game review scores from all major magazines (many thanks to Chris from ZXSR for this information!)
Great, a cool feature quite expected (at least for me!)
User avatar
PeterJ
Site Admin
Posts: 6854
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: New Database Model ZXDB

Post by PeterJ »

We recently spoke about the 'little bug' threads and some way of knowing they have been processed.

[mention]druellan[/mention] (who kindly tracks and maintains the changes) now has rights to edit posts in the 'ZXDB Discussion' sub-forum (Note - he can edit this section of the forum only).

When a change is accepted and has been processed for entry into ZXDB he will add the following to the end of the post:
✓ Checked
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

Yet another ZXDB update is available!

This release mostly contains latest bugfixes compiled by [mention]druellan[/mention] based on people's contributions in this forum. :)
User avatar
R-Tape
Site Admin
Posts: 6353
Joined: Thu Nov 09, 2017 11:46 am

Re: New Database Model ZXDB

Post by R-Tape »

Einar Saukas wrote: Wed Jun 05, 2019 3:23 am Yet another ZXDB update is available!

This release mostly contains latest bugfixes compiled by @druellan based on people's contributions in this forum. :)
Fantastic! Bloody well done Dru! You're the Tasmanian Devil of bugfixes (erm—in a good way that is).
User avatar
druellan
Dynamite Dan
Posts: 1466
Joined: Tue Apr 03, 2018 7:19 pm

Re: New Database Model ZXDB

Post by druellan »

R-Tape wrote: Wed Jun 05, 2019 9:45 am Fantastic! Bloody well done Dru! You're the Tasmanian Devil of bugfixes (erm—in a good way that is).
And the site didn't explode! yay!
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

In the next upcoming ZXDB update, column "formattype_id" won't be used anymore. This column will still exist in all file-related tables (so it doesn't break SQL queries in sites that were not updated yet), but it will be always NULL.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

A new ZXDB update is available!

Special thanks to [mention]R-Tape[/mention] (new titles) and [mention]pavero[/mention] (hires inlays).
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

I forgot to mention that I have fixed references to magazine "Personal Computer News" in this last update!

These references were originally imported from Martijn's WoS database. The problem is, "Personal Computer News" was a weekly magazine, but Martijn's WoS only supported monthly magazines, so it couldn't distinguish between different issues released during the same month.

For instance, if you look at references to "Personal Computer News" for "Pssst" at WoS, you will notice that these full page links are missing:

http://www.worldofspectrum.org/infoseek ... id=0009401

Now if you look at the magazine sections for the same game at SC, you will notice that all "Personal Computer News" links work properly:

https://spectrumcomputing.co.uk/index.p ... 96&id=9401

For each page, you have the option to open just the specific page image, or download the entire magazine issue PDF, or view the PDF directly inside the browser. Or you can click on the magazine issue to see a list of other game references in that specific issue. Or even click on the magazine name to get a list of all magazine issues...

If this explanation above sounds confusing, just try it yourself. It's quite intuitive in practice.

Support for magazines is certainly not a new feature at SC. It has existed for ages, but I bet lots of users never noticed it :)
User avatar
PeterJ
Site Admin
Posts: 6854
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: New Database Model ZXDB

Post by PeterJ »

Einar Saukas wrote: Tue Jun 18, 2019 10:55 pm I forgot to mention that I have fixed references to magazine "Personal Computer News" in this last update!
Fantastic [mention]Einar Saukas[/mention]

PCN was always one of my favourites. It had a very different style to the other weeklies.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: New Database Model ZXDB

Post by djnzx48 »

Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

djnzx48 wrote: Wed Jun 19, 2019 6:27 am Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.

Let me explain:

When you click on a page number, SC opens the corresponding image file. So if it needs to show page 4 of issue #52 of magazine X, it simply has to open a file like "magx05200004.jpg" or something similar. That's easy.

However when you click on the "VIEW" link next to the page number, SC will try to open the corresponding page of the PDF file inside the browser. But how can it find out what's the corresponding page?

* In certain magazines, the cover is not included in numbering. Therefore page number 4 is actually the 5th page of the PDF file (as if cover was page number zero).

* In other magazines, the cover is considered as page number 1. Therefore page number 4 coincides with the 4th page of the PDF.

* In other magazines, page number 4 could be the 8th page of the PDF, due to additional index and advertising pages without numbering at the beginning (as if cover was page number -3), for instance.

The best way to solve this problem is to store this information in ZXDB for each magazine issue. The magazines currently indexed as PDF in ZXDB are listed below:

Crash

Jogos 80

Micro Mart

Mundo Spectrum

Personal Computer News

Planeta Sinclair Almanaque

RetroMagazine

Sinclair User

Spectrum Today (EN)

The Spectrum Show

If anyone is willing to help, by checking "what would be the corresponding page number for the cover?" for even a few of these issues, it would help us a lot!
User avatar
PeterJ
Site Admin
Posts: 6854
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: New Database Model ZXDB

Post by PeterJ »

Thanks for the great explanation [mention]Einar Saukas[/mention]

When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: New Database Model ZXDB

Post by djnzx48 »

Einar Saukas wrote: Wed Jun 19, 2019 7:25 pm
djnzx48 wrote: Wed Jun 19, 2019 6:27 am Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.
OK, I didn't know whether the links were stored in the database or autogenerated on the website. But are you sure the problem isn't just a simple off-by-one error? Having a brief look, all the magazines I've seen so far have the correct page numbers on archive.org, except for:

PCN: the numbering starts at the contents page rather than the cover. So the archive.org numbers are two pages ahead.

The Spectrum Show Magazine (issues #0 and #1 only): the numbering starts on the next page after the cover. So the archive.org numbers are one page ahead.

In the archive.org URLs, #page/n1 refers to the second page of the magazine, #page/n2 refers to the third, and so on. For the title page, this parameter is simply omitted.

EDIT: Only, some magazine scans don't even have consistent numbering within themselves. For example, this Crash #40 scan has an extra page inserted after page 13 (some kind of fold-out poster?) Without verifying every scan, you can't be sure whether the page numbers are all correct.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: New Database Model ZXDB

Post by djnzx48 »

PeterJ wrote: Wed Jun 19, 2019 9:27 pm Thanks for the great explanation @Einar Saukas

When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf
It's because all the links have '-Special' appended to them, when only a few of them are actually specials. Try for instance the 2015/1/22 link and it works.
hikoki
Manic Miner
Posts: 576
Joined: Thu Nov 16, 2017 10:54 am

Re: New Database Model ZXDB

Post by hikoki »

I wonder if the archive.org api search could be used for page numbers.
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.

For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magazine-40/search/%2254+crash%22
Last edited by hikoki on Thu Jun 20, 2019 10:31 am, edited 5 times in total.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: New Database Model ZXDB

Post by djnzx48 »

You mean scanning the bottom of every page with OCR to find out the page numbers?

I suppose it might work in theory, but it could be kind of impractical. How would you distinguish them from any other number on the page? And a lot of the full-page advertisements don't have page numbers anyway.

EDIT: Heh, so it actually works? That's interesting. It still seems hackish though as you'd need a specific query for every magazine layout.
hikoki
Manic Miner
Posts: 576
Joined: Thu Nov 16, 2017 10:54 am

Re: New Database Model ZXDB

Post by hikoki »

djnzx48 wrote: Thu Jun 20, 2019 9:18 am And a lot of the full-page advertisements don't have page numbers anyway.

EDIT: Heh, so it actually works? That's interesting. It still seems hackish though as you'd need a specific query for every magazine layout.
well you could provide users with both the expected and hackish links :)

a script to detect such searches without results might be useful to locate which pages are not numbered

EDIT

Sample on how to data mining the internet archive
https://programminghistorian.org/en/les ... et-archive
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

hikoki wrote: Thu Jun 20, 2019 9:06 am I wonder if the archive.org api search could be used for page numbers.
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.

For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magaz ... /"54+crash"
Apparently only a few magazines have tagged pages. For instance it doesn't seem to work for Crash issue #84...
hikoki
Manic Miner
Posts: 576
Joined: Thu Nov 16, 2017 10:54 am

Re: New Database Model ZXDB

Post by hikoki »

Einar Saukas wrote: Thu Jun 20, 2019 1:09 pm Apparently only a few magazines have tagged pages. For instance it doesn't seem to work for Crash issue #84...
Footers on Crash 84 follow a different pattern: january (black square) numberpage
If you know the ascii code of black square should work

https://archive.org/details/crash-magazine-84/search/
%22
january+(black square)+numberpage
%22

EDIT

[mention]Einar Saukas[/mention] you are right in this case as footers don't seem to be OCRed
well, you can automatically detect the lack of footers by crawling mags with different URLs. If your search term doesn't return one single result then provide just the estimated /page url
*trying to outsmart Einar*
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

djnzx48 wrote: Thu Jun 20, 2019 4:00 am EDIT: Only, some magazine scans don't even have consistent numbering within themselves. For example, this Crash #40 scan has an extra page inserted after page 13 (some kind of fold-out poster?) Without verifying every scan, you can't be sure whether the page numbers are all correct.
Actually that's an error in the PDF version. This was supposed to be a single page:

https://archive.org/download/World_of_S ... 000015.jpg

But it was broken into 2 pages in PDF:

https://archive.org/details/crash-magazine-40/page/n14

If anyone provides a fixed PDF, I can ask people at Archive.org to replace it. It would fix numbering in this issue.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

hikoki wrote: Thu Jun 20, 2019 1:29 pm @Einar Saukas you are right in this case as footers don't seem to be OCRed
well, you can automatically detect the lack of footers by crawling mags with different URLs. If your search term doesn't return one single result then provide just the estimated /page url
I wouldn't mind storing in ZXDB a different "search term" for each magazine issue, if this was a reliable solution. But apparently it's not. I don't like the idea of adopting a solution that only works sometimes.

Therefore I think the best option we have is storing the "cover page number" for each magazine issue. It seems this approach will always work, except when there's an error in the PDF itself (but then we can work to fix the PDF instead).
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

djnzx48 wrote: Thu Jun 20, 2019 4:00 am
Einar Saukas wrote: Wed Jun 19, 2019 7:25 pm Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.
OK, I didn't know whether the links were stored in the database or autogenerated on the website. But are you sure the problem isn't just a simple off-by-one error? Having a brief look, all the magazines I've seen so far have the correct page numbers on archive.org, except for:

PCN: the numbering starts at the contents page rather than the cover. So the archive.org numbers are two pages ahead.

The Spectrum Show Magazine (issues #0 and #1 only): the numbering starts on the next page after the cover. So the archive.org numbers are one page ahead.

In the archive.org URLs, #page/n1 refers to the second page of the magazine, #page/n2 refers to the third, and so on. For the title page, this parameter is simply omitted.
Thanks a lot!!!

I will include this information in the next ZXDB update, afterwards I can help Peter incorporate this logic into SC.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

BTW table "formattypes" and columns "formattype_id" will be removed in the next ZXDB update.

This table is now empty and these columns only contain NULL, so this change shouldn't affect anybody.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

PeterJ wrote: Wed Jun 19, 2019 9:27 pm Thanks for the great explanation @Einar Saukas
You are welcome!

PeterJ wrote: Wed Jun 19, 2019 9:27 pm When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf
It seems to be working now :)
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

There's a new ZXDB update already!

Special thanks to [mention]druellan[/mention] for bug fixes, [mention]pavero[/mention] for additional files, and [mention]djnz48[/mention] for magazine numbering information!
Post Reply