New Database Model ZXDB

This is the place for general discussion and updates about the ZXDB Database. This forum is not specific to Spectrum Computing.

Moderator: druellan

Post Reply
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

djnzx48 wrote: Wed Jun 19, 2019 6:27 am Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.

Let me explain:

When you click on a page number, SC opens the corresponding image file. So if it needs to show page 4 of issue #52 of magazine X, it simply has to open a file like "magx05200004.jpg" or something similar. That's easy.

However when you click on the "VIEW" link next to the page number, SC will try to open the corresponding page of the PDF file inside the browser. But how can it find out what's the corresponding page?

* In certain magazines, the cover is not included in numbering. Therefore page number 4 is actually the 5th page of the PDF file (as if cover was page number zero).

* In other magazines, the cover is considered as page number 1. Therefore page number 4 coincides with the 4th page of the PDF.

* In other magazines, page number 4 could be the 8th page of the PDF, due to additional index and advertising pages without numbering at the beginning (as if cover was page number -3), for instance.

The best way to solve this problem is to store this information in ZXDB for each magazine issue. The magazines currently indexed as PDF in ZXDB are listed below:

Crash

Jogos 80

Micro Mart

Mundo Spectrum

Personal Computer News

Planeta Sinclair Almanaque

RetroMagazine

Sinclair User

Spectrum Today (EN)

The Spectrum Show

If anyone is willing to help, by checking "what would be the corresponding page number for the cover?" for even a few of these issues, it would help us a lot!
User avatar
PeterJ
Site Admin
Posts: 6858
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: New Database Model ZXDB

Post by PeterJ »

Thanks for the great explanation [mention]Einar Saukas[/mention]

When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: New Database Model ZXDB

Post by djnzx48 »

Einar Saukas wrote: Wed Jun 19, 2019 7:25 pm
djnzx48 wrote: Wed Jun 19, 2019 6:27 am Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.
OK, I didn't know whether the links were stored in the database or autogenerated on the website. But are you sure the problem isn't just a simple off-by-one error? Having a brief look, all the magazines I've seen so far have the correct page numbers on archive.org, except for:

PCN: the numbering starts at the contents page rather than the cover. So the archive.org numbers are two pages ahead.

The Spectrum Show Magazine (issues #0 and #1 only): the numbering starts on the next page after the cover. So the archive.org numbers are one page ahead.

In the archive.org URLs, #page/n1 refers to the second page of the magazine, #page/n2 refers to the third, and so on. For the title page, this parameter is simply omitted.

EDIT: Only, some magazine scans don't even have consistent numbering within themselves. For example, this Crash #40 scan has an extra page inserted after page 13 (some kind of fold-out poster?) Without verifying every scan, you can't be sure whether the page numbers are all correct.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: New Database Model ZXDB

Post by djnzx48 »

PeterJ wrote: Wed Jun 19, 2019 9:27 pm Thanks for the great explanation @Einar Saukas

When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf
It's because all the links have '-Special' appended to them, when only a few of them are actually specials. Try for instance the 2015/1/22 link and it works.
hikoki
Manic Miner
Posts: 576
Joined: Thu Nov 16, 2017 10:54 am

Re: New Database Model ZXDB

Post by hikoki »

I wonder if the archive.org api search could be used for page numbers.
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.

For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magazine-40/search/%2254+crash%22
Last edited by hikoki on Thu Jun 20, 2019 10:31 am, edited 5 times in total.
User avatar
djnzx48
Manic Miner
Posts: 729
Joined: Wed Dec 06, 2017 2:13 am
Location: New Zealand

Re: New Database Model ZXDB

Post by djnzx48 »

You mean scanning the bottom of every page with OCR to find out the page numbers?

I suppose it might work in theory, but it could be kind of impractical. How would you distinguish them from any other number on the page? And a lot of the full-page advertisements don't have page numbers anyway.

EDIT: Heh, so it actually works? That's interesting. It still seems hackish though as you'd need a specific query for every magazine layout.
hikoki
Manic Miner
Posts: 576
Joined: Thu Nov 16, 2017 10:54 am

Re: New Database Model ZXDB

Post by hikoki »

djnzx48 wrote: Thu Jun 20, 2019 9:18 am And a lot of the full-page advertisements don't have page numbers anyway.

EDIT: Heh, so it actually works? That's interesting. It still seems hackish though as you'd need a specific query for every magazine layout.
well you could provide users with both the expected and hackish links :)

a script to detect such searches without results might be useful to locate which pages are not numbered

EDIT

Sample on how to data mining the internet archive
https://programminghistorian.org/en/les ... et-archive
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

hikoki wrote: Thu Jun 20, 2019 9:06 am I wonder if the archive.org api search could be used for page numbers.
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.

For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magaz ... /"54+crash"
Apparently only a few magazines have tagged pages. For instance it doesn't seem to work for Crash issue #84...
hikoki
Manic Miner
Posts: 576
Joined: Thu Nov 16, 2017 10:54 am

Re: New Database Model ZXDB

Post by hikoki »

Einar Saukas wrote: Thu Jun 20, 2019 1:09 pm Apparently only a few magazines have tagged pages. For instance it doesn't seem to work for Crash issue #84...
Footers on Crash 84 follow a different pattern: january (black square) numberpage
If you know the ascii code of black square should work

https://archive.org/details/crash-magazine-84/search/
%22
january+(black square)+numberpage
%22

EDIT

[mention]Einar Saukas[/mention] you are right in this case as footers don't seem to be OCRed
well, you can automatically detect the lack of footers by crawling mags with different URLs. If your search term doesn't return one single result then provide just the estimated /page url
*trying to outsmart Einar*
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

djnzx48 wrote: Thu Jun 20, 2019 4:00 am EDIT: Only, some magazine scans don't even have consistent numbering within themselves. For example, this Crash #40 scan has an extra page inserted after page 13 (some kind of fold-out poster?) Without verifying every scan, you can't be sure whether the page numbers are all correct.
Actually that's an error in the PDF version. This was supposed to be a single page:

https://archive.org/download/World_of_S ... 000015.jpg

But it was broken into 2 pages in PDF:

https://archive.org/details/crash-magazine-40/page/n14

If anyone provides a fixed PDF, I can ask people at Archive.org to replace it. It would fix numbering in this issue.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

hikoki wrote: Thu Jun 20, 2019 1:29 pm @Einar Saukas you are right in this case as footers don't seem to be OCRed
well, you can automatically detect the lack of footers by crawling mags with different URLs. If your search term doesn't return one single result then provide just the estimated /page url
I wouldn't mind storing in ZXDB a different "search term" for each magazine issue, if this was a reliable solution. But apparently it's not. I don't like the idea of adopting a solution that only works sometimes.

Therefore I think the best option we have is storing the "cover page number" for each magazine issue. It seems this approach will always work, except when there's an error in the PDF itself (but then we can work to fix the PDF instead).
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

djnzx48 wrote: Thu Jun 20, 2019 4:00 am
Einar Saukas wrote: Wed Jun 19, 2019 7:25 pm Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.
OK, I didn't know whether the links were stored in the database or autogenerated on the website. But are you sure the problem isn't just a simple off-by-one error? Having a brief look, all the magazines I've seen so far have the correct page numbers on archive.org, except for:

PCN: the numbering starts at the contents page rather than the cover. So the archive.org numbers are two pages ahead.

The Spectrum Show Magazine (issues #0 and #1 only): the numbering starts on the next page after the cover. So the archive.org numbers are one page ahead.

In the archive.org URLs, #page/n1 refers to the second page of the magazine, #page/n2 refers to the third, and so on. For the title page, this parameter is simply omitted.
Thanks a lot!!!

I will include this information in the next ZXDB update, afterwards I can help Peter incorporate this logic into SC.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

BTW table "formattypes" and columns "formattype_id" will be removed in the next ZXDB update.

This table is now empty and these columns only contain NULL, so this change shouldn't affect anybody.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

PeterJ wrote: Wed Jun 19, 2019 9:27 pm Thanks for the great explanation @Einar Saukas
You are welcome!

PeterJ wrote: Wed Jun 19, 2019 9:27 pm When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf
It seems to be working now :)
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

There's a new ZXDB update already!

Special thanks to [mention]druellan[/mention] for bug fixes, [mention]pavero[/mention] for additional files, and [mention]djnz48[/mention] for magazine numbering information!
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

And another ZXDB update is available!

This update includes mostly new titles (thanks [mention]R-Tape[/mention]!) and hires inlays (thanks [mention]pavero[/mention]!)
User avatar
Juan F. Ramirez
Bugaboo
Posts: 5102
Joined: Tue Nov 14, 2017 6:55 am
Location: Málaga, Spain

Re: New Database Model ZXDB

Post by Juan F. Ramirez »

Einar Saukas wrote: Wed Jul 03, 2019 5:05 am This update includes mostly new titles (thanks @R-Tape!) and hires inlays (thanks @pavero!)
As to the inlays... are they the ones mentioned here?
viewtopic.php?f=29&t=1661#p23769
User avatar
pavero
Dynamite Dan
Posts: 1570
Joined: Sat Dec 09, 2017 11:49 pm
Location: The Czech Republic
Contact:

Re: New Database Model ZXDB

Post by pavero »

Juan F. Ramirez wrote: Wed Jul 03, 2019 3:33 pm
Einar Saukas wrote: Wed Jul 03, 2019 5:05 am This update includes mostly new titles (thanks @R-Tape!) and hires inlays (thanks @pavero!)
As to the inlays... are they the ones mentioned here?
viewtopic.php?f=29&t=1661#p23769
No, [mention]PeterJ[/mention] must update SC site firstly.
User avatar
PeterJ
Site Admin
Posts: 6858
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: New Database Model ZXDB

Post by PeterJ »

All done [mention]pavero[/mention] & [mention]Juan F. Ramirez[/mention]
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

Another ZXDB version is available!

This update features integration with JSW Central (thanks [mention]jetsetdanny[/mention]!), hires inlays (thanks [mention]pavero[/mention]!) and more bugfixes (thanks [mention]druellan[/mention]!)
jetsetdanny
Dizzy
Posts: 85
Joined: Thu May 02, 2019 10:22 pm
Contact:

Re: New Database Model ZXDB

Post by jetsetdanny »

Thanks a lot, Einar! ZXDB's integration with JSW Central is greatly appreciated! :)
Website: JSW Central
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

jetsetdanny wrote: Sun Jul 14, 2019 6:18 am Thanks a lot, Einar! ZXDB's integration with JSW Central is greatly appreciated! :)
You are welcome!
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

Yet another ZXDB version is now available!

This update includes new titles (thanks to [mention]R-Tape[/mention]!), new Danish magazines "SOFT" and "SOFT Today" (thanks to [mention]kolbeck[/mention]!), and fixed magazine references to New Computer Express (thanks to myself! :) )
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: New Database Model ZXDB

Post by Einar Saukas »

And another ZXDB version is available!

This update includes more titles (thanks [mention]R-Tape[/mention]!), more hires inlays (thanks [mention]pavero[/mention]!), and more magazine updates (I will describe these later)


IMPORTANT: From now on, the auxiliary tables to help database searches (prefixed with "search_by_") are distributed separately. If you have a website that uses these tables, don't forget to always execute ZXDB_help_search.sql after running the main ZXDB script.
User avatar
R-Tape
Site Admin
Posts: 6353
Joined: Thu Nov 09, 2017 11:46 am

Re: New Database Model ZXDB

Post by R-Tape »

Einar Saukas wrote: Thu Jul 25, 2019 12:56 pm And another ZXDB version is available!
Excellent! We should start counting these, there are 35 updates so far in this thread (I'll look at the other thread later).

EDIT - The other thread has 19 updates, so plus the 35 so far takes the total (from this forum) to erm...54!

There were plenty announced at WoS to include if anyone wanted to count them too.
Post Reply