Page 7 of 21

Re: New Database Model ZXDB

Posted: Wed Jun 19, 2019 7:25 pm
by Einar Saukas
djnzx48 wrote: Wed Jun 19, 2019 6:27 am Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.

Let me explain:

When you click on a page number, SC opens the corresponding image file. So if it needs to show page 4 of issue #52 of magazine X, it simply has to open a file like "magx05200004.jpg" or something similar. That's easy.

However when you click on the "VIEW" link next to the page number, SC will try to open the corresponding page of the PDF file inside the browser. But how can it find out what's the corresponding page?

* In certain magazines, the cover is not included in numbering. Therefore page number 4 is actually the 5th page of the PDF file (as if cover was page number zero).

* In other magazines, the cover is considered as page number 1. Therefore page number 4 coincides with the 4th page of the PDF.

* In other magazines, page number 4 could be the 8th page of the PDF, due to additional index and advertising pages without numbering at the beginning (as if cover was page number -3), for instance.

The best way to solve this problem is to store this information in ZXDB for each magazine issue. The magazines currently indexed as PDF in ZXDB are listed below:

Crash

Jogos 80

Micro Mart

Mundo Spectrum

Personal Computer News

Planeta Sinclair Almanaque

RetroMagazine

Sinclair User

Spectrum Today (EN)

The Spectrum Show

If anyone is willing to help, by checking "what would be the corresponding page number for the cover?" for even a few of these issues, it would help us a lot!

Re: New Database Model ZXDB

Posted: Wed Jun 19, 2019 9:27 pm
by PeterJ
Thanks for the great explanation [mention]Einar Saukas[/mention]

When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 4:00 am
by djnzx48
Einar Saukas wrote: Wed Jun 19, 2019 7:25 pm
djnzx48 wrote: Wed Jun 19, 2019 6:27 am Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.
OK, I didn't know whether the links were stored in the database or autogenerated on the website. But are you sure the problem isn't just a simple off-by-one error? Having a brief look, all the magazines I've seen so far have the correct page numbers on archive.org, except for:

PCN: the numbering starts at the contents page rather than the cover. So the archive.org numbers are two pages ahead.

The Spectrum Show Magazine (issues #0 and #1 only): the numbering starts on the next page after the cover. So the archive.org numbers are one page ahead.

In the archive.org URLs, #page/n1 refers to the second page of the magazine, #page/n2 refers to the third, and so on. For the title page, this parameter is simply omitted.

EDIT: Only, some magazine scans don't even have consistent numbering within themselves. For example, this Crash #40 scan has an extra page inserted after page 13 (some kind of fold-out poster?) Without verifying every scan, you can't be sure whether the page numbers are all correct.

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 4:30 am
by djnzx48
PeterJ wrote: Wed Jun 19, 2019 9:27 pm Thanks for the great explanation @Einar Saukas

When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf
It's because all the links have '-Special' appended to them, when only a few of them are actually specials. Try for instance the 2015/1/22 link and it works.

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 9:06 am
by hikoki
I wonder if the archive.org api search could be used for page numbers.
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.

For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magazine-40/search/%2254+crash%22

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 9:18 am
by djnzx48
You mean scanning the bottom of every page with OCR to find out the page numbers?

I suppose it might work in theory, but it could be kind of impractical. How would you distinguish them from any other number on the page? And a lot of the full-page advertisements don't have page numbers anyway.

EDIT: Heh, so it actually works? That's interesting. It still seems hackish though as you'd need a specific query for every magazine layout.

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 10:36 am
by hikoki
djnzx48 wrote: Thu Jun 20, 2019 9:18 am And a lot of the full-page advertisements don't have page numbers anyway.

EDIT: Heh, so it actually works? That's interesting. It still seems hackish though as you'd need a specific query for every magazine layout.
well you could provide users with both the expected and hackish links :)

a script to detect such searches without results might be useful to locate which pages are not numbered

EDIT

Sample on how to data mining the internet archive
https://programminghistorian.org/en/les ... et-archive

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 1:09 pm
by Einar Saukas
hikoki wrote: Thu Jun 20, 2019 9:06 am I wonder if the archive.org api search could be used for page numbers.
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.

For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magaz ... /"54+crash"
Apparently only a few magazines have tagged pages. For instance it doesn't seem to work for Crash issue #84...

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 1:29 pm
by hikoki
Einar Saukas wrote: Thu Jun 20, 2019 1:09 pm Apparently only a few magazines have tagged pages. For instance it doesn't seem to work for Crash issue #84...
Footers on Crash 84 follow a different pattern: january (black square) numberpage
If you know the ascii code of black square should work

https://archive.org/details/crash-magazine-84/search/
%22
january+(black square)+numberpage
%22

EDIT

[mention]Einar Saukas[/mention] you are right in this case as footers don't seem to be OCRed
well, you can automatically detect the lack of footers by crawling mags with different URLs. If your search term doesn't return one single result then provide just the estimated /page url
*trying to outsmart Einar*

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 10:28 pm
by Einar Saukas
djnzx48 wrote: Thu Jun 20, 2019 4:00 am EDIT: Only, some magazine scans don't even have consistent numbering within themselves. For example, this Crash #40 scan has an extra page inserted after page 13 (some kind of fold-out poster?) Without verifying every scan, you can't be sure whether the page numbers are all correct.
Actually that's an error in the PDF version. This was supposed to be a single page:

https://archive.org/download/World_of_S ... 000015.jpg

But it was broken into 2 pages in PDF:

https://archive.org/details/crash-magazine-40/page/n14

If anyone provides a fixed PDF, I can ask people at Archive.org to replace it. It would fix numbering in this issue.

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 10:39 pm
by Einar Saukas
hikoki wrote: Thu Jun 20, 2019 1:29 pm @Einar Saukas you are right in this case as footers don't seem to be OCRed
well, you can automatically detect the lack of footers by crawling mags with different URLs. If your search term doesn't return one single result then provide just the estimated /page url
I wouldn't mind storing in ZXDB a different "search term" for each magazine issue, if this was a reliable solution. But apparently it's not. I don't like the idea of adopting a solution that only works sometimes.

Therefore I think the best option we have is storing the "cover page number" for each magazine issue. It seems this approach will always work, except when there's an error in the PDF itself (but then we can work to fix the PDF instead).

Re: New Database Model ZXDB

Posted: Thu Jun 20, 2019 10:40 pm
by Einar Saukas
djnzx48 wrote: Thu Jun 20, 2019 4:00 am
Einar Saukas wrote: Wed Jun 19, 2019 7:25 pm Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.
OK, I didn't know whether the links were stored in the database or autogenerated on the website. But are you sure the problem isn't just a simple off-by-one error? Having a brief look, all the magazines I've seen so far have the correct page numbers on archive.org, except for:

PCN: the numbering starts at the contents page rather than the cover. So the archive.org numbers are two pages ahead.

The Spectrum Show Magazine (issues #0 and #1 only): the numbering starts on the next page after the cover. So the archive.org numbers are one page ahead.

In the archive.org URLs, #page/n1 refers to the second page of the magazine, #page/n2 refers to the third, and so on. For the title page, this parameter is simply omitted.
Thanks a lot!!!

I will include this information in the next ZXDB update, afterwards I can help Peter incorporate this logic into SC.

Re: New Database Model ZXDB

Posted: Mon Jun 24, 2019 6:03 pm
by Einar Saukas
BTW table "formattypes" and columns "formattype_id" will be removed in the next ZXDB update.

This table is now empty and these columns only contain NULL, so this change shouldn't affect anybody.

Re: New Database Model ZXDB

Posted: Mon Jun 24, 2019 6:58 pm
by Einar Saukas
PeterJ wrote: Wed Jun 19, 2019 9:27 pm Thanks for the great explanation @Einar Saukas
You are welcome!

PeterJ wrote: Wed Jun 19, 2019 9:27 pm When I go to our MicroMart Page (I didn't know that was included in ZXDB).

https://spectrumcomputing.co.uk/index.p ... mag_id=280

I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:

https://archive.org/download/Micro-Mart ... pecial.pdf
It seems to be working now :)

Re: New Database Model ZXDB

Posted: Tue Jun 25, 2019 12:57 am
by Einar Saukas
There's a new ZXDB update already!

Special thanks to [mention]druellan[/mention] for bug fixes, [mention]pavero[/mention] for additional files, and [mention]djnz48[/mention] for magazine numbering information!

Re: New Database Model ZXDB

Posted: Wed Jul 03, 2019 5:05 am
by Einar Saukas
And another ZXDB update is available!

This update includes mostly new titles (thanks [mention]R-Tape[/mention]!) and hires inlays (thanks [mention]pavero[/mention]!)

Re: New Database Model ZXDB

Posted: Wed Jul 03, 2019 3:33 pm
by Juan F. Ramirez
Einar Saukas wrote: Wed Jul 03, 2019 5:05 am This update includes mostly new titles (thanks @R-Tape!) and hires inlays (thanks @pavero!)
As to the inlays... are they the ones mentioned here?
viewtopic.php?f=29&t=1661#p23769

Re: New Database Model ZXDB

Posted: Wed Jul 03, 2019 6:14 pm
by pavero
Juan F. Ramirez wrote: Wed Jul 03, 2019 3:33 pm
Einar Saukas wrote: Wed Jul 03, 2019 5:05 am This update includes mostly new titles (thanks @R-Tape!) and hires inlays (thanks @pavero!)
As to the inlays... are they the ones mentioned here?
viewtopic.php?f=29&t=1661#p23769
No, [mention]PeterJ[/mention] must update SC site firstly.

Re: New Database Model ZXDB

Posted: Wed Jul 03, 2019 7:20 pm
by PeterJ
All done [mention]pavero[/mention] & [mention]Juan F. Ramirez[/mention]

Re: New Database Model ZXDB

Posted: Sun Jul 14, 2019 3:06 am
by Einar Saukas
Another ZXDB version is available!

This update features integration with JSW Central (thanks [mention]jetsetdanny[/mention]!), hires inlays (thanks [mention]pavero[/mention]!) and more bugfixes (thanks [mention]druellan[/mention]!)

Re: New Database Model ZXDB

Posted: Sun Jul 14, 2019 6:18 am
by jetsetdanny
Thanks a lot, Einar! ZXDB's integration with JSW Central is greatly appreciated! :)

Re: New Database Model ZXDB

Posted: Fri Jul 19, 2019 2:48 pm
by Einar Saukas
jetsetdanny wrote: Sun Jul 14, 2019 6:18 am Thanks a lot, Einar! ZXDB's integration with JSW Central is greatly appreciated! :)
You are welcome!

Re: New Database Model ZXDB

Posted: Fri Jul 19, 2019 2:52 pm
by Einar Saukas
Yet another ZXDB version is now available!

This update includes new titles (thanks to [mention]R-Tape[/mention]!), new Danish magazines "SOFT" and "SOFT Today" (thanks to [mention]kolbeck[/mention]!), and fixed magazine references to New Computer Express (thanks to myself! :) )

Re: New Database Model ZXDB

Posted: Thu Jul 25, 2019 12:56 pm
by Einar Saukas
And another ZXDB version is available!

This update includes more titles (thanks [mention]R-Tape[/mention]!), more hires inlays (thanks [mention]pavero[/mention]!), and more magazine updates (I will describe these later)


IMPORTANT: From now on, the auxiliary tables to help database searches (prefixed with "search_by_") are distributed separately. If you have a website that uses these tables, don't forget to always execute ZXDB_help_search.sql after running the main ZXDB script.

Re: New Database Model ZXDB

Posted: Thu Jul 25, 2019 6:43 pm
by R-Tape
Einar Saukas wrote: Thu Jul 25, 2019 12:56 pm And another ZXDB version is available!
Excellent! We should start counting these, there are 35 updates so far in this thread (I'll look at the other thread later).

EDIT - The other thread has 19 updates, so plus the 35 so far takes the total (from this forum) to erm...54!

There were plenty announced at WoS to include if anyone wanted to count them too.