Great, a cool feature quite expected (at least for me!)Einar Saukas wrote: ↑Sat Jun 01, 2019 5:43 am A new ZXDB update is available!
This new version contains game review scores from all major magazines (many thanks to Chris from ZXSR for this information!)
New Database Model ZXDB
Moderator: druellan
- Juan F. Ramirez
- Bugaboo
- Posts: 5147
- Joined: Tue Nov 14, 2017 6:55 am
- Location: Málaga, Spain
Re: New Database Model ZXDB
Re: New Database Model ZXDB
We recently spoke about the 'little bug' threads and some way of knowing they have been processed.
[mention]druellan[/mention] (who kindly tracks and maintains the changes) now has rights to edit posts in the 'ZXDB Discussion' sub-forum (Note - he can edit this section of the forum only).
When a change is accepted and has been processed for entry into ZXDB he will add the following to the end of the post:
[mention]druellan[/mention] (who kindly tracks and maintains the changes) now has rights to edit posts in the 'ZXDB Discussion' sub-forum (Note - he can edit this section of the forum only).
When a change is accepted and has been processed for entry into ZXDB he will add the following to the end of the post:
✓ Checked
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
Yet another ZXDB update is available!
This release mostly contains latest bugfixes compiled by [mention]druellan[/mention] based on people's contributions in this forum.
This release mostly contains latest bugfixes compiled by [mention]druellan[/mention] based on people's contributions in this forum.
Re: New Database Model ZXDB
Fantastic! Bloody well done Dru! You're the Tasmanian Devil of bugfixes (erm—in a good way that is).Einar Saukas wrote: ↑Wed Jun 05, 2019 3:23 am Yet another ZXDB update is available!
This release mostly contains latest bugfixes compiled by @druellan based on people's contributions in this forum.
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
In the next upcoming ZXDB update, column "formattype_id" won't be used anymore. This column will still exist in all file-related tables (so it doesn't break SQL queries in sites that were not updated yet), but it will be always NULL.
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
A new ZXDB update is available!
Special thanks to [mention]R-Tape[/mention] (new titles) and [mention]pavero[/mention] (hires inlays).
Special thanks to [mention]R-Tape[/mention] (new titles) and [mention]pavero[/mention] (hires inlays).
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
I forgot to mention that I have fixed references to magazine "Personal Computer News" in this last update!
These references were originally imported from Martijn's WoS database. The problem is, "Personal Computer News" was a weekly magazine, but Martijn's WoS only supported monthly magazines, so it couldn't distinguish between different issues released during the same month.
For instance, if you look at references to "Personal Computer News" for "Pssst" at WoS, you will notice that these full page links are missing:
http://www.worldofspectrum.org/infoseek ... id=0009401
Now if you look at the magazine sections for the same game at SC, you will notice that all "Personal Computer News" links work properly:
https://spectrumcomputing.co.uk/index.p ... 96&id=9401
For each page, you have the option to open just the specific page image, or download the entire magazine issue PDF, or view the PDF directly inside the browser. Or you can click on the magazine issue to see a list of other game references in that specific issue. Or even click on the magazine name to get a list of all magazine issues...
If this explanation above sounds confusing, just try it yourself. It's quite intuitive in practice.
Support for magazines is certainly not a new feature at SC. It has existed for ages, but I bet lots of users never noticed it
These references were originally imported from Martijn's WoS database. The problem is, "Personal Computer News" was a weekly magazine, but Martijn's WoS only supported monthly magazines, so it couldn't distinguish between different issues released during the same month.
For instance, if you look at references to "Personal Computer News" for "Pssst" at WoS, you will notice that these full page links are missing:
http://www.worldofspectrum.org/infoseek ... id=0009401
Now if you look at the magazine sections for the same game at SC, you will notice that all "Personal Computer News" links work properly:
https://spectrumcomputing.co.uk/index.p ... 96&id=9401
For each page, you have the option to open just the specific page image, or download the entire magazine issue PDF, or view the PDF directly inside the browser. Or you can click on the magazine issue to see a list of other game references in that specific issue. Or even click on the magazine name to get a list of all magazine issues...
If this explanation above sounds confusing, just try it yourself. It's quite intuitive in practice.
Support for magazines is certainly not a new feature at SC. It has existed for ages, but I bet lots of users never noticed it
Re: New Database Model ZXDB
Fantastic [mention]Einar Saukas[/mention]Einar Saukas wrote: ↑Tue Jun 18, 2019 10:55 pm I forgot to mention that I have fixed references to magazine "Personal Computer News" in this last update!
PCN was always one of my favourites. It had a very different style to the other weeklies.
Re: New Database Model ZXDB
Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.djnzx48 wrote: ↑Wed Jun 19, 2019 6:27 am Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
Let me explain:
When you click on a page number, SC opens the corresponding image file. So if it needs to show page 4 of issue #52 of magazine X, it simply has to open a file like "magx05200004.jpg" or something similar. That's easy.
However when you click on the "VIEW" link next to the page number, SC will try to open the corresponding page of the PDF file inside the browser. But how can it find out what's the corresponding page?
* In certain magazines, the cover is not included in numbering. Therefore page number 4 is actually the 5th page of the PDF file (as if cover was page number zero).
* In other magazines, the cover is considered as page number 1. Therefore page number 4 coincides with the 4th page of the PDF.
* In other magazines, page number 4 could be the 8th page of the PDF, due to additional index and advertising pages without numbering at the beginning (as if cover was page number -3), for instance.
The best way to solve this problem is to store this information in ZXDB for each magazine issue. The magazines currently indexed as PDF in ZXDB are listed below:
Crash
Jogos 80
Micro Mart
Mundo Spectrum
Personal Computer News
Planeta Sinclair Almanaque
RetroMagazine
Sinclair User
Spectrum Today (EN)
The Spectrum Show
If anyone is willing to help, by checking "what would be the corresponding page number for the cover?" for even a few of these issues, it would help us a lot!
Re: New Database Model ZXDB
Thanks for the great explanation [mention]Einar Saukas[/mention]
When I go to our MicroMart Page (I didn't know that was included in ZXDB).
https://spectrumcomputing.co.uk/index.p ... mag_id=280
I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:
https://archive.org/download/Micro-Mart ... pecial.pdf
When I go to our MicroMart Page (I didn't know that was included in ZXDB).
https://spectrumcomputing.co.uk/index.p ... mag_id=280
I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:
https://archive.org/download/Micro-Mart ... pecial.pdf
Re: New Database Model ZXDB
OK, I didn't know whether the links were stored in the database or autogenerated on the website. But are you sure the problem isn't just a simple off-by-one error? Having a brief look, all the magazines I've seen so far have the correct page numbers on archive.org, except for:Einar Saukas wrote: ↑Wed Jun 19, 2019 7:25 pmYes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.djnzx48 wrote: ↑Wed Jun 19, 2019 6:27 am Not sure if this is a bug or not, but in the past I've noticed the magazine references often link to one past the intended page. For example, the Pssst link takes me to page 79 when the actual reference is on page 78. It seems like the page numbers in archive.org links start at zero, with no page number specified for the first page, so subtracting one from the page number in the SC links might fix the problem.
PCN: the numbering starts at the contents page rather than the cover. So the archive.org numbers are two pages ahead.
The Spectrum Show Magazine (issues #0 and #1 only): the numbering starts on the next page after the cover. So the archive.org numbers are one page ahead.
In the archive.org URLs, #page/n1 refers to the second page of the magazine, #page/n2 refers to the third, and so on. For the title page, this parameter is simply omitted.
EDIT: Only, some magazine scans don't even have consistent numbering within themselves. For example, this Crash #40 scan has an extra page inserted after page 13 (some kind of fold-out poster?) Without verifying every scan, you can't be sure whether the page numbers are all correct.
Re: New Database Model ZXDB
It's because all the links have '-Special' appended to them, when only a few of them are actually specials. Try for instance the 2015/1/22 link and it works.PeterJ wrote: ↑Wed Jun 19, 2019 9:27 pm Thanks for the great explanation @Einar Saukas
When I go to our MicroMart Page (I didn't know that was included in ZXDB).
https://spectrumcomputing.co.uk/index.p ... mag_id=280
I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:
https://archive.org/download/Micro-Mart ... pecial.pdf
Re: New Database Model ZXDB
I wonder if the archive.org api search could be used for page numbers.
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.
For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magazine-40/search/%2254+crash%22
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.
For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magazine-40/search/%2254+crash%22
Last edited by hikoki on Thu Jun 20, 2019 10:31 am, edited 5 times in total.
Re: New Database Model ZXDB
You mean scanning the bottom of every page with OCR to find out the page numbers?
I suppose it might work in theory, but it could be kind of impractical. How would you distinguish them from any other number on the page? And a lot of the full-page advertisements don't have page numbers anyway.
EDIT: Heh, so it actually works? That's interesting. It still seems hackish though as you'd need a specific query for every magazine layout.
I suppose it might work in theory, but it could be kind of impractical. How would you distinguish them from any other number on the page? And a lot of the full-page advertisements don't have page numbers anyway.
EDIT: Heh, so it actually works? That's interesting. It still seems hackish though as you'd need a specific query for every magazine layout.
Re: New Database Model ZXDB
well you could provide users with both the expected and hackish links
a script to detect such searches without results might be useful to locate which pages are not numbered
EDIT
Sample on how to data mining the internet archive
https://programminghistorian.org/en/les ... et-archive
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
Apparently only a few magazines have tagged pages. For instance it doesn't seem to work for Crash issue #84...hikoki wrote: ↑Thu Jun 20, 2019 9:06 am I wonder if the archive.org api search could be used for page numbers.
https://openlibrary.org/dev/docs/bookurls
I guess the search term would have to contain footer or header words, characteristic of every magazine.
For example, a search link for page 54 on a Crash magazine containing the quoted term "54 Crash"
https://archive.org/details/crash-magaz ... /"54+crash"
Re: New Database Model ZXDB
Footers on Crash 84 follow a different pattern: january (black square) numberpageEinar Saukas wrote: ↑Thu Jun 20, 2019 1:09 pm Apparently only a few magazines have tagged pages. For instance it doesn't seem to work for Crash issue #84...
If you know the ascii code of black square should work
https://archive.org/details/crash-magazine-84/search/
%22
january+(black square)+numberpage
%22
EDIT
[mention]Einar Saukas[/mention] you are right in this case as footers don't seem to be OCRed
well, you can automatically detect the lack of footers by crawling mags with different URLs. If your search term doesn't return one single result then provide just the estimated /page url
*trying to outsmart Einar*
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
Actually that's an error in the PDF version. This was supposed to be a single page:djnzx48 wrote: ↑Thu Jun 20, 2019 4:00 am EDIT: Only, some magazine scans don't even have consistent numbering within themselves. For example, this Crash #40 scan has an extra page inserted after page 13 (some kind of fold-out poster?) Without verifying every scan, you can't be sure whether the page numbers are all correct.
https://archive.org/download/World_of_S ... 000015.jpg
But it was broken into 2 pages in PDF:
https://archive.org/details/crash-magazine-40/page/n14
If anyone provides a fixed PDF, I can ask people at Archive.org to replace it. It would fix numbering in this issue.
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
I wouldn't mind storing in ZXDB a different "search term" for each magazine issue, if this was a reliable solution. But apparently it's not. I don't like the idea of adopting a solution that only works sometimes.hikoki wrote: ↑Thu Jun 20, 2019 1:29 pm @Einar Saukas you are right in this case as footers don't seem to be OCRed
well, you can automatically detect the lack of footers by crawling mags with different URLs. If your search term doesn't return one single result then provide just the estimated /page url
Therefore I think the best option we have is storing the "cover page number" for each magazine issue. It seems this approach will always work, except when there's an error in the PDF itself (but then we can work to fix the PDF instead).
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
Thanks a lot!!!djnzx48 wrote: ↑Thu Jun 20, 2019 4:00 amOK, I didn't know whether the links were stored in the database or autogenerated on the website. But are you sure the problem isn't just a simple off-by-one error? Having a brief look, all the magazines I've seen so far have the correct page numbers on archive.org, except for:Einar Saukas wrote: ↑Wed Jun 19, 2019 7:25 pm Yes, that's a problem. And it's something that needs to be fixed in ZXDB, not SC.
PCN: the numbering starts at the contents page rather than the cover. So the archive.org numbers are two pages ahead.
The Spectrum Show Magazine (issues #0 and #1 only): the numbering starts on the next page after the cover. So the archive.org numbers are one page ahead.
In the archive.org URLs, #page/n1 refers to the second page of the magazine, #page/n2 refers to the third, and so on. For the title page, this parameter is simply omitted.
I will include this information in the next ZXDB update, afterwards I can help Peter incorporate this logic into SC.
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
BTW table "formattypes" and columns "formattype_id" will be removed in the next ZXDB update.
This table is now empty and these columns only contain NULL, so this change shouldn't affect anybody.
This table is now empty and these columns only contain NULL, so this change shouldn't affect anybody.
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
You are welcome!
It seems to be working nowPeterJ wrote: ↑Wed Jun 19, 2019 9:27 pm When I go to our MicroMart Page (I didn't know that was included in ZXDB).
https://spectrumcomputing.co.uk/index.p ... mag_id=280
I click on any PDF and it says Page not Found on archive.org. Is that something with the path? Every PDF link seems to be trying to go to:
https://archive.org/download/Micro-Mart ... pecial.pdf
- Einar Saukas
- Bugaboo
- Posts: 3167
- Joined: Wed Nov 15, 2017 2:48 pm
Re: New Database Model ZXDB
There's a new ZXDB update already!
Special thanks to [mention]druellan[/mention] for bug fixes, [mention]pavero[/mention] for additional files, and [mention]djnz48[/mention] for magazine numbering information!
Special thanks to [mention]druellan[/mention] for bug fixes, [mention]pavero[/mention] for additional files, and [mention]djnz48[/mention] for magazine numbering information!