Search Tables

This is the place for general discussion and updates about the ZXDB Database. This forum is not specific to Spectrum Computing.

Moderator: druellan

Post Reply
User avatar
RMartins
Manic Miner
Posts: 387
Joined: Thu Nov 16, 2017 3:26 pm

Search Tables

Post by RMartins » Sun Jul 14, 2019 4:35 pm

ZXDB has a few search related tables, that seem to have been created to facilitate search, but the pattern, doesn't seem to be constant, and I would like a clarification, if possible:

NOTE: Probably an image, would explain things a bit better, but I'm not on my regular computer right now.

So we have "entries" table, that contains "title" and "library_title".

We have an "authors" table, that links the "entries" table to the "labels" table, to provide author names (as label.name).
So far so good.

But then there is a "search_by_authors" table, which has the same links (FKs with numeric IDs) as "authors" table, without the extra fields.
I can only see this as useful, to keep this pattern constant between similar search tables.

The same happens with "search_by_publishers" table, but here there is a difference betwen PKs, since "publishers" has one more field.
So this justifies the the normalization of these search tables. Cool.

However, when we get to "search_by_titles", instead of numeric fields for both FK relations using numeric IDs, we have 2 fields:
- entry_title (VARCHAR)
- entry_id (numeric)

Which is a bit odd, but somewhat justifiable, because "entries" do not use "labels" for the "titles", because they are included in the original "entries" table.

OK, but now I see there is a "search_by_names" ( which is similar "to "search_by_titles"), but relates only with the "labels" table.

What is the context of "labels" ?
So what is this table for ?

It seems that "labels" can name several things, like: authors and publishers.
But what is the context of a "name" in this "search_by_names" ?

We also have "aliases", but that is understood and logical, since it relates with different locales.

Also, why do "labels" have a "deceased" attribute ?
Only people (authors, owners, devs, artists, etc... ) can be deceased.
A publisher shouldn't be possible to be deceased, or can it ?

This is probably related with some legacy data, that still requires this.

Any feedback info, greatly appreciated.
Here looking at you @Einar Saukas. 8-)

NOTE: I understand that these "search_*" tables DO NOT have reals FKs, to the tables, since they are just an utility to simplify searches.
0 x

User avatar
StooB
Microbot
Posts: 127
Joined: Mon Oct 22, 2018 10:03 am

Re: Search Tables

Post by StooB » Sun Jul 14, 2019 7:59 pm

RMartins wrote:
Sun Jul 14, 2019 4:35 pm
Also, why do "labels" have a "deceased" attribute ?
Only people (authors, owners, devs, artists, etc... ) can be deceased.
"Labels" are the text string for every label_id, including people, publishers and actual software labels.
0 x

User avatar
RMartins
Manic Miner
Posts: 387
Joined: Thu Nov 16, 2017 3:26 pm

Re: Search Tables

Post by RMartins » Sun Jul 14, 2019 9:30 pm

There is a nice description of tables in github
https://github.com/zxdb/ZXDB/blob/master/README.md

But it didn't clarify all my doubts in enought detail.

So any more info on the search tables, more specifically the "name" context would be nice.
Thanks in advance.
0 x

AndyC
Manic Miner
Posts: 243
Joined: Mon Nov 13, 2017 5:12 am

Re: Search Tables

Post by AndyC » Mon Jul 15, 2019 5:04 am

They sound like a bit of a fudge, I haven't looked in ages but there was a fair few "tables" which seemed to originate from the fact the WoS "database" was basically a bunch of flat files. Decent indexes on the tables would probably outperform these "search by" most of the time and, if not, something like materialised/indexed views would be a better choice (depending on the specifics of your RDBMS)
0 x

User avatar
Einar Saukas
Manic Miner
Posts: 760
Joined: Wed Nov 15, 2017 2:48 pm

Re: Search Tables

Post by Einar Saukas » Mon Jul 15, 2019 3:51 pm

RMartins wrote:
Sun Jul 14, 2019 4:35 pm
It seems that "labels" can name several things, like: authors and publishers.
In ZXDB, table "labels" refers to all kinds of entities, including real people's names, nicknames, companies, user groups, fantasy names used by publishers, etc. This table contains an attribute called "labeltype_id" that distinguishes the type of label (person, nickname, company, etc). Each label has a name.

Table "entries" refers to all kinds of published (or unpublished) items: programs, books, computers and peripherals. Each entry has a (main) title.

Table "authors" associates "entries" with "labels". It indicates the entities that authored a certain entry.

Table "publishers" also associates "entries" with "labels". However it indicates the entities that published a certain entry.

RMartins wrote:
Sun Jul 14, 2019 4:35 pm
Also, why do "labels" have a "deceased" attribute ?
Only people (authors, owners, devs, artists, etc... ) can be deceased.
A publisher shouldn't be possible to be deceased, or can it ?
For people, attribute "deceased" indicates the date a person passed away.

For companies, attribute "deceased" indicates the date a company was closed.

RMartins wrote:
Sun Jul 14, 2019 4:35 pm
We also have "aliases", but that is understood and logical, since it relates with different locales.
Table "aliases" stores alternate titles of an entry. If an entry was published with 2 different titles, one of them is an alias. If the same entry is later re-released using a different title, this new title is also an alias.


PS: I will answer your other questions in a separate post...
0 x

User avatar
Einar Saukas
Manic Miner
Posts: 760
Joined: Wed Nov 15, 2017 2:48 pm

Re: Search Tables

Post by Einar Saukas » Mon Jul 15, 2019 6:38 pm

Tables prefixed with "search_by_" are auxiliary tables to help database searches. They are not really part of ZXDB content, since they only contain redundant information, taken from other ZXDB tables. They are only needed for systems that perform ZXDB searches directly in the database.

About these tables:
  • Table "search_by_titles" basically puts together, in the same place, all possible titles related to each entry. It includes the main game title, all re-release titles and aliases, and alternate titles from compilations. These titles are stored in lowercase, without punctuation, for convenience. The idea is, if you know part of one of the titles for an existing game, you will be able to find its ID. For instance, you can find this entry at SC when you search for "Cauldron 2", "Cauldron II", "pumpkin", "calabaza", or "hechizo".
  • Table "search_by_names" does something similar for labels. If you search for part of a name, you will find the corresponding label ID.
  • Table "search_by_authors" helps locate all entries authored by a certain label, even indirectly. This is the reason that, if you search at SC for titles authored by "Joffa", you will find several of his games, although he was always credited under his name "Jonathan M. Smith" instead of nickname "Joffa".
  • Table "search_by_publishers" helps locate all entries published by a certain label, even indirectly. This is the reason that, if you search at SC for titles published by "Ocean", the results will also include all titles published under the label "The Hit Squad" that belonged to Ocean.
In a nutshell, tables prefixed with "search_by_" simply put together data from other ZXDB tables. You should only use them to help improve your searches, nothing else. The actual data you need is stored in the other ZXDB tables.
0 x

User avatar
Einar Saukas
Manic Miner
Posts: 760
Joined: Wed Nov 15, 2017 2:48 pm

Re: Search Tables

Post by Einar Saukas » Mon Jul 15, 2019 6:58 pm

BTW here is the script that generates all auxiliary tables with prefix "search_by_", putting together information from other ZXDB tables:

https://github.com/zxdb/ZX-Ed/blob/mast ... eindex.sql

At the bottom of this script, there are a few examples on how to use these tables on searches. Take a look!

Right now, these auxiliary tables are already included in the official ZXDB distribution, just for convenience. However, if they are causing misunderstandings, I may consider to remove them. In this case, website owners interested in these tables could simply execute this script themselves, whenever ZXDB is updated...
0 x

User avatar
RMartins
Manic Miner
Posts: 387
Joined: Thu Nov 16, 2017 3:26 pm

Re: Search Tables

Post by RMartins » Mon Jul 15, 2019 7:16 pm

Einar Saukas wrote:
Mon Jul 15, 2019 6:58 pm
BTW here is the script that generates all auxiliary tables with prefix "search_by_", putting together information from other ZXDB tables:

https://github.com/zxdb/ZX-Ed/blob/mast ... eindex.sql
Cool 8-)
I always prefer code to an explanation, since code is self explanatory :D
Einar Saukas wrote:
Mon Jul 15, 2019 6:58 pm
...

Right now, these auxiliary tables are already included in the official ZXDB distribution, just for convenience. However, if they are causing misunderstandings, I may consider to remove them. In this case, website owners interested in these tables could simply execute this script themselves, whenever ZXDB is updated...
No confusion or misunderstandings, I was just trying to see, what was the mindset behind them, and if I could use them.
I like to understand stuff first, and then use.
No stress with them.
0 x

User avatar
Einar Saukas
Manic Miner
Posts: 760
Joined: Wed Nov 15, 2017 2:48 pm

Re: Search Tables

Post by Einar Saukas » Mon Jul 15, 2019 9:38 pm

RMartins wrote:
Sun Jul 14, 2019 4:35 pm
This is probably related with some legacy data, that still requires this.
AndyC wrote:
Mon Jul 15, 2019 5:04 am
They sound like a bit of a fudge, I haven't looked in ages but there was a fair few "tables" which seemed to originate from the fact the WoS "database" was basically a bunch of flat files.
No, it's neither legacy data or a fudge. The current ZXDB scheme is very much intentional. Although there are still a few details I'm gradually improving, I believe that current ZXDB structure is a good tradeoff between usability and performance.

Even so, suggestions for improvements all always welcome. If you see any opportunities for improvement, please post them here!

AndyC wrote:
Mon Jul 15, 2019 5:04 am
Decent indexes on the tables would probably outperform these "search by" most of the time and, if not, something like materialised/indexed views would be a better choice (depending on the specifics of your RDBMS)
I agree that these "search_by_" tables would have worked better as materialized views instead. However MySQL/MariaDB doesn't support materialized (or indexed) views.

I believe ZXDB already contains all "decent indexes" that are necessary. Unfortunately indexes can't really help in this case. Please take a look at the script link I posted, to see what I mean.

Although I'm not really a database expert, so there's always a chance that I missed something... If that's the case, please let me know! :)
0 x

AndyC
Manic Miner
Posts: 243
Joined: Mon Nov 13, 2017 5:12 am

Re: Search Tables

Post by AndyC » Tue Jul 16, 2019 4:13 am

Einar Saukas wrote:
Mon Jul 15, 2019 9:38 pm

No, it's neither legacy data or a fudge. The current ZXDB scheme is very much intentional. Although there are still a few details I'm gradually improving, I believe that current ZXDB structure is a good tradeoff between usability and performance.
That's kind of what I meant by "fudge" - it is just working around the limitations of a specific RDBMS engine, since you are effectively doing the same as a materialised view, it's just the "view" part of it in this case is actually a SQL script rather than an actual view. FWIW I do this so often in my day job that I have dynamic SQL to do it to any view (as Indexed Views in SQL Server have some awkward limitations). It wasn't meant as a criticism, more pointing to one of those times that database theory goes out the window when you have real queries to deal with. ;-)

I am surprised that it makes a big difference over just indexing the tables though, ZXDB isn't actually very big and the join conditions are fairly straightforward (because the table designs are good!) - but then I'm no MariaDB expert by a long way. I guess it's also trickier when you are not really in control of how the data might need to be queried and so can't really tune things for a real world scenario.
0 x

Post Reply