Predictive Search

Broken link? Feature request? Anything related to the Spectrum Computing website here.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

Einar Saukas wrote: Sat Jul 20, 2019 11:36 pm ...
RMartins wrote: Sat Jul 20, 2019 12:48 amHaving said that, it would be nice to know what other users think about this subject, in order to get more feedback and ideas.
I suspect this discussion has been too abstract for most users. It's hard for anyone to evaluate what's the best interface without actually trying it.

I suggest you implement your idea in your alternate page (or even replace the main page), afterwards write a post describing the changes and asking for feedback.
For now the intent was to provide an autocomplete, and that's done and working.
I will probably change it a bit, to allow something like 40 entries, but just show 10, but then allow to scroll to see the remaining 30.

To review the way how search is being done, I need to first understand the data and relation between the several tables involved, in more detail.
In particular, how other info is kept in the labels table.



Some extra coments below.
Einar Saukas wrote: Sat Jul 20, 2019 12:05 am ...
RMartins wrote: Fri Jul 19, 2019 7:35 pmBut I believe you are disagreeing, because I might not have explained it well enough.
No. I understood exactly what you meant. There's no need to discuss it technically, because the problem is not technical. We just happen to have different ideas for the intended behavior.

You assume that users searching for "cadáv" are only interested in words containing this accent. I assume they want "Cadàveriön" but didn't memorize the exact spelling.

You assume that users searching for "off-road" are only interested in words containing this hyphen. I assume they want all off road games regardless of spelling.

You assume that users searching for "night rally" don't want to see "Nightmare Rally" in the list of results. I assume they do.

Basically you assume most users will bother to learn the nuances of using capital letters or not, using punctuation or not, etc. I assume they just want to find a game with minimum effort.
Yes I assume all that, behind what I mentioned as user intent.

They can still search exactly like you describe, just by not using any case or accent.
You can call that a "nuance" they will have to learn. I have no problem with that.

The thing is, if they do not want to mach every single "ana" in the search topic, but just want to get the ones that match "aña" they can, and will not have FALSE positives.
With the current solution they can't, it's case insensitive and accent insensitive all the way.

Having a choice for refining the search, seems an improvement or win to me, but that might not be the opinion of the majority.

Ideally we would have a search engine, and someone to maintain it, but we don't.
Einar Saukas wrote: Sat Jul 20, 2019 12:05 am ...
RMartins wrote: Fri Jul 19, 2019 7:35 pm They just need several entries into the search index (not database index), like if they were alternative names (aliases).
And that's exactly what I wanted to avoid, because maintaining a list of alternative names would be a PITA.
OK, I understand that it involves some effort, but that's the only feasible (NON FALSE positives) way to do it, using only SQL.
Einar Saukas wrote: Sat Jul 20, 2019 12:05 am
RMartins wrote: Fri Jul 19, 2019 7:35 pmIf you think about it, you already do this with the "aliases" table.
No, I don't. The "aliases" table contains alternative official names for the same game. It doesn't make sense to add other titles to this table that a game never had, just to help searches.
I believe I explained that it should be some other table, and not the "aliases" table, since that has a specific release context.
A table specific for this purpose, like "title_synonyms" or similar.
i.e. specific tables and data created for handling searching, like the search_by_* tables already are.
Einar Saukas wrote: Sat Jul 20, 2019 12:05 am
RMartins wrote: Fri Jul 19, 2019 7:35 pmYou get the good but you also get the bad, by clipping data.
If instead you just keep the several spellings, you get the good part, but without the bad ones.
I know. False positives is the price to pay for easier, broader searches. I think that's a good tradeoff.
OK, I just believe we can improve on the current solution.
Einar Saukas wrote: Sat Jul 20, 2019 12:05 am ...
RMartins wrote: Fri Jul 19, 2019 7:35 pmSince we are using the poor man SQL solution, we need to build the search index tables, with all valid combinations and then search that, using SQL.
Again, that's what I'm trying to avoid.
I understand that, but by creating the search_by_* tables, aren't you actually already doing something similar ?
Surely not as involved as it could be, but a start of what a typical "search index" is.
Einar Saukas wrote: Sat Jul 20, 2019 12:05 am ...
Your proposal wouldn't be able to find "Electra 9000", for instance. It's the title of a game re-released by Alternative, probably better known than the original title from the first K'Soft release.
I searched for "Electra" and it finds it, but searching for "electra" fails to find it, curious!
Maybe I missed something.

Are you referring to the fact, that current quick search query, doesn't include "labels" alone (without joining to authors or publishers) ?

If that's the case, I was just following the tip on the field "Search by Title, Author, Publisher".
Maybe we should include the "labels" individually too.

But to be fair, I still haven't grasped the "labels" table completely, because it seems a bit messy to have some of it related with other tables, giving it context (like authors or publishers), but part of it, is neither of those, and has not relation with any other extra context giving table.
I find it a bit odd, although I do understand that this is probably related with legacy data.

Again, thank you for your feedback.
I do appreciate it.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Predictive Search

Post by Einar Saukas »

RMartins wrote: Sun Jul 21, 2019 3:27 pm I searched for "Electra" and it finds it, but searching for "electra" fails to find it, curious!
Maybe I missed something.
Both "Electra" and "electra" provide the same results in searching. But they show different results in autocomplete.

RMartins wrote: Sun Jul 21, 2019 3:27 pm Are you referring to the fact, that current quick search query, doesn't include "labels" alone (without joining to authors or publishers) ?
It's the opposite. Current quick search query includes "labels" alone (but autocomplete doesn't).

RMartins wrote: Sun Jul 21, 2019 3:27 pm If that's the case, I was just following the tip on the field "Search by Title, Author, Publisher".
A more accurate description would be "Search by Title, Author, Publisher, License Owner, Magazine Name, Team Name, Magazine Columnist, etc"

RMartins wrote: Sun Jul 21, 2019 3:27 pm Maybe we should include the "labels" individually too.
Yes.

RMartins wrote: Sun Jul 21, 2019 3:27 pm But to be fair, I still haven't grasped the "labels" table completely
In ZXDB, table "labels" contains all entities. It includes real people's names, nicknames, companies, user groups, fantasy names used by publishers, etc.

Some of them are authors (of certain titles), for instance Joffa.

Some of them are publishers (or certain titles), for instance Ocean Software.

Some of them are both authors and publishers, for instance Jonathan Cauldwell.

Some of them are owners (of certain licenses), for instance Daley Thompson.

Some of them are both authors and license owners, for instance Alfonso Azpiri.

There are a few "relationship" tables that associate entries to the entities that authored them, or published them, or own the licenses related to them. They are described here:

https://github.com/zxdb/ZXDB

RMartins wrote: Sun Jul 21, 2019 3:27 pm because it seems a bit messy to have some of it related with other tables, giving it context (like authors or publishers), but part of it, is neither of those, and has not relation with any other extra context giving table.
I find it a bit odd, although I do understand that this is probably related with legacy data.
No, it's not odd, not messy, and not related with legacy data. We already had this conversation here.

A certain person could have authored different games. Perhaps also published a few of them, worked as columnist in some magazine, or created some character that was later licensed to others. Don't you think it makes sense to store information about each person (country, nickname, homepage, etc) in a separate table, instead of duplicating it everywhere?

RMartins wrote: Sun Jul 21, 2019 3:27 pm Again, thank you for your feedback.
I do appreciate it.
You are welcome!
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

Einar Saukas wrote: Mon Jul 22, 2019 5:58 pm
RMartins wrote: Sun Jul 21, 2019 3:27 pm I searched for "Electra" and it finds it, but searching for "electra" fails to find it, curious!
Maybe I missed something.
Both "Electra" and "electra" provide the same results in searching. But they show different results in autocomplete.

RMartins wrote: Sun Jul 21, 2019 3:27 pm Are you referring to the fact, that current quick search query, doesn't include "labels" alone (without joining to authors or publishers) ?
It's the opposite. Current quick search query includes "labels" alone (but autocomplete doesn't).
OK, I see what you mean.
The quick search is different from the autocomplete search, in current implementation.
That's true, due to the tip definition that I followed.

I can re-align that, if that's the intent.
Einar Saukas wrote: Mon Jul 22, 2019 5:58 pm ...
There are a few "relationship" tables that associate entries to the entities that authored them, or published them, or own the licenses related to them. They are described here:

https://github.com/zxdb/ZXDB
Yes, you have mentioned something along those lines before.
Einar Saukas wrote: Mon Jul 22, 2019 5:58 pm
RMartins wrote: Sun Jul 21, 2019 3:27 pm because it seems a bit messy to have some of it related with other tables, giving it context (like authors or publishers), but part of it, is neither of those, and has not relation with any other extra context giving table.
I find it a bit odd, although I do understand that this is probably related with legacy data.
No, it's not odd, not messy, and not related with legacy data. We already had this conversation here.
Maybe I haven't explained myself well enough.
I'm not criticizing the ZXDB tables or it's data, per se.

Let me try and explain what I meant by "has not relation with any other extra context giving table".
And I may be wrong, so please correct me if that is the case.

"Labels" table, has a bunch of records, some:
- have a reference to the "authors" table, giving that "label" context in the sense that it's an author name.
- have a reference to the "publishers" table, giving that "label" context in the sense that it's a publisher label/name.
- have a reference to the "licensors" table, giving that "label" context in the sense that it's a License Owner label/name.
but there seems to exist more label data in "labels" table then just "authors", "publishers" or "licensors".

Assuming this is true (might not be), I haven't seen any extra table referenced, in order to give context to whatever other stuff is in the "labels" records.

At least, that's might current view, which can be wrong.
Einar Saukas wrote: Mon Jul 22, 2019 5:58 pm A certain person could have authored different games. Perhaps also published a few of them, worked as columnist in some magazine, or created some character that was later licensed to others. Don't you think it makes sense to store information about each person (country, nickname, homepage, etc) in a separate table, instead of duplicating it everywhere?
YEs, I do agree with this.

But from my current perspective (mentioned above), there seems to be some info that does not have context associated.
And that was my only complain about feeling it's "messy" or a "bit odd".

Thank you for your feedback.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Predictive Search

Post by Einar Saukas »

RMartins wrote: Mon Jul 22, 2019 7:27 pm OK, I see what you mean.
The quick search is different from the autocomplete search, in current implementation.
That's true, due to the tip definition that I followed.

I can re-align that, if that's the intent.
Yes, I think you should accept all "labels".

Right now, quick searches also provides other results besides authors and publishers.

Einar Saukas wrote: Mon Jul 22, 2019 5:58 pm Maybe I haven't explained myself well enough.
I'm not criticizing the ZXDB tables or it's data, per se.

Let me try and explain what I meant by "has not relation with any other extra context giving table".
And I may be wrong, so please correct me if that is the case.

"Labels" table, has a bunch of records, some:
- have a reference to the "authors" table, giving that "label" context in the sense that it's an author name.
- have a reference to the "publishers" table, giving that "label" context in the sense that it's a publisher label/name.
- have a reference to the "licensors" table, giving that "label" context in the sense that it's a License Owner label/name.
but there seems to exist more label data in "labels" table then just "authors", "publishers" or "licensors".

Assuming this is true (might not be), I haven't seen any extra table referenced, in order to give context to whatever other stuff is in the "labels" records.
There are many other references:

* A label can be referenced to another label. For instance, label "Frobush" is a nickname that belongs to label "Jonathan M. Smith". Because of this, you can search for this nickname to find his games, although he was never credited as "Frobush" in any of them.

* A label can be a team of developers. For instance, label "Bizarre Developments" was a team behind the development of a few titles. It wasn't an individual author, publisher or license owner for any of them.

* A label can be a columnist responsible for a regular section in a magazine, in our magazine references. For instance "Lloyd Mangram".

* A label can be a company that's not a publisher, but it's the main company that owns several other publishers. For instance "British Telecom".

And so on.

You can take a look at the ZXDB schema to get all possible references. Or you can simply allow searches on all existing labels.
User avatar
Metalbrain
Microbot
Posts: 107
Joined: Thu Feb 15, 2018 2:14 pm

Re: Predictive Search

Post by Metalbrain »

Regarding the "aña" vs "ana" search, would it be possible to make an initial search that is case/accent insensitive , and then refine it to prioritize the case/accent sensitive results? This way the intended results would appear first, but if someone makes a mistake (such as the cadáv vs cadàv one), he'd still get the result.

I'd love to have this kind of search, and the fact that Google (and search engines in general) just does insensitive searches, no matter how specific you write the terms has always frustrated the hell out of me.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

Metalbrain wrote: Tue Jul 23, 2019 10:06 am Regarding the "aña" vs "ana" search, would it be possible to make an initial search that is case/accent insensitive , and then refine it to prioritize the case/accent sensitive results? This way the intended results would appear first, but if someone makes a mistake (such as the cadáv vs cadàv one), he'd still get the result.
Well, yes and no.

That means we have to make 2 queries, or enlarge the current query, to replicate every single subquery, which is basically the same.

However, even if we do this ( at least for autocomplete), the number of displayable results is limited to some N (currently 10).
So adding other entries, could generate duplicates (if using separate queries).

On the other hand, if we get the positive matches, they will have the real name, which is correctly accented and cased.
So I'm not really sure if there is a benefit there.

The real benefit we would like is to avoid false positives, while still finding all the entries we are currently able to find.
Metalbrain wrote: Tue Jul 23, 2019 10:06 am I'd love to have this kind of search, and the fact that Google (and search engines in general) just does insensitive searches, no matter how specific you write the terms has always frustrated the hell out of me.
Yes, in general, having some control on the match is desirable, but Google results are hugely more complex, and using a real search engine, with tons of configurable properties and rules. Not even comparable to something like an SQL query. :D
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

Although I haven't had much feedback yet, I know there is some problem, when different case searches are returning different results.
Searching for "Chaos" or "chaos" gives a different result list.

As soon as I have some extra free time, I'll take a look a find the reason for it.
I already have an idea of what it might be.

If any one has some other issue to report, please do so.
If I have all the known issues available, I can fix them all in one session, if possible.
Last edited by RMartins on Tue Jul 23, 2019 3:40 pm, edited 1 time in total.
User avatar
Metalbrain
Microbot
Posts: 107
Joined: Thu Feb 15, 2018 2:14 pm

Re: Predictive Search

Post by Metalbrain »

RMartins wrote: Tue Jul 23, 2019 2:57 pm
Metalbrain wrote: Tue Jul 23, 2019 10:06 am Regarding the "aña" vs "ana" search, would it be possible to make an initial search that is case/accent insensitive , and then refine it to prioritize the case/accent sensitive results? This way the intended results would appear first, but if someone makes a mistake (such as the cadáv vs cadàv one), he'd still get the result.
Well, yes and no.

That means we have to make 2 queries, or enlarge the current query, to replicate every single subquery, which is basically the same.

However, even if we do this ( at least for autocomplete), the number of displayable results is limited to some N (currently 10).
So adding other entries, could generate duplicates (if using separate queries).

On the other hand, if we get the positive matches, they will have the real name, which is correctly accented and cased.
So I'm not really sure if there is a benefit there.

The real benefit we would like is to avoid false positives, while still finding all the entries we are currently able to find.
For autocomplete, we'd need 2 separate searches with the capped number of results (only if uppercase or special chars are used, otherwise just the insensitive search would be enough), and we'd show the N first results of the sensitive search first, and if there are not enough sensitive results we'd fill up the results later with the results from the insensitive search, placed lower (and removing any duplicates from the sensitive search).

For the main search (without capping the number of results), the sensitive search could be made only within the results of the insensitive search, to make it faster (and once again, only if needed). And the results (if we get them) from the inner sensitive search would appear first, and the "only insensitive" results would appear lower.
Post Reply