Predictive Search

Broken link? Feature request? Anything related to the Spectrum Computing website here.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

RMartins wrote: Tue Jul 16, 2019 10:56 am ...
I intend to change the behaviour, so that it will only select an item, if we physically select an item, instead of defaulting to the first one on enter.
...
This behaviour was changed, it no longer auto focus to the list.

NOTE: Be sure to force a refresh (CTRL+5 on windows, CMD+Shift+R on MAC), to get the updated Javascript files.
hikoki
Manic Miner
Posts: 576
Joined: Thu Nov 16, 2017 10:54 am

Re: Predictive Search

Post by hikoki »

It doesn't work on Safari version 9.0.
hikoki
Manic Miner
Posts: 576
Joined: Thu Nov 16, 2017 10:54 am

Re: Predictive Search

Post by hikoki »

hikoki wrote: Wed Jul 17, 2019 3:52 am It doesn't work on Safari version 9.0.
It works only that just on the right top search box. Nice addition
User avatar
ZXDunny
Manic Miner
Posts: 498
Joined: Tue Nov 14, 2017 3:45 pm

Re: Predictive Search

Post by ZXDunny »

Doesn't work at all here, Win10, Chrome latest. Deleted cookies, still nothing.
User avatar
PeterJ
Site Admin
Posts: 6854
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: Predictive Search

Post by PeterJ »

hikoki wrote: Wed Jul 17, 2019 6:22 am
hikoki wrote: Wed Jul 17, 2019 3:52 am It doesn't work on Safari version 9.0.
It works only that just on the right top search box. Nice addition
Correct. As per the first post, it's just the quick search. Does that work on Safari?
User avatar
PeterJ
Site Admin
Posts: 6854
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: Predictive Search

Post by PeterJ »

ZXDunny wrote: Wed Jul 17, 2019 10:18 am Doesn't work at all here, Win10, Chrome latest. Deleted cookies, still nothing.
I had to CTRL + F5 then reboot on W10.
User avatar
ZXDunny
Manic Miner
Posts: 498
Joined: Tue Nov 14, 2017 3:45 pm

Re: Predictive Search

Post by ZXDunny »

PeterJ wrote: Wed Jul 17, 2019 10:39 am
ZXDunny wrote: Wed Jul 17, 2019 10:18 am Doesn't work at all here, Win10, Chrome latest. Deleted cookies, still nothing.
I had to CTRL + F5 then reboot on W10.
No effect here, I've cleared cookies, refreshed hard with CTRL+F5 and rebooted Win10.

Still no predictive search.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

NOTE: The only place where it's implemented right now, is on the quick search box. Top right corner.

I a bit confused, about the rest.

Doesn't run on Safari 9.0 ?
I'll check that.

Doesn't run on Chrome on Win10 ?
I checked Chrome, and it works, but you have to clear your cache, because the javascript file is being cached.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

ZXDunny wrote: Wed Jul 17, 2019 11:22 am
PeterJ wrote: Wed Jul 17, 2019 10:39 am I had to CTRL + F5 then reboot on W10.
No effect here, I've cleared cookies, refreshed hard with CTRL+F5 and rebooted Win10.

Still no predictive search.
Did you notice that you need to use a different URL ?
https://spectrumcomputing.co.uk/index_new.php
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

RMartins wrote: Wed Jul 17, 2019 11:40 am Doesn't run on Safari 9.0 ?
I'll check that.
Just confirmed that it works, On Safari 12.1.1, if using the correct URL.
https://spectrumcomputing.co.uk/index_new.php

Anyone else, having issues?
User avatar
ZXDunny
Manic Miner
Posts: 498
Joined: Tue Nov 14, 2017 3:45 pm

Re: Predictive Search

Post by ZXDunny »

RMartins wrote: Wed Jul 17, 2019 11:42 am
ZXDunny wrote: Wed Jul 17, 2019 11:22 am No effect here, I've cleared cookies, refreshed hard with CTRL+F5 and rebooted Win10.

Still no predictive search.
Did you notice that you need to use a different URL ?
https://spectrumcomputing.co.uk/index_new.php
Aha! That was the problem - I was just clicking on the banner at the top of the forum :)

Feel a bit of a pratt now!
User avatar
PeterJ
Site Admin
Posts: 6854
Joined: Thu Nov 09, 2017 7:19 pm
Location: Surrey, UK

Re: Predictive Search

Post by PeterJ »

No worries [mention]ZXDunny[/mention]

Just glad you are now able to test.
User avatar
ZXDunny
Manic Miner
Posts: 498
Joined: Tue Nov 14, 2017 3:45 pm

Re: Predictive Search

Post by ZXDunny »

Yeah, it works. It's a bit weird typing in "Cyb" and getting results that have "cyb" somewhere inside them, rather than starting with it. We may need to add in some modifiers for the user to specify where the results need to match.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

ZXDunny wrote: Wed Jul 17, 2019 2:30 pm Yeah, it works. It's a bit weird typing in "Cyb" and getting results that have "cyb" somewhere inside them, rather than starting with it. We may need to add in some modifiers for the user to specify where the results need to match.
That's the current behaviour, of the search_by tables.

"Search" is a big subject. I know because I was responsible (tech lead) for web platform of "Yellow Pages" in my country, for about 3 years.
Doing search properly, requires a very different approach than using SQL and is something that takes constant effort to maintain.

Here the approach being followed, is to have the best we can just using SQL, so some limitations apply, but we will try to improve whenever we can.

Some things still need to be debated, taking into account, what features we would like to have and what can be done with an SQL only solution.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

To the ones, that keep an eye on the Tab Icons, there is a proposal for a new one on this new page.

It was based on the site Logo, but its design was adapted to fit the size restrictions of a favicon.
Previous favicon was just a rescale of the site logo, and becomes unreadable and blurry at smaller versions.

NOTE: It might not show up on every browser, since there are a lot of settings for the favicon, and several formats, so some might still be missing.
User avatar
Metalbrain
Microbot
Posts: 107
Joined: Thu Feb 15, 2018 2:14 pm

Re: Predictive Search

Post by Metalbrain »

Tested in several browsers, it's working on all of them.

Regarding the Sir Fred problem, it can be found if we skip the space (like looking for sirfr). Looking just for "Sir ", we only get these results:
Sir Clive
Sir Clive and Mr. ZX
Sir Computers

and looking for "sir " in lowercase, we get nothing
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

"Sir Fred", or any other Title that is searched using spaces, will not easily find a match.

And the reason for this, is that search tables, effectively remove the spaces in the titles.
... replace( lower(title),' ','' ),

I'm not sure what was the reason/motivation behind removing the spaces, but I'm sure there must have been one.

The problem with removing spaces is that then it becomes possible to find text formed by concatenating 2 previous words.
For example, we can find "Green Beret" by searching for "nbe", which is not what users expect.

Another example, is that searching for "green beret" doesn't return any result.
But searching for "greenberet" will get you a result for "Green Beret".


From a usability stand point, humans typically search for words or parts of words, they never expect to find substrings of concatenations of words.


Another implicit usability rule, is that if users are very specific about something, is because they are passing an intent.
Examples are Upper case letters or special chars like "ñ".


So, if user uses Upper case in the search string, then it means they are looking for that specific case, for example to distinguish the start of a Word or a Name. But if the user uses all lower case, then it means that any match is acceptable including in the middle of a word or name.

If user uses special chars, it means they explicitly want to find that specific accentuated char. Searching for "aña" means that the user wants matches like "Cañas", but not "Anagrams" or "Zemana", but we currently get those.
However, if the user searches for "ana", We expect all of these to be valid hits: "Cañas", "Anagrams" or "Zemana".

The good news, is that all of the above, can be done using just SQL, a pré-processed search table and collation rules.

These are all things, that need some debate, to find the best compromise for an SQL only solution.
Last edited by RMartins on Thu Jul 18, 2019 4:15 pm, edited 1 time in total.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

On another note, take into account, that the autocomplete feature, can not show a huge amount of results, so the result is being limited to the first 10 entries, listed in alphabetical ascending order.

If the user does not find the result he expects, he needs to drill down into a more specific content, by continuing to write a larger search string.

Also remember, that quick search, includes: Titles, Aliases, Authors and Publishers.

So, 10 of each are gathered and sorted individually, and then sorted as a whole (this guarantees the results are deterministic for the same query) and only the first 10 are shown (of potentially 40 results).

If you are unlucky, it can be 10 of the same type: 10 titles or 10 Authors, etc... so you might need to drill down until you get to the sweet spot, where what you are searching for, actually shows up in that 10 slot window.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Predictive Search

Post by Einar Saukas »

RMartins wrote: Thu Jul 18, 2019 3:10 pmI'm not sure what was the reason/motivation behind removing the spaces, but I'm sure there must have been one.
The motivation is that users searching for either "Knight Lore" or "Knightlore" will work in both cases. Searching for either "Chop Lifter" or "Choplifter" will also work in both cases. Incidentally the exact spellings are "Knight Lore" and "Choplifter", but nobody needs to know that.

Ignoring special characters has the same motivation. Otherwise nobody would ever find "Cadàveriön". And it has to work both ways, otherwise searching for proper spellings like "iniciación" would not find anything, because most foreign titles imported from Martijn's WoS were missing accentuation (we are gradually fixing it but it's going to take some time).

Thus basically "search_by" tables were created to help searches work as good as possible. They were never intended to help autocomplete, and I doubt there's a single solution that would work for both. You may need to create your own auxiliary tables for that.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

Einar Saukas wrote: Thu Jul 18, 2019 3:37 pm
RMartins wrote: Thu Jul 18, 2019 3:10 pmI'm not sure what was the reason/motivation behind removing the spaces, but I'm sure there must have been one.
The motivation is that users searching for either "Knight Lore" or "Knightlore" will work in both cases. Searching for either "Chop Lifter" or "Choplifter" will also work in both cases. Incidentally the exact spellings are "Knight Lore" and "Choplifter", but nobody needs to know that.
And I have nothing against that.
But that is why search is a hard topic.

For these kinds of wanted matches, a specific extra entry should be created for it, instead of just clipping the space on the existing one.
What I mean is the search table should have both spellings (or how many that title is usually searched for).

This is why I said, that a well fined tune search engine, takes time and effort to maintain.
Einar Saukas wrote: Thu Jul 18, 2019 3:37 pm Ignoring special characters has the same motivation. Otherwise nobody would ever find "Cadàveriön". And it has to work both ways, otherwise searching for proper spellings like "iniciación" would not find anything, because most foreign titles imported from Martijn's WoS were missing accentuation (we are gradually fixing it but it's going to take some time).
I also do not have anything against that.
However, it should not be "the only way" to search (case insensitive and accent insensitive).

For example, maybe add some entries in the "aliases" tables could be kept as search related aliases for titles.

NOTE: we can use a different COLLATION, on each query condition for example.

We can determine the collation to use, depending on the input search string for example (if user used upper case, or used special chars).
Look at my explanation on user intent. I can make an example later, if needed.
Einar Saukas wrote: Thu Jul 18, 2019 3:37 pm Thus basically "search_by" tables were created to help searches work as good as possible. They were never intended to help autocomplete, and I doubt there's a single solution that would work for both. You may need to create your own auxiliary tables for that.
What I'm proposing, is that we can in fact use the same system, and we probably should.
We just need to refine it a bit, to fit the intended user searches.

Any user would expect that if he can find something using quick search, to be able to find it on advanced search using the same text on the proper field.

But anyway, these are intended as ideas for debate.
Just trying to help out improve the search.
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Predictive Search

Post by Einar Saukas »

Sure, I understand your ideas. They are quite similar to my considerations when I created these tables about a year ago. But I still don't see how they could be improved.

I don't really need to store text in these tables as lowercase only, since all comparisons use "ci" collation. Everything is currently converted to lowercase just to reduce unnecessary duplicates. Also collation is currently defined in these tables just for convenience, so site owners don't need to remember to specify collation in their queries. Changing these details wouldn't improve searches or help autocomplete.

Moreover, I still think it's a good idea to remove spaces and punctuation from these tables, otherwise users searching for either "Offroad Racing" or "Off Road Racing" wouldn't find anything.

Admittedly I'm not an expert in this area so I could be wrong. If you have any changes to suggest, please go ahead!
User avatar
Einar Saukas
Bugaboo
Posts: 3070
Joined: Wed Nov 15, 2017 2:48 pm

Re: Predictive Search

Post by Einar Saukas »

RMartins wrote: Thu Jul 18, 2019 3:18 pmAlso remember, that quick search, includes: Titles, Aliases, Authors and Publishers.
You probably don't need to distinguish between authors and publishers.

In field "author" you can simply autocomplete against:

Code: Select all

SELECT name FROM labels WHERE id IN
(SELECT label_id FROM search_by_authors)
Analogously for "publisher":

Code: Select all

SELECT name FROM labels WHERE id IN
(SELECT label_id FROM search_by_publishers)
In field "title" you can autocomplete against:

Code: Select all

SELECT title FROM entries 
UNION ALL SELECT title FROM aliases
And in a generic field "text" in basic search, you can autocomplete against:

Code: Select all

SELECT title AS text FROM entries 
UNION ALL SELECT title AS text FROM aliases 
UNION ALL SELECT name AS text FROM labels
For simplicity I have omitted COLLATION, GROUP BY, and ORDER BY. They are left as exercise for the reader :)

This way, it's also easy to impose a limit of 40 results overall, instead of partial limits of 10 each.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

Einar Saukas wrote: Thu Jul 18, 2019 8:48 pm Sure, I understand your ideas. They are quite similar to my considerations when I created these tables about a year ago. But I still don't see how they could be improved.

I don't really need to store text in these tables as lowercase only, since all comparisons use "ci" collation. Everything is currently converted to lowercase just to reduce unnecessary duplicates. Also collation is currently defined in these tables just for convenience, so site owners don't need to remember to specify collation in their queries. Changing these details wouldn't improve searches or help autocomplete.
I agree, except with the last sentence :)

NOTE:
If table data does not contain Upper case characters, we can never match them as Upper case characters.
However, if table data contains Upper case characters, then we can match or ignore them, depending on the collation being used.

So, there is an improvement to be gained, by not forcing the content to be all lower case, which is to allow matching case if required.

Another improvement, is that instead of using MYSQL collation utf8_unicode_ci , the MB4 version should be used (utf8mb4_unicode_ci), according to note on the following documentation page https://dev.mysql.com/doc/refman/8.0/en ... icode.html (this is mentioned in several places in the documentaiton, since the 3 byte version, is a MySQL fluke and non standard Umicode.

Einar Saukas wrote: Thu Jul 18, 2019 8:48 pm Moreover, I still think it's a good idea to remove spaces and punctuation from these tables, otherwise users searching for either "Offroad Racing" or "Off Road Racing" wouldn't find anything.
Punctuation in data is not required to be removed, for similar reasons as case (being possible to match it, if needed).

NOTE:
If we remove accents, we can no longer match them explicitly, since they are not there.
But if we have accents available in table data, we can ignore it (accent insensitive), just by using a different collation, or use it for mathching (accent sensitive).

NOTE:
Current collation in use utf8_unicode_ci, is equivalent to utf8_unicode_ai_ci (3 byte Unicode accent insensitive and case insensitive).
This is why searching for '%aña%' returns results that contain 'ana'.

Here is some documentation reference on this:
https://dev.mysql.com/doc/refman/8.0/en ... names.html

Regarding special cases, ( like "Off Road" versus "Offroad" ) the standard solution is to have a dictionary for those (where synonyms can be defined) or special match/translationrules (when using a search engine), but probably the best we can do here is to provide aliases (AKA synonym) for those known cases, including the entire title for simplification, for example in the "aliases" table or similar.
So there should be 2 entries like "Off Road Racing" versus "Offroad Racing".

This "aliases" (dictionary) can then be used, when generating the search indexes (AKA search_by_* tables in our case).
Einar Saukas wrote: Thu Jul 18, 2019 8:48 pm Admittedly I'm not an expert in this area so I could be wrong. If you have any changes to suggest, please go ahead!
Neither am I, because search is a huge subject, with many considerations and p´ros and cons, like any engineering solution, sometimes compromises need to be done.

But I can give you practical examples of this stuff I'm talking about.

For example, in every place in the world, there are streets or areas that are known by a different name, then their real name, usually for cultural reasons. And hence this is the name that most people use to search stuff in that area.

For example, in Lisbon, there was a big international event in 1998, The World International Exposition, also known as Expo 98.

The surrounding area of that event, got culturally known as "EXPO", which in fact, maps to 3 different parishes.
So when people search for stuff in EXPO, we know that we should limit the search to that geographic location.

The same happens, to a river side road that is culturally called "Marginal", since it's on the margin/bank of the river, but it has 3 or 4 names, since it crosses several parishes and 2 counties.

The same probably happens with some games, that might have some culturally distinct name.

No good example comes to mind right now, but a tentative example is "Chaos - The Battle of Wizards", that is known by many as just "Chaos". Not a very good example, because the cultural name is a substring of the real name.

Another example is "Live and Let Die" which is also known as "Aquablast". Although these different names are from different releases, but the idea behind is the same.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

Einar Saukas wrote: Thu Jul 18, 2019 9:07 pm
RMartins wrote: Thu Jul 18, 2019 3:18 pmAlso remember, that quick search, includes: Titles, Aliases, Authors and Publishers.
You probably don't need to distinguish between authors and publishers.

...
Currently the Quick search is using this query:

Code: Select all

SELECT N.name
FROM(
(
	SELECT	E.title AS name
	FROM	entries E	INNER JOIN search_by_titles SBT ON (SBT.entry_id = E.id)
	WHERE	SBT.entry_title LIKE ?
	ORDER	BY E.title
	LIMIT 10
)
UNION
(
	SELECT	A.title AS name
	FROM	aliases A
	WHERE	A.title LIKE ?
	ORDER	BY A.title
	LIMIT 10
)
UNION
(
	SELECT	L.name AS name
	FROM	labels L	INNER JOIN search_by_authors SBA ON (SBA.label_id = L.id)
	WHERE	L.name LIKE ?
	ORDER	BY L.name
	LIMIT 10
)
UNION
(
	SELECT	L.name AS name
	FROM	labels L	INNER JOIN search_by_publishers SBP ON (SBP.label_id = L.id)
	WHERE	L.name LIKE ?
	ORDER	BY L.name
	LIMIT 10
)) N
ORDER BY N.name
LIMIT 10
Einar Saukas wrote: Thu Jul 18, 2019 9:07 pm This way, it's also easy to impose a limit of 40 results overall, instead of partial limits of 10 each.
Not sure I understood what you meant.

It does not matter, if we have 40 results of each, if we are only going to present 10 in total.
At most they could be 10 of the same type, so we will never show the other 30 of each type.
If you look at the above query, it should be self explanatory.

I would suggest we take the more technical detail somewhere else, to avoid loosing most of the current readers :lol:

However, I can give a simple example, on how to query for each intent.
Here is a simple explanation of how COLLATIONs work, using weights.
https://dev.mysql.com/doc/refman/8.0/en ... ation.html

Lets focus just on titles for example.

Code: Select all

SELECT	E.title AS name
FROM	entries E	INNER JOIN search_by_titles SBT ON (SBT.entry_id = E.id)
WHERE	SBT.entry_title LIKE ?
ORDER	BY E.title
LIMIT 10
Current COLLATION is utf8_unicode_ci, meaning the search is accent insensitive and case insensitive.

Now suppose the user, introduced a search string like 'aña':
- this means the user is interested in matching the special char 'ñ'.
- the user did not use any Upper case character ( so is not interested in a specific case match ).

From the observation of the search input string, we can infer that we should do an accent sensitive and case insensitive search.

Code: Select all

SELECT	E.title AS name
FROM	entries E	INNER JOIN search_by_titles SBT ON (SBT.entry_id = E.id)
WHERE	SBT.entry_title LIKE ? COLLATE utf8mb4_unicode_as_ci
ORDER	BY E.title
LIMIT 10
Notice the extra "COLLATE utf8mb4_unicode_as_ci"!

NOTE:
Remember that to reap the benefits of the several distinct collations, the table data needs to contain mixed case and special chars, so that the comparisons will work as expected.

So, for all the possible combinations, we would need the following 4 COLLATIONS:
- utf8_unicode_ai_ci <=> utf8_unicode_ci
- utf8_unicode_as_cs <=> utf8_unicode_cs
- utf8_unicode_as_ci
- utf8_unicode_ai_cs

And this is were most MySQL installs will give us trouble, since these COLLATIONS are not ALL available by default!

But this can be fixed, by adding them to the server.
https://dev.mysql.com/doc/refman/8.0/en ... e-uca.html
It might not be as easy as we would like (might require a recompile), for the mixed ones at least.

But at first try, we could live with just the first two, that are usually available or more easily installable.

NOTE: to simplify the example, I'm not using the utf8mb4, which we should.

To check for locally available/installed collations, run this command:

Code: Select all

SHOW COLLATION WHERE Charset = 'utf8mb4';
UPDATE:
Using specific COLLATIONS on a query, might trip MySQL into not using indexes.
Or might require extra indexes on the same field, but for a different collation.
Something to keep an eye on.
User avatar
RMartins
Manic Miner
Posts: 776
Joined: Thu Nov 16, 2017 3:26 pm

Re: Predictive Search

Post by RMartins »

RMartins wrote: Fri Jul 19, 2019 3:15 am ...

UPDATE:
Using specific COLLATIONS on a query, might trip MySQL into not using indexes.
...
But using SQL LIKE usually prevents the use of any index, so it might not be a big concern, since the database is small.
Post Reply