mdbtxt1
mdbtxt2
Proceed to Safety

Failures of Web Search Engines    

Google search began as a fully capable literal search engine with boolean operators. "quotes" could be used to force it to honor word order within a phrase, and the user was trusted to spell search terms as intended.

Google was so popular that competitors were forced to provide similar services. However, over time, a number of changes have diluted the usefulness of search engines. They now deliver many results that not only fail to contain content matching the search terms, but indeed never contained matching content.

The changes can be grouped into stages, presented here in chronological order:

Stage 1 Recommendations: Spelling

Search engine algorithms try to guess what the user really wanted rather than what they actually requested, This includes most spelling errors (whether an actual mistake or intended), similar proper nouns (names etc.), and tense changes.

For example, stage 1 recommendations cause a search for /combinational probability/ will give results that begin with the line

Did you mean: combinatorial probability

and many of the results contain the word "combinations" instead of the desired "combinational".

This feature began in late 2008 1 as an optional but easy to ignore feature: results began with the familiar "Did you mean: more-common-spelling" with exactly two results, then were followed by "Results for: your spelling" and the results you actually asked for.

Some time later, probably around the time of the Hummingbord algorithm changes 2 the spelling "correction" results became mandatory. If you search for something rare, it will give you results for some other spelling instead:

Showing results for "solstheim"
Search instead for "solsthelion"

It is still possible however to follow a link to the proper search. One can even engineer one's way around the extra step by including a magic query parameter &nfpr=1 or &tbs=li:1 in a search URL. Either of these will make it revert to the older "did you mean?" nag prompt.

Stage 2 Recommendations: History

Search engine algorithms try to predict what the user wants (based on past history) rather than treating each new query as an independent request. This is fairly easy to avoid, by doing searches in a "private" or "incognito" window.

Stage 3 Recommendations: Promotion

Search engine algorithms promote certain results based on content provider or affiliation. These "promoted" results often push the "real" results even further down the list of results.

While advertising makes up the well-known majority of promotion in search engine results, other types of promotion or "demotion" are documented. For example, a Google algorithm update dubbed "Medic" by independent analysts was introduced in 2018. It aimed to demote sites deemed "non user-friendly" or "not providing good user experience", and promote those from sources that rate well in "E.E.A.T.": (first-hand) Experience; Expertise; Authoritativeness; Trust(worthiness). These policies are in keeping with existing guidelines that Google maintains for its Search Quality Raters (several thousand people employed by external vendors). The 2018 changes seem to mostly affect sites serving topics that Google calls "YMYL" (your money, your life) which notably includes searches on medical information (nutrition e.g. the keto diet, medical products e.g. a glucometer) but also other vital or high-value topics such as finance or investment advice.

Auto-scrolling and Quasi-Scrollers

Many sites host content that is contributed by users. Many are intended to be easy to access and have a minimum of Javascript. However, due to their quasi-live reverse scrolling design, search result links no longer lead to a page containing the searched word or phrase (because it has been scrolled off the bottom).

Many good examples of this can be found on Reddit. Find a topic (subreddit) of interest, then find an individual item (question or "post") that is recent and looks fairly popular, then take note of some phrase that appears in the first or second reply-comment. Then come back a day later and see if that phrase still appears near the top — it often won't. Reddit orders comments by a heuristic indicating how likely they are to be of interest to readers. Eventually your phrase will have moved so far down that you need to use the "show more comments" link at the bottom. At this point, the URL to this particular post will probably be featured in results on a search engine such as Google if you search for your phrase ("in quotes"), but if you then follow the link to Reddit you can no longer find your phrase with the browser's "find in page" command.


footnotes

1 : Danny Sulivan (for Search Engine Land), Google Testing Enhanced Listings, "Pagelinks" & Auto-Spelling Correction, link.

2 : Wikipedia, Google Search.


Robert Munafo's home pages on AWS    © 1996-2024 Robert P. Munafo.    about    contact
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Details here.

This page was written in the "embarrassingly readable" markup language RHTF, and was last updated on 2023 Dec 27. s.27