Almost a year ago, Pro Web delivered a presentation at Rocket Science SEO, the event organized by Enrico Altavilla. On that occasion we explored the operation of two among Google’s most important updates in recent years, Hummingbird and Rankbrain. The content of the presentation was purely technical and far from easy. And for many months we had serious doubts whether it was possible to write a post about it. But then we thought that if we broke down the content into smaller parts, we could write two or three interesting posts, which, perhaps with a little bit of effort, would not be too heavy.
So, today we cover the part of the presentation about Rankbrain, explaining what it is and how it works and describing the patent that is probably behind it.
What is RankBrain and how does it work?
RankBrain: a combination of artificial intelligence and words that become mathematical entities, it is still one of Google’s most “mysterious” algorithms
Everything began on 26 October 2015, the date on which Bloomberg.com published an article in which the journalist Jack Clark quoted the words of Greg Corrado, Principal Scientist at Google. Corrado argued that Google had already been using Artificial Intelligence and machine learning for months to refine SERP results and the algorithm that does it was called RankBrain. According to Clark, Corrado said that Rankbrain uses artificial intelligence to transform language into mathematical entities that can be understood by a computer; when RankBrain comes across words or phrases with which it is not familiar, it tries to replace them with words or phrases of similar meaning, thus modifying the results in the SERPs. Corrado also spoke of the team of engineers involved in the development of Rankbrain, citing Yonghui Wu and Thomas Strohmann. Probably these names do not say much to you but keep the latter in mind because we will soon be talking more about him.
In those same days, TheSEMPost.com also discussed the issue through its founder, Jennifer Slegg, who quoted the statements of an unnamed Google spokesperson and added some information to those provided by Bloomberg:
- Rankbrain works with all languages;
- It is useful especially but not only in long-tail keyword searches and in those that have never been made before;
- It works with all types of queries;
- It does not learn in real time, but it is periodically updated.
But perhaps the most interesting article was the one by Bill Slawsky, published on Gofishdigital.com, which offered some hypotheses on further details on how RankBrain works. According to the article, RankBrain is believed to replace a query based on two fundamental factors: the concept and the context. Slawsky also tried to make assumptions on the patent that is probably behind it all. The title of the patent is “the use of concepts such as contexts for the substitutions of the query” and is the fruit of the work of a team of engineers among whom one name stands out: Thomas Strohmann. That’s right, that same Strohmann, who Corrado had mentioned as one of the founding fathers of RankBrain.
This coincidence has led us to surmise that Slawsky’s intuition was absolutely right and to explore the patent in detail. It explains how the query replacement process takes place, i.e., the process through which Rankbrain replaces the search terms entered by users with other terms to obtain better results in the SERPs. Bear in mind that Slawsky’s idea, albeit very convincing, remains a hypothesis that we will now try to prove.
Rankbrain: the possible patent
The picture below shows all of the steps in the substitution process, taking as an example the case of the query “New York Times Puzzle”: this is a user who is searching for the New York Times’ puzzle games on Google. The letters and numbers indicated in the image are functional to the explanation; we will concentrate on the letters, which represent the salient steps of the process, while we will quote only the numbers most useful to understanding.
Here is a picture of the filed patent that explains the operation of RankBrain via the substitution method: the query concerns the crossword puzzles of the New York Times.
A: The user query (in figure 221) is routed through a network to the Search System.
B: The query is submitted to the Query Reviser Engine that determines whether and how it should be revised.
C: If the Query Reviser Engine decides that the query must be revised, the terms composing it are forwarded separately to the Substitution Engine. In this case therefore the terms “New”, “York”, “Times” and “Puzzles” will be forwarded separately (in figure 222).
D: The Substitution Engine parses the individual terms, combines them together to identify one or more concepts and queries the Collection Of Concepts to check if there is a match between the concepts found and the concepts already known.
E: The Collection Of Concepts returns the concept found to the Substitution Engine. In this case the concept returned is “New York Times”
F: The Substitution Engine forwards the concept “New York Times” to the Query Log
G: The Query Log tries to combine the concept with the terms present in the original and to replace these terms with others, obtaining a set of substitution rules which it ranks. The specific substitution rules generated by the Query Log are shown below:
- Rule No. 231: “New York Times Puzzle” can be replaced with “New York Times Crossword”. The Query Log gives this substitution rule a positive score (represented in the figure by an up arrow)
- Rule No. 232: “Puzzle New York Times” can be replaced with “Crossword New York Times”. The Query Log gives this substitution rule a negative score (represented in the figure by a down arrow)
- Rule No. 233: “New York Times Puzzle” can be replaced with “New York Times Subscription”. The Query Log gives this substitution rule a negative score
- Rule No. 234: “Jigsaw Puzzle” can be replaced with “Jigsaw Crossword”. The Query Log does not assign any score to this rule
“Puzzle” can be replaced with “Crossword”. The Query Log does not assign any score to this rule
H: The Query Log sends the substitution rules which it has ranked to the Substitution Engine.
I: The Substitution Engine determines that the only appropriate rule is the first one (the only one that has received a positive score, rule no. 231) and sends it to the Collection Of Substitution Rules. From now on this substitution rule will be used to refine the results in the SERPs.
The consequences in SERPs
What are the consequences of the process described in the patent? Google has learned that when users type the query “New York Times Puzzle” they are probably also searching results related to the query “New York Times Crossword” and creates SERPs that are able to respond to both queries. The SERPs of the two queries are in fact very similar:
What is Rank Brain?
RankBrain is a Google algorithm that uses artificial intelligence and machine learning to help the core algorithm to better interpret user queries.
What does it actually do?
It tries to refine the search results by replacing the terms that make up the query with more suitable terms.
On what basis is the substitution carried out?
On the basis of the concepts contained in the query, and of the contexts that they can represent
How does it affect the SEO?
You can get there by reasoning, so let us let you think. Or maybe we’ll write a post shortly about it…