JavaScript SEO: the difficult and controversial relationship between Google and JavaScript

More and more often we hear about JavaScript and the improvements carried out by Google in interpreting such language, which is now widely used by developers to create more interactive, dynamic and appealing websites from a user experience perspective.

However, this technology has disadvantages, adding a new layer of complexity when it comes to guaranteeing a proper scanning, indexing, and positioning on search engines.

To understand the difficult relationship between Google and JavaScript, it is necessary to differentiate between two possible scenarios: on the one hand, the websites using JavaScript “only” to add more or less critical features, on the other hand, the websites entirely based on JavaScript, like the Single Page Application created in React, Angular, Vue.js or other frameworks.

In the first case, it is sufficient to restrict the areas in which Googlebot still shows limitations and implement the proper measures. In the second case, the necessary interventions could be more important, because it has to be reconsidered how the indexing process of contents happens.

Capabilities and limits of Googlebot in interpreting JavaScript

Read below the main aspects to be considered when referring to a website with a partial dependency from JavaScript:

  • Redirect in JavaScript: within the official guidelines, Google implies to be able to follow the redirects in JavaScript. However, the 301 redirect from a server perspective is still the best option when it comes to redirecting a page or a website towards a new address.
  • Link in JavaScript: during the Google I/O ’18 , a few Google representatives stated that the search engine can only follow links containing the URL within the HREF attribute. However, some tests carried out by Search Engine Land shows that it is also possible to follow the JavaScript links without the URL within the HREF attribute.
  • Contents generated by interactions: since Google doesn’t interact with the page, the contents generated through JavaScript as the result of a user interaction can’t be displayed. A typical example is the infinite scroll: an SEO friendly infinite scroll has to necessarily foresee the presence of an action page with an HTML link.
  • Lazy loading: The lazy loading is a loading technology with delayed content, especially images, which is used to improve the speed of a website. However, not always the images uploaded through lazy loading can be indexed by search engines; to ensure that Google can index the delayed uploaded contents, it is possible to:
    • Add the <noscript> tag to each image
    • Use structured data (schema.org/image)
    • Use an API Intersection Observer
  • Contents dynamically inserted in the DOM: according to various tests, Google would be able to display and index the contents dynamically added through JavaScript, such as text, images, meta tag, and other elements. However, it is highly recommended (by Google itself) the incorporation of critical contents directly into the original HTML, especially:
  • Rel=“canonical”
  • Status code
  • Meta robots
  • Tag Title
  • Text-base content (main content)
  • Structured data

The latest developments: Googlebot and Bing are now evergreen

Recently, both Google and Bing announced the update of the browser used for the rendering. In the case of Google, it went from Chrome 41 (2015 to Chrome 74 (2019) in May 2019. The representatives of both search engines stated that from now on, the rendering engine will be the subject of regular updates in order to be constantly aligned to the last version of Chrome.

Does this mean that SEO won’t have to worry about JavaScript anymore?

Not really. Before giving for granted the rendering capabilities of engines, it is important to carry out the appropriate controls and verify case by case if any limitation exists. Especially, it is important to remember a fundamental concept, linked to the bot’s performance: the crawl budget or more precisely the render budget, which means the set of resources allocated by Googlebot to scan a specific website.

In other words, if it is true that Google can scan and correctly interpret websites developed in JavaScript, there is no guarantee that has the necessary resource available to do so, or at least in the timing expected by the webmaster.

To better understand this statement, it is useful to take a step back and clarify how the indexing process of the websites entirely based on JavaScript happens.

How websites entirely based on JavaScript are indexed by Googlebot

In the case of applications entirely developed in JavaScript, the indexing process – usually immediate for the traditional websites – happens in two phases, called the two waves of indexing.

In the first wave, all the contents which are not generated through JavaScript, and therefore are already included in the source code of the page, are served directly to the engine and added to the index.

In the second wave, it takes place the rendering of the contents generated with JavaScript. The term rendering does not refer to the graphic rendering of the page, but to the generation of the full HTML code, which happens to fill the template with data coming from the APIs or the database.

However, since the rendering is quite expensive in terms of timing and resources, Googlebot could decide to maintain it on delay. Even if Google recently stated that the next rendering and indexing will happen simultaneously, as far as today there is an actual risk that the contents generated with JavaScript are indexed in a second phase, resulting in a delay of the necessary time to position new pages.

The various types of rendering

To avoid this issue, it is necessary to make sure that the rendering expense won’t lie entirely on the client, in this case, Googlebot. To reach this result, it is possible to implement different solutions, such as:

  • Server-side rendering (or SSR): when it receives a request, the server carries out JavaScript and generates the full HTML code providing it to the engines. This solution allows improving some metrics associated with performances, including First Paint, First Contentful Paint and Time to Interactive because the client doesn’t have to necessarily download, analyze and run JavaScript before showing the first contents to the users. However, the Time to First Byte may get worse, since the code is executed on the server for every request.
  • Server-side rendering + hydration: even this hybrid approach allows the immediate indexing of contents because JavaScript is executed on the server to generate the full HTML. Compared to the previous solution, the difference is that this code is reused by the browser to preserve the interactivity of the application. If implemented correctly, the SSR + hydration has positive effects on First Paint and First Contentful, but it could also have negative impacts on Time to Interactive.
  • Dynamic rendering: this approach consists in providing to the bots – identified through User-Agent – the rerouted content, and to the users (browser) the “normal” content to be rendered client – side.

It is an actual change of Google’s policies, because it is carried out an activity of cloaking, a practice that seemed to be free of penalization.

Along the recommended open source solutions to implement the dynamic rendering we can find Renderton and Puppeteer, while the most used third-party tools are Prerender.io, SEO4Ajax, SnapSearch, and Brombone.

Even if the dynamic rendering is certainly the easiest alternative, Google has recently stated that it has to be considered only as a workaround, which means a temporary solution to be used till one’s web app is updated to support the server-side rendering or hybrid.

JavaScript and SEO: When is it necessary to intervene?

If there are solutions to possible issues linked to the double indexing wave, it is also true that certain interventions are expensive in terms of money and/or effort from a development perspective. Is it always worth it?

The answer is simple: it depends. If no critical content is generated through JavaScript, one can legitimately decide to not intervene.

Paolo Amorosi SEO Coordinator at Pro Web Consulting