Google Doesn't Have A Limit On Long HTML Pages

by Clint Butler, Digitaleer

(SEO This Week) - With all the different SEO audit tools and webmaster tools that exist there is bound to be some measurements that apply to the one that may not apply to others. After all, SEO is so subjective anyway beyond the basic implementations, it's bound to happen.

A Twitter user found one such difference while using the Bing Webmaster Tools site audit feature and got a message saying that "HTML size too long" and so, logically, he asked Google's John Mueller if it also applied to Googlebot.

John's reply was as expected.

Based on the text of John's answer, he did the same Google search I had to do in order to see if there was actually a documented limit of the page size.

Turns out, the last study (term used loosely) was in 2015 when someone submitted a copy of Pride and Prejudice to Google and the search engine only indexed a part of it. The same user did a search again in 2017 and the status hadn't changed.

This supports information a deleted Google documentation page that places size limits at 30MB (anything bigger is completely ignored) and an HTML size limit of 2.5MB (which we know they are not following anymore as modern web page development are making bigger sites).

So if there isn't an official HTML size limit, but we know Google isn't indexing full documents over a given size, what is that size?

Well, it turns out, it's around 180,000 words.

Ted Kubaitis from and Lee Witcher conducted a test by putting 1 million test keywords on a page and Google indexed the top 180K of them and took 48 hours and they indexed in order from the top down. The test keywords ranged from 7 to 16 characters in length with a white space in between.

Kubaitis summarized that "There are likely multiple limits in play. There is a character limit which is how big a page can be because the HTML specification doesn't set any limits so a web page can be any size and still be valid HTML. From an engineering point of view, you have to set practical limits. John is likely referring to this first engineering limit where the Page can be 10s of MB and still get fully loaded by GoogleBot. That doesn't mean there aren't other limits too further into the process. The indexing limit is probably a second limit being used."

(function() { // DON'T EDIT BELOW THIS LINE var d = document, s = d.createElement('script'); s.src = ''; s.setAttribute('data-timestamp', +new Date()); (d.head || d.body).appendChild(s); })();
Recommended Tools
About Us
Contact Us
2021 - Copyright, All Rights Reserved, web design by Digitaleer with ❤️