(SEO This Week) – With all the different SEO audit tools and webmaster tools that exist there is bound to be some measurements that apply to the one that may not apply to others. After all, SEO is so subjective anyway beyond the basic implementations, it’s bound to happen.
A Twitter user found one such difference while using the Bing Webmaster Tools site audit feature and got a message saying that “HTML size too long” and so, logically, he asked Google’s John Mueller if it also applied to Googlebot.
@JohnMu Sir – Does Google have a limit on crawling long pages – Bing Flags some of our long posts as “HTML Size is too long.”
— Kanuj⚡ (@kanuj5678) December 8, 2021
Bing has a soft limit of 125 KB (HTML page) – any such requirements from Google or is it best to paginate very long articles? https://t.co/oyw22IhapB pic.twitter.com/eTtthdhBWx
John’s reply was as expected.
We don’t have a documented limit, last I saw someone check it was 10’s-100’s of MB, so I wouldn’t worry about that. Giant HTML pages do slow things down, so it’s probably still something to keep on your to-do list.
— 🐄 John 🐄 (@JohnMu) December 8, 2021
Based on the text of John’s answer, he did the same Google search I had to do in order to see if there was actually a documented limit of the page size.
Turns out, the last study (term used loosely) was in 2015 when someone submitted a copy of Pride and Prejudice to Google and the search engine only indexed a part of it. The same user did a search again in 2017 and the status hadn’t changed.
This supports information a deleted Google documentation page that places size limits at 30MB (anything bigger is completely ignored) and an HTML size limit of 2.5MB (which we know they are not following anymore as modern web page development are making bigger sites).
So if there isn’t an official HTML size limit, but we know Google isn’t indexing full documents over a given size, what is that size?
Well, it turns out, it’s around 180,000 words.
Ted Kubaitis from seotoollab.com and Lee Witcher conducted a test by putting 1 million test keywords on a page and Google indexed the top 180K of them and took 48 hours and they indexed in order from the top down. The test keywords ranged from 7 to 16 characters in length with a white space in between.
Kubaitis summarized that “There are likely multiple limits in play. There is a character limit which is how big a page can be because the HTML specification doesn’t set any limits so a web page can be any size and still be valid HTML. From an engineering point of view, you have to set practical limits. John is likely referring to this first engineering limit where the Page can be 10s of MB and still get fully loaded by GoogleBot. That doesn’t mean there aren’t other limits too further into the process. The indexing limit is probably a second limit being used.”