Google Doesn’t Have A Limit On Long HTML Pages

Date:

Share post:

by Clint Butler, Digitaleer

(SEO This Week) – With all the different SEO audit tools and webmaster tools that exist there is bound to be some measurements that apply to the one that may not apply to others. After all, SEO is so subjective anyway beyond the basic implementations, it’s bound to happen.

A Twitter user found one such difference while using the Bing Webmaster Tools site audit feature and got a message saying that “HTML size too long” and so, logically, he asked Google’s John Mueller if it also applied to Googlebot.

John’s reply was as expected.

Based on the text of John’s answer, he did the same Google search I had to do in order to see if there was actually a documented limit of the page size.

Turns out, the last study (term used loosely) was in 2015 when someone submitted a copy of Pride and Prejudice to Google and the search engine only indexed a part of it. The same user did a search again in 2017 and the status hadn’t changed.

This supports information a deleted Google documentation page that places size limits at 30MB (anything bigger is completely ignored) and an HTML size limit of 2.5MB (which we know they are not following anymore as modern web page development are making bigger sites).

So if there isn’t an official HTML size limit, but we know Google isn’t indexing full documents over a given size, what is that size?

Well, it turns out, it’s around 180,000 words.

Ted Kubaitis from seotoollab.com and Lee Witcher conducted a test by putting 1 million test keywords on a page and Google indexed the top 180K of them and took 48 hours and they indexed in order from the top down. The test keywords ranged from 7 to 16 characters in length with a white space in between.

Kubaitis summarized that “There are likely multiple limits in play. There is a character limit which is how big a page can be because the HTML specification doesn’t set any limits so a web page can be any size and still be valid HTML. From an engineering point of view, you have to set practical limits. John is likely referring to this first engineering limit where the Page can be 10s of MB and still get fully loaded by GoogleBot. That doesn’t mean there aren’t other limits too further into the process. The indexing limit is probably a second limit being used.”

Clint Butler
Clint Butlerhttps://www.seothisweek.com
With more than 15+ years’ of Agency Owner experience working as an advanced SEO, I help companies scale their business with the best content strategies and digital marketing campaigns.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

spot_img

Related articles

Google Expands Site Reputation Abuse Policy to Tackle Manipulative SEO Practices

In a significant move to combat manipulative SEO tactics, Google has expanded its site reputation abuse policy to...

SEO in smaller non-English speaking markets

What is the SEO in the smaller markets? The direct answer to what smaller markets present is that...

Navigating Google’s Frequent Algorithm Updates

Google’s frequent algorithm updates have been rolling out in quick succession since August. Some in the SEO community...

Are We Worried About The Wrong User Metrics?

Since E-A-T and E-E-A-T have become such a big deal for SEOs, there have been a lot of...