(SEO This Week) – Google’s John Mueller dropped a proverbial dime on how the search engine works when dealing with PDF files and the content in them. Actually, he didn’t, but some might claim he did so let’s get ahead of it.
On the positive side, none of the contents of the PDFs are indexed, so they're not as easily findable for casual searches either. I didn't expect to appreciate bad SEO, but here we are :-).
— 🐄 John 🐄 (@JohnMu) January 18, 2022
In the tweet sequence, John was complaining about having to have two different technologies installed in order to fill out forms.
TIL some PDFs aren't as P as expected. Apparently you have to both install Adobe's PDF reader & activate JavaScript in order to fill out some official forms o_O. https://t.co/Rrw3CprHwm
— 🐄 John 🐄 (@JohnMu) January 18, 2022
… and there are a bunch of these indexed too: https://t.co/0VdCUi3b7N
The story here is that John pointed out that the PDFs themselves were in the index, however, he went on to say that the content of the PDFs was not.
This assertion would lead credence to new testing observations that show that a web page can indeed be in Google’s index and findable when looking directly for that asset. However, at the time, the web page won’t be ranked for any terms on that particular page, or in this case, PDF.
Upon further review of John’s claim a look at the search result he shared resulted in a different story.
The PDFs he was referencing were in fact indexed, and so was the content, however, because the PDFs required Adobe Reader to open there is a default message on all the documents. This default message is what was indexed.
A couple of things are going on here, first, the entities that are providing the PDF documents have technology in place to detect both Adobe and Javascript being active on a user’s browser. Second, if the browser doesn’t report those two pieces of tech, the users are given this PDF with the template messaging on it. Third, their SEOs teams are allowing those blank versions of the PDFs to be indexed.
In the end, this is more of an issue for users who think that they are clicking on a PDF download link in the search results and end up getting a 45 page PDF with a “Please wait…” message on every page.
But on the bright side, this search result alone should lay your mind to rest knowing that Google is still converting PDF files into HTML, reading them, and ranking them based on the content inside of them.