Google’s image search results include PDF images. This change was actually added in June this year. But only a few noticed it.
One of those who noticed was Alex Chitu. The update was then confirmed by a Google staffer.
According to Google, the latest update was added in June. The technology behind this update relies on OCR technology, which premiered in 2008. It indexes content of PDF files and it shows them as part of the normal web search results.
When the search engine giant started indexing PDFs in 2008, the company recommended webmasters to create their HTML pages for any type of images. At that time, Google couldn’t index images within the PDF.
But that has changed now.
Although images within a PDF file are included in a Google image search result, the option to view it is quite different. Instead of “view image,” you’ll be given an option to “view PDF.”
As to why, there is no way to provide link to an image file found within a PDF. This means that you will only see image preview. If you need to view the image, you will have to view the PDF file to see the actual image.
It may not provide you the option to view the image and save it without having to view the PDF file. However, this change of Google is a great step to make your PDFs searchable by the search engine, particularly Google.
The OCR technology of the company can recognize more than 200 languages from across the globe. With that number, you can expect it to recognize all major languages in the world.
This update will also give you an additional avenue to test how images are rank through this area compared to those images taken from HTML pages. Unfortunately, most people are not likely to download PDFs, especially if they came from unknown, unfamiliar sites. This means that it could still lead to a low click through rate.
Any image found in PDF files can be reached by Googlebot and be included in image searches. That said, if you don’t want your PDF files uploaded online to be indexed by Google, you should add them to your robots.txt file.
And if they contain sensitive materials, you just have no option but to take them offline.
Google spends most of its money to optimize the indexing capacity of its technology. Thus, this is not a surprise move of Google.