Google now indexed Text from Images – Uses OCR

November 1st, 2008 by Annkur

Google leads the search business, and every passing they they convince me that no one is even getting close to them. Just recently they announced that they can now crawl and understand text in Flash animations, they now have something even better! Apparently now Google uses a OCR technology to read scanned documents / images (within PDF files).

This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world’s information accessible and useful. [Google Blog]

View this technology in action:

Repairing Aluminum Wiring PDF Scan || Repairing Aluminum HTML (Google Processed)

Update: Amit at Labno tells you how to use this technology for converting your scanned documents to Text

Via Tech Crunch

Similar Posts:

Categories: Concept / Educative, First look, Tech Industry News | Tags: , , , , , , ,

1 Comment

  1. Back to SEO Basics – from Site Review Session at Google I/O 2009 | SEO Scientist - Applying the scientific method to SEO

    [...] this is clearly stated in Google Webmaster Help section, every now and then there are speculations popping up about how Google definitely reads text inside images. So [...]

Leave a comment