- Last Updated on Friday, 17 August 2012 17:58
Microsoft Office contains a decent OCR engine, yet it does not create PDF files with a text layer on it. This project contains a script that takes a tif file and converts it into HOCR format (HTML + OCR). This can be then processed with a simple Java program to get a PDF file. Grab it here.