marcinmilkowski.pl
modi2hocr PDF Print E-mail
Microsoft Office contains a decent OCR engine, yet it does not create PDF files with a text layer on it. This project contains a script that takes a tif file and converts it into HOCR format (HTML + OCR). This can be then processed with a simple Java program to get a PDF file. Grab it here.
 
Next >