I need to convert the following PDF into TIF.
https://www.dropbox.com/s/6yo378t1mu4j7 ... 36.idp.pdf
Later on, it should be parsed using tesseract (OCR software).
tesseract produces quite a lot of errors and false character recognition. Its available options are also limited.
The difficulty in this particular PDF is, that it uses user defined fonts.
Code: Select all
A
Type: Type 3
Encoding: Custom
Actual Font: A
Actual Font Type: Type 3
So far I used, but results are not that good.
Code: Select all
$ convert -monochrome -density 600 in.pdf out.tif
https://www.dropbox.com/s/uxwd4k6pb1orz69/tmp.tif
There are white dots inside the letters and lots of steps. Apparently too much for tesseract.
https://www.dropbox.com/s/c35hvnp07w8e9cd/tesseract.txt
Many thanks,