Page 1 of 1

PDF Conversion to PNG issues

Posted: 2014-04-28T06:59:44-07:00
by talexander
We currently use Alfresco as a CMS. This uses IM and GS to create thumbnail previews of all documents. This works fine for most every document but a small subset of PDFs. These PDFs have been processed by ABBYY OCR 4 Linux and are therefore OCRd documents. I am unsure of the exact command line switches that Alfresco is passing to convert but I get the same result running the following command from an ssh session:

./convert -thumbnail x300 -density 300 -colorspace sRGB /root/test1.pdf[0] -background white -flatten /root/test1.png

or

./convert -thumbnail x300 -density 300 /root/test1.pdf[0] /root/test1.png

This is using GS version 8.56 and IM 6.8.6-6.

The results all look like this:

Image

I am looking to remove the static/noise on the document which is not present when you open the original PDF the preview is generated from or present in the flash preview that Alfresco creates. I have tried running this using a later version of ghostscript but this does not change the outcome. I am guessing it is some form of layer that the OCR process has left in the document but I am struggling against the wealth of option to work out if there is anything I can do to change how IM is processing them. I would be grateful of any guidance some more in the know might be able to send my way.

Am using RHEL 5 x64.

Link to PDF:

https://dl.dropboxusercontent.com/u/34148148/test1.pdf

Re: PDF Conversion to PNG issues

Posted: 2014-04-28T07:14:15-07:00
by snibgo
Please put your test1.pdf somewhere like dropbox.com and paste the URL here.

Re: PDF Conversion to PNG issues

Posted: 2014-04-28T08:20:54-07:00
by talexander
Please find the link to the PDF below:

https://dl.dropboxusercontent.com/u/34148148/test1.pdf

Re: PDF Conversion to PNG issues

Posted: 2014-04-28T08:31:42-07:00
by snibgo
With more recent versions of IM (v6.8.9-0) and GS (v9.10), on Windows 8.1, your command gives a much cleaner result; the background is essentially white. So does the simpler command:

Code: Select all

convert test1.pdf t.png