Page 1 of 1

Efficiently convert PDF to bilevel BMP

Posted: 2012-06-20T05:59:16-07:00
by myicq
I am stumbling a bit on a project. I have a process that works, but I can't find out how to make it efficient.

Job : I have a PDF file with writing in Hindi. I need output to be a 1 bit (bilevel) BMP file, preferably at a given size.

So far I have the following:

Code: Select all


C:\IM>convert -verbose -type bilevel -depth 1 -density 200 d:\hindi1.pdf d:\hindi2.bmp

[ghostscript library] -q -dQUIET -dPARANOIDSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r200x200"  "-sOutputFile=C:/../Temp/magick-AifKlr0d-%08d" "-fC:/../Temp/magick-y666brcZ" "-fC:/../Temp/magick-emE4zcNr"

/Temp/magick-AifKlr0d-00000001 PNG 1497x2206 1497x2206+0+0 8-bit DirectClass 241KB 0.156u 0:00.203
/Temp/magick-AifKlr0d-00000002 PNG 1497x2206 1497x2206+0+0 8-bit DirectClass 159KB 0.141u 0:00.1
39
/Temp/magick-AifKlr0d-00000003 PNG 1497x2206 1497x2206+0+0 8-bit DirectClass 200KB 0.141u 0:00.1
40
/Temp/magick-AifKlr0d-00000004 PNG 1497x2206 1497x2206+0+0 8-bit DirectClass 69.4KB 0.141u 0:00.
140

d:\hindi1.pdf[0] PDF 1497x2206 1497x2206+0+0 1-bit DirectClass 241KB 0.438u 0:00.452
d:\hindi1.pdf[1] PDF 1497x2206 1497x2206+0+0 1-bit DirectClass 241KB 0.297u 0:00.312
d:\hindi1.pdf[2] PDF 1497x2206 1497x2206+0+0 1-bit DirectClass 241KB 0.156u 0:00.156
d:\hindi1.pdf[3] PDF 1497x2206 1497x2206+0+0 1-bit DirectClass 241KB 0.000u 0:00.000

d:\hindi1.pdf=>d:\hindi2-0.bmp[0] PDF 1497x2206 1497x2206+0+0 1-bit Bilevel PseudoClass2c 410KB 3.875u 0:03.281
d:\hindi1.pdf=>d:\hindi2-1.bmp[1] PDF 1497x2206 1497x2206+0+0 1-bit Bilevel DirectClass 9.896MB 4.000u 0:03.500
d:\hindi1.pdf=>d:\hindi2-2.bmp[2] PDF 1497x2206 1497x2206+0+0 1-bit Bilevel DirectClass 9.896MB 4.047u 0:03.656
d:\hindi1.pdf=>d:\hindi2-3.bmp[3] PDF 1497x2206 1497x2206+0+0 1-bit Bilevel DirectClass 9.896MB 4.094u 0:03.812
My propblems are:
  • The first page of the PDF is converted correctly (not exactly to size, but for now OK). But any other pages are coming out 24 bits
    Have a look at the PseudoClass2c in first file vs. DirectClass in other 3.
    This is my most serious problem.
  • I do not know how to set exact size (specifying width = 4800 instead of density)
Any other conversion tools will be better suited for this ? (commercial OK as well) ?

The original source of the document is MS Word, so I can also go other routes. Open for suggestions, the faster the better.

I can provide example source files if needed.

PDF to BMP -- how to do with single convert ?

Posted: 2012-06-22T06:28:21-07:00
by myicq
I partly found the answer to the issue myself..

Code: Select all

c:\IM>convert -type Grayscale -verbose  -density 200 d:\hindi1.pdf d:\hindi1.PNG

c:\IM>convert -type Bilevel d:\hindi1.png d:\hindi1.bmp
But how do I write this as ONE convert line ? I do not need the PNG file, it may well be in memory only.
I tried to use a pipe both with PNG and MIFF format, but no luck.

Anyone that can help with this ?

Re: Efficiently convert PDF to bilevel BMP

Posted: 2012-06-22T10:11:45-07:00
by fmw42
try just

convert -density 200 d:\hindi1.pdf -type Bilevel d:\hindi1.bmp

or

convert -density 200 d:\hindi1.pdf -colorspace gray -type Bilevel d:\hindi1.bmp

Re: Efficiently convert PDF to bilevel BMP

Posted: 2012-06-25T12:16:42-07:00
by myicq
fmw42 wrote:try just
convert -density 200 d:\hindi1.pdf -type Bilevel d:\hindi1.bmp
Tried, but not desired result
or
convert -density 200 d:\hindi1.pdf -colorspace gray -type Bilevel d:\hindi1.bmp
Will try later. Thanks for your suggestion!
meanwhile I have found http://www.digitzone.com/pdftobmp.html which IS commercial, but still command line and will convert in a few seconds per page. Hope I find a way for IM to do the same.

Re: Efficiently convert PDF to bilevel BMP

Posted: 2012-06-25T13:22:02-07:00
by fmw42
Post a link to your pdf file so others can test and verify. If you have a correct result, the post that also.

Re: Efficiently convert PDF to bilevel BMP

Posted: 2012-06-29T04:00:29-07:00
by myicq
fmw42 wrote:Post a link to your pdf file so others can test and verify. If you have a correct result, the post that also.
I have uploaded two PDF examples each 4 pages, and converted documents (8 x BMP) APPROX like they need to be.
BMPs are 3182 x 4687 px 1 BPP, ideally I would like 4800 wide but this is not extremely important if it's difficult to achieve.

These BMPs were converted using a Windows commercial software, but I would like to use IM if it's faster + better.

Do anyone have examples that can do this ?

Files at:
http://ge.tt/8hWfXoJ/v/0?c

Re: Efficiently convert PDF to bilevel BMP

Posted: 2012-06-29T12:11:04-07:00
by fmw42
Your problem is that you have an alpha channel with text in it. The IM ghostscript delegate can only work in one of two ways as determined by the sDevice in the delegates.xml file. Assuming you have pngalpha for the device, it can handle transparency, but only one frame at a time. Alternately, you can process multiple frames all at once with another sDEVICE, but it won't handle transparency.

So assuming you have sDEVICE=pngalpha, you need to process each frame one at a time

convert -density 288 sankul-ann-hin-9-15-103.pdf[0] -background white -flatten -depth 1 BMP3:sankul-ann-hin-9-15-103_0.bmp
convert -density 288 sankul-ann-hin-9-15-103.pdf[1] -background white -flatten -depth 1 BMP3:sankul-ann-hin-9-15-103_1.bmp
convert -density 288 sankul-ann-hin-9-15-103.pdf[2] -background white -flatten -depth 1 BMP3:sankul-ann-hin-9-15-103_2.bmp
convert -density 288 sankul-ann-hin-9-15-103.pdf[3] -background white -flatten -depth 1 BMP3:sankul-ann-hin-9-15-103_3.bmp


In the above, I am using a technique of supersampling by generating higher quality from the pdf by specifying a higher density (4x) and then resizing afterwards by 1/4=25%

Alternately, this also produces depth 1 with the current BMP

convert -density 288 sankul-ann-hin-9-15-103.pdf[0] -background white -flatten sankul-ann-hin-9-15-103_0.bmp

see
http://www.imagemagick.org/Usage/formats/#bmp