Complex Cropping Problem
Posted: 2013-04-24T14:49:35-07:00
Ok, so here's the scenario:
I have a large archive of PDF files (1000s of pages). These PDFs have *NOT* been built with cropbox, trimbox, etc defined (le-sigh). Some pages are RGB, some 8 bit gray, some mono, some CMYK, just to make it more fun! The geometry of the desired content is variable, as is the overall image size. They will need to be converted to tif format for the next part of my process, so we can batch convert them at -density 300 right off... typical result image resolution is in the 3000x6000pix range (with quite a bit of variance, and in differing color-spaces as mentioned above...)
there are extraneous marks and comments in the margins of the file, primarily at the top and bottom, but NOT ALWAYS. These marks and comments are totally inconsistent from file to file. there are EXACTLY TWO consistent marks on the image, a top-center crop guide and a bottom-center crop guide. Printer's marks for side cropping vary by image color-space and are sometimes absent... (did I mention this is a PITA?). As far as I can tell the geometry of these top-center and bottom-center guides is consistent even when size varies. These guide marks are often flanked by text, or extraneous marks like ink strips or other, inconsistent printer's marks. however their vertical end points always mark the correct vertical crop location for the desired content.
goal: find the location of the top- and and bottom-center crop guides, crop the image height at these points and then figure out how wide to crop it based on dumb image ratios, possibly with a check to make sure we got inside the printer's marks.
I'm not so much concerned about speed, just automating the process so man hours can be spent elsewhere.... HOWEVER I've had a 3.4GHz i7-3770 box chewing on "compare -subimage-search -metric rmse rawpage.tif topcropguide.tif result.png" for over 2 hours now without result...
Any ideas?
Input much appreciated!
I have a large archive of PDF files (1000s of pages). These PDFs have *NOT* been built with cropbox, trimbox, etc defined (le-sigh). Some pages are RGB, some 8 bit gray, some mono, some CMYK, just to make it more fun! The geometry of the desired content is variable, as is the overall image size. They will need to be converted to tif format for the next part of my process, so we can batch convert them at -density 300 right off... typical result image resolution is in the 3000x6000pix range (with quite a bit of variance, and in differing color-spaces as mentioned above...)
there are extraneous marks and comments in the margins of the file, primarily at the top and bottom, but NOT ALWAYS. These marks and comments are totally inconsistent from file to file. there are EXACTLY TWO consistent marks on the image, a top-center crop guide and a bottom-center crop guide. Printer's marks for side cropping vary by image color-space and are sometimes absent... (did I mention this is a PITA?). As far as I can tell the geometry of these top-center and bottom-center guides is consistent even when size varies. These guide marks are often flanked by text, or extraneous marks like ink strips or other, inconsistent printer's marks. however their vertical end points always mark the correct vertical crop location for the desired content.
goal: find the location of the top- and and bottom-center crop guides, crop the image height at these points and then figure out how wide to crop it based on dumb image ratios, possibly with a check to make sure we got inside the printer's marks.
I'm not so much concerned about speed, just automating the process so man hours can be spent elsewhere.... HOWEVER I've had a 3.4GHz i7-3770 box chewing on "compare -subimage-search -metric rmse rawpage.tif topcropguide.tif result.png" for over 2 hours now without result...
Any ideas?
Input much appreciated!