Ok, so here's the scenario:
I have a large archive of PDF files (1000s of pages). These PDFs have *NOT* been built with cropbox, trimbox, etc defined (le-sigh). Some pages are RGB, some 8 bit gray, some mono, some CMYK, just to make it more fun! The geometry of the desired content is variable, as is the overall image size. They will need to be converted to tif format for the next part of my process, so we can batch convert them at -density 300 right off... typical result image resolution is in the 3000x6000pix range (with quite a bit of variance, and in differing color-spaces as mentioned above...)
there are extraneous marks and comments in the margins of the file, primarily at the top and bottom, but NOT ALWAYS. These marks and comments are totally inconsistent from file to file. there are EXACTLY TWO consistent marks on the image, a top-center crop guide and a bottom-center crop guide. Printer's marks for side cropping vary by image color-space and are sometimes absent... (did I mention this is a PITA?). As far as I can tell the geometry of these top-center and bottom-center guides is consistent even when size varies. These guide marks are often flanked by text, or extraneous marks like ink strips or other, inconsistent printer's marks. however their vertical end points always mark the correct vertical crop location for the desired content.
goal: find the location of the top- and and bottom-center crop guides, crop the image height at these points and then figure out how wide to crop it based on dumb image ratios, possibly with a check to make sure we got inside the printer's marks.
I'm not so much concerned about speed, just automating the process so man hours can be spent elsewhere.... HOWEVER I've had a 3.4GHz i7-3770 box chewing on "compare -subimage-search -metric rmse rawpage.tif topcropguide.tif result.png" for over 2 hours now without result...
Any ideas?
Input much appreciated!
Complex Cropping Problem
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Complex Cropping Problem
Are the crop marks exactly the same size and shape and color? Perhaps you should post (links to) one or two examples so that other can see what you are trying to describe.
Re: Complex Cropping Problem
The crop marks are the same SHAPE and nominally the same color (black, as defined by the color space of the image, on a nominally white background, though sometimes it is emulated paper white, not color-space white), but size is not identical as the scale of the entire page is variable. I don't have access to example images at the moment, but the marks are consistent in their geometry --if not their scale-- as far as I can tell. I have not looked at every image...
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Complex Cropping Problem
The only thing I can suggest is to make a template of the crosses and use compare in subimage-search mode to located the cross marks. But it would work best if the background around the cross were of consistent color.
see
http://www.imagemagick.org/Usage/compare/
http://www.imagemagick.org/script/compare.php
viewtopic.php?f=1&t=14613&p=51076&hilit ... ric#p51076
see
http://www.imagemagick.org/Usage/compare/
http://www.imagemagick.org/script/compare.php
viewtopic.php?f=1&t=14613&p=51076&hilit ... ric#p51076
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Complex Cropping Problem
Subimage search is slow, but various techniques can accelerate it hugely. For my work (photographs), I generally resize both images by 10% so the search takes about 1% of the time. This gives an approximte location, so I repeat the search at full size but just searching a very small part of the image.jawzx wrote:HOWEVER I've had a 3.4GHz i7-3770 box chewing on "compare -subimage-search -metric rmse rawpage.tif topcropguide.tif result.png" for over 2 hours now without result...
If you can provide a couple of extreme examples, perhaps we can suggest something more specific.
snibgo's IM pages: im.snibgo.com
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Complex Cropping Problem
I have done the same thing as user snibgo, reducing the image and template sizes each to 10% of the full size, found the match, then repeated at some reasonable region about the match locations at full resolution. It speeds things up quite a bit if your template image can take the reduction to 10%
Re: Complex Cropping Problem
Thanks! I'll try the 10% reduction, it should still be useful at that size... how do you then search in a defined area for tighter match?
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Complex Cropping Problem
jawzx wrote:Thanks! I'll try the 10% reduction, it should still be useful at that size... how do you then search in a defined area for tighter match?
You find the match coordinates, compute the coordinates scaled by 10x, then use that as the center and crop some appropriate size that atleast allows the full size template to move +-10 pixels in each direction. I would allow more than 10 though.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Complex Cropping Problem
Yes, as fmw says. I allow plus-or-minus 20 pixels. In practise, the correct result is rarely outside +-10, and I've never seen it as much as +-15.
In theory, the search at 10% resize could return a false positive that a full-size search wouldn't find. In practise, I've never seen this.
I've even used this technique for searching sound-within-sound. The problem was synchronising the outputs of a video camera and a separate sound recorder. I converted the two sounds to two images, and used ImageMagick to find one image within the other.
In theory, the search at 10% resize could return a false positive that a full-size search wouldn't find. In practise, I've never seen this.
I've even used this technique for searching sound-within-sound. The problem was synchronising the outputs of a video camera and a separate sound recorder. I converted the two sounds to two images, and used ImageMagick to find one image within the other.
snibgo's IM pages: im.snibgo.com
Re: Complex Cropping Problem
Thanks for the hints guys! By carefully optimizing my match image for the crop marks, scaling to 10%, and further subdividing the search area I have cut down the process time per image (including rendering a 300dpi anti-aliased tiff from my PDFs!) to about 45 seconds average per image. My script performs the compare actions for top and bottom crops in parallel using background functions and in combination with xargs and a hot CPU the throughput is frankly better than I could have hoped! Hundreds of man hours will be saved! (And my sanity!) data:image/s3,"s3://crabby-images/4ee89/4ee894fc3d896a4e088f55ccf38a4c6139b5a011" alt="Very Happy :D"
data:image/s3,"s3://crabby-images/4ee89/4ee894fc3d896a4e088f55ccf38a4c6139b5a011" alt="Very Happy :D"