Multicropping Dictionary Entries (Based on Whitespace)

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
rramphal
Posts: 2
Joined: 2012-08-14T22:50:05-07:00
Authentication code: 67789

Multicropping Dictionary Entries (Based on Whitespace)

Post by rramphal »

Hi Everyone,

This forum is amazing and I am hoping that someone will be able to help me out with this project:

I have a set of scans of a dictionary. Here are four examples (I left them as links because they are all quite large):
http://img189.imageshack.us/img189/9287/00011y.png
http://img854.imageshack.us/img854/29/03022.png
http://img826.imageshack.us/img826/2843/07452.png
http://img443.imageshack.us/img443/7585/09922.png

I want to run a script on all the files that would multicrop them to their individual entries. Ideally, the image would be split according to the red lines below and be named in a sequential manner (eg. 0001.1-01.png, 0001.1-02.png, ..., 0001.1-13.png) so that in the end, I would have a set of images, each with its own dictionary entry.

Image

I had the idea that I could replace rows of consecutive white pixels with one line and then crop from there (like viewtopic.php?f=1&t=20766). I also found this topic: viewtopic.php?f=1&t=16041 which is similar; however, the particular issue with this project is that there is also whitespace between lines within an entry. It seems as though the difference in the whitespace heights is enough to separate just the entries and not the lines, but I'm not sure how to start. I hope that this is clear enough. I would appreciate any ideas or suggestions!

Thanks!
Ravi
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Multicropping Dictionary Entries (Based on Whitespace)

Post by fmw42 »

Are you able to insert the red lines yourself? If so then the first reference you gave will provide the solution. If not, then note that each text section starts with some bold characters that start to the left side of the image. None of the following text lies in that region. So you should be able to use the first few columns to locate those bold characters. -scale the first few columns down to one column and look for the begining of dark regions that will define those characters. Then allow for half the distance between the characters and the bottom of the line just above them. That will then give you the Y coordinates for the crops and the width of the image is the X coordinate. Loop over each Y coordinate extracted from the dark areas of the column and do your crops appropriately.
rramphal
Posts: 2
Joined: 2012-08-14T22:50:05-07:00
Authentication code: 67789

Re: Multicropping Dictionary Entries (Based on Whitespace)

Post by rramphal »

Wow ― that is so ingenious! Thank you so much Fred! I'll try it out and see if I can get it to work.
Post Reply