Multicropping Dictionary Entries (Based on Whitespace)
Posted: 2012-08-14T23:55:23-07:00
Hi Everyone,
This forum is amazing and I am hoping that someone will be able to help me out with this project:
I have a set of scans of a dictionary. Here are four examples (I left them as links because they are all quite large):
http://img189.imageshack.us/img189/9287/00011y.png
http://img854.imageshack.us/img854/29/03022.png
http://img826.imageshack.us/img826/2843/07452.png
http://img443.imageshack.us/img443/7585/09922.png
I want to run a script on all the files that would multicrop them to their individual entries. Ideally, the image would be split according to the red lines below and be named in a sequential manner (eg. 0001.1-01.png, 0001.1-02.png, ..., 0001.1-13.png) so that in the end, I would have a set of images, each with its own dictionary entry.

I had the idea that I could replace rows of consecutive white pixels with one line and then crop from there (like viewtopic.php?f=1&t=20766). I also found this topic: viewtopic.php?f=1&t=16041 which is similar; however, the particular issue with this project is that there is also whitespace between lines within an entry. It seems as though the difference in the whitespace heights is enough to separate just the entries and not the lines, but I'm not sure how to start. I hope that this is clear enough. I would appreciate any ideas or suggestions!
Thanks!
Ravi
This forum is amazing and I am hoping that someone will be able to help me out with this project:
I have a set of scans of a dictionary. Here are four examples (I left them as links because they are all quite large):
http://img189.imageshack.us/img189/9287/00011y.png
http://img854.imageshack.us/img854/29/03022.png
http://img826.imageshack.us/img826/2843/07452.png
http://img443.imageshack.us/img443/7585/09922.png
I want to run a script on all the files that would multicrop them to their individual entries. Ideally, the image would be split according to the red lines below and be named in a sequential manner (eg. 0001.1-01.png, 0001.1-02.png, ..., 0001.1-13.png) so that in the end, I would have a set of images, each with its own dictionary entry.

I had the idea that I could replace rows of consecutive white pixels with one line and then crop from there (like viewtopic.php?f=1&t=20766). I also found this topic: viewtopic.php?f=1&t=16041 which is similar; however, the particular issue with this project is that there is also whitespace between lines within an entry. It seems as though the difference in the whitespace heights is enough to separate just the entries and not the lines, but I'm not sure how to start. I hope that this is clear enough. I would appreciate any ideas or suggestions!
Thanks!
Ravi