Page 2 of 2
Re: Split images by white space
Posted: 2010-04-21T11:55:10-07:00
by Bonzo
Anthony has some install notes here:
http://www.imagemagick.org/Usage/api/#building
I installed on a Centos 5.2? server using:
Code: Select all
# uninstall old ImageMagick
yum remove ImageMagick
# get new ImageMagick sources
wget ftp://ftp.imagemagick.org/pub/ImageMagick/ImageMagick.tar.gz
#or as default version did not work
wget ftp://ftp.imagemagick.org/pub/ImageMagick/ImageMagick-6.6.0-0.tar.gz
# untar
tar -zxvf ImageMagick*.tar.gz
cd ImageMagick*
# Extra steps recommended by snibgo – I think I managed to install OK before without but was starting to get a shared libraries: libMagickCore.so.3 error
export LDFLAGS="-L/usr/local/lib -Wl,-rpath,/usr/local/lib"
export LD_LIBRARY_PATH="/usr/local/lib"
ldd /usr/local/bin/convert
#ABOVE LINE ONLY DIDN’T WORK ON ONE SERVER BUT DID WORK ON ANOTHER
# End of extra steps
# configure and make
./configure
make
# install
make install
Re: Split images by white space
Posted: 2010-04-21T12:16:56-07:00
by hm2k
I manually upgraded.
Code: Select all
[user@blade ~]# convert -version
Version: ImageMagick 6.6.1-4 2010-04-21 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2010 ImageMagick Studio LLC
Features:
That seemed to do the trick:
Code: Select all
[user@blade artwork]# ./multicrop p29cE.jpg p29cE_out.jpg
Processing Image 0
Size: 398x136
Page Geometry: 443x540+17+44
Processing Image 1
Size: 404x145
Page Geometry: 443x540+17+222
Processing Image 2
Size: 404x127
Page Geometry: 443x540+20+393
[user@blade artwork]# ls p29cE*
p29cE.jpg p29cE_out-0.jpg p29cE_out-1.jpg p29cE_out-2.jpg
data:image/s3,"s3://crabby-images/904e0/904e0168ab918ee4c3574d031ad055e4bab3dd1e" alt="Smile :)"
Re: Split images by white space
Posted: 2010-04-21T13:17:02-07:00
by hm2k
I just tried this in production on 10 files and it worked perfectly.
Thanks very much for your assistance.
Keep up the good work.
Re: Split images by white space
Posted: 2010-04-21T13:18:53-07:00
by fmw42
you are welcome. glad it was of help
Re: Split images by white space
Posted: 2014-12-16T10:13:12-07:00
by johnbent
Anyone still monitoring this really old thread? I have over 350+ images that I'd love to split along "large" regions of whitespace. Can multicrop handle this? I couldn't figure out the arguments to use. Basically what I have is 350+ scanned pages of a dictionary and I'd like to convert them to text (I have permission from the copyright holder). It's too much work for me to do myself so I want to use mechanical turk. I'd like to create a task for each individual word in the dictionary. So is there a way to use multicrop to separate out each word entry in this picture:
data:image/s3,"s3://crabby-images/1899a/1899a9e436184fed563bcceeed6627d1a9a013ab" alt="Image"
Re: Split images by white space
Posted: 2014-12-16T10:24:00-07:00
by snibgo
It is generally best to start a new thread for new questions. By all means, refer back to previous threads.
I would tackle it like this:
1. Deskew each page.
2. Chop off head and tail of each page.
3. Divide each page into two columns, both trimmed left and right.
4. Divide each column into lines (but with no further trimming).
Now, you have one image per line in the dictionary. Each image that has a character at the far left is the start of a definition. Each image with white space at the left is a continuation.
So you then join up all the lines for each definition, and send that to the OCR.
Re: Split images by white space
Posted: 2014-12-16T10:37:28-07:00
by johnbent
That's a great suggestion! Thanks very much. I'm a total newbie to imagemagick however. I'm willing to work to figure out how to do all of the above but if you know any of the command lines to perform each of those above steps automatically for each of the 350+ pages, that'd be a much appreciated head start. I think I'll also follow your suggestion and start a new thread for this.
PS: I haven't had good luck with automated OCR on this since I believe most OCR use language context and there isn't language context in OCR for Palauan. So my OCR plan is mechanical turk (human workers on amazon).