Page 2 of 2

Re: Split images by white space

Posted: 2010-04-21T11:55:10-07:00
by Bonzo
Anthony has some install notes here: http://www.imagemagick.org/Usage/api/#building

I installed on a Centos 5.2? server using:

Code: Select all

# uninstall old ImageMagick
yum remove ImageMagick

# get new ImageMagick sources
wget ftp://ftp.imagemagick.org/pub/ImageMagick/ImageMagick.tar.gz
#or as default version did not work
wget ftp://ftp.imagemagick.org/pub/ImageMagick/ImageMagick-6.6.0-0.tar.gz

# untar
tar -zxvf ImageMagick*.tar.gz
cd ImageMagick*

# Extra steps recommended by snibgo – I think I managed to install OK before without but was starting to get a shared libraries: libMagickCore.so.3 error

export LDFLAGS="-L/usr/local/lib -Wl,-rpath,/usr/local/lib"
export LD_LIBRARY_PATH="/usr/local/lib"

ldd /usr/local/bin/convert
#ABOVE LINE ONLY DIDN’T WORK ON ONE SERVER BUT DID WORK ON ANOTHER

# End of extra steps

# configure and make
./configure
make

# install
make install


Re: Split images by white space

Posted: 2010-04-21T12:16:56-07:00
by hm2k
I manually upgraded.

Code: Select all

[user@blade ~]# convert -version
Version: ImageMagick 6.6.1-4 2010-04-21 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2010 ImageMagick Studio LLC
Features:
That seemed to do the trick:

Code: Select all

[user@blade artwork]# ./multicrop p29cE.jpg p29cE_out.jpg

Processing Image 0
  Size: 398x136
  Page Geometry: 443x540+17+44
Processing Image 1
  Size: 404x145
  Page Geometry: 443x540+17+222
Processing Image 2
  Size: 404x127
  Page Geometry: 443x540+20+393

[user@blade artwork]# ls p29cE*
p29cE.jpg  p29cE_out-0.jpg  p29cE_out-1.jpg  p29cE_out-2.jpg
:)

Re: Split images by white space

Posted: 2010-04-21T13:17:02-07:00
by hm2k
I just tried this in production on 10 files and it worked perfectly.

Thanks very much for your assistance.

Keep up the good work.

Re: Split images by white space

Posted: 2010-04-21T13:18:53-07:00
by fmw42
you are welcome. glad it was of help

Re: Split images by white space

Posted: 2014-12-16T10:13:12-07:00
by johnbent
Anyone still monitoring this really old thread? I have over 350+ images that I'd love to split along "large" regions of whitespace. Can multicrop handle this? I couldn't figure out the arguments to use. Basically what I have is 350+ scanned pages of a dictionary and I'd like to convert them to text (I have permission from the copyright holder). It's too much work for me to do myself so I want to use mechanical turk. I'd like to create a task for each individual word in the dictionary. So is there a way to use multicrop to separate out each word entry in this picture:

Image

Re: Split images by white space

Posted: 2014-12-16T10:24:00-07:00
by snibgo
It is generally best to start a new thread for new questions. By all means, refer back to previous threads.

I would tackle it like this:

1. Deskew each page.

2. Chop off head and tail of each page.

3. Divide each page into two columns, both trimmed left and right.

4. Divide each column into lines (but with no further trimming).

Now, you have one image per line in the dictionary. Each image that has a character at the far left is the start of a definition. Each image with white space at the left is a continuation.

So you then join up all the lines for each definition, and send that to the OCR.

Re: Split images by white space

Posted: 2014-12-16T10:37:28-07:00
by johnbent
That's a great suggestion! Thanks very much. I'm a total newbie to imagemagick however. I'm willing to work to figure out how to do all of the above but if you know any of the command lines to perform each of those above steps automatically for each of the 350+ pages, that'd be a much appreciated head start. I think I'll also follow your suggestion and start a new thread for this.

PS: I haven't had good luck with automated OCR on this since I believe most OCR use language context and there isn't language context in OCR for Palauan. So my OCR plan is mechanical turk (human workers on amazon).