Hello dear members,
I am very new to Image processing. I have a question on image processing or cleanup before OCR. I am working on scanned mortgage documents, mostly TIFF and PDF images. Tesseract OCR is not giving expected output because most of the images have noise, punch holes, discontinued letters etc.
I am developing this module which will run in Windows environment only. I have gone through Fred's TextCleaner script. I don't want to use Cygwin to run the script.
I need to follow these steps:
1. Find out if an image really needs clean up operations i.e. It is not a better option to pass each and every image for preprocessing because of extra processing time and distortion in letters for processing
2. If image really needs processing then process / cleanup the image
3. Pass the image for OCR
Following basic functions are required for Image cleanup:
1. Image scaling
2. Image cropping at the text region
3. Image clipping
4. Image rotation
5. Lines straightening
6. Remove noise
7. Enhance local contrast
8. Autodetection of page orientation (90, 180, and 270 degrees)
9. Automated image de-skewing
10. Image despeckling
I have gone through links in ImageMagick forum as well as other links through Google, I could not find any proper answer which can provide me command lines which I can run as basic operation on all images i.e. Detect then Cleanup. Please help me on this. Thanks in advance.
Image processing for better OCR result
-
- Posts: 3
- Joined: 2014-10-14T06:57:45-07:00
- Authentication code: 6789
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Image processing for better OCR result
I do not know how to auto detect if the image needs modifications. But IM has most of the commands you need for your operations. You should read
http://www.imagemagick.org/script/comma ... ptions.php
http://www.imagemagick.org/Usage/reference.html
http://www.imagemagick.org/Usage/
1. Image scaling (see -resize)
2. Image cropping at the text region (see -crop)
3. Image clipping (not sure what you mean by clipping, but see -contrast-stretch)
4. Image rotation (see -rotate)
5. Lines straightening (not sure what you mean here)
6. Remove noise (see -despeckle and -enhance and -morphology open/close)
7. Enhance local contrast (see -lat, thought it thresholds)
8. Autodetection of page orientation (90, 180, and 270 degrees) (see -auto-orient)
9. Automated image de-skewing (see -deskew)
10. Image despeckling (see -despeckle)
http://www.imagemagick.org/script/comma ... ptions.php
http://www.imagemagick.org/Usage/reference.html
http://www.imagemagick.org/Usage/
1. Image scaling (see -resize)
2. Image cropping at the text region (see -crop)
3. Image clipping (not sure what you mean by clipping, but see -contrast-stretch)
4. Image rotation (see -rotate)
5. Lines straightening (not sure what you mean here)
6. Remove noise (see -despeckle and -enhance and -morphology open/close)
7. Enhance local contrast (see -lat, thought it thresholds)
8. Autodetection of page orientation (90, 180, and 270 degrees) (see -auto-orient)
9. Automated image de-skewing (see -deskew)
10. Image despeckling (see -despeckle)
-
- Posts: 3
- Joined: 2014-10-14T06:57:45-07:00
- Authentication code: 6789
Re: Image processing for better OCR result
Thank you very much for your quick response. I will go through all the links and try to develop a script which might solve the purpose. Thanks again.