I've been trying to clear the background of the kind of image that you can see below.
data:image/s3,"s3://crabby-images/54a78/54a78c699d50220027f64871646728b8d4f17633" alt="Image"
data:image/s3,"s3://crabby-images/9ab8f/9ab8fbcc26c5f1044bdc69650251348095cbd346" alt="Image"
data:image/s3,"s3://crabby-images/7f6ba/7f6ba7af7b1657c83d312d4f8e33d371e6ed7986" alt="Image"
The process I'm doing is that first I'll run a simple filter (hand made) to remove some of the noise (picking only black pixels that are surrounded by 8 other black pixels): https://github.com/vkruoso/receita-tool ... aFilter.py - After that I just run tesseract hoping the result will be good.
I'm providing a free webservice that get information from a government site to allow an easier way to have the information (this really should be provided by the government). Doing that process I've managed to successfully decode the text 25% of the time. But that's not good enough to provide a good service.
I have very little background on image processing, so I think someone around here can give some hints about how to approach on this particular kind of image.
--
Thanks a lot.