Improve OCR Accuracy

This article describes the factors that affect OCR accuracy.

OCR is a tricky thing. It requires a good, clear document. If the letters are too bold and blur together, the OCR engine will have a hard time figuring them out. Conversely, if the letters are too dim and have "open" sections, it will throw the OCR engine off. This is quite common with faxed documents.

Another common problem is when there are extra speckles or "noise" on the scan. This can confuse the engine. Skewed text can make it throw it off, since the OCR engine expects text that is relatively horizontal. You will also want to avoid decorative fonts, since these can be hard to recognize.

The best image for OCR is going to be black and white at 200-300 dpi. Ideally, it will use standard font faces, like Times or Arial. It should be clear of background noise and have as few images as possible.

Some scanning problems can be cleaned up automatically. For example, many scanners will automatically deskew scans, especially sheet-fed scanners where sheets are sometimes pulled through crooked. Some scanners also have automatic exposure options, which can reduce background noise and make sure that text has the right "weight".

Another factor is the OCR engine itself. Enterprise Organizer Pro includes ,"Advanced" OCR engine which has excellent accuracy. As an alternative, if you have Microsoft Office 2003 or newer installed on your machine you can use the "Microsoft Office Document Imaging" (MODI) engine, which also has very good accuracy and good speed.

Attached Files
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Name
Email
Security Code Security Code
Related Articles RSS Feed
Blank Pages aren’t Removed
Viewed 1918 times since Fri, Oct 25, 2013
Sharing Separators
Viewed 1678 times since Fri, Oct 25, 2013
Your Scanner isn’t Showing Up
Viewed 1790 times since Fri, Oct 25, 2013
PDF/A Warning
Viewed 1788 times since Fri, Oct 25, 2013
Scanning Starts as Soon as a Document is Placed in the Feeder
Viewed 1798 times since Fri, Oct 25, 2013
Problems with Multifunction Scanners
Viewed 8928 times since Fri, Oct 25, 2013
How Do I Scan from the Document Feeder? Glass?
Viewed 2245 times since Fri, Oct 25, 2013
Brother Scanner Doesn’t List a TWAIN Driver
Viewed 2602 times since Fri, Oct 25, 2013
Scanner Image Settings
Viewed 1857 times since Fri, Oct 25, 2013
Network Scanner Options
Viewed 1906 times since Fri, Oct 25, 2013
MENU