Improve OCR Accuracy

This article describes the factors that affect OCR accuracy.

OCR is a tricky thing. It requires a good, clear document. If the letters are too bold and blur together, the OCR engine will have a hard time figuring them out. Conversely, if the letters are too dim and have "open" sections, it will throw the OCR engine off. This is quite common with faxed documents.

Another common problem is when there are extra speckles or "noise" on the scan. This can confuse the engine. Skewed text can make it throw it off, since the OCR engine expects text that is relatively horizontal. You will also want to avoid decorative fonts, since these can be hard to recognize.

The best image for OCR is going to be black and white at 200-300 dpi. Ideally, it will use standard font faces, like Times or Arial. It should be clear of background noise and have as few images as possible.

Some scanning problems can be cleaned up automatically. For example, many scanners will automatically deskew scans, especially sheet-fed scanners where sheets are sometimes pulled through crooked. Some scanners also have automatic exposure options, which can reduce background noise and make sure that text has the right "weight".

Another factor is the OCR engine itself. Enterprise Organizer Pro includes ,"Advanced" OCR engine which has excellent accuracy. As an alternative, if you have Microsoft Office 2003 or newer installed on your machine you can use the "Microsoft Office Document Imaging" (MODI) engine, which also has very good accuracy and good speed.

Attached Files
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Name
Email
Security Code Security Code
Related Articles RSS Feed
Brother Scanner Doesn’t List a TWAIN Driver
Viewed 2115 times since Fri, Oct 25, 2013
Separators Get Dropped from the Scan
Viewed 1464 times since Fri, Oct 25, 2013
How Do I Scan from the Document Feeder? Glass?
Viewed 1848 times since Fri, Oct 25, 2013
Scans Come in as Negatives
Viewed 1435 times since Fri, Oct 25, 2013
Auto-Detect Page SizePage
Viewed 1491 times since Fri, Oct 25, 2013
Best Practices for Scanning then Shredding Paper Documents
Viewed 2024 times since Fri, Oct 25, 2013
Scanner Image Settings
Viewed 1476 times since Fri, Oct 25, 2013
Scan Troubleshooter
Viewed 1506 times since Fri, Oct 25, 2013
Problems with Multifunction Scanners
Viewed 8551 times since Fri, Oct 25, 2013
Recommended Scanning Resolution
Viewed 1759 times since Fri, Oct 25, 2013
MENU