Optical Character Recognition


When you scan a document, you create an image file. Even though the image seems to have text, to the computer it is just a picture.

Optical character recognition (OCR) is a process which extracts text from a scanned image. It does this by looking for recognizable letters and words. A good OCR engine can pull the text out of a scanned image with excellent OCR Accuracy.

Why Bother with OCR?

If your only concern is archiving paper documents as electronic files, OCR may not matter to you. But if you want to copy and paste text from a scan, or do a text search of the scan’s contents, you will need OCR.

Where Does the OCR Text End Up?

If you convert your files to PDF format, QuikFile embeds the OCR text in the PDF file:

QuikFile can also create a plain text file, which is just the unformatted OCR text without the original document image.

How Do I Turn On OCR?

When you set up your job, you’ll be asked whether you want to include OCR as part of the process.

Can I Re-Run OCR?

You can redo OCR by re-converting the file. You’ll need to set up a Conversion Jobs. Under the Source option, set the input file type to PDF and select the Redo PDFs That Are Already Searchable option. PDF files which have OCR text will be re-converted. QuikFile will discard the old OCR text, re-run OCR, and embed the new text in the file.

What OCR engine is QuikFile using?

When you set up a Conversion Jobs, you can choose among several OCR engines. See Optical Character Recognition for more information.



Article ID: 327
Created On: Fri, Jan 3, 2014 at 3:28 PM
Last Updated On: Tue, Jul 29, 2014 at 1:20 AM
Authored by: KB Admin01 [[email protected]]

Online URL: https://kb.quikbox.com/article.php?id=327