OCR Options


The following options are available when you run OCR:

Make Searchable PDF

Select Make Searchable PDF to run text recognition on the PDF and make it searchable and indexable. This will also make it so you can highlight or copy text in the PDF.

Process Separators   (Enterprise Organizer Pro Professional)

Enterprise Organizer Pro supports special Separators, which are cover-sheets that show where a new file should start. The Process Separators option tells Enterprise Organizer Pro to split up the document at the separators.

Auto-Rotate Pages

Rotated pages will be flipped so that the text is upright. For example, landscape pages usually have to be scanned as portrait, making the text sideways. Auto-Rotate Pages will detect this and flip the page back to landscape so it’s readable.

Send Text to Word

This option will send the OCR text to your word processor for editing.

Tip:

The text will go to whichever program is registered on your system to open RTF files.

PDF/A-1b   (Enterprise Organizer Pro Professional)

Select this option to create a PDF/A-compatible file. PDF/A is an archival-quality PDF intended for long-term storage. Some government agencies require the PDF/A format.

NOTE: PDF/A does not allow the full range of PDF features. You should not modify a PDF/A file or you might introduce elements that aren’t compatible with PDF/A. PDF/A will also produce larger file sizes.

OCR Engine

There are a number of different OCR engines you might be able to use. Each engine has its own strengths and weaknesses:

Standard
This engine comes standard with Enterprise Organizer Pro. Its biggest advantage is speed – the engine is very fast. Its accuracy is typically 80-90% on clean documents – high enough to make your documents searchable. It handles poor images gracefully, but its accuracy degrades as the image quality goes down. This engine only recognizes English characters.
Advanced   (Enterprise Organizer Pro Professional)
The advanced engine is somewhat slower than the standard engine, but its accuracy is much better – usually above 97%. This engine supports and automatically detects Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish. It handles light text and dark, dirty backgrounds quite well.
Microsoft Office (MODI)
Microsoft Office 2003 and 2007 include Microsoft Office Document Imaging, or MODI. MODI offers a fast, capable OCR engine. The MODI engine is actually licensed from ScanSoft, the maker of OmniPage and PaperPort, so its performance is comparable to those engines. It does not recognize rotated pages.
OmniPage
OmniPage is an excellent engine with very high accuracy. However, the cost of its accuracy is speed. You will notice the speed difference. It will recognize non-English characters if you have the appropriate language packs installed.
IMPORTANT:

These engines will only be available as options if they are detected on your system. Enterprise Organizer Pro does not bundle or install any engine except for the Standard engine and, in some cases, the Advanced engine.

Embedded Text

If you selected the Make Searchable PDF option, you can control how the text gets embedded in your PDF:

Word-Aligned
The OCR text will be embedded invisibly behind the scanned image. The words will be aligned behind their representation in the scan. With this option, you can actually "select" the words in the scan to copy or highlight them. This is normally the best choice.
Embed at Top of Page
The OCR text will be clumped at the top of the page in one continuous, hidden paragraph. This is a good option if you don’t care about selecting and copying text in the PDF. It results in slightly smaller PDF files.

Line Breaks

If you’re scanning to Word, use this option to tell Enterprise Organizer Pro where to insert line breaks (returns) in the text.

By Paragraph
Enterprise Organizer Pro will try to figure out where paragraphs end based on punctuation. For example, if a "." falls at the end of a line, it’s probably the last sentence in the paragraph. Enterprise Organizer Pro will insert two returns wherever it thinks a paragraph ends. If you choose this option, you should proofread the text to make sure all of the line breaks were handled correctly.
By Line
Enterprise Organizer Pro will preserve the original lines from the document. This means that wherever a line wraps in the document, Enterprise Organizer Pro will insert a line break. Enterprise Organizer Pro will not try to figure out paragraph endings.
None
Enterprise Organizer Pro will not insert any line breaks. The text will come out as one continuous line of text.

OCR Timeout

This is the maximum amount of time that Enterprise Organizer Pro will spend looking for text on a single page. To explain, some pages can cause the OCR engine to hang, especially if they have a lot of non-text elements, like graphics, or the image isn’t clear. If the OCR engine hasn’t succeeded by the timeout, it will give up and move on. For reference, a typical, clean page can be recognized in around ten seconds.

Limit OCR to ___ Pages

To make a PDF searchable, you might not need to OCR the entire document. For example, many documents have all of the relevant keywords in the first few pages. If you have a 100-page document, you can save a lot of time by not running OCR on the other 90+ pages. Use the Limit OCR setting to end OCR after your specified page limit.

Save As Default

This option will save your current settings as the default OCR settings.



Article ID: 138
Created On: Mon, Oct 28, 2013 at 12:21 PM
Last Updated On: Fri, Jun 20, 2014 at 2:25 PM
Authored by: KB Admin02 [[email protected]]

Online URL: https://kb.quikbox.com/article.php?id=138