Job Advanced Options

These advanced options are available when you add or edit a conversion job (Jobs tab > New/Edit > Advanced tab).

OCR Options

These options relate to how OCR will be performed.

Make Searchable PDF

Run OCR to make the PDF searchable. If you don’t select this option, no text will be embedded in the PDF.

Process Separators

QuikFile’s separators are cover-sheets used to indicate the beginning of a new document (and where a new file should start) and, optionally, where to save the file. Separators are recognized by special codes at the top of the page. If you select the Process Separators option, QuikFile will look for separators as it runs OCR.

Auto-Rotate Pages

Rotated pages will be flipped so that the text is upright. The most common case is landscape pages. Landscape pages usually have to be scanned as portrait, making the text sideways. Auto-Rotate Pages will detect that the text is running vertical and flip the page back to landscape so it’s readable.

Remove Blank Pages

You can automatically drop blank pages from the documents. To identify a blank page, QuikFile looks at how much of the page is white. If it’s almost completely white, QuikFile will consider it blank and drop it from the document.

You can adjust the sensitivity of the blank page removal. Use Blank Page Sensitivity at the bottom of the dialog. At one end of the scale, only completely blank pages will be dropped (no black at all). At the other end of the scale, even pages with some black markings will be dropped. The default setting will drop pages with minor speckling, but anything the size of a letter or number will be kept.

OCR Engine

QuikFile can use a number of different OCR engines. Each engine has its own strengths and weaknesses:

Lite

This is a basic engine. Its biggest advantage is speed – the engine is very fast. Its accuracy is typically above 90% for a clean page with crisp text, high enough to make your documents searchable. It handles poor images gracefully, but its accuracy degrades as the image quality goes down. This engine only recognizes English characters.

Standard

This engine lies between the Lite and Advanced engines. It is nearly as fast as the Lite engine but with better accuracy – approaching 100% on clean pages with crisp text. Its accuracy, however, degrades as the image quality goes down. This engine only recognizes English characters.

Advanced

The advanced engine is somewhat slower than the Lite and Standard engines, but its accuracy is much better – usually at or near 100%. It handles bad scans quite well. This engine supports and automatically detects Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish.

Microsoft Office (MODI)

Microsoft Office 2003 and 2007 include Microsoft Office Document Imaging, or MODI. MODI includes a fast, capable OCR engine. The MODI engine is actually licensed from ScanSoft, the maker of OmniPage and PaperPort, so its performance is comparable to those engines.

ScanSoft OmniPage

OmniPage is an excellent engine with very high accuracy. However, the cost of its accuracy is speed. You will notice the speed difference. It will recognize non-English characters if you have the appropriate language packs installed.

 
IMPORTANT:

These engines will only be available as options if they are detected on your system. QuikFile does not bundle or install any engine except for the Standard and Advanced engines.

Embed OCR Text

If you are converting to PDF or running OCR on existing PDF files, the OCR text can be embedded in the PDF to make it indexable and searchable. The PDF file will still have the original scanned image, but it will also have text invisibly embedded on the pages. Use these options to determine if and how your OCR text will be embedded in the PDF:

Do Not Embed

The OCR text will not be embedded in the PDF file. The PDF will not be indexable or searchable. In other words, it will be an "image-only PDF".

Embed Word-Aligned

The OCR text will be embedded behind the scanned image. The words will be aligned behind their representation in the scan. With this option, you can actually "select" the words in the scan to copy them.

Embed at Top of Page

The OCR text will be clumped at the top of the page in one continuous, hidden paragraph. This is a good option if you don’t care about selecting and copying text in the PDF. It results in slightly smaller PDF files.

Page Timeout

Some pages can cause the OCR engine to hang. To let the OCR engine move on past pages like these, you can specify a page timeout. If the engine hasn’t been able to successfully OCR a page before the timeout lapses, it will give up and move on to the next page. Failed pages will be reported in the log.

Other Options

Job Priority

You can give one job priority over other jobs. A higher priority will let the job’s files skip to the front of the line in the conversion queue.

There are times this is useful. For example, suppose you have two jobs: one is converting the company’s old repositories and the other is converting new scans coming off of the network scanner. Normally, QuikFile converts files in the order it receives them, so new scans could get stuck in line behind hundreds or thousands of old files. But by giving the network scanner job a higher priority, its files can jump straight to the front of the line.

Line Break Options

Use these options to tell QuikFile where to insert line breaks (hard returns) in the OCR text.

NOTE: These options are only relevant if you choose Text as the output file format. When you convert to PDF and embed hidden text, the OCR text will always be character-aligned behind the image.

By Paragraph

QuikFile will try to figure out where paragraphs end based on punctuation. For example, if a "." falls at the end of a line, it’s probably the last sentence in the paragraph. QuikFile will insert two returns wherever it thinks a paragraph ends. If you choose this option, you should proofread your OCR text to make sure all of the line breaks were handled correctly.

By Line

QuikFile will preserve the original lines from the document. This means that wherever a line wraps in the document, QuikFile will insert a line break. QuikFile will not try to figure out paragraph endings.

None

QuikFile will not insert any line breaks. The OCR text will come out as one, continuous line of text.

Output Date Stamp

If you want to preserve a file’s original Windows create/modify timestamp, select Use Original Date/Time, otherwise your converted files will receive a timestamp with the date and time they were converted.

Miscellaneous Settings

Automatically Begin New Document Every ___ Pages

Automatically splits up input files into separate files at the page interval you specify.

Documents Routed by Separators are Also Routed to Job Destinations

Normally if you use Separators, the document will go to the location specified on the separator instead of the job’s output folder. This option causes converted files to be saved to both locations.

Skip Input File if Output Destination Already Exists

If there’s already a converted file in the output folder with the same name as a file in the input folder, the file will be skipped.

Make Output PDFs PDFA-1b Compliant

Uses the PDF/A-1b standard for new PDFs.

Specify a Backup Path

Before it converts a file, QuikFile will always try to make a backup copy of the original. If there are errors in a conversion or if you want to re-do a conversion, you will want to have a copy of the original file. Also, no computer operation is fool-proof. Keeping backups is always a wise idea.

Specify a folder where QuikFile can put the backup copies. This folder should not be the source of any other conversion jobs, otherwise the backup copies will get converted. If you don’t specify a backup folder, QuikFile will use the default backup folder (see General Settings). If you are having QuikFile also convert files in subfolders, QuikFile will duplicate the folder structure so that backup copies can be found in their original layout.

IMPORTANT

The burden is on you to ensure that you have reliable backups of your files. While QuikFile does its best to try to preserve your originals, these can easily be deleted or the backup may fail. You should never rely on a single backup solution.

 

Attached Files
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Name
Email
Security Code Security Code
Related Articles RSS Feed
OCR Accuracy
Viewed 1258 times since Fri, Jan 3, 2014
Optical Character Recognition
Viewed 1312 times since Fri, Jan 3, 2014
MENU