Zone OCR Tab

An OCR zone is a part of a page, where an optical character recognition processing is performed to find barcodes or simple texts. The recognized value of a barcode or the recognized text in the zone (called the extracted data of the zone) can be used as metadata in any other property fields, for example, Bates stamp, PDF properties, file or folder name.

An OCR zone can be applied on all pages or on a specified page of the incoming pages or the output documents. The OCR zone can be filtered and, based on its extracted data, the output document can be split or pages can be removed from the output document.

Nuance recommends performing scans using 300 dpi resolution, and Black&White color mode, as this combination provides the best recognition results.

 

 

 

The above screenshot displays a list of the already defined OCR zones. You can see and edit all properties of the OCR zones via this tab. Alternatively, the separate Preview window provides a more comfortable way for editing the OCR zones.

Use the Preview button to open the Preview window. When this window is opened first, an example document should be selected. Later, the same document is tried to load again. The example document can be a TIFF, JPEG or PDF file with single or multiple pages. It is recommended to use an example document similar to an everyday work document (in the number of pages and position of the texts) to define the OCR zones as accurate as possible.

 

The above screenshot shows the Preview window with an example document already loaded. There are four parts of the window: Preview area, Toolbar, Status bar and the OCR zone properties pane.

 

Preview area: Shows a page of the example document with the specified zooming level. The OCR zones can be selected by clicking on them. The selected OCR zone can be moved or resized with the mouse. A new OCR zone can be created simply by drawing a rectangle with the mouse over the example document. The properties of the selected OCR zone can be changed in the OCR zone properties pane on the right side of the window. The selected OCR zone can be deleted by pressing the Del key or selecting the Delete option in the context menu of the zone.

 

Status bar: located at the bottom of the page. Shows information about the page displayed in the preview area. The information is from left to right:

  • Page: Shows the page number of the currently visible page of the example document and the page count of the example document.
  • Coordinates: Shows the X and Y coordinates of the mouse cursor relative to the top-left corner of the image in the preview area.
    These values are reflected in the measurement unit selected in the toolbar.

 

Toolbar: The following buttons are located in the toolbar, going from left to right:

  • Open: Load another example document.
  • Zoom in: Enlarge the view of the example document in the preview area.
  • Zoom out: Reduce the view of the example document in the preview area.
  • Fit width: Zoom in/out the currently visible page of the example document to occupy the whole width of the preview area.
  • Fit height: Zoom in/out the currently visible page of the example document to occupy the whole height of the preview area.
  • Fit whole: Zoom out the currently visible page of the example document to be visible the whole page in the preview area.
  • Previous page: Show the previous page of the example document in the preview area. This button is disabled when displaying the first page of the document.
  • Next page: Show the next page of the example document in the preview area. This button is disabled when displaying the last page of the document.
  • Measurement unit: Change the selected measurement unit to dots, mm or inch. It has an effect on the following:
    • Coordinates in the status bar;
    • Location properties of the OCR zones;
    • Size properties of the OCR zones.

 

OCR zone properties pane: Here the properties of the selected OCR zone can be reviewed and changed. The following properties are available:

  • Metadata: Name of the metadata containing the recognized value of the OCR zone. By default, a name is generated automatically (OCRZONEx). The name cannot contain spaces.
    It is a combo box, which contains the already defined OCR zones. If an OCR zone is selected here, it is also selected in the preview area.
  • Type: Type of the data in the OCR zone. It can be Barcode or Text.
  • Barcode: The barcode format expected in the OCR zone. This property is only available if the Type is Barcode.
    The following barcode formats are supported: Codabar, Code 128, Code 2 of 5, Code 2 of 5 interleaved, Code 39, Code 39 extended, Code 93, EAN-13, EAN-8, PDF-417, UPC-A, and UPC-E.
    Note: By convention, Code 39 True Type barcode fonts use the asterisk (*) as the start/stop character, for example: *1234ABCD*. Ensure that you use the asterisk (*) as the start/stop character when utilizing the free Code 39 barcode font that is supplied in the installation package.
    Note: if you want to use Code 2 of 5 no header or trailer, simply select the Code 2 of 5 option.
    Note: a number of barcode types should not be used on the same page, as they can cause complications. Take the following recommendations into consideration:
  • Use the BAR_POSTNET barcode type alone.
  • Do not use 1D and 2D barcode types together in the same OCR zone.
  • Do not use BAR_UPC_A together with BAR_EAN in the same OCR zone.
  • Do not enable BAR_C128 together with BAR_UCC128 in the same OCR zone.
  • Language: Selected language of the text in the OCR zone. This property is only available if the Type is Text.
    Setting the language correctly provides serious help to the OCR engine in recognizing the text in the zone. You can choose from a considerable pool of OCR languages.
  • Auto-correct text: toggles usage of the text recognition dictionary of the OCR engine. The default and recommended setting is On.
  • Direction: Direction of the barcode in the OCR zone. This property is only available if the Type is Barcode.
    The following directions are supported: Horizontal and Vertical, Horizontal or Vertical. Only barcodes in the selected direction are recognized.
  • Job separation: The OCR zone can be used to separate the incoming pages and generate multiple output documents. The selected separation option is performed only if the OCR zone has non-empty extracted data.
    The available options are:
    • None: The output document is not separated because of this OCR zone.
    • Beginning of job: The page where this zone has extracted data will be the first page of an output document. The previous page will be in a separate output document.
    • End of job: The page where this zone has extracted data will be the last page of an output document. The next page will be in a separate output document.
    The Filter setting affects this option, because it can modify the extracted data of a zone. For example, if a zone has extracted data but after applying the filter the data becomes empty, job separation is not performed because of that zone.
    If the On Page property of the OCR Zone is a page number, the only valid option for Job separation is None.
  • Filter: Apply a filter on the extracted data of the OCR zone.
    The available options are:
    • None: No filter is used; the whole extracted data is stored in the metadata.
    • Simple: A simple text filter can be used combining fixed values and wild cards. The filter can be set in the textbox below the Filter combo box. The filtered extracted data of the zone is returned, not the whole data.
      Supported wild cards: An asterisk (*) can be used for zero or more characters; a question mark (?) represents a single character.
      For example: Filter "Dep*" returns the extracted data of the zone, if it starts with "Dep". Filter "Invoice???" returns this part of the extracted data, if it starts with "Invoice" and there are 3 characters after it, for example, "Invoice001".
    • Regular Expression: A Regular Expression can be used to filter the extracted data. The rule expression can be set in the textbox below the Filter combo box. The filtered extracted data of the zone is returned, not the whole data.
      Please refer to How to use Regular Expressions or Wikipedia for more information about Regular Expressions.
      Note: Depending on the purchased license, the Regular Expression option may not be available for this installation.
    There are additional settings to set a filter as accurate as possible. These options are only available, if the Filter option is Simple or Regular Expression.
    • Case sensitive: Matching of the filter will handle the upper-case and lower-case letters differently.
    • Remove spaces: Spaces are removed from the extracted data of the OCR zone. Removing spaces can influence the filter result.
  • On page: Define a page number, where the OCR zone will be used. The available options are:
    • All: The OCR zone will be applied to all pages of a job.
    • A page number: The OCR zone will be applied just on a specified page.
      The available page numbers depends on the example document loaded in the Preview window (this is one of the reason to use an example document, which is similar to regular job).
      If a page number is selected, the appropriate page of the example document will be shown in the preview area.
      If the Job separation setting is Beginning of job or End of job, this option cannot be selected.
    During the processing there are two kinds of document, the incoming pages and the output document(s). Because of this the selected page number can refer to both of them, so there is an additional setting to define it. This is the Based on setting, which is only available if the On page setting is a page number:
    • Input: The selected page number refers to the incoming pages. It means that the OCR zone will be applied just on one specified page.
    • Output: The selected page number refers to the output document(s). It means that the OCR zone will be applied zero or more times, depending on the number of pages in the output document(s).
    Pages not saved in the output document(s) (that is, either the Remove page option of the OCR zone or blank page removal is used) are also counted.
    For an example, consider the following. There are 4 incoming pages and two OCR zones. OCR zone #1 has the following settings: On page is 3, Based on is Input. OCR zone #2 has the following settings: On page is 1, Based on is Output. Because of other settings (e.g. blank page job separation) two output documents will be created, containing the first and last two pages of the incoming pages. OCR zone #1 will be applied just on the 3rd page of the incoming pages. But OCR zone #2 will be applied on the 1st and 3rd pages of the incoming pages, because it is applied on the 1st page on every output document.
  • Keep value: When a job is processed, it is possible that multiple output documents will be generated (e.g. the selected output format is JPEG; or the blank page separation is used). By default if an OCR zone is recognized successfully on a page, its extracted data can be used just for that output document where the page belongs to. If this option is selected, the extracted data can be used also for all subsequent output documents.
  • Remove page: If the OCR zone has extracted data on a page, that page is not saved in the output document(s).
    If this option is selected, the background color of the OCR zone becomes red.
  • Full page: The OCR zone covers the whole page independently the size of the page. It is useful if there is just one barcode on a page, but the position of the barcode cannot be defined in advance.
  • Location: Its textboxes show the X and Y coordinates of the upper left corner of the OCR zone. It is possible to type in values instead of moving the zone with the mouse.
    These values are reflected in the measurement unit selected in the toolbar.
  • Size: Its textboxes show the width and height of the OCR zone. It is possible to type in values instead of resizing the zone with the mouse.
    These values are reflected in the measurement unit selected in the toolbar.
  • Test OCR zone: Click this button to test the recognition of the OCR zone. The extracted data of the OCR zone is shown in a message box. The filter is applied on the extracted data, so with this button the filter can be tested as well.

 

When the OCR zones are ready, click the OK button to save the zones. The list on the Zone OCR tab will reflect the changes.

 

 

 

Attached Files
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Name
Email
Security Code Security Code
Related Articles RSS Feed
Flow Tab
Viewed 1434 times since Mon, Jan 13, 2014
PDF security tab
Viewed 1250 times since Mon, Jan 13, 2014
Advanced Image Processing
Viewed 1222 times since Mon, Jan 13, 2014
Bates stamp tab
Viewed 1747 times since Mon, Jan 13, 2014
PDF Tab
Viewed 1215 times since Mon, Jan 13, 2014
MENU