Extraction Rules

The extraction rules allow us to add additional rules to a field to improve the extraction.

How to set them up?

The Extraction Rules can be accessed in the advanced tab of the document type configuration menu. Each field can have a set of different rules.

When adding extraction rules, you can either add them only for the current tenant (a) or add them to the root of the field (b), which enables the extraction rules for all tenants where the field is used. The Extraction rules root is only available for individual fields.

Conditions and Actions

The setup of the extraction rules has two parts: the conditions and actions. The conditions are the set of defined rules, and the actions are the changes that will happen if all the conditions are met.

Conditions 

First, a condition must be defined. This condition can consist of different parameters that can be set up. The following rules can be defined:

  • Coordinates
    A region on the document can be defined where the value must be found. The condition can also be switched to outside, which reverses the search parameter. The coordinate range is from 0-1.
  • Pages
    A range of pages can be defined on which a value has to be found. The condition can also be switched to only look for values outside the page range. The pages start on page 0, which would be the first page. Multiple pages or a range of pages can also be added.
  • Regex
    A regex pattern can be defined which the value has to match. The condition can also be switched to do not match. If "do not" is used the condition is true if it does not match the pattern.

These conditions can be combined to create a set of conditions that have to be matched. These rules can be defined in blocks.

For each block, you can define if one, many, none, or all conditions must be met.

Actions

If all "conditions" are fulfilled, an "Action" is triggered. Two actions are available: confidence and Labels.

  • Confidence 
    If you choose the confidence action, you can define whether the label's confidence (extraction candidate) should increase, decrease, or you can set a confidence value that the value will have.
  • Label (extraction prediction)
    You can choose the label option to create or remove labels. The label, in this case, means a new extraction prediction. This means adding a new extraction prediction or removing an existing one. You have to add a confidence level to the label.

Recommended Readings

To better understand the workflows of the Parashift Platform, the following readings and videos are recommended: