How to use verifiers to filter out false positive results
Verifiers are one of the rule-based validation approaches, which filter out the false-positives. These let the user narrow down extraction results by applying some additional logic and checks to the extraction results. One can for example say that a specific type of information will always occur on the first page of the document, or that it must be on the lower half of the page, etc.
The verifiers will be processed after the transformers.
The verifiers with a star symbol * follow in the next release.
Verifier
|
Description
|
coords: Position Range
|
Rejects candidates which are not with the rectangle made of `left`, `right`, `top`, `bottom` coordinates, e.g. at the top half page.
|
coords: Position Range Center
|
Rejects candidates whose centers are not with the rectangle made of `left`, `right`, `top`, `bottom` coordinates, e.g. at the top half page.
|
image: aspect ratio
|
Restricts the results to a range of aspect ratio
|
key-value: Pair Orientation*
|
Defintion of keywords, which should be found in the near (horizontaly or verticaly) of the candidate
|
number: Page Number
|
Checks whether a candidate is on the specified page. Candidates found on other pages are ignored.
|
number: Value Range
|
Can only be used if the `output_data_type` is numeric (integer or float). Candidates with a value that is smaller than `minimum value` or larger than `maximum_value` are rejected.
|
pick_nth
|
This verifier picks the n-th element of the sorted candidate list
|
string: Ends With
|
Candidates that do not end with the specific string are rejected.
|
string: Has Not Pattern
|
Candidates containing the specified `pattern` are rejected.
|
string: Has Pattern
|
Candidates that do not contain the specified `pattern` are rejected. In this case only Swiss vat numbers are allowed
|
string: Starts Not With
|
Skips candidates whose prediction value start with a specific string.
|
string: Starts With
|
Candidates that do not start with the specific string are rejected.
|
string: String Length
|
Rejects candidates that do not have the right number of "characters"
|
Verifier "coords: Position Range"
Definition of coordinates, where the candidate have to be found. The verifier can restrict the result horizontally (left coordinate [0..1] (float) - right coordinate [0..1] (float)) and vertically (top coordinate [0..1] (float) - bottom coordinate [0..1] (float)). All values are defined in percent between 0 and 1, using a point as decimal.
Parameter
|
Description
|
Left coordinate
|
Left coordinate of the rectangle (related to page width)
0 <= value <= 1
|
Right coordinate
|
Right coordinate of the rectangle
(related to page width)
0 <= value <= 1
|
Top coordinate
|
Top coordinate of the rectangle
(related to page height)
0 <= value <= 1
|
Bottom coordinate
|
Bottom coordinate of the rectangle
(related to page height)
0 <= value <= 1
|
Verifier "coords: Position Range Center"
Definition of coordinates, where the center of the candidate have to be found. The verifier can restrict the result horizontally (left coordinate [0..1] (float) - right coordinate [0..1] (float)) and vertically (top coordinate [0..1] (float) - bottom coordinate [0..1] (float)). All values are defined in percent between 0 and 1, using a point as decimal.
Parameter
|
Description
|
Left coordinate
|
Left coordinate of the rectangle (related to page width)
0 <= value <= 1
|
Right coordinate
|
Right coordinate of the rectangle
(related to page width)
0 <= value <= 1
|
Top coordinate
|
Top coordinate of the rectangle
(related to page height)
0 <= value <= 1
|
Bottom coordinate
|
Bottom coordinate of the rectangle
(related to page height)
0 <= value <= 1
|
Verifier "key-value: Pair Orientation"
Definition of keywords, which should be found in the near (horizontaly or verticaly) of the candidate.
Parameter
|
Description
|
Orientation (string)
|
Search string (regex)
|
Verifier "image: Aspect ratio"
Check the aspect ratio of the result tokens (length / height).
Parameter
|
Description
|
Minimum value
|
minimum ratio
|
Maximum value
|
maximum ratio
|
Verifier "number: Page Number"
Checks whether a candidate is on the specified page. Candidates found on other pages are ignored. Here an example to extract the invoice date, which is normaly placed on the first page of an invoice.
Parameter
|
Description
|
Page number
|
Page number 1 = first page
|
Verifier "number: Value Range"
Can only be used if the output_data_type is numeric (integer or float). Candidates with a value that is smaller than minimum value or larger than maximum_value are rejected. Here an example to determine percentage values ( 0<= value <= 100).
Parameter
|
Description
|
Minimum value
|
Minimal value
|
Maximum value
|
Maximal value
|
Verifier "pick_nth"
This verifier picks the n-th element of the sorted candidate list, in this example the second element (zero-based index).
Parameter
|
Description
|
Index number
|
Index of the value
0 = first value
|
Result before restriction:
Result after restriction:
Verifier "string: Ends With"
Candidates that do not end with the specific string are rejected. As example to restrict candidates to amounts with to decimal places.
Parameter
|
Description
|
Pattern (string)
|
Search string (regex)
|
Verifier "string: Has Not Pattern"
Candidates containing the specified pattern are rejected. As example that the result is no German IBAN number.
Parameter
|
Description
|
Pattern (string)
|
Search string (regex)
|
Verifier "string: Has Pattern"
Candidates that do not contain the specified pattern are rejected. In this case only Swiss vat numbers are allowed.
Parameter
|
Description
|
Pattern (string)
|
Search string (regex)
|
Verifier "string: Starts Not With"
Skips candidates whose prediction value start with a specific string. In this case remove all tax identification number candidates with German identifier.
Parameter
|
Description
|
Pattern (string)
|
Search string (regex)
|
Verifier "string: Starts With"
Skips candidates whose prediction value not start with a specific string. In this case remove all tax identification number candidates, except Austrian ids.
Parameter
|
Description
|
Pattern (string)
|
Search string (regex)
|
Verifier "string: String Length"
Rejects candidates that do not have the right number of "characters".
Parameter
|
Description
|
Minimum length
|
minimal length of the value
|
Maximum length
|
maximal length of the value
|