Extract document values automatically

Learn about the automatic extraction of values.

Following the classification of the documents, the extraction of the data takes place. Each document type consists of different fields and field sets, which in turn have a corresponding recognition logic.

The invoice number can best be identified by the surrounding keywords and the corresponding structure. An IBAN number or ESR number, on the other hand, is easiest to identify by its structure.
During automatic data extraction, each field is processed and extracted in turn. In the background, possible candidates are listed for each field and assigned probability values. If a candidate is above the threshold value, it is entered into the field. The threshold value consists of two values, on the one hand an OCR threshold value, on the other hand a threshold value for the probability of the value based on the defined recognition mechanisms.

The following two threshold values of a field are relevant for the data extraction. The left limit indicates whether a candidate is entered in the field, the right limit whether the field must be offered to the user for validation.


If there is at least one field in which the right threshold is not exceeded, the document must be post-processed manually, provided that the "Validation required" option is set during upload.