Benchmarking

How does it work?

The benchmark compares the user manually validated values with the machine’s extracted values and shows the differences between the old and new training models after manual validation.

How is the Benchmark setup?

Benchmark is provided in one Excel file named Results - which, containing few sheets inside.

Benchmark per field

Fieldset identifier
Identifier of the fieldsets.
Field identifier
Identifier of the field.
Count
Total count of fields.
Exact Match
The exact matches compare the user annotations with the machine predictions. The value displayed here is the number of annotations that have matched the predictions.
Similarity (Levenshtein)
The similarity score is similar to the exact matches, but predictions that almost match the annotations are considered here as correct.
Result
Result score in percentage.

Dark Processing

In this sheet, you can review the percentage of dark processing.
Dark processing refers to fields that required no manual interaction; they were automatically extracted by the platform or accepted without validation.

Extraction Benchmark

The Field sheets provide an in-depth view of each document and what has been extracted by the machine, and what has been extracted by the user.

DocumentID
Parashift ID of the document.
Identifier
Identifier of the field.
FieldsetIdentifier
Identifier of the fieldset the field belongs to.
ItemIndex
Index of the item (used for repeatable field sets, e.g., line items).
PageNumber
Page number where the value was extracted.
Value
Final validated value (ground truth).
RecognitionValue
Value extracted by OCR (raw recognition).
RecognitionConfidence
Confidence score of the OCR recognition.
PredictionValue
Value predicted by the machine learning model.
PredictionConfidence
Confidence score of the ML prediction.
Confidence
Final confidence score used by the system (can be a combination of OCR and ML).
ValidationStatus
Status of validation (skipped, done)
CreatedAt
Timestamp when the document was created.
UpdatedAt
Timestamp of the last update.
TP (True Positive)
Correctly predicted and validated value.
FP (False Positive)
Incorrect prediction where a value was predicted but should not have been.
TN (True Negative)
Correctly identified absence of a value.
FN (False Negative)
Missed prediction where a value should have been detected.
LevenshteinSimilarity
A measure of similarity between predicted and actual values.

How to analyze a benchmark

The information in the benchmark can be used to analyze the quality of the machine's extractions. The first step of the analysis is reviewing the overview sheet. The value that determines the quality of a field the most is the exact match. Therefore, finding fields with a low exact match average is the first step. Afterwards a deeper analysis in the specific fields sheet is recommended.

Requesting a Benchmark

The benchmark can be requested over the Parashift Support (support@parashift.io). The following information has to be provided:

Tenant ID
Document Type
Time-line or Document IDs