manually validate extracted data: Extraction Validation

Learn about Extraction Validation, overview, navigation, warnings & errors, comments, flags and more.

The new Validation Extraction interface described in this article is still in "beta" and we are actively collecting feedback.

To use it you need to open the document in Extraction Validation from the three-dot menu:

  1. in "Overview" or "Extraction" list
  2. click on the three-dot menu
  3. select "Extraction beta"

After the Beta this will replace the old Extraction Validation interface (this one)

Introduction

The Extraction Validation user interface is a central component of the Parashift Platform App. You use it to manually validate the automatically extracted data but also to train new Fields. It is easy to use for casual users but also caters to the needs of power users. The following article will give insights into the different components and best practices when using Extraction Validation

Overview

The whole interface can generally be split into the following five areas

1. Breadcrumbs

Shows some document information like the

breadcrumbs extraction validation

  1. path to the menu from which the Extraction Validation was opened, clickable for quick navigation
  2. document ID or document name, ability to change the document name

2. Actions

Shows the Document Type as well as all the actions for a document.

action menu

  1. "Document Type"
  2. The "Back" button opens the list from which a user-entered Extraction Validation or the main Extraction list.
  3. Switch between Single and Serial mode (See: Serial Mode in-depth explanation)
  4. Secondary Actions (See: Done, Save, Force-Done and Forwarding to 1st/2nd/3rd Level)
  5. Finish Validation, the button is only active if all fields are properly validated (See: Done, Save, Force-Done and Forwarding to 1st/2nd/3rd Level)

3. Viewer

View the document, navigate through pages and more.

  1. Page navigation - skim through pages and/or jump to the start or the end of your document.
  2. Rotation - If our preprocessing was unable to rotate your document, you can do so manually.
  3. Zoom in/out - If you need to take a closer look at your document, you can zoom in/out.
  4. Ruler - If you're validating pages full of line items, our ruler might come in handy not to lose where you left off over to the validation area.
  5. Help - Here you'll find a list of shortcuts to all features that come in handy when validating.

4. Editor

Displays all the extracted data and, together with the Info-Box/Field one of the most important screens in Validation Extraction.

Fields are grouped into Sections and Field Sets. Layout, size, section names come either pre-configured when using our plentiful standards or can be customized by an administrator. The Field Editor is designed to lazy load data, meaning even with tons of data the Extraction Validation interface should always perform fast and reliable.

To fully explain the functionality we need to look at the editor together with the Info Box - Field tab further down in this article.

5. Info-Box

Document

General information to the document, owner of this document, Document Type, recognized language and average recognition confidence to some upload parameters as well as relevant dates.

Hover over any title or data point to get an in-depth description or more information.

Field

This tab is open by default when navigating through fields. Together with the Field Editor, this is one of the most important screens in Extraction Validation.

All the details about field validation can be found further down.

Info box field

  1. Preview of the selected value on the document, allows to quickly check read value vs. field value in case of e.g. low recognition (OCR) confidence
  2. Field status and any open error or warning messages.
  3. Prediction Confidence
    1. in Percent, how sure was the machine that it predicted the correct value
    2. displays a little user icon if a value was picked manually and not predicted.
  4. Recognition Confidence
    1. in Percent, how sure was the machine that it read the value on the document correctly (OCR/Barcode/more)
    2. displays a little user icon if a value was corrected manually and not recognized.

Deep Dive: Difference between Recognition and Prediction

 

Flags

Change if the document should be part of the general training pool or mark a document as unprocessable.

Comments

Leave a comment on the document or look at the comments of other users. As soon as there are comments on a document this tab has a little notification bubble.

Field Validation - Status, Warnings & Errors

Overview of the Editor and Info Box - Field tab further up.

A field generally has three different states

1. Valid (green or deactivated look)
Valid fields don't need any manual validation, either because configured thresholds were met (e.g. predicted with a high enough confidence), the field is optional or was already validated. They are intentionally shown in grey to not draw any attention since they should not need any.

2. Warnings (yellow)
Fields with warnings need either action through a user or need to be confirmed. 

3. Errors (red)
Fields with errors always need manual user interaction, they can not be confirmed.

Valid

The platform differentiates between fields that were automatically valid, with no user interaction (Field valid) and valid fields that a user had to interact with (Validated)

Warning

A field can have one or multiple Warnings, each warning needs to either be corrected (value changed) or confirmed (ENTER). The Info Box will always give a longer description of the currently active warning.

Once a warning is confirmed it still shows in the Info Box in green to signal that this warning was present, but confirmed by a user.

Most Common Warnings

Warning Description Configuration
Prediction confidence low

Please confirm the field or select a new value.

Check that the value itself is the correct one.

Through the configuration of the Extraction Threshold, Admins can choose when this error is triggered.

Recognition confidence low

Please confirm that the field value matches the document.

Check that the value matches the document, low recognition confidence indicates badly read characters (e.g. 1 instead of I (uppercase i), or 0 (zero) instead of O (uppercase o)

Currently, the threshold for low recognition confidence is fixed at 95%, everything below triggers this warning.
Field empty

Please confirm the field or select a value.

Through the configuration of the Extraction Threshold or setting a field to optional, Admins can choose when this error is triggered.
Verification Custom Admins can configure custom warnings with custom warning texts to e.g. force a certain text format.

Error

A field can have one or multiple Errors, each error needs to be resolved (corrected). The Info Box will always give a longer description of the currently active error.

Most Common Errors

Error Description Configuration
Coordinates required 

Please select a value or area from the document.

Many fields require coordinates (not only user input) to properly train on the field.

Admins can configure fields to not require coordinates.

Not valid X

Value couldn't be converted into a proper date/number.

Manually provide the correct value.

Fields can be setup as date, number or text field (and more), depending on this it is required that a text can be converted.

Value out of range.

Please provide a Date/Number between/bigger/smaller than the configured values

Admins can configure min/max values for date and number fields.
Verification

Custom

Admins can configure custom errors with custom error texts to e.g. force a certain text format.

Navigation

General

The jump order is from top right to bottom left.

On Opening

When opening a document the first invalid field is in focus, any valid fields are skipped.

Mouse - for casual users

The Editor can of course be navigated by mouse, just click on any field to edit it.

Keyboard - for power users

Further, it is possible to use the Editor completely with the keyboard. We recommend using ENTER to jump directly into the next invalid field, valid fields are skipped. This allows users to focus only on fields that need attention. If there are no more invalid fields 

It is of course also possible to navigate with TAB, however here the field state is not taken into consideration.

End of document

At the end of the document, if there are any warnings/errors left, the focus jumps back to the first invalid field in the document.

Should no invalid fields be left a pop-up shows that to, with another ENTER, end Validation.  Then either the next document opens up the list view is loaded. (see: Single vs. Serial mode)