Relation between documents and fields to get structured information to a document, this relationship is the basis for most integrations. Some API examples and worthy next reads are also included in this article.
Introduction
The Parashift Platform is designed to extract information from any kind of document. Depending on use case, clients can have one, two, or hundreds of document types (configurations) active, each with maybe similar or unique data points to be extracted.
It is therefore very important to have a standardized way to output all this captured information. This article describes the relationship between documents and fields and gives some examples of how to best download results via API.
Relationships & Structure
In a nutshell, the following diagram shows how the different objects (documents and document_fields) are linked with each other.
Main takeaways
- One document consists of zero or multiple document_fields, while a document_field is always linked to one document.
Example API Calls
One document with its document_fields
GET /documents/123456/?include=document_fields
Alternative: All document_fields belonging to one document
GET /document_fields/?filter[document_id]=123456&include=document
One document with its document_fields and their extraction_canidates
GET /documents/123456/?include=document_fields&extra_fields[document_fields]=extraction_candidates
All the documents and their classification_candidates
GET /documents/?extra_fields[documents]=classification_candidates
One document with recognition_text (ocr)
GET /documents/123456/?extra_fields[documents]=recognition_text
Recommended Reading
I strongly recommend the following two articles, going into detail what other attributes are on a document and document_field. They explain where we save e.g. classification results and how our flat list of document_fields can be structured back into e.g. a table.
Also, check out our Postman API Documentation