How to integrate the Parashift Platform into your software/environment/process. What resources & documentation is available, understanding some basic concepts (Workflow, Relationships, Data Schemas), Upload documents and fetch data.
Introduction
This article should serve as a quick guide to get a good overview of what it takes to integrate the Parashift Platform into your software, environment or process. We'll first look at the available resources we provide online (documentation), then look at the basic requirements and principles of how the Platform works in terms of relationships and data structures as well as our API and then do a quick integration, upload a document and fetch the extracted data with some basic examples.
Documentation
If you want to go deeper than this quick introduction, feel free to check out all our other publicly available resources. We try to cater them to different audiences, following a list from least technical to most technical:
Documentation | Target Audience | Description |
https://support.parashift.io/user-guide | Users & Configurators | Focused on Users that work with Documents and our Validation Interfaces and Configurators that set up the Parashift Platform, configure new Document Types and Workflows. |
https://support.parashift.io/technical-api-documentation | Configurators & Developers | Focused on Concifgurators that want to understand the underlying principles of the Platform better and Developers that need to know these principles for their integrations. |
https://docs.parashift.io/ | Developers | Focused on Developers, an interactive Postman Collection API Documentation with all available endpoints and tons of examples. |
Requirements
API Key
To get started you need an active tenant with Parashift and get yourself an API Key:
We will use this key to authenticate any requests against our API in the upcoming examples.
Tenant ID
While getting your API key quickly, also note down your Tenant ID; we will reference this ID in some URLs further down in this article. (green boxes in following image)
Example Document
To upload a document we need to convert a pdf, jpg, tiff or png into base64 to put into the requests body.
You can either use one of your own documents and transform it to base64 programmatically, or use a file and the following website to quickly transform any file into base64:
https://base64.guru/converter/encode/pdf
(Disclaimer: Parashift is not associated in any way with the website above and we don't know what they are doing with the uploaded files despite converting them to base64.)
If you don't have any example data at hand you can also use the following pdf to convert to base64
or even use the already converted base64 string inside this text document
Example Integration
Description
We are now going to upload a single document to the Parashift Platform, check the state of that document once per pull request and once per webhook and then fetch the results.
All you need to know for now is that a document, after processing, has a document type and fields. The amount and types of fields are defined through the document type.
Of course, you can do a lot more with the Platform, and maybe it can be helpful to also read up on the following articles regarding data structures, relationships and workflow, but again, for a first integration, you don't have to go that deep.
- Relationships & Structure of Batches, Documents, Pages and Files
- Relationships & Structure of Documents & Fields
- Basics: Document & Batch Workflow
API - JSON API & Headers
All requests you send to our API follow the same schema. We implemented our public API (available at https://api.parashift.io/v2/ ) following the JSON API Standard. The full API documentation is available at https://docs.parashift.io. You can give that a read as well if you want, but despite starting to sound like a broken record, for this initial integration you definitely don't have to. (We just like to be thorough in providing more reading material/links for the interested)
To make your first request you can simply use e.g. Postman, use it from the web or download & install it and you are basically ready to make your first request.
Add request headers
Add these headers for every request you submit to the API.
Key | Value | Description |
Content-Type | application/vnd.api+json | Use EXACTLY this value, otherwise you will get an 406 error |
Authorization |
Bearer [Your API Key] |
e.g. Bearer 12345678974465 See above how to get your API Key. |
Base URL
https://api.parashift.io/v2/
Upload your first document
To upload your first document now all you need to do is POST it to the /documents endpoint with the pre-defined body including your base64 encoded document.
curl --location --request POST 'https://api.parashift.io/v2/documents' \
--header 'Content-Type: application/vnd.api+json' \
--header 'Authorization: Bearer [Your API Key]' \
--data-raw '{
"data": {
"type": "documents",
"attributes": {
"files": [
{
"base64_file": "YourBase64File"
}
]
}
}
}'
You should receive a successful "201 Created" reply, including the newly created document in the body. Save the ID at the top somewhere, this is the document ID that we will need later to fetch the extracted data.
{
"data": {
"id": "1633723",
"type": "documents",
"attributes": {
"batch_id": "1633723",
"batch_index": 0,
"classification_lower_threshold": 0,
"classification_upper_threshold": 1,
"created_at": "2022-06-17T08:01:50.016934Z",
"custom_fields": {},
"document_type_identifier": null,
"error_codes": [],
"exported_at": null,
"external_id": null,
"language": null,
"language_confidence": null,
"name": "#1633723",
"not_for_training": false,
"recognition_confidence": null,
"sla_deadline": null,
"status": "pending",
"tenant_id": "543",
"updated_at": "2022-06-17T08:01:50.427226Z",
"upload_configuration": "client",
"validation_required": true,
"workflow_status": "started",
"workflow_step": "inbound"
},
"relationships": {
"comments": {
"links": {
"related": "api.parashift.io/v2/comments?filter[document_id]=1633723"
}
},
"document_fields": {
"links": {
"related": "api.parashift.io/v2/document_fields?filter[document_id]=1633723"
}
},
"output_files": {
"links": {
"related": "api.parashift.io/v2/files?filter[document_id]=1633723&filter[file_type]=output_file"
}
},
"pages": {
"links": {
"related": "api.parashift.io/v2/pages?filter[document_id]=1633723"
}
}
}
},
"meta": {}
}
Of course there are way more options available but for now this simple upload is enough, to learn more about Uploading documents you can also read this article:
Fetch the document
To get the extracted data from that document take the document ID which was provided after the upload and send a GET request to the /documents endpoint trailing the ID
curl --location --request GET 'https://api.parashift.io/v2/documents/1633276' \
--header 'Content-Type: application/vnd.api+json' \
--header 'Authorization: Bearer [Your API Key]'
You should receive a successful "200 OK" reply, including the document in the body.
{
"data": {
"id": "1633723",
"type": "documents",
"attributes": {
"batch_id": "1633723",
"batch_index": 0,
"classification_lower_threshold": 0.0,
"classification_upper_threshold": 1.0,
"created_at": "2022-06-17T08:01:50.016934Z",
"custom_fields": {},
"document_type_identifier": "pp-correspondence",
"error_codes": [],
"exported_at": null,
"external_id": null,
"language": "en",
"language_confidence": 0.999996366183802,
"name": "#1633723",
"not_for_training": false,
"recognition_confidence": 0.976840708680727,
"sla_deadline": "2022-06-17T11:01:50.016934Z",
"status": "in_progress",
"tenant_id": "543",
"updated_at": "2022-06-17T09:10:40.379639Z",
"upload_configuration": "client",
"validation_required": true,
"workflow_status": "in_progress",
"workflow_step": "extraction_validation"
},
"relationships": {
"comments": {
"links": {
"related": "api.parashift.io/v2/comments?filter[document_id]=1633723"
}
},
"document_fields": {
"links": {
"related": "api.parashift.io/v2/document_fields?filter[document_id]=1633723"
}
},
"output_files": {
"links": {
"related": "api.parashift.io/v2/files?filter[document_id]=1633723&filter[file_type]=output_file"
}
},
"pages": {
"links": {
"related": "api.parashift.io/v2/pages?filter[document_id]=1633723"
}
}
}
},
"meta": {}
}
Fetch extracted field data
To process a document with all of its fields can take some time. So make sure to check the current status to see if the document was processed and all the field data is available.
To do that just look at the body of the previous request and check if the following three attributes have one of these combinations:
status | workflow_step | workflow_status | Description |
done | done | done | The document is completely processed, and all data can be fetched, including export files, document type and of course field data |
in_progress |
extraction_validation |
in_progress | The document is waiting for manual interaction through a user, data can of course already be fetched but may change with validation (user interaction) |
(also see Basics: Document & Batch Workflow)
Then you can do the same request as before, but this time we will include the document_fields. (you could also have included them from the start) and to reduce the returned payload a little we will also filter out any attributes that are not
identifier -> used to identify the field
value -> what was extracted
confidence -> how good is the result
curl --location -g --request GET 'https://api.parashift.io/v2/documents/1633723/?include=document_fields&fields[document_fields]=identifier,value,confidence' \
--header 'Content-Type: application/vnd.api+json' \
--header 'Authorization: Bearer [Your API Key]'
You will get the document again and then all the associated fields in a flat list.
{
"data": {
"id": "1633723",
"type": "documents",
"attributes": {
...
"document_type_identifier": "pp-correspondence",
...
"status": "in_progress",
...
"workflow_status": "in_progress",
"workflow_step": "extraction_validation"
},
"relationships": {
"comments": {
"links": {
"related": "api.parashift.io/v2/comments?filter[document_id]=1633723"
}
},
"document_fields": {
"links": {
"related": "api.parashift.io/v2/document_fields?extra_fields[document_fields]=extraction_candidates&filter[document_id]=1633723"
},
"data": [
{
"type": "document_fields",
"id": "95305517"
},
...
{
"type": "document_fields",
"id": "95305525"
}
]
},
"output_files": {
"links": {
"related": "api.parashift.io/v2/files?filter[document_id]=1633723&filter[file_type]=output_file"
}
},
"pages": {
"links": {
"related": "api.parashift.io/v2/pages?filter[document_id]=1633723"
}
}
}
},
"included": [
{
"id": "95305517",
"type": "document_fields",
"attributes": {
"confidence": 0.98221,
"identifier": "pp-subject",
"value": "Awesome First Integration"
},
"relationships": {
"document": {
"links": {
"related": "api.parashift.io/v2/documents/1633723"
}
}
}
},
{
"id": "95305525",
"type": "document_fields",
"attributes": {
"confidence": 0.98451,
"identifier": "pp-receiver-address-city",
"value": "Ankh-Morpork"
},
"relationships": {
"document": {
"links": {
"related": "api.parashift.io/v2/documents/1633723"
}
}
}
},
],
"meta": {}
}
If you want more data there are tons of additional attributes we provide such as the original ocr value, coordinates, confidences, machine learning candidates and more.
It is really recommended to read this article explaining what possible field attributes there are, how you can use them to map the Parashift fields to your data and how to reconstruct more complex objects like arrays and tables from our flat list of fields that we return.
To get more data just leave out the "fields" filter and maybe even add an extra_field like extraction_candidates
curl --location -g --request GET 'https://api.parashift.io/v2/documents/1633723/?include=document_fields&extra_fields[document_fields]=extraction_candidates' \
--header 'Content-Type: application/vnd.api+json' \
--header 'Authorization: Bearer [Your API Key]'
The response you get now includes all this extra data to help you and the integration you build in any way possible.
{
"data": {
"id": "1633723",
"type": "documents",
"attributes": {
...
"document_type_identifier": "pp-correspondence",
...
"status": "in_progress",
...
"workflow_status": "in_progress",
"workflow_step": "extraction_validation"
},
"relationships": {
"comments": {
"links": {
"related": "api.parashift.io/v2/comments?filter[document_id]=1633723"
}
},
"document_fields": {
"links": {
"related": "api.parashift.io/v2/document_fields?extra_fields[document_fields]=extraction_candidates&filter[document_id]=1633723"
},
"data": [
{
"type": "document_fields",
"id": "95305517"
},
...
{
"type": "document_fields",
"id": "95305525"
}
]
},
"output_files": {
"links": {
"related": "api.parashift.io/v2/files?filter[document_id]=1633723&filter[file_type]=output_file"
}
},
"pages": {
"links": {
"related": "api.parashift.io/v2/pages?filter[document_id]=1633723"
}
}
}
},
"included": [
{
"id": "95305517",
"type": "document_fields",
"attributes": {
"confidence": 0.98221,
"coordinates": {
"top": 0.3172386272944932,
"left": 0.11901081916537867,
"right": 0.4152498712004122,
"bottom": 0.3391859537110934
},
"created_at": "2022-06-17T09:10:00.118511Z",
"data_type": "string",
"document_id": 1633723,
"recognition_value": "Awesome First Integration",
"recognition_confidence": 0.98838,
"extraction_lower_threshold": 0.3,
"extraction_upper_threshold": 0.95,
"external_value": null,
"fieldset_identifier": null,
"identifier": "pp-subject",
"item_index": null,
"page_number": 0,
"prediction_value": "Awesome First Integration",
"prediction_confidence": 0.99376,
"status": "in_progress",
"validation_status": "skipped",
"tenant_id": "543",
"updated_at": "2022-06-17T09:10:37.502286Z",
"user_value": null,
"value": "Awesome First Integration",
"extraction_candidates": [
{
"recognition_value": "Awesome First Integration",
"prediction_value": "Awesome First Integration",
"page_number": 0,
"coordinates": {
"top": 0.3172386272944932,
"left": 0.11901081916537867,
"right": 0.4152498712004122,
"bottom": 0.3391859537110934
},
"confidence": 0.9822109963621605,
"recognition_confidence": 0.9883790810902914,
"prediction_confidence": 0.9937593936920166
},
{
"recognition_value": "Parashift AG, Hauptstrasse 134, 4450 Sissach, Schweiz",
"prediction_value": "Parashift AG, Hauptstrasse 134, 4450 Sissach, Schweiz",
"page_number": 0,
"coordinates": {
"top": 0.11572226656025539,
"left": 0.12107161257083977,
"right": 0.5141679546625451,
"bottom": 0.12609736632083002
},
"confidence": 0.10058518687352651,
"recognition_confidence": 0.9896196637834821,
"prediction_confidence": 0.1016402468085289
}
]
},
"relationships": {
"document": {
"links": {
"related": "api.parashift.io/v2/documents/1633723"
}
}
}
},
{
"id": "95305525",
"type": "document_fields",
"attributes": {
"confidence": 0.98451,
"coordinates": {
"top": 0.18754988028731046,
"left": 0.16125708397733127,
"right": 0.26841834106130863,
"bottom": 0.19992019154030327
},
"created_at": "2022-06-17T09:10:00.403340Z",
"data_type": "string",
"document_id": 1633723,
"recognition_value": "Ankh-Morpork",
"recognition_confidence": 0.98956,
"extraction_lower_threshold": 0.25,
"extraction_upper_threshold": 0.95,
"external_value": null,
"fieldset_identifier": "pp-receiver-address",
"identifier": "pp-receiver-address-city",
"item_index": null,
"page_number": 0,
"prediction_value": "Ankh-Morpork",
"prediction_confidence": 0.9949,
"status": "in_progress",
"validation_status": "skipped",
"tenant_id": "543",
"updated_at": "2022-06-17T09:10:38.010412Z",
"user_value": null,
"value": "Ankh-Morpork",
"extraction_candidates": [
{
"recognition_value": "Ankh-Morpork",
"prediction_value": "Ankh-Morpork",
"page_number": 0,
"coordinates": {
"top": 0.18754988028731046,
"left": 0.16125708397733127,
"right": 0.26841834106130863,
"bottom": 0.19992019154030327
},
"confidence": 0.9845102115129194,
"recognition_confidence": 0.9895561337471008,
"prediction_confidence": 0.9949008226394653
}
]
},
"relationships": {
"document": {
"links": {
"related": "api.parashift.io/v2/documents/1633723"
}
}
}
}
],
"meta": {}
}