Intelligent page separation

Learn how to split a batch document in an intelligent way

This functionality enables splitting a batch document intelligently using Machine Learning (ML) into separate into single/individual documents.

ML in action:

  • There is a Machine Learning (ML) model for document, trained on a representative subsample of documents across all documents of Parashift Data Center (PDC).
  • Upon request, tenants can train an individual separation model, where
    • the generic model is used as transfer learning starting point
    • ground truth is collected from corresponding tenant & hierarchy
  • Individual separation models will be trained in a predefined schedule continuously as new documents are validated for separation

Set up:

  • Upload batch documents by clicking the checkbox batch
  • Perform separation-validation - at least 5 docs to initiate separation model training in the background. You might have to perform this for sufficient amount of docs to get a good model (just like how the improvement of classification, extraction models on our platform works)


  • Observe how well the automated separation is working
  • Once happy with the automated intelligent separation model, you can skip the manual separation-validation step by deselecting "MANUAL VALIDATION SEPARATION REQUIRED" in the upload profile. Or you can keep the manual validation always a part of your process to continuously improve the models.

The list of upload profiles can be found clicking on "Upload" in the sidebar.

 

After entering the right upload profile one can find the "Separation Settings" section.