Our Document Classification settings fall into two categories:
Additional and Semi-structured Documents
Structured Layout Match Threshold
Additional and Semi-structured documents
Semi-structured Classification
When this setting is enabled, you will be able to train a classification model to automatically match documents to Additional and Semi-structured layouts.
To enable Semi-structured Classification, select this setting in the “Classification” section of your flow’s settings.
For more information, see Automatic Document Classification.
Semi-structured target accuracy
If the estimated accuracy of the model's prediction is below this percentage, the system will generate a Classification Supervision task for the document.
You can set this target accuracy in the “Classification” section of your flow’s settings.
Continuous Classification model improvement
When this setting is enabled, the system will train new models automatically and then deploy them after their training is complete. You can see a history of model training and deployments in the "Model Activity" card on the Classification Model detail page.
You can enable this setting in the “Document Classification” section of the application settings (Administration > Settings).
Semi-structured QA sample rate
This setting determines the percentage of documents that we will randomly select for Classification QA. In Classification QA tasks, users will confirm that the model's classifications are correct.
You can set this sample rate in the “Classification” section of your flow’s settings.
Semi-structured grouping logic
When multiple pages are matched to the same layout in a given submission, there are three options to handle that:
Consecutive pages as separate documents — Every page matched to the same layout will be treated as an individual document.
Consecutive pages as a document — Consecutive pages that are matched to the same layout will be treated as one document (for PDF or TIFF files, only consecutive pages of the same file will be grouped as one document).
Manual review of consecutive pages — Consecutive pages that are matched to the same layout will create a Classification Supervision task for the submission (for PDF or TIFF files, only consecutive pages of the same file will be grouped as one document).
You can select one of these options for your flow in the “Classification” section of your flow’s settings.
Structured Layout Match Threshold
The Structured Layout Match Threshold determines whether a Structured page is matched to a layout. The pages with confidence scores below the threshold are sent to Classification Supervision (if enabled) or marked as "No Layout Found."
You can set this threshold in the “Classification” section of your flow’s settings.