Document Eligibility Filtering indicates whether a document is eligible for training, based on internal checks in the application and our machine learning logic. It provides additional information about documents that were excluded from the training set. With Document Eligibility Filtering, you can see which documents are incompatible with training and why, allowing you to address any issues accordingly and achieve better model performance.
Using Document Eligibility Filtering
Before using Document Eligibility Filtering, make sure that you’ve:
uploaded the number of required documents for training in Training Data Management and
analyzed the data.
To learn more about uploading training documents and analyzing data, see Labeling Anomaly Detection.
Go to Library > Models, and click on the name of the model you want to view document-eligibility information for.
A Field Identification Model or Table Identification Model card shows the information on the training documents that are currently uploaded.Required to train - number of additional documents needed to run a model training
Eligible for training - number of documents that will be used in your training set. This number may change as documents are annotated each time you analyze your training data.
The Training Data Health card shows information about the quality of your training data. The bar indicates how many documents you need to meet the minimum required for training.
All documents with the Training Status Ready to Annotate or Never appear as ineligible for training until you change the status and reanalyze the data.
To see ineligibility details for the training set, click See Ineligibility details >> in the Field Identification Model or Table Identification Model card.
Ineligibility information is displayed in the right-hand sidebar, showing the reasons documents are ineligible for training and how many documents are ineligible for each reason.Always make sure to reanalyze your data to see updated information on your training dataset.
If documents have been added, removed, or modified since the last analysis, the ineligibility details may be outdated.
Depending on the results of the analysis, a yellow indicator may appear on the left-hand side of a document’s record in the Training Data card. Hover over it to see whether an anomaly was detected in the document or the document is ineligible for training.
Learn more about anomalies in Labeling Anomaly Detection.
You can filter the documents by training-ineligibility reason by clicking Filters and selecting a reason in the Training Eligibility drop-down list. Click Apply Filters to view the results.
To view ineligibility details for a particular document, click its ID in the Training Data table.
Ineligibility information is displayed in the right-hand sidebar, showing the reasons the document is ineligible for training.
Ineligibility reasons
Reason | Description |
---|---|
Ineligible status | A document will always be ineligible for training if its Training Status is Ready to Annotate or Never. |
Incompatible layout version | The information about the layout is incompatible with the documents provided for training. |
Overlapping bounding boxes | If a document has overlapping bounding boxes, it is ineligible for training. |
Consecutive page breaks | The bounding boxes for a field with multiple bounding boxes span across more than two consecutive pages. Example:
|
Max Pages per doc are exceeded | The document has more than the maximum number of pages per document, as defined in the system. The default maximum is 5000 pages. For more information about the default value, contact the Support team. |
Max Segments per page exceeded | The document has more than the maximum number of text segments per page, as defined in the system. The default maximum is 900. |
Max total pages exceeded | The training set contains more than the maximum number of total pages, as defined in the system. The default maximum is 5000. Contact the Support team for more information and assistance. |
Max Segments exceeded | The training set contains more than the maximum number of total text segments, as defined in the system. The default maximum is 900. For more information contact the Support team. |
Unexpected Multiple Occurrences | Keyers can annotate multiple occurrences in Supervision and QA, even if the Multiple Occurrences checkbox is NOT selected in the Layout Editor. In Training Data Management, these documents appear as ineligible for training after Training Data Analysis. |