Document Eligibility Filtering

Document Eligibility Filtering indicates whether a document is eligible for training, based on internal checks in the application and our machine learning logic. It provides additional information about documents that were excluded from the training set. With Document Eligibility Filtering, you can see which documents are incompatible with training and why, allowing you to address any issues accordingly and achieve better model performance. 

Using Document Eligibility Filtering

Before using Document Eligibility Filtering, make sure that you’ve: 

  • uploaded the number of required documents for training in Training Data Management and

  • analyzed the data.

  1. Go to Library > Models, and click on the name of the model you want to view document-eligibility information for.
    A Field Identification Model or Table Identification Model card shows the information on the training documents that are currently uploaded.

    • Required to train - number of additional documents needed to run a model training

    • Eligible for training - number of documents that will be used in your training set. This number may change as documents are annotated and each time you analyze your training data.

The Training Data Health card shows information about the quality of your training data. The bar indicates how many documents you need to meet the minimum required for training. 

All documents with the Training Status Ready to Annotate or Never appear as ineligible for training until you change the status and reanalyze the data.

  1. To see ineligibility details for the training set, click See Ineligibility details >> in the Field Identification Model or Table Identification Model card.
    Ineligibility information is displayed in the right-hand sidebar, showing the reasons documents are ineligible for training and how many documents are ineligible for each reason.

    Always make sure to reanalyze your data to see updated information on your training dataset.

    If documents have been added, removed, or modified since the last analysis, the ineligibility details may be outdated.

Depending on the results of the analysis, a yellow indicator may appear on the left-hand side of a document’s record in the Training Data card. Hover over it to see whether an anomaly was detected in the document or the document is ineligible for training.


You can filter the documents by training-ineligibility reason by clicking Filters and selecting a reason in the Training Eligibility drop-down list. Click Apply Filters to view the results.

  1. To view ineligibility details for a particular document, click its ID in the Training Data table.
    Ineligibility information is displayed in the right-hand sidebar, showing the reasons the document is ineligible for training.

Ineligibility reasons

Reason

Description

Ineligible status

A document will always be ineligible for training if its Training Status is Ready to Annotate or Never.

Incompatible layout version

The information about the layout is incompatible with the documents provided for training.

Overlapping bounding boxes

If a document has overlapping bounding boxes, it is ineligible for training.

Consecutive page breaks

The bounding boxes for a field with multiple bounding boxes span across more than two consecutive pages.

Example: 

  • Page 1 has a field that continues on Page 2, but that field continues also on Page 3. 

  • Page 1 and Page 2 are two consecutive pages, but Pages 1 and 3 are not. Therefore, the span of the multiple bounding boxes makes the document ineligible for training. 

Max Pages per doc are exceeded

The document has more than the maximum number of pages per document, as defined in the system. The default maximum is 5000 pages. For more information about the default value, contact the Support team.

Max Segments per page exceeded

The document has more than the maximum number of text segments per page, as defined in the system. The default maximum is 900.

Max total pages exceeded

The training set contains more than the maximum number of total pages, as defined in the system. The default maximum is 5000. Contact the Support team for more information and assistance. 

Max Segments exceeded

The training set contains more than the maximum number of total text segments, as defined in the system. The default maximum is 900. For more information contact the Support team.