TDM for Classification Models

Accessing this feature
Your access to the feature described in this article depends on your license package and pricing plan.
To learn which features are available to your organization and how to add more, contact your Hyperscience representative.

Classification models are a crucial part of document processing, as they help the system determine which layout should be used to process each page you upload. Training Data Management for Classification allows you to add, remove, and update training pages for Classification (also known as NLC) models to achieve more accurate classification results. In this way, TDM helps you maximize the performance of your Classification models.

NLC (Non-structured layout classifier) finds the correct Semi-structured or Additional layout for a given set of submission pages, based on the words in the submitted documents. Note that NLC works on a page level. Learn more in Automatic Document Classification.

Each release contains a set of layouts. The creation of a release generates a single Classification model. For example, if you create a release with two layouts, then one Classification model will be generated. It will be trained to identify the document pages submitted through the release’s flow.

If you create a new layout and create a new release for it, then a new Classification model will be created. Note that you need to add your new layout to a new release or a copy of an existing release for document pages to be matched to that layout. Learn more in Adding a New Release.

TDM for Classification logic

TDM for Classification operates on a document level. However, the Training Data tab displays the number of uploaded documents and the required and recommended number of pages per layout. Learn more in the Training Data Tab section of this article.

TDM for Classification allows you to manage example documents that should be included in or excluded from your model’s training:

Layouts eligible to train - These are the layouts that meet the minimum number of pages required for training. To ensure this requirement is met, upload documents that:
- have pages that match your layout and
- are diverse but still represent your layout.

Our recommendation for a robust model is 120 page examples per layout.
You need at least 10 page examples to meet the minimum requirements for model training.
Do not upload the same document multiple times.

Excluded documents - TDM uses these as examples of documents that you expect to process but don't want to match. They serve as counter-examples of the documents that your model should not classify.

Access TDM for Classification

To access TDM for Classification, go to Library > Models and click on Classification Models in the drop-down menu at the top of the page.

A table with all Classification models appears:

You can filter the Classification Models table by release with the Filter by release drop-down list, which is located on the right-hand side of the page.
You can also access the model management page for a particular model by clicking on its name in the table.

Importing Classification Models
In v41.0.1 and below, you can import models from the Import Model button on the upper-right corner of the page. To learn more, see

The Classification models table contains the following columns:

Model shows the name of your Classification model.
Compatible Releases indicate the number of releases the Classification model can predict.
Status shows the model's current state (e.g., Needs Training or Live)
Training Status indicates the current state of the model training (e.g., Pending, In Progress, Failed, Canceled, or Last trained on [date]).

To access TDM features for your Classification model:

Go to Library > Releases
Click on the release's name for the model you want to manage training data for.
Click View Model in the Automatic Document Classification card.

Using TDM for Classification

Overview Tab

Projected Automation Chart

The projected Automation chart displays your model's predicted automation rate based on the target accuracy. Learn more about these metrics in our Accuracy article.

Expand it by clicking the arrow button () .

The chart displays how the target accuracy affects the automation. The lower the accuracy, the higher the automation and vice versa. To learn more, see Automation.

Margin of Error (MoE)
The Margin of Error (MoE) indicates the allowable range of inaccuracy in the system’s results. It shows you how much the output can differ from the true value while still being acceptible. A smaller margin of error means the system is more accurate.

Model Activity Table

The Model Activity table shows the training history for your classification models and has the following columns:

Training Started indicates the start date and hour of the training process.
Status displays the status of your model (e.g., Training in Progress, Training Failed, Training Canceled, Scheduled Training, and Needs Training).
Actions allow you to download the last trained version of your model.
- You can also download the current version of your model from the drop-down menu next to the Run Training button.

If you download the training data for the model, it may contain personally identifiable information. Learn more about managing your data in PII Data Deletion.

The System Version is the Hyperscience version in which the model was trained. You can filter it from the System Version drop-down menu.

Use the pagination options at the bottom of the table to display all activities for your Classification model.

Model Compatibility Table

The Model Compatibility table indicates the releases that your model is compatible with and contains the columns described below.

Release name displays the names of the releases to which your layouts are added. Click on the column header to sort its contents by name.
Created On shows the creation date of the release. Sort its contents chronologically by clicking on the column header.
Status indicates whether your release is live or locked. Learn more in What is a Release?. Sort this column’s contents by clicking its header.

You can use the pagination options at the bottom of the table to display all releases that are compatible with your model.

Training Data Tab

Training Data Summary Card

The Training Data Summary card displays insights on the status of your training dataset.

Training Data Status indicates your training data's health based on the number of pages uploaded for each layout:
- Requirements Not Met - The minimum number of required pages uploaded for each layout is 10. If you’ve added less than 10, you won’t be able to run a training.
You need to upload at least 10 examples per layout for the model to learn what documents should be considered as a part of your training set. Note that you can run a training without Excluded examples if you have more than two layouts. However, we recommend adding documents in the Excluded section, as well, as they serve as counter-examples.
- Not Optimized - Hyperscience recommends uploading at least 120 pages to build a robust classification model. If you upload more than 10 but fewer than 120 pages, the status will indicate that your training data can be optimized. However, you’ll still be able to proceed with training.
- Ready To Train - This status will be displayed after you’ve reached the minimum required and the recommended number of uploaded pages to start a model training.

Layouts Eligible to train - indicates the number of layouts that meet the minimum requirements for training.
Excluded Documents - number of documents used as counter-examples. The excluded documents train the model on what should not be matched. Note that they are recommended but not required.

Excluded Documents Required
Excluded documents are required when you have only one layout in your release.

Last Training - displays the date and hour of the last model training.

Training Data Health Card

The Training Data Health card displays a breakdown of your dataset. It shows all layouts included in the Classification model, as well as bars next to each layout indicating the number of uploaded pages. Note that you’ll have the required and recommended numbers of documents for each layout.

Follow the steps below to add training data to your model:

Click the Add Training Data button.
Select the layout you want to add data for from the Upload To Layout drop-down.
Drag and drop your files into the dialog box or click Browse.
Once you’ve uploaded your files, click Continue.

Training Data Table

The Training Data table shows all documents available for use as training data for the model. It contains the following columns:

Document ID shows the unique ID number of the document.

Hover over the preview icon to see the pages of the uploaded document. Freeze the preview by clicking on the preview icon. Page through the document using the arrow keys on your keyboard or the arrows in the preview dialog. Click anywhere on the page to hide the preview.

Note that TDM for Classification works on a document level (i.e. when you edit the classification in TDM, you will classify the whole document and not a single page to a specific layout), whereas QA operates at the page level.

Pages displays the number of pages in the document.
Layout shows the layout this example corresponds to.
Usage Rule indicates the way the system will use the specific document for training:
- Always - The document will always be used in future model trainings. It will never be deleted, regardless of the system’s data-deletion settings.
- Auto - The document may be used in future model trainings until it is automatically deleted according to the system’s data-deletion settings.
- Never - The document will never be used in future model trainings and will be automatically deleted according to the system’s data-deletion settings. Documents processed through Supervision or QA will always display a status of 'Never' in TDM.
- Loading - The document has just been uploaded and is going through pre-processing. Once they load, the status will change to 'Auto'.
Source - indicates how the document was added to Training Data Management:
- Upload - The document was uploaded manually through TDM.
- Processing - The document was uploaded through Submissions.
- Anomaly - The model was not confident enough for a document, and an anomaly was generated after the model training. Review the anomaly and continue with the process.
  The machine might classify a page with high confidence yet still be incorrect. This type of mistake is known as a high-confidence error. To confirm and correct such errors, users must complete Model Validation Tasks (MVTs), which are shown as anomalies in TDM. Learn more in Document Classification Model Validation Tasks.
- QA - Indicates the legacy documents that were processed through QA in v37 or earlier.

Excluded Training Data

The Excluded Training Data table displays the documents used as counter-examples for your classification model. The columns are the same as those described above for the Training Data table.

Excluded Documents Required vs Recommended
Excluded documents help train the model on what should not be matched. They serve as counter-examples and are only required if your release contains a single layout.
If you have multiple layouts in your release, we recommend uploading excluded documents to achieve a higher model performance.

You can change the displayed columns by clicking on Manage Columns… in the drop-down menu.

You can filter the tables by:

Layout
Usage Rule
Source
Scheduled Deletion
Has Anomalies - Filters out documents that are incorrectly classified for this layout.

You can also search by Document ID. The Actions drop-down menu provides options to bulk-delete, edit, or download training data, as well as to download the entire training dataset.

Training a Classification Model

Follow the steps described below to learn how to train a classification model using TDM.

Upload your documents

To upload documents to TDM for Classification:

Click the Add Training Data button on the right-hand side of the Training Data Health card.
Choose Upload Files or Import Training Data from the dialog box.

You can import the following training data:
A Hyperscience export
A .ZIP file containing sub-folders, where the name of each folder is the name of an existing layout. We recommend naming each layout differently to avoid any confusion when importing training data. Do not include special characters in the names of the layouts, as doing so could lead to unexpected behavior during import.

You can also add examples directly to that layout by clicking add documents next to each layout. The same dialog box will appear, but the options for importing training data and uploading to a layout will be grayed out.
or add documents directly from a layout’s details page by clicking Upload Documents.

Review your documents

Review your documents using the Training Document View. It helps you match each document to a specific layout.

TDM for Classification works on document-level
You will classify the whole document and not a single page to a specific layout.

Assign a layout to the training document from the Layout drop-down menu on the right-hand side of the page
Change the status of your document from the Training Status section.

Click Save Changes after you’ve classified your document.

Training a Classification model

Classification models data
Different Classification models across flows share the same data in TDM. Any changes applied to the training data (e.g., updating or removing documents) are also applied to all releases and flows.

After you’ve reached the requirements and recommendations, you’ll see a message that indicates that your model is ready to be trained for the first time in the Overview tab.

You can run training from either tab by clicking on the Run Training button on the upper-right corner of the page.

Classification models are automatically deployed after training.

You can cancel your training at any time from the drop-down menu in the upper-right corner of the page.

Learn more about Classification in Document Classification.

Importing a Classification model

In v41.0.1 and below, you can import your classification models through the Classification Models table. Follow the steps below to learn how to import classification models.

Importing a Classification Model

To import a model:

Filter by the release for which you want to import a Classification model.
Click Import Model in the upper-right corner of the page.
A dialog box appears:
Drag and drop your ZIP file directly into the dialog box, or click Browse to find the file on your machine and open it.
Click Import.

Importing Classification models
When importing classification models, make sure they are trained for the same release version as the currently opened classification model.
If the model is from a different release, an error message will appear in the UI indicating the mismatch.
You’re only able to import models by themselves, you cannot import them with their training data.