Model Validation Tasks

Accessing this feature
Your access to the feature described in this article depends on your license package and pricing plan.
To learn which features are available to your organization and how to add more, contact your Hyperscience representative.

Model Validation tasks (MVTs) indicate the potential human annotation errors and help you determine the automation of the model compared to your annotations. To learn more about the annotation process, see Training a Semi-structured Model. Learn more about Semi-structured (NLC) Classification in TDM for Classification Models.

In this article, you’ll learn more about MVTs and how they help improve your model’s performance.

Understanding Anomalies and MVTs in Training Data Management

Understanding the difference between MVTs and Anomalies when working with Classification and Identification models in Training Data Management (TDM) will help you improve your model’s performance more effectively. This section explains their purposes and how they fit into the model training process. To learn more about model training, see TDM for Classification Models and TDM for Identification Models.

MVTs and TDM
Previously, Model Validation Tasks were flagged as QA tasks after model training. However, they’re now treated as part of the anomalies in Training Data Management:
for Identification Models in v37 and later. To learn more, see TDM for Identification Models.
for Classification Models in v39 and later. Learn more in TDM for Classification Models

In TDM, both Anomalies and MVTs help improve model quality, but they are triggered at different stages and serve different purposes. Learn more in the sections below.

Anomalies

Anomalies are generated after Training Data Analysis, before the model is trained. They highlight potential issues and discrepancies, such as inconsistencies or missing fields. They are identified by comparing how similar documents in your dataset are labeled. Learn more in Training Data Curator and Text Segmentation.

Consistent annotations
Consistent annotations are crucial for the high quality of your model. To learn more, see Step 5 of our Training a Semi-structured Model article.

Detecting anomalies in Identification models

Anomalies are generated for Identification models only
Anomalies are part of the Training Data Management features that help you improve the quality of the training data for your Identification models. They are not available for Classification or Long-form Extraction models.

To detect and address potential anomalies:

Upload and annotate your training documents in TDM. (A minimum of 100 documents is required; 400 documents are recommended.)
Click Analyze data.
- After the analysis is done, the documents with inconsistencies are flagged as anomalies.
Review and resolve anomalies by editing or confirming your annotations.
Click Analyze data again to confirm consistency. To learn more about anomalies, see Labeling Anomaly Detection.

Model Validation tasks

Model Validation tasks are generated after the model training has finished. They indicate the model’s high-performance predictions and help you determine the accuracy of the trained model compared to your annotations. They appear as anomalies in Training Data Management, allowing you to identify potential issues in the model’s predictions that could impact performance.

Improving Identification Models Using MVTs

To get the most out of MVTs, follow the steps below:

Confirm the consistency of the annotations in your training set by following the steps in the Detecting anomalies in Identification models section of this article.
Initiate the first round of model training.
After the initial training is done, review and resolve the MVTs, which are flagged as anomalies. Learn more about anomaly indicators in Labeling Anomaly Detection.
Retrain the model with the corrected data.

Two rounds of training
For best results, we recommend running two rounds of Field Identification or Table Identification model training. Doing so provides you with more control over your model’s performance. However, if you observe a higher number of anomalies after the second round, repeat steps 3 and 4, described in the Improving Identification models using MVTs section of this article.
We have found that the first round of MVTs have a higher impact on your model’s performance. If you want to improve it further, see Step 7 - Evaluating Model Performance in our Training a Semi-structured Model article.

MVTs in Classification models

In Semi-structured Classification models, MVTs are also displayed after model training. They are flagged as anomalies when the layout assigned to a document doesn’t match the layout the model predicts with high confidence.

TDM for Classification works on the document level
In TDM for Classification, you match the whole document—rather than a single page—to a specific layout. Therefore, a single layout is applied to the entire document, instead of a separate layout being applied to each page. To learn more, see TDM for Classification Models.
If your document contains pages with different layouts (e.g., the first two pages follow one layout and the last two a different one), we recommend splitting the file into separate documents, one for each layout, and uploading them individually. Contact our Support team for more information.

Follow the steps below to review and resolve layout mismatches flagged as MVTs:

Go to the Training Data tab, and then click Filter on the Training Data table.
In the “ Has Anomalies drop-down list, click Contains anomalies.
Click Apply Filters.

Number of anomalies
In TDM for Classification models, the number of anomalies is not displayed. However, you can still check the affected documents by applying the Has Anomalies filter in the Training Data tab.

Open a flagged document to access the Training Document view.

Review documents after model training
Documents with layout mismatches are flagged with a red border around the Layout drop-down list. This border indicates that model predicts a different layout than the one initially classified.

Based on the model’s prediction:
- Update the layout to match the model’s prediction if the model’s suggestion is correct.
  - If some of the pages have a different layout, you need to re-upload them as separate documents and delete the incorrect one.
- Keep the current layout only if your original classification is correct.

Layout drop-down list
The layout displayed in the Layout drop-down is the one that was initially selected when the model was added to TDM. After training, the model may predict a different layout, as described in the guidance above the drop-down list.

Click Save Changes

Following these steps helps to refine the model in future training rounds. Learn more about Semi-structured Classification in TDM for Classification Models.