Training Data Management

Training Data Management (formerly Keyer Data Management) allows users to improve and supervise models by working directly with the training data (“ground truth”) obtained from each document in the training set. Users can group their documents, see incompatible ones, annotate representative parts of them, and detect potential inconsistencies. To learn more, see Step 3 and Step 4 of our Training a Semi-Structured Model and Training Data Management for Classification articles.

The performance of your models depends on the quality of the pages, the diversity of the documents, and, for Identification models, the consistency of the annotations. For more information on model-training results, see Evaluating Model Training Results.

TDM includes tools for controlling and managing Identification and Classification models’ performance.

TDM for Identification models

TDM for Identification models includes the following features: 

  • Document Eligibility Filtering — indicates whether a document is eligible for training, based on internal checks in the application and our machine learning logic. It provides additional information about documents that were excluded from the training set. 

  • Training Data Curator — labels each training document as having high or low importance. The importance is calculated by determining which data would best contribute to the model’s performance. 

  • Labeling Anomaly Detection for Fields and Tables — identifies potential discrepancies in the training datasets before running model training. Once the annotations are ready, the user can analyze the data to find inconsistencies and ensure a top-performing locator model. 

Learn how to use these features to maximize the performance of your identification model in our Training a Semi-Structured Model article.

TDM for Classification 

TDM for Classification models allows you to add, remove, and update training pages for Classification models. Learn more in TDM for Classification.

Accessing Training Data Management tools

If you have the View Training Data permission (given to System Admin and Business Admin permission groups by default), you can access the Training Data Management tools for a model:

  1. Go to Library > Models. Learn more about the models table in Model Management.

  2. Choose the type of models you want to view from the drop-down menu.

  3. Click on the name of the model you would like to view training data for. 

    • For ID models:  

      • Click the Field Identification or the Table Identification tab, depending on the type of training data you would like to view.

      • The Training Data Management tools are located on the Training Data Health card. 

    • For Classification models: 

      • Click on the Training Data tab to edit the documents used for training. 

Continuous Model Training

If Continuous Field Locator model improvement and/or Continuous Classification model improvement settings are enabled, and you import a model from another environment, your automation rates may be reduced. Models only use training data from their current environment, and if you do not have enough training data in your new environment, your model will be overwritten by a lower-performing one. For optimal performance, we recommend that you train models manually and disable the Continuous Field Locator model improvement and Continuous Classification model improvement settings. Only enable these settings if instructed to do so by a Hyperscience representative.