Importing and Exporting Training Data

Using features for Semi-structured documents
This article mentions features used in the processing of Semi-structured documents. Your access to those features depends on your license package and pricing plan.
To learn which features are available to your organization and how to add more, contact your Hyperscience representative.

To ensure that you do not lose any training data during application upgrades and model setups, you can move your training data between environments. The ability to export your models’ training data from production to lower environments can also help you debug issues with your deployed models.

You can export and import training data for the same layout across different instances. For example, you can move ground-truth data across variations of the same layout in different instances, but you cannot move this data across entirely different layouts.

Exporting Training Data

To export a model’s training data, you need to:

Go to Library > Models.
Click on a model name from the table to view its Model Details page.
Click the Download Training Documents button.

In the Notifications tab which is located in the upper-right corner next to your name, you can track the progress of the export task. Once the export is ready to be downloaded, you can find a notification with a Download link in the Notifications tab.

The export consists of a training data ZIP file with the following elements:

A JSON file with training data
Document images
Layout version

A training data ZIP file contains up to 500 pages. If a model has more than 500 pages of training data, the data is split into multiple ZIP files. For example, if a model has 900 pages of training data, two training data ZIP files will be available for download.

“Model training at risk” warning message

If you’ve enabled the PII data deletion setting in Administration > System Settings, a Model Training at Risk warning message appears in your Model Details pages.

The Model Training at Risk warning message has a Save Data button. Clicking this button opens a popup window that prompts you to download your model’s training data.

Importing Training Data

To import a model’s training data, you need to:

Go to Library > Models.
Click on a model name from the table to view its Model Details page.
Click the Upload Training Documents button.

The Upload Documents dialog has two tabs:

Create New
Upload Existing

The “Create New” tab

The Create New tab allows you to add new documents for annotation and model training. Once the documents are successfully uploaded, you have to annotate them and train the Field Locator or Table Locator model with them.

When uploading documents, you can enable the following Image Upload settings:

Image correction
Captured image enhancement

To learn more about these two settings, see the Image Readability section of the Identification Settings article.

If you add multiple documents for upload, you can also select one of the following Upload As settings:

A single document – creates a single document from all uploaded files.
A document for each file – creates a document for each uploaded file.

“Upload Existing” tab

The Upload Existing tab allows you to upload existing training data that has already been created for this layout in another instance.

Note that you can upload only one ZIP file at a time.

Before uploading a ZIP file, you need to choose how you’d like to handle duplicate documents. A document that you export, make changes to, and then reimport is considered a duplicate. Select one of the following options:

Skip if a duplicate file exists
Override existing documents with data from .ZIP