Identification Settings

Our Identification settings can be categorized as follows:

Supervision Tasks
Accuracy
Image Readability

Supervision Tasks

Table output manual review

When enabled, the Table output manual review setting will always send Table Transcription tasks to a keyer to review the table column and row drawn during Table ID Supervision.

When disabled, a Table Transcription task will only be generated if one or more cells in the table have transcribed values that fall below the defined accuracy thresholds.

You can enable this setting in your flow’s Manual Transcription Block.

Field Identification Quality Assurance

The Field Identification Quality Assurance process samples a user-defined portion of fields and sends them to Field Identification QA so the system can gather data regarding machine and data keyer accuracy. “Field Identification” refers to the location of the field and is specific to Semi-structured documents.

You can enable this setting in the “Extraction” setting of your flow’s settings. If you enable Field Identification Quality Assurance, you will also need to specify a Field Identification QA sample rate, as described below.

Field Identification QA sample rate

The Field Identification sample rate refers to the percentage of documents that will be selected for Field Identification QA tasks. Semi-structured documents require full-document QA because certain fields are located in relation to other specific fields, and Hyperscience's machine learning models need data for all of the document’s fields in order to improve.

The most important question to consider when determining a QA sample rate is: what is your acceptable margin of error? The greater the QA dataset, the better the system will be at determining accuracy. You should also take the time frames into consideration — for example, the margin of error will be lower when aggregating the volume over a span of seven days than over one day.

You can set your Field Identification sample rate in the “Extraction” setting of your flow’s settings. Please contact your deployment manager for additional help with determining the right sample rate for your team.

Accuracy

Field Identification target accuracy

The system uses QA data and the Field Identification target accuracy to calculate the optimal confidence threshold that will allow the system to reach the target accuracy with the minimum amount of manual effort.

When models for Semi-structured layouts are trained, the projected accuracy and automation estimates assume a 95% target accuracy for Field Identification, which may differ from your system's configuration. If your system's target accuracy is set higher than 95%, you should expect to see lower automation, and if your system's target accuracy is set lower than 95%, you should expect to see greater automation.

By fixing these estimates to a static 95% target accuracy, you can easily compare the estimates across models, even if you have changed your target accuracy for Field Identification between trainings.

You can set your Field Identification target accuracy in the “Extraction” section of your flow’s settings.

Continuous Field Locator model improvement

When enabled, Continuous Field Locator model improvement will automatically train and deploy Field Locator models with better performance. This setting is enabled by default.

When disabled, users will have to manually trigger Field Locator model training to gain the benefit from new documents and training data.

Note that if Continuous Field Locator model improvement is enabled and you import a model from another environment, you might end up with lower automation rates. Models only use training data from their current environment, and if you do not have enough training data in your new environment, the model you imported will be overwritten by a worse one. For optimal performance, we recommend that you train models manually and disable the Continuous Field Locator model improvement setting. Only enable Continuous Field Locator model improvement if instructed to do so by a Hyperscience representative.

In order for a new training job to be triggered, the two following conditions must be met:

Training frequency – training will occur at least 3 days after the last training job was completed.
New QA document requirements – 5% of the number of documents that were used to train the live model are needed.

If the live model was trained on 150 documents, there must be 8 new documents in order for a new training job to be created (150 * 0.05 = 7.5, round up).

You can enable this setting in the “Identification” section of the application settings (Administration > Settings).

Continuous Field Locator model improvement and Keyer Data Management

In Keyer Data Management, documents with completed Supervision and QA tasks are included as training data, but by default, their Training Status is Never. If you don't manually change the Training Status of these documents to Auto or Always, or if you don't upload new training documents, Continuous Field Locator Model training will never occur, even if enabled. In order for this automatic training to occur, there must be a 5% increase in the number of eligible training documents since the last training of the model.

For more information on managing training data, see Keyer Data Management.

Image Readability

Image Correction

If the Image Correction setting is enabled, the machine corrects the rotation of pages that are submitted to Hyperscience.

This setting is enabled by default, and you can find it in your flow’s Machine Classification Block. To learn more, see Flow Blocks.

Captured Image Enhancement

If the Captured Image Enhancement setting is enabled, the machine improves the readability of Semi-structured pages captured by mobile devices. To improve image quality, the machine properly adjusts pages’ orientation and crops backgrounds.

The Captured Image Enhancement setting is disabled by default, but you can enable it in your flow’s Machine Classification Block. To learn more, see Flow Blocks.

Before enabling the Captured Image Enhancement setting, make sure that the majority of the pages you will be processing are captured by mobile devices. Contact your Hyperscience representative for more information.