Identification Settings

Our Identification settings can be categorized as follows:

Supervision Tasks
Accuracy
Image Readability

Supervision Tasks

Table output manual review

When enabled, the Table output manual review setting will always send Table Transcription tasks to a keyer to review the table column and row drawn during Table ID Supervision.

When disabled, a Table Transcription task will only be generated if one or more cells in the table have transcribed values that fall below the defined accuracy thresholds.

You can enable this setting in your flow’s Manual Transcription Block.

Field Identification Quality Assurance

The Field Identification Quality Assurance process samples a user-defined portion of fields and sends them to Field Identification QA so the system can gather data regarding machine and data keyer accuracy. “Field Identification” refers to the location of the field and is specific to Semi-structured documents.

You can enable this setting in the “Extraction” setting of your flow’s settings. If you enable Field Identification Quality Assurance, you will also need to specify a Field Identification QA sample rate, as described below.

Field Identification QA sample rate

The Field Identification sample rate refers to the percentage of documents that will be selected for Field Identification QA tasks. In earlier versions, Semi-structured documents require full-document QA because certain fields are located in relation to other specific fields, and Hyperscience's machine learning models need data for all of the document’s fields in order to improve.

In v37 and above, you can use the Training Data Management tool to adjust your ground truth data accordingly. Learn more in Training Data Management.

The most important question to consider when determining a QA sample rate is: what is your acceptable margin of error? The greater the QA dataset, the better the system will be at determining accuracy. You should also take the time frames into consideration — for example, the margin of error will be lower when aggregating the volume over a span of seven days than over one day.

You can set your Field Identification sample rate in the “Extraction” setting of your flow’s settings. Please contact your deployment manager for additional help with determining the right sample rate for your team.

Accuracy

Identification target accuracy

The system uses the Field Identification Target Accuracy and the Table Identification Target Accuracy to calculate the optimal confidence threshold that will allow the system to reach the target accuracy with the minimum amount of manual effort.

When models for Semi-structured layouts are trained, the projected accuracy and automation estimates assume a 95% target accuracy for Field Identification and a 96% target accuracy for Table Identification, which may differ from your system's configuration. If your flow’s target accuracies are set higher than these percentages, you should expect to see lower automation, and if your system's target accuracy is set lower than these percentages, you should expect to see greater automation.

By fixing these estimates to a static 95% target accuracy for fields and 96% target accuracy for table columns, you can easily compare the estimates across models, even if you have changed your target accuracies between trainings.

You can set your Identification target accuracy values in the Identification section of your flow’s settings. To learn more, see Document Processing Subflow Settings.

Targets for specific fields or table columns

If you have certain fields or table columns (a.k.a. “entries”) that require a higher or lower level of accuracy than the rest of the entries in a flow’s release, you can set individual accuracy targets for each of those entries. For example, if you’re processing documents that contain addresses and policy or account numbers, and their identification needs to be as accurate as possible, you can set accuracy targets for these fields that are higher than those set for your documents, overall. Doing so eliminates the need for higher accuracy standards to be applied to the remaining entries, which may reduce the number of Identification Supervision tasks that are generated.

Automation rates for higher target accuracies

When testing this feature using a single document and multiple target accuracies:
- Due to the thresholding mechanism used in Identification, if a document is sent to Supervision for a given target accuracy, there is no guarantee that it will be sent to Supervision for all higher target accuracies.
- The underlying causes is of that behavior are following design choices:
  - We have multiple confidence scores and thresholds for them.
  - We select the best combination of these thresholds for each accuracy level independently from other accuracy levels.
- For example, If we send a document to Supervision because Threshold 1 for target accuracy 90%, it's possible that, for a target accuracy of 95%, Threshold 1 is less strict, and we don't send that particular document to Supervision.
When processing varied submissions and documents with multiple target accuracies over time:
- Higher target accuracies for a given entry will result in lower automation rates and more Supervision tasks being created for that entry.

No additional QA tasks are generated if you set accuracy targets at the entry level, and the sampling rates specified in flows’ settings still apply.

You can set target accuracy values for specific entires in the Identification section of your flow’s settings. To learn more, see Document Processing Subflow Settings.

Continuous Field Locator model improvement

When enabled, Continuous Field Locator model improvement will automatically train and deploy Field Locator models with better performance. By default, this setting is enabled in v38.0.1+ and disabled in v38.1 and later.

When disabled, users will have to manually trigger Field Locator model training to gain the benefit from new documents and training data.

Note that if Continuous Field Locator model improvement is enabled and you import a model from another environment, you might end up with lower automation rates. Models only use training data from their current environment, and if you do not have enough training data in your new environment, the model you imported will be overwritten by a worse one. For optimal performance, we recommend that you train models manually and disable the Continuous Field Locator model improvement setting. Only enable Continuous Field Locator model improvement if instructed to do so by a Hyperscience representative.

In order for a new training job to be triggered, the two following conditions must be met:

Training frequency – training will occur at least 3 days after the last training job was completed.
New QA document requirements – 5% of the number of documents that were used to train the live model are needed.

If the live model was trained on 150 documents, there must be 8 new documents in order for a new training job to be created (150 * 0.05 = 7.5, round up).

You can enable this setting in the “Identification” section of the application settings (Administration > System Settings).

Continuous Field Locator model improvement and Training Data Management

In Training Data Management, documents with completed Supervision and QA tasks are included as training data, but by default, their Training Status is Never. If you don't manually change the Training Status of these documents to Auto or Always, or if you don't upload new training documents, Continuous Field Locator Model training will never occur, even if enabled. In order for this automatic training to occur, there must be a 5% increase in the number of eligible training documents since the last training of the model.

For more information on managing training data, see Training Data Management.

Image Readability

Image Correction

If the Image Correction setting is enabled, the machine corrects the rotation of pages that are submitted to Hyperscience.

This setting is enabled by default, and you can find it in your flow’s Machine Classification Block. To learn more, see Flow Blocks.

Captured Image Enhancement

If the Captured Image Enhancement setting is enabled, the machine improves the readability of Semi-structured pages captured by mobile devices. To improve image quality, the machine properly adjusts pages’ orientation and crops backgrounds.

The Captured Image Enhancement setting is disabled by default, but you can enable it in your flow’s Machine Classification Block. To learn more, see Flow Blocks.

Before enabling the Captured Image Enhancement setting, make sure that the majority of the pages you will be processing are captured by mobile devices. Contact your Hyperscience representative for more information.