Transcription Settings

Our Transcription settings fall into two categories:

  • Supervision Tasks 

  • Accuracy

Supervision Tasks

Manual Transcription Supervision

The Manual Transcription Supervision setting enables manual transcription tasks in Hyperscience. If disabled, all submissions will be transcribed as Machine Only, and the Consensus and Required settings in the Layout Editor will be disregarded.

You can enable Manual Transcription Supervision in the “Extraction” section of your flow’s settings.

To reduce your handling time when transcribing tables with blank cells, disable the Create Manual Transcription task for tables with blank cells setting in the Manual Transcription Block in your flow. It is enabled by default to ensure that blank cells always go to Transcription Supervision in order to avoid potential data loss. 

You can disable this setting by clicking on your flow’s Manual Transcription Block in Flow Studio and deselecting the Create Manual Transcription task for tables with blank cells checkbox.

Transcription Quality Assurance

The Quality Assurance (QA) process samples a user-defined portion of fields and sends them for QA Supervision so the system can gather data on machine and data keyer accuracy. 

The Transcription Quality Assurance setting enables all Transcription QA tasks in your flow. Note that accuracy cannot be measured if this setting is disabled. When you enable Transcription QA, you must also set the QA sample rate. 

You can enable Transcription Quality Assurance in the General Transcription section of your flow’s settings.

Automatic QA Sample Rate

If the Automatic QA Sample Rate setting is enabled, the system calculates and set the percentage of fields that should be selected for Transcription QA. This setting calculates QA sample rates daily for your:

  • Structured text, 

  • Structured checkbox,

  • Structured signature, and 

  • Semi-structured fields.

To calculate the automatic QA sample rate, the system uses the number of QA records you currently have:

  • Less than 4999 QA records – 10% sample rate

  • 5000 to 9999 QA records – 5% sample rate

  • 10000 to 14999 QA records – 3% sample rate

  • 15000 to 49999 QA records – 2.5% sample rate

  • More than 50000 QA records – 1% sample rate

If you want to edit the above-mentioned sample rates, contact your Hyperscience representative.

You can enable the Automatic QA Sample Rate setting in the General Transcription section of your flow’s settings. 

To ensure that the automatic QA sample rate is automatically calculated and updated, a live release must be assigned to a given flow.

Transcription QA sample rate

This sample rate refers to the percentage of fields that will be selected for Transcription QA. A high sampling rate reduces the margin of error in accuracy calculations but also has a higher labor cost.

The most important question to consider when deciding on the sample rate is: what is the acceptable margin of error? The more documents you QA, the closer you can get to the true accuracy of the system. You also have to take time into consideration: the margin of error will lower when aggregating the volume over a span of seven days than over one day, as shown in the chart below:

You can set your Transcription QA sample rate in the General Transcription section of your flow’s settings. Please contact your deployment manager for additional help with determining the right sample rate for your team. 

Accuracy

Transcription Automation Training

Transcription Automation Training is a flow setting that enables on-premise training for transcription. It uses accuracy information from QA to reach your desired target system accuracy while minimizing human intervention. 

  • You can choose whether to enable Transcription Automation Training for Structured document transcription, Semi-structured document transcription, or both.

  • You can set a different target accuracy for each field type. Note that when you set the target accuracy, the system will automatically calculate the automation level the system can achieve while hitting the target accuracy. The thresholds will also be automatically updated.

  • When these thresholds are set, the system will automatically process all fields with machine confidence above the corresponding threshold and send all fields below the corresponding threshold to a data keyer for Transcription Supervision.

Note that, in v37, Transcription Automation Training for Semi-structured documents also includes fields with multiple bounding boxes.

In the Period of Records to Use setting, decide how many days in the past you would like to pull data from. For example, let’s say you set the Period of records to use setting to 30 days. To use PII data for Transcription Automation Training, you need to complete a submission’s QA tasks within 30 days of the submission’s creation.

To make sure Transcription Automation Training has enough time to use the PII data, you need to set the PII deletion policy for a longer period than the one set in Period of records to use.

The PII deletion setting deletes the submissions’ images and transcribed data without resetting the Transcription Automation Training confidence to 0.

  • Transcription Automation Training can use PII-deleted records only if there’s not enough new data from non-PII-deleted records. Note that, when training on PII-deleted records, the model won’t perform as well as it would if you were training on non-PII-deleted records.

  • Submission record deletion can only be configured to occur after the Transcription Automation Training period. To learn more, see the “Submission record deletion policy” section in PII Data Deletion.

You can configure both Transcription Automation Training and the period of records to use in the ”Structured Document Transcription” and “Semi-structured Document Transcription” sections of your flow’s settings.

Requirements for Transcription Automation Training

When Transcription Automation Training is enabled, the system will automatically review QA data to set optimal thresholds for:

  • text, checkbox, and signature fields for Structured documents

  • text and checkbox fields for Semi-structured documents.

Each threshold will be set independently once the minimum number of QA records required for each field type is achieved. The accuracy training job is scheduled to run nightly and whenever you edit Accuracy settings.

Minimum number of QA records per field type:

  • Structured documents require 5000 records for text fields, while checkbox and signature fields require 2000 records for each field type.

  • Semi-structured documents require 2000 records for text fields, 2000 records for table cells, and 2000 records for checkboxes.

Until the minimum number of records for a field type is met, the manually set machine confidence thresholds will be used. Thresholds will be updated once training completes, which may take up to 30 minutes. Under the Transcription Automation Training setting in your flow’s settings, you will find the last time a training job was successfully updated in the thresholds.

These thresholds are layout-specific, so if you deploy a release with updated layout versions, then your automation performance will decrease until enough QA data has been associated with each updated layout. For more information about this process, see What is a Release?

Note that custom field data types (CFDTs) will be included with "Entry" field types.

Thresholds

Transcription Quality Assurance must be enabled in order for the system to calculate optimal values for the Thresholds. These Thresholds determine when fields should be sent to Transcription Supervision.

Note that if Transcription Automation Training or Transcription Quality Assurance is disabled, you must manually set the Thresholds for each field type. You can set them in the Structured Document Transcription and Semi-structured Document Transcription sections of your flow’s settings.

Minimum legibility thresholds 

If manual transcription is not possible in your business process and you choose to disable the Manual Transcription Supervision feature, then you must set minimum legibility thresholds for each field type in the Structured Document Transcription and Semi-structured Document Transcription sections of your flow’s settings.

If Manual Transcription Supervision is disabled, the system will automatically process all fields above the minimum legibility threshold. Note that at very low confidence scores, the transcriptions can be quite inaccurate.

Projected Automation graph

The “Projected Automation Based on Target Accuracy” graph displays a separate line for each field type. Note that if you add new layouts that the system has not seen before, the actual automation rate will be lower than the projection.

Note that the graph reports combined projected automation for all language families. To learn more about language families, see Supported Languages.

You can find this graph in the “Transcription” section of the application settings (Administration > System Settings). You can go to the graph directly by clicking See Projections under the Transcription Automation Training setting in your flow’s settings.

ProjectedAutomationGraph.png