Transcription Accuracy and Automation

Overview

Accuracy is the proportion of data where machine predictions are correct. These predictions match the actual value that the data represents. To understand what the true value of a piece of data is, the machine uses Quality Assurance tasks. When QA is enabled, humans can give feedback to the machine and improve accuracy over time.

Transcription accuracy and automation

Transcription accuracy is measured on a field level. For example, if you have a Social Security number that is “123-45-6789” and the machine transcribes the SSN as “123-45-678”, the machine will be scored as 0% accurate, even though 8 of the 9 digits are correctly transcribed. Transcription is considered to be accurate only if the full field is correctly transcribed. For the machine to have 100% transcription accuracy, all 9 digits of the SSN need to be transcribed correctly.

Accuracy and automation

Automation is defined as the proportion of data where the machine is confident enough to make a prediction without human supervision. An automated field is a field that a human does not need to review or transcribe on their own. If 90% of the fields are automated, you only need human effort to process 10% of the fields.

There is a relationship between accuracy and automation. A higher accuracy target will see the machine send more fields for a human transcription or review. You can trade higher accuracy for lower automation, and vice-versa.

Accuracy is continuously measured, and the machine adapts to customer data every day. We will take a look at how you can improve both transcription accuracy and automation in the sections below.

Accuracy targets

Hyperscience provides the ability to set accuracy targets for document transcription once certain conditions are met. These targets will direct both the confidence thresholds and automation levels.

The benefits of setting accuracy targets are the following:

Accuracy targets enable both fine-tuning and auto-thresholding. Read more about fine-tuning and auto-thresholding in the sections below.
Accuracy targets lead to increased automation as the model improves its decision making process of sending fields to Supervision.

Each field type can have a different accuracy target. The minimum number of QA records per field type are:

For Structured text - 5000 fields
For Semi-structured text - 2000 fields
For checkboxes - 2000 fields
For signatures - 2000 fields
For table text - 2000 table cells

The accuracy targets are all configurable on a flow level. You can access these settings by clicking on Flows in the left-hand sidebar and clicking on the name of a flow. You can set accuracy targets for Structured or Semi-structured text only if the Transcription Automation Training setting is enabled for that document type. To learn more about Transcription Automation Training settings, see the “Structured Document Transcription” and “Semi-structured Document Transcription” sections in Flow Settings.

Period of records to use is a flow setting that applies to all field types. There are separate Periods of records to use flow settings for Structured and Semi-structured documents. You decide how many days in the past you would like to pull data from. You need to maintain the minimum number of QA records for each field during the period you set.

Machine confidence

Machine confidence is an internal non-configurable number. The machine confidence numbers only have meaning inside the specific machine learning models. Note that machine confidence is different from accuracy and probability. The relationship between accuracy and confidence is not straightforward. For example, fields with machine confidence of 0.8726 are not 87% accurate or 87% likely to be accurate, and 0.8726 does not mean 87% confidence.

The confidence scores and confidence thresholds are automatically adjusted over time.
You control the accuracy requirement – you indicate how accurate you want the transcription to be. Hyperscience adjusts all the other metrics, so you achieve the desired accuracy.
Let’s say you require 99.5% accuracy. This accuracy target may lead to 75% automation. When the machine starts learning from QA results, the automation may increase to 94% while maintaining the same level of accuracy. You can learn more about improving automation while keeping the desired level of accuracy in the section below.

Confidence distribution from QA data

Results from QA tasks are used to create a confidence distribution. In the example below, you can see such a confidence distribution:

As mentioned in the previous section, the confidence values do not mean anything outside of the particular machine learning models. If a field is transcribed with 0.96 confidence, this does not mean that the model is 96% sure that the data was transcribed correctly. As observed in the chart above, 0.96 for this particular confidence distribution means that the model is about 99.74% confident in its transcription accuracy (100% - .26%). Therefore, if you want to have a 99.5% overall accuracy budget, the model picks a threshold whose number of errors divided by the total number of fields is 0.5%.

Fine-tuning and auto-thresholding

QA tasks allow the confidence distribution to further adapt to the data the model sees in your instance. To enforce verified results with very little ground truth error, we use consensus. To learn more about consensus, see Scoring Transcription Accuracy.

Fine-tuning is the process of Transcription Automation Training. During fine-tuning, machine learning models use verified QA tasks to gain new fine-tuned confidence, based on the observed ground truth and the results from current and previous QA tasks. Ground truth is used to fine-tune the models by:

Decreasing confidence on high-confidence errors - cases where the machine incorrectly transcribed fields with high confidence. The machine learns from these high-confidence errors and improves accuracy.
Increasing confidence on low-confidence correct extractions - cases where the machine correctly transcribed fields but the fields were still sent to Supervision. The machine learns from these low-confidence correct extractions and improves automation.

The original confidence is the confidence the model has in reading fields correctly. The fine-tuned confidence incorporates both the original confidence as well as all QA data, thus making the fine-tuned confidence more accurate.

Using fine-tuned confidence, the model creates a new confidence threshold in a process called auto-thresholding. Auto-thresholding is one way our machine learns and improves over time. Auto-thresholding is scheduled to run nightly and whenever you edit a flow’s Structured Document Transcription settings or Semi-structured Document Transcription settings.

With learnings from QA tasks, machine accuracy can be measured across the confidence spectrum.
Thresholds are automatically set, based on the desired level of accuracy.
To reach the desired level of accuracy, the model auto-adjusts the thresholds automatically.