Document Processing Subflow Settings

This article describes the settings available in the Document Processing Subflow included in v41. The settings available in custom flows may differ from those described here, depending on which blocks are included in those flows. To learn more about the settings for individual blocks, see Flow Blocks.

As part of our efforts to give you more precise control over your Hyperscience processes, we’ve made many of our settings configurable on the flow level.

While you can build custom flows, each instance of Hyperscience includes a Document Processing flow. To learn more about the version of this flow that comes with v41, see Document Processing Flow in V41.

The Document Processing flow contains several subflows, including the Document Processing Subflow. This article focuses on the settings available in that subflow.

View the subflow’s settings

To view the settings of the Document Processing Subflow:

  1. Click Flows in the left-hand sidebar, and click on the name of the Document Processing flow that contains the Document Processing Subflow whose settings you would like to view.

  2. Click Edit Flows.

  3. On the Flow Studio canvas, click the Start Document Processing Subflow Block.

  4. Click the Settings Type drop-down list, and click on a setting type.

Edit the subflow’s settings

After you’ve viewed the subflow’s settings, you can make any necessary changes, and then click Save in the upper-right corner of the page. You can save changes to multiple settings types at once.

Available settings

The sections below describe the settings available for each setting type.

File Filter

Setting

Description

Default Value

All Files or Images Only

Determines whether the filters in this block are applied to all files in submissions (Apply to all files) or only to image files (i.e., files whose MIME type is image) (Apply to images only).

Apply to all files

Minimum Image Width (px)

The minimum width in pixels that an image needs to have in order to be allowed by the filter.

This filter applies only to images (i.e., files whose MIME type is image) and has no impact on other files.

(Blank)

Minimum Image Height (px)

The minimum height in pixels that an image needs to have in order to be allowed by the filter.

This filter applies only to images (i.e., files whose MIME type is image) and has no impact on other files.

(Blank)

Minimum File Size (KB)

The minimum size in kilobytes that a file needs to have in order to be allowed by the filter.

(Blank)

File Extension Action

Select one of the following options:

  • Do not filter files by extension

  • Allow only these file extensions

  • Deny files with these extensions

Do not filter files by extension

File Extensions

A list of file extensions that the filter will allow or deny, based on the option selected in File Extension Action. Select the checkboxes for the file extensions that you would like to filter by.

If zip is selected as a file extension, the filter will not decompress ZIP files included in submissions. Each ZIP file will be treated as an individual file, regardless of the numbers of types of files compressed within it.

If there are file extensions that you want to filter by that do not appear in the drop-down list, select other, and enter the extensions in Other File Extensions.

This field only appears if Allow only these file extensions or Deny files with these extensions is selected in File Extension Action.

(Does not appear)

Other File Extensions

A comma-separated list of file extensions that do not appear in File Extensions.

This field only appears if other is selected in File Extensions.

(Does not appear)

Submission Bootstrap

S3 Submission Retrieval Store

If you are using an S3 bucket as your submission retrieval store and you are not authenticating through IAM roles, provide your AWS access key ID and secret access key in the S3 Submission Retrieval Store field.

To enter your credentials:

  1. Click Edit.

  2. Enter your credentials in JSON format:

    {
    "aws_access_key_id": “<your_access_key_id>”,
    "aws_secret_access_key": “<your_secret_key>”
    }
  3. Click Done.

  4. Click Save in the upper-right corner of the page.

  5. In the dialog box that appears, click Save & Deploy.

For more information about AWS access key IDs and secret access keys, see Amazon's Understanding and getting your AWS credentials.

Classification

Setting

Description

Default Value

Structured Layout Match Threshold

The minimum confidence score a page must have in order for it to be matched to a layout. If the page's confidence score is below this value, the system sends it to Classification Supervision (if enabled) or marks it as "No Layout Found."

0.6

Semi-structured Classification

Enables the management of a model that automatically classifies Semi-structured and Additional documents.

Enabled

Manual Classification Supervision

Enables Classification Supervision.

Disabled

Semi-structured Classification Target Accuracy

Your desired accuracy for the classification of Semi-structured and Additional documents. If the estimated accuracy of the model's prediction for a document is below this value, the system will send the document to Classification Supervision (if enabled) or mark it as "No Layout Found."

99

Semi-structured Classification Grouping Logic

Determines how multiple pages are matched to the same layout variation in a given submission will be handled.

To learn more about this setting, see Document Classification Settings.

Consecutive pages as a document

Semi-structured QA Sample Rate

The percentage of documents that the system will randomly select for Classification QA.

5

Identification

Setting

Description

Default Value

Identification Target Accuracy (Entry Level)

Allows you to set flow-level Identification Target Accuracy values for fields and table columns (“entries”) included in the flow’s release.

To enter target accuracies, click the pencil icon below Identification Target Accuracy (Entry Level).

Then, click the Target Accuracy cell for a field or table column to enter a target accuracy for it. Click outside of the cell to save the value you entered. If you have many entries and layouts, clicking Filter and entering criteria for the Layout Name, Entry Name, and Entry Type filters may be helpful.

Automation rates for higher target accuracies

  • When testing this feature using a single document and multiple target accuracies:

    • Due to the thresholding mechanism used in Identification, if a document is sent to Supervision for a given target accuracy, there is no guarantee that it will be sent to Supervision for all higher target accuracies.

    • The underlying causes is of that behavior are following design choices:

      •  We have multiple confidence scores and thresholds for them.

      • We select the best combination of these thresholds for each accuracy level independently from other accuracy levels.

    • For example, If we send a document to Supervision because Threshold 1 for target accuracy 90%, it's possible that, for a target accuracy of 95%, Threshold 1 is less strict, and we don't send that particular document to Supervision.

  • When processing varied submissions and documents with multiple target accuracies over time:

    • Higher target accuracies for a given entry will result in lower automation rates and more Supervision tasks being created for that entry.

To learn more about entry-level Identification Target Accuracy, see Identification Settings.

The value set in Field Identification Target Accuracy or Table Identification Target Accuracy

Field Identification Target Accuracy

Your desired accuracy for the identification of fields. If the estimated accuracy of the model's prediction for a field is below this value, the system will send the field and all its occurrences (if any) to Field ID Supervision.

95

Table Identification Target Accuracy

Your desired accuracy for the identification of tables. If the estimated accuracy of the model's prediction for a table is below this value, the system will send the table to Table ID Supervision.

96

Manual Identification Supervision

Enables Field ID Supervision and Table ID Supervision.

Enabled

Field Identification Quality Assurance

Enables Field ID Quality Assurance. If disabled, the system won't have data to retrain existing Semi-structured models, and Field Identification on new Semi-structured layouts cannot be automated.

Enabled

Field Identification QA Sample Rate

The percentage of documents that the system will randomly select for Field ID QA.

This setting is only available if Field Identification Quality Assurance is enabled.

5

Table Identification Quality Assurance

Enables Table ID Quality Assurance. If disabled, the system won't have data to retrain existing Semi-structured models with tables, and Table Identification on new Semi-structured layouts cannot be automated.

Enabled

Table Identification QA Sample Rate

The percentage of documents that the system will randomly select for Table ID QA.

This setting is only available if Table Identification Quality Assurance is enabled.

5

Manual Identification Notification Flow

Determines which Notification subflow is used to send Manual Identification updates to downstream systems.

Submission State Notifications

Default Task Restrictions

Determines which users can access Supervision Tasks created by the Manual Identification Block. To learn more, see Task Restrictions Overview.

None

General Transcription

Setting

Description

Default Value

Customize Field Transcription

Allows you to:

  • set flow-level Transcription Target Accuracy values for fields included in the flow’s release, and

  • view any Transcription Target Accuracy values set in the Field Dictionary for those fields  

If you enter a target accuracy for a field that already has a target accuracy set in the Field Dictionary, the value you enter here will override the one in the Field Dictionary.

Note that field-specific accuracy targets apply only to fields in Structured documents.

To enter target accuracies, click Customize Field Transcription, and click the Target Accuracy cell for a field to enter a target accuracy for it. Click outside of the cell to save the value you entered. If you have many fields and layouts, clicking Filter and entering criteria for the Layout Name and Field Name filters may be helpful.

Setting a particular field’s Target Accuracy to 99%, for example, does not always guarantee that 99% accuracy will be reached all the time. The feature utilizes the Accuracy thresholds for the entire Transcription Fine-tuning model, which are valid throughout all fields and not available per field. Even so, setting a Target Accuracy for a particular entry to 99% guarantees that the entry will have a higher accuracy threshold compared to setting the Structured Text Target Accuracy for all entries.

To learn more about field-level Transcription Target Accuracy, see Transcription Accuracy and Automation.

The value set in Structured Text Target Accuracy, or the Transcription Target Accuracy value set in the Field Dictionary (if any)

Customize Transcription Target Accuracy

For Structured documents, allows you to:

  • set flow-level Transcription Target Accuracy values for fields included in the flow’s release, and

  • view any Transcription Target Accuracy values set in the Field Dictionary for those fields  

For Semi-structured documents, allows you to:

  • set flow-level Transcription Target Accuracy values for specific fields or table columns (a.k.a. “entries”) included in the flow’s release.

If you enter a target accuracy for a field in a Structured layout that already has a target accuracy set in the Field Dictionary, the value you enter here will override the one in the Field Dictionary.

To enter target accuracies, click Customize Transcription Target Accuracy, and click the Target Accuracy cell for an entry to enter a target accuracy for it. Click outside of the cell to save the value you entered. If you have many entries and layouts, clicking Filter and entering criteria for the types and names of entries and layouts may be helpful.

Setting a particular entry’s Target Accuracy to 99%, for example, does not always guarantee that 99% accuracy will be reached all the time. The feature utilizes the Accuracy thresholds for the entire Transcription Fine-tuning model, which are valid throughout all entries and not available per entry. Even so, setting a Target Accuracy for a particular entry to 99% guarantees that the entry will have a higher accuracy threshold compared to setting the Structured Text Target Accuracy or Semi-structured Text Target Accuracy for all entries.

To learn more about Transcription Target Accuracy for individual entries, see Transcription Automation and Accuracy.

  • For fields in Structured documents:

    • The value set in Structured Text Target Accuracy, or the Transcription Target Accuracy value set in the Field Dictionary (if any)

  • For fields or table columns in Semi-structured documents:

    • The value set in Semi-structured Text Target Accuracy

Manual Transcription Supervision

Enables Transcription Supervision.

Enabled

Transcription Quality Assurance

Enables Transcription Quality Assurance. If disabled, the system won't have the data needed to determine the accuracy of transcriptions.

Enabled

Automatic QA Sample Rate

If enabled, based on the QA records you have, the system automatically calculates QA sample rates for your:

  • Structured text,

  • Structured checkbox,

  • Structured signature, and

  • Semi-structured fields.

When enabled, this setting overrides the following flow settings:

  • Structured Text Transcription QA Sample Rate

  • Structured Checkbox Transcription QA Sample Rate

  • Structured Signature Transcription QA Sample Rate

  • Semi-structured Transcription QA Sample Rate

Disabled

Structured Text Transcription QA Sample Rate

The percentage of text fields in Structured documents that the system randomly samples for Transcription QA.

5

Structured Checkbox Transcription QA Sample Rate

The percentage of checkbox fields in Structured documents that the system randomly samples for Transcription QA.

5

Structured Signature Transcription QA Sample Rate

The percentage of signature fields in Structured documents that the system randomly samples for Transcription QA.

5

Semi-structured Transcription QA Sample Rate

The percentage of fields in Semi-structured documents that the system randomly samples for Transcription QA.

5

Table Transcription Quality Assurance

Enables Table Transcription Quality Assurance. If disabled, the system won't have the data needed to determine the accuracy of table cell transcriptions.

Disabled

Table Transcription QA Sample Rate

The percentage of table cells that the system randomly samples for Transcription QA.

5

Finetuning Only For Trained Layouts

If enabled, finetuning (Transcription Automation) only uses the layouts its model was trained on.

​​If you add a new layout, the layout does not use finetuning and defaults to manually entered thresholds until a new finetuning model is trained with that layout.

Enabled

Force Normalization Errors To Supervision

This setting allows you to send fields with normalization errors to Supervision. When enabled, the normalization errors are flagged for human review, ensuring data accuracy within the platform.

Disabled

Force Missing/Blank Fields To Supervision

Fields marked as Required in the Layout Editor can be sent to Supervision when missing or left blank. When this setting is enabled, blank fields are sent to Transcription Supervision, while missing fields are sent to Identification Supervision. Doing so ensures that critical fields are handled within Hyperscience.

Disabled

Structured Document Transcription

Setting

Description

Default Value

Transcription Automation Training

This feature enables the system to use QA data to calculate the optimal mix of data keyer and machine transcriptions to reach a specified target accuracy with the minimum amount of data keyer effort.

To see a graph of current projected automations, click See Projections. You will be redirected to the Transcription section of Administration > System Settings.

Disabled

Period of Records to Use

This setting is only available if Transcription Automation Training is enabled. It determines how far in the past to draw training data from.

100 days

Improved Threshold Accuracy

This setting helps to ensure that your target accuracy is being met on all processed fields.

Enabled

Structured Text Target Accuracy

Your desired accuracy for the transcription of fields in Structured documents.

If Transcription Automation Training for Structured documents is enabled, the system uses this value to calculate the Structured Text Automation once the minimum amount of training data is obtained through Transcription QA.

95

Structured Text Automation

This setting shows the level of automation you can expect when the system is working to reach the target accuracy set in Structured Text Target Accuracy.

The system automatically calculates this value after the minimum amount of training data is obtained through Transcription QA.

N/A

Structured Text Threshold

This setting determines the minimum confidence thresholds needed for a field to be automatically processed.

Fields with confidence scores above this threshold are automatically processed. If a field’s confidence score is below this threshold, it will be sent to Transcription Supervision.

If Transcription Automation Training for Structured documents is enabled, any value you enter manually will be overwritten by the value calculated by the system, based on your target accuracy.

.5

Structured Text Minimum Legibility Threshold

The minimum confidence score a text field must have in a Structured document in order for the system to automatically process the field. If a field's confidence score is below this value, the system will mark the field as illegible.

This setting only applies when Manual Transcription Supervision is disabled, the field has a Supervision Autotranscribe override, or the submission has a Machine-only override.

0.1

Structured Checkbox Target Accuracy

Your desired accuracy for the transcription of checkboxes in Structured documents.

If Transcription Automation Training for Structured documents is enabled, the system uses this value to calculate the Structured Checkbox Automation once the minimum amount of training data is obtained through Transcription QA.

95

Structured Checkbox Automation

This setting shows the level of automation you can expect when the system is working to reach the target accuracy set in Structured Checkbox Accuracy.

The system automatically calculates this value after the minimum amount of training data is obtained through Transcription QA.

N/A

Structured Checkbox Threshold

This setting determines the minimum confidence thresholds needed for a checkbox to be automatically processed.

Checkboxes with confidence scores above this threshold are automatically processed. If a checkbox’s confidence score is below this threshold, it will be sent to Transcription Supervision.

If Transcription Automation Training for Structured documents is enabled, any value you enter manually will be overwritten by the value calculated by the system, based on your target accuracy.

.56

Structured Checkbox Minimum Legibility Threshold

The minimum confidence score a checkbox field must have in a Structured document in order for the system to automatically process the field. If a checkbox field's confidence score is below this value, the system will mark it as illegible.

This setting only applies when Manual Transcription Supervision is disabled, the field has a Supervision Autotranscribe override, or the submission has a Machine-only override.

0.25

Structured Signature Target Accuracy

Your desired accuracy for the transcription of signatures in Structured documents.

If Transcription Automation Training for Structured documents is enabled, the system uses this value to calculate the Structured Signature Automation once the minimum amount of training data is obtained through Transcription QA.

95

Structured Signature Automation

This setting shows the level of automation you can expect when the system is working to reach the target accuracy set in Structured Signature Accuracy.

The system automatically calculates this value after the minimum amount of training data is obtained through Transcription QA.

N/A

Structured Signature Threshold

This setting determines the minimum confidence thresholds needed for a signature to be automatically processed.

Signatures with confidence scores above this threshold are automatically processed. If a signature’s confidence score is below this threshold, it will be sent to Transcription Supervision.

If Transcription Automation Training for Structured documents is enabled, any value you enter manually will be overwritten by the value calculated by the system, based on your target accuracy.

.56

Structured Signature Minimum Legibility Threshold

The minimum confidence score a signature field must have in a Structured document in order for the system to automatically process the field. If a signature field's confidence score is below this value, the system will mark it as illegible.

This setting only applies when Manual Transcription Supervision is disabled, the field has a Supervision Autotranscribe override, or the submission has a Machine-only override.

0.5

Semi-structured Document Transcription

Setting

Description

Default Value

Transcription Automation Training

This feature enables the system to use QA data to calculate the optimal mix of data keyer and machine transcriptions to reach a specified target accuracy with the minimum amount of data keyer effort.

To see a graph of current projected automations, click See Projections. You will be redirected to the Transcription section of Administration > System Settings.

Disabled

Period of Records to Use

This setting is only available if Transcription Automation Training is enabled. It determines how far in the past to draw training data from.

100 days

Improved Threshold Accuracy

This setting helps to ensure that your target accuracy is being met on all processed fields.

Enabled

Semi-structured Text Target Accuracy

Your desired accuracy for the transcription of fields in Semi-structured documents.

If Transcription Automation Training for Semi-structured documents is enabled, the system uses this value to calculate the Semi-structured Text Automation once the minimum amount of training data is obtained through Transcription QA.

95

Semi-structured Text Automation

This setting shows the level of automation you can expect when the system is working to reach the target accuracy set in Semi-structured Text Target Accuracy.

The system automatically calculates this value after the minimum amount of training data is obtained through Transcription QA.

N/A

Semi-structured Text Threshold

This setting determines the minimum confidence thresholds needed for a field to be automatically processed.

Fields with confidence scores above this threshold are automatically processed. If a field’s confidence score is below this threshold, it will be sent to Transcription Supervision.

If Transcription Automation Training for Semi-structured documents is enabled, any value you enter manually will be overwritten by the value calculated by the system, based on your target accuracy.

.5

Semi-structured Text Minimum Legibility Threshold

The minimum confidence score a text field must have in a Semi-structured document in order for the system to automatically process the field. If a field's confidence score is below this value, the system will mark the field as illegible.

This setting only applies when Manual Transcription Supervision is disabled, the field has a Supervision Autotranscribe override, or the submission has a Machine-only override.

0.1

Table Target Accuracy

Your desired accuracy for the transcription of table cells in Semi-structured documents.

If Transcription Automation Training for Semi-structured documents is enabled, the system uses this value to calculate Table Automation once the minimum amount of training data is obtained through Transcription QA.

95

Table Automation

This setting shows the level of automation you can expect when the system is working to reach the target accuracy set in Table Target Accuracy.

The system automatically calculates this value after the minimum amount of training data is obtained through Transcription QA.

N/A

Table Threshold

This setting determines the minimum confidence thresholds needed for a table cell to be automatically processed.

Table cells with confidence scores above this threshold are automatically processed. If a table cell’s confidence score is below this threshold, it will be sent to Transcription Supervision.

If Transcription Automation Training for Semi-structured documents is enabled, any value you enter manually will be overwritten by the value calculated by the system, based on your target accuracy.

0.52

Table Minimum Legibility Threshold

The minimum confidence score a table cell must have in a Semi-structured document in order for the system to automatically process the table cell. If a table cell’s confidence score is below this value, the system will mark the table cell as illegible.

This setting only applies when Manual Transcription Supervision is disabled or the submission has a Machine-only override.

0.1

Semi-structured Checkbox Target Accuracy

Your desired accuracy for the transcription of checkboxes in Semi-structured documents.

If Transcription Automation Training for Semi-structured documents is enabled, the system uses this value to calculate the Semi-structured Checkbox Automation once the minimum amount of training data is obtained through Transcription QA.

95

Semi-structured Checkbox Automation

This setting shows the level of automation you can expect when the system is working to reach the target accuracy set in Semi-structured Checkbox Accuracy.

The system automatically calculates this value after the minimum amount of training data is obtained through Transcription QA.

N/A

Semi-structured Checkbox Threshold

This setting determines the minimum confidence thresholds needed for a checkbox to be automatically processed.

Checkboxes with confidence scores above this threshold are automatically processed. If a checkbox’s confidence score is below this threshold, it will be sent to Transcription Supervision.

If Transcription Automation Training for Semi-structured documents is enabled, any value you enter manually will be overwritten by the value calculated by the system, based on your target accuracy.

.56

Semi-structured Checkbox Minimum Legibility Threshold

The minimum confidence score a checkbox field must have in a Semi-structured document in order for the system to automatically process the field. If a checkbox field's confidence score is below this value, the system will mark it as illegible.

This setting only applies when Manual Transcription Supervision is disabled, the field has a Supervision Autotranscribe override, or the submission has a Machine-only override.

0.25