Glossary

Prev Next

This article provides a list of terms as a reference to help you better understand key concepts related to the Hyperscience platform. It clarifies commonly used terminology, ensuring consistency across documentation and conversations. By using these definitions, you’ll gain a clearer understanding of how our platform works and how to make the most of its features.

A

Term

Definition

Reference

Accuracy

Accuracy measures the effectiveness of the models based on the proportion of correct predictions out of all predictions made.

It helps you understand how often the system correctly predicts values compared to the actual values that reached consensus during QA. Accuracy can be influenced by factors like imbalanced datasets or inconsistent annotations.

Accuracy

Additional Layouts

A layout type used to categorize pages where no data is extracted (e.g., a fax cover sheet). These layouts allow users to define custom categories for unmatched pages, helping to improve classification accuracy. Additional layouts can apply to Structured, Semi-structured, or Unstructured documents.

Average Handling Time

A metric that represents the average time it takes to process a submission. The value is averaged across multiple documents or submissions within a specific date range.

Operational Value Reporting

Annotation

A user-provided input that defines the correct prediction for a given machine learning task. Annotations are used to train supervised machine learning models.

Anomaly

A potential inconsistency or error in how a field or table is labeled in a document. Anomalies are flagged to help ensure consistent and accurate training data, which improves model performance.

Labeling Anomaly Detection

API Blocks

These blocks enable Hyperscience to interact with external systems through APIs, facilitating tasks like data retrieval, validation, or sending information to other applications.

Flow Blocks

Auto Thresholding

An automated process that calculates a confidence threshold based on a target accuracy. Predictions below this threshold are sent to Supervision for a human review to ensure the system meets the desired accuracy.

Automation

The processing of data without the need for human intervention.

Automation

Automation Rate

The extent to which a machine can process data independently without requiring human supervision. It represents the proportion of extracted data with confidence scores exceeding a specified threshold. This threshold is determined by the level of accuracy you want your extracted data to have.

Automation

Auto-Splitting

A feature in Hyperscience that automatically groups pages into documents using rules you define. It helps organize Semi-structured documents by deciding where one document ends and another begins based on page count, text patterns (like titles), or layout-specific logic.

Auto-Splitting

B

Bounding Box

A rectangular subregion of a given page that specifies the location of text to be processed downstream or to be displayed to the user.

Segmentation

Bundle

A packaged file that contains everything needed to install or upgrade the Hyperscience platform. It includes the application and all required tools, helping to streamline setup and upgrade processes.

Bypass Validation if Layout ID is Missing

A flow-level setting that bypasses validation by layout identifier if the matched Structured layout variation doesn’t have an identifier specified. The bypass allows the system to continue classifying documents even without layout identifiers, ensuring that documents that are not tied to a specific layout variation are still processed.

Structured Classification and Layout Identifiers

C

Callibration

A quality check performed after QA on Structured documents to evaluate model performance. It helps set target accuracy levels, define baseline automation thresholds, and assess how well different layouts, fields, or data types are processed before going live.

Contact your Hyperscience Representative for more information.

Case

A group of related documents, files, or pages that are processed together using a unique Case ID.

Case Collation

Cell

A value in a table that holds a single piece of data, such as a name, number, date, or multiline entries like an address or description. In Hyperscience, cells are key to reading and extracting data from tables accurately.

Table Identification

Character

Any single letter, number, or symbol found in a document. Hyperscience reads characters to understand and extract text.

Checkbox

A non-text field used to capture two-option answers like “Yes/No” or “True/False.”

Checkboxes and Signatures

Classification Model

A machine learning model that automatically identifies a document’s type—Structured, Semi-structured, or Additional—and matches it to the correct layout. This classification helps Hyperscience process different document types accurately without manual intervention.

Classify Using Layout Identifier

A flow-level setting that allows Structured documents to be matched using a layout identifier.

Structured Classification and Layout Identifiers

Clustering

The process of grouping similar documents or data points based on shared characteristics, often using machine learning algorithms. Clustering helps the platform to better organize and interpret large volumes of data by recognizing patterns and similarities..

Document Drift Management (Layout Triage)

Collation

The process of grouping related files, documents, or pages into a single case using a unique identifier called a Case ID. For example, if you submit multiple documents for a loan application, collation ensures that all these documents are grouped together under one case for streamlined processing and review.

Case Collation

Column

A list of values in a table that are of the same type of information, like names or prices, with one value per row. In simple tables, columns usually appear as vertical sections. However, in more complex tables, columns may not follow a vertical layout but still represent the same kind of data across rows.

Table Identification

Consensus

A process used to confirm the correct value of a transcribed field. Consensus is reached when two matching transcriptions are provided for the same field, usually one from a human and one from the machine or two separate human-provided entries. This process ensures higher accuracy, especially when the system's confidence is low.

Transcription Supervision Consensus

Continuous Field Locator model improvement

When this setting is enabled, the system automatically retrains and updates Field Locator models using newly available QA data. This process allows the model to improve over time without manual intervention. It helps enhance accuracy for identifying field locations in Semi-structured documents.

This setting should only be enabled if there’s enough training data in the environment to support it.

Identification Settings

Copycat

After you’ve annotated a single row from a table, you can use the copycat feature to copy the annotations to the remaining rows of the table. The copycat is not always accurate, so make sure to double-check the annotations before you submit.

Custom Code Block

A flexible component in Hyperscience flows that allows you to add custom Python logic to transform, validate, or enrich data before it's sent to downstream systems. It lets you apply your own business rules as part of document processing.

Custom Data Type

A user-defined format that tells Hyperscience what a specific type of data should look like, such as a Social Security Number or a policy ID. Custom data types help the system validate and extract field values more accurately based on expected patterns.

Custom Supervision

A configurable task in Hyperscience that you can tailor to your business needs. It allows you to manually review, validate, or enrich data using flexible logic, custom fields, and decision types.

Custom Supervision

D

Database Block

A specific type of block that allows you to connect Hyperscience to external databases. These blocks allow the system to fetch or validate information during document processing.

Flow Blocks

Data Extraction

The process of pulling specific information—like names, dates, or amounts—from a document. In Hyperscience, extraction happens after a page is matched to a layout and uses trained models to identify and capture the right data.

Dataset

A group of documents used to help the system learn or improve. Datasets are used for training, testing, or evaluation of how well the system reads and extracts information.

Data Type

A property that defines the format of the data expected in a field, like numbers, dates, or email addresses. For example, the data type Date accepts only valid dates (e.g., MM/DD/YYYY). Data types help Hyperscience understand what’s expected in a field and flag anything that doesn’t match.

What is a Data Type?

Deployed Model

A trained machine learning model that has been activated within Hyperscience to process documents in real time. Once deployed, the model is live and is used to process documents—classifying them, locating fields, and extracting data based on what it has learned.

Training a Semi-structured Model

Document

A group of one or more pages processed as a single unit in Hyperscience. Documents are categorized as Structured, Semi-structured, or Additional based on how consistent their layouts are and how fields can be extracted from their pages.

Understanding Document Types

Document Classification Quality Assurance

A task where you review and confirm whether Hyperscience correctly identified the type of each page. This process helps improve the system’s ability to match pages to the right layout, supports the training of the Classification model, and is used for document-classification reporting.

Document Classification

Document Classification Task

The first step in Supervision. It is used to categorize and combine pages that were not classified by the machine.

Document Classification

Document Drift Management (Layout Triage)

A post-processing feature that helps you manage documents that don't match layouts during Classification. When submissions don't meet the Structured Layout Match Threshold or are manually flagged as having incorrect or missing layouts, their pages are marked as unmatched.

Document Drift Management (Layout Triage)

Document Eligibility Filtering

A feature in Training Data Management that indicates whether a document is eligible for training based on internal checks in the application and our machine learning logic. It provides additional information about documents that were excluded from the training set.

Document Eligibility Filtering

Document Renderer Block

A step in a Hyperscience flow that turns processed documents into downloadable PDFs and generates links to access them. You can customize the page size and image quality of the PDFs to meet your needs.

Flow Blocks

Dropout

A field-level setting that tells the system to ignore background text like pre-printed labels or symbols. When enabled, the system removes this background content and transcribes only new or handwritten text, helping to improve accuracy.

This setting is enabled by default.

Training a Structured Model

E

Eligible Number of Documents

The number of documents that meet the requirements for training Identification models in Hyperscience. A minimum of 100 is needed to train, with 400 recommended for best results.

This number applies to Field Identification and Table Identification models.

“.env” File

A configuration file used to define environment-specific variables, such as API keys or database credentials. It allows Hyperscience to run securely and consistently across different instances.

Excluded Documents

Documents added to a Classification training set to show the system what should not be matched to a specific layout. They help improve model accuracy by teaching your model to ignore documents that look similar but don’t belong.

TDM for Classification Models

F

False Negatives

A type of error or outcome in machine learning evaluation. In Hyperscience, a false negative happens when the model fails to extract a field value that is clearly present and should have been captured.

False negatives apply to Field Identification and Table Identification models.

False Positive

A type of error or outcome in machine learning evaluation. In Hyperscience, a false positive happens when the model predicts a field value in the wrong location or extracts something that shouldn’t be considered a valid field at all.

False positives apply to Field Identification and Table Identification models.

Field

A labeled piece of information you want to capture from a document, like “Name,” “Date of Birth,” or “Total Amount.” In Hyperscience, fields allow you to specify the values that will be extracted from your documents.

Field Identification

Field Customization

A feature that allows you override a field's default settings on a per-release basis. For example, you can set a layout's "Name" field to always be sent to Supervision for a given release, but not for other releases the field's layout appears in.

Creating Field Customizations

Field Dictionary

A centralized place in Hyperscience where you define and manage field customizations for Structured layouts. It helps keep field names, data types, and output settings consistent across different documents

Navigating the Field Dictionary

Field Identification Quality Assurance

A manual quality check for Semi-structured documents where you review and correct the system’s predicted field locations or humans’ input. Doing so helps improve how accurately your model finds and extracts information in future documents. The human’s input in QA is used to measure human performance for reporting purposes.

Field Identification Quality Assurance

Field Identification Task

A manual task in Hyperscience where you confirm or correct the location of fields in Semi-structured documents. You can adjust or draw bounding boxes around field values to help your model learn where to look for the data you want to extract. You may need to perform a Field ID task when the machine is not confident enough in its prediction for a field, based on the target accuracy.

Field Identification

Field Identification Model / Field Locator Model

A machine learning model in Hyperscience that learns where fields are located in Semi-structured documents. It uses examples from training to predict the position of each field on a page so the system can extract the right data.

Training a Semi-Structured Model

Finetuning

A process that improves the accuracy by using your data to adjust system thresholds automatically. It helps ensure the system makes more accurate predictions and flags uncertain results for review, reducing errors and improving overall performance.

Managing Transcription Models

Field-Level Accuracy Targets (FLAT)

A configuration in Hyperscience that allows you to set different accuracy levels for specific fields or table columns. For example, if you need higher accuracy for fields like addresses or account numbers, you can set a higher target for them while keeping other fields at a lower target accuracy. Doing so helps improve the precision of critical fields without adding extra tasks.

Identification Settings

Flexible Extraction

A task in Hyperscience that involves human intervention to validate or correct data extraction for Structured documents. This task is used when automatic extraction isn’t fully reliable, allowing you to transcribe or adjust specific fields to ensure accuracy.

Transcription

Flexible Extraction Block

A component in Hyperscience that allows you to define when documents or fields should undergo Flexible Extraction. It enables the validation of transcriptions or adding data to documents that were manually categorized or skipped regular Transcription Supervision. To use it, you need a Custom Code Block to set the specific rules.

Flow Blocks

Flow (Workflow)

A customizable workflow in Hyperscience that automates processes, including steps like classification, data extraction, validation, and output. Flows streamline operations by handling tasks sequentially with minimal manual effort.

Flows Overview

Flow Identifiers

A unique name or identifier for a specific flow within the system. It helps distinguish flows clearly, especially during testing and debugging. This identifier may differ from labels displayed in the UI. For more information, contact your Hyperscience representative.

Testing and Debugging Flows

Flow Run

The complete execution cycle of a specific flow within Hyperscience. Each flow run encompasses all steps from initiation to completion for a given submission, allowing you to monitor, troubleshoot, and manage document processing.

Flow Runs Page

Full Page Transcription (FPT)

A process in Hyperscience that captures all visible text on a page, not just specific fields. It’s useful for documents with Unstructured layouts, enabling broader data extraction and analysis.

Text Classification

G

Graphics Processing Unit (GPU)

A processing unit initially designed to increase the speed of graphics calculations. It is efficient in completing matrix computations, making it faster than a central processing unit (CPU) in some calculations related to machine learning.

Ground Truth

Manually annotated data used to train our machine learning models. We use a subset of this data to assess the performance of your models

Training a Semi-Structured Model

Groups

A collection of documents in the Training Data Curator used to organize training data for machine learning models. Groups help you manage, annotate, and track documents based on specific use cases, such as invoice processing.

Training Data Curator

H

Human-in-the-Loop (HITL)

A process where people review and correct data that the Hyperscience platform has low confidence in. It helps improve accuracy and ensures high-quality results, especially when the system is unsure. Also known as Supervision.

What is Supervision?

I

Intelligent Document Processing (IDP)

The customized automation of data extraction from paper-based documents or document images to integrate with specific digital business processes.

IDP Flow

The end-to-end process in Hyperscience where documents are uploaded, classified, and automatically processed to extract key data. It includes steps like document ingestion, data extraction, validation (with Human-in-the-Loop if needed), and structured output delivery to downstream systems.

Flows Overview

Image

A file (e.g., a scanned document or photo) that the Hyperscience platform processes to extract text and data from.

Image Correction

A setting in the Machine Classification Block that identifies and corrects the orientation of page images by automatically rotating them.

Flow Blocks

Incremental Training

A process that enables you to efficiently update your existing model by incorporating new data or minor annotation changes without losing previously learned information.

Incremental Training

Input Blocks

Blocks that enable you to integrate and process documents from your organization's data sources (such as inboxes, message queues, or network folders) within the system.

J

Job

A logical unit of work to be accomplished within the system.

Jobs Page

K

Knowledge Store

A database-like feature that allows you to store business data (e.g., vendor names, addresses) that can be displayed as decision choices during Custom Supervision. Configuring Custom Supervision tasks to retrieve validated sets of choices from the Knowledge Store prevents errors and reduces time spent on Supervision.

Knowledge Store

L

Layout

The way a page is arranged in a document. In Hyperscience, layouts guide how data is extracted from different types of documents, like forms, invoices, and purchase orders. There are three types of layouts: Structured, Semi-structured, and Additional, depending on how consistent the content and its arrangement are across pages.

Layout Identifier

A string of characters that appears on a specific location on a page in a Structured layout. It allows the machine to distinguish among similar layouts that a submission matches to. It can contain letters, numbers, or a combination of both. Layout identifiers help the machine achieve the best match for submitted pages.

Layout Identifiers

Layout Variation

Layout variations occur when documents of the same type (such as HCFA or W-8 forms in the US) have the same key information but differ in how that information is arranged on the page.

Training a Structured Model

Layout Version

An iteration of a layout that reflects the state of a layout at a given point in time. You can create new versions and restore older ones based on your needs.

What is a Layout Version?

Large Language Models (LLMs)

Advanced models that understand and generate human-like text. They are used to enhance document-processing tasks, such as data extraction, summarization, and classification, by interpreting complex content. As a result, LLMs can improve automation and accuracy.

Flow Blocks

Large Language Models Install Block

A component that checks for and installs a Large Language Model (LLM) in the system if it’s not already present. This block ensures the necessary LLM is available for processing tasks that require advanced language understanding.

Flow Blocks

M

Machine Classification Block

A flow component that automatically detects and categorizes document types using AI, helping route them through the correct processing steps.

Flow Blocks

Machine Collation Block

A flow component that automatically groups related documents or data points, streamlining the organization and processing of information.

Flow Blocks

Machine Identification Block

A flow component that automatically finds and labels key fields on a page so the system knows what data to extract.

Flow Blocks

Machine Transcription Block

A flow component that reads and converts text from a document into structured, machine-readable data.

Flow Blocks

Manual Classification Block

A flow component that enables human reviewers to assign the correct document type when the model is not sure.

Flow Blocks

Manual Identification Block

A flow component that allows a human to annotate fields on a page to teach the model where information is located.

Flow Blocks

Manual Transcription Block

A flow component that allows a human reviewer to transcribe text when the system is not confident that its transcription is accurate.

Flow Blocks

Margin of Error (MoE)

The range of uncertainty in the system's estimate of accuracy. It shows you how much the estimate may differ from the true value. The smaller the margin of error is, the more confident the system is in its estimate.

Accuracy

Model

A construct in our system that performs specific tasks, like classifying documents, identifying fields, or extracting text. The model improves over time as it learns from more examples.

Multiple Bounding Boxes (MBB)

Used to annotate a single field's value when it spans across line or page breaks, ensuring the accurate capture of information across multiple locations or pages.

Field Identification

Multiple Occurrences (MO)

Used to identify multiple distinct instances of a field.

Field Identification

Model Validation Tasks (MVTs)

Tasks used to review and correct data predictions made by Hyperscience models. They help evaluate model accuracy and provide feedback that can be used to improve future model performance.

Document Classification Model Validation Tasks

Multiline

A setting in the Layout Editor for fields or columns that may span more than one line of text (e.g., Address, Description). When this setting is not enabled and a field's value spans more than one line, it will not be extracted in its entirety.

N

Named Entity Recognition Block (NER)

A component in Hyperscience that automatically identifies and extracts specific types of information—like names, addresses, and organizations—from unstructured text. This block is used in tandem with full-page transcription to process documents containing freeform text.

Flow Blocks

Nested Table

Data structure that represents a table within another table. It is embedded as a row into another table, creating a hierarchical or a “nested” structure. Nested tables allow you to extract data from tables with complicated structures where child-row data points inherit data points from parent rows.

What is a Nested Table?

Non-Structured Layout Classifier (NLC)

Finds the correct Semi-structured or Additional layout for a given set of submission pages based on the words in the submitted documents. Note that NLC works on a page level.

Automatic Document Classification

Normalization

The process of converting extracted data into a consistent format. In Hyperscience, normalization helps standardize values like dates, amounts, or addresses so they’re easier to use in downstream systems.

O

Optical Character Recognition (OCR)

A technology that automatically identifies and converts printed or handwritten text within digital images into machine-encoded text. In Hyperscience, OCR is a key step that enables the platform to extract and work with text during document processing.

Out-of-Memory (OOM) Error

An error that happens when a process in Hyperscience tries to use more memory than is available. It can interrupt processing and usually indicates that the job or document is too large or complex for the allocated resources.

Memory Management

Out-of-the-Box (OOTB) Model

A pre-trained machine learning model that comes ready to use in Hyperscience without needing additional training. It works well for common document types and use cases, helping teams get started quickly.

Output Blocks

After processing, these blocks send extracted data to designated destinations like databases, applications, or other systems, ensuring seamless integration with existing workflows.

Flow Blocks

P

Page

A logical entity in Hyperscience. A submission consists of one or more pages, and each page belongs to at most one document. Each page is analyzed individually for pre-processing and classification and as part of a document for subsequent tasks (where applicable).

PDF Extraction

A tool in the system that assists in creating layouts from PDFs by automatically suggesting field locations and names. PDF Extraction can only be used when creating the first variation of a layout. It cannot be used when creating subsequent layout variations. Only available for Structured layouts and is disabled by default.

Application Settings Overview

Permission Group

A group of users who share the same set of permissions.

Permission Groups

Personally Identifiable Information (PII)

Any data that can be used to identify an individual, such as a name, address, or Social Security number.

PII Data Deletion

PII Data Deletion

A setting that allows you to remove PII from submissions to enhance security or comply with organizational policies. In Hyperscience, enabling this feature deletes all document image data, including extracted data and the original and processed images.

PII Data Deletion

PII Deletion Policy

A configurable setting in Hyperscience that determines when PII is deleted based on either the submission-completion date or the date submitted. You can specify the number of days after the chosen date and set the exact time for deletion.

PII Data Deletion

Production Environment

The live Hyperscience system, where real documents and data are processed as part of day-to-day business operations. This environment is used by end users and must meet high standards for performance, stability, and data security. It is distinct from development or testing environments.

Deployment Information

Projected Automation

The predicted automation based on the desired target accuracy. The projection is derived from the model’s training data. The system automatically ensures that the same data is not used for both projections and training.

Automation

Q

Quality Assurance (QA)

Process that ensures the accuracy and reliability of system outputs. In Hyperscience, QA tasks allow users to review and correct errors in classification, identification, and transcription. Documents may be randomly sampled for QA from all processed data.

What is Quality Assurance?

Quality Assurance (QA) Records

Results of Quality Assurance (QA) tasks in Hyperscience. These records are used to evaluate accuracy and help the system calculate the QA sample rate, which determines how many fields are selected for review to ensure consistent data quality.

Transcription Settings

R

Recommended Number of Documents

The suggested minimum and optimal amount of labeled data required to train machine learning models in Hyperscience.

  • For Classification models, at least 10 pages per layout are needed, with 120 pages per layout recommended.

  • For Identification models, the minimum is 100 documents, and the recommended amount is 400 documents.

Providing diverse, high-quality examples ensures better model accuracy and reliability.

Release

A release is a package of one or more committed layout variation versions. This collection of layouts or layout variations should reflect the various document types that the machine should expect. To use the layouts that you have created to process documents, you will need to deploy a flow that contains a release.

What is a Release?

Reprocessing Block

When initiated from other Supervision tasks, the Document Classification task allows users to manually classify machine-misclassified documents. As a result, the users can reclassify all pages of the submission and submit them for reprocessing.

Reprocessing

Required Number of Documents

The minimum amount of annotated data needed to train machine learning models in Hyperscience.

  • For Classification models, at least 10 pages per layout are required.

  • For Identification models, the system requires a minimum of 100 documents.

Meeting these thresholds ensures the models can be successfully trained and begin learning layout- or document-type patterns.

Resubmitting a Submission

In Hyperscience, the process of reprocessing a previously submitted document using the same flow and configuration. This action is typically used to address issues or errors encountered during the initial processing. "Resubmitting" and "retrying" are used interchangeably and retain the original submission ID.

Testing and Debugging Flows

Retrying a Submission

In Hyperscience, the process of reprocessing a previously submitted document using the same flow and configuration. This action is typically used to address issues or errors encountered during the initial processing. "Resubmitting" and "retrying" are used interchangeably and retain the original submission ID.

Testing and Debugging Flows

Routing Blocks

Blocks that direct documents or data through specific paths in the flow based on predefined conditions, ensuring each item follows the appropriate processing route. ​

Flow Blocks

Row

A single horizontal grouping of field values extracted by the Table Identification model within a tabular region of a document.

Table Identification

S

Segment

A distinct region on a document image that contains text, as identified by the Segmentation model. Segments are the building blocks used for downstream tasks like Classification and Transcription. Each segment includes positional data (in the form of a bounding box) and text content, helping the system understand the document’s layout and structure.

Segmentation

Segmentation

The process of partitioning an image into regions containing text. It is the first step of downstream processing tasks such as Classification and Transcription.

Segmentation

Semi-structured Layout

A configuration within Hyperscience designed to process documents where fields and table cells are present, but their positions can vary among documents. Unlike Structured layouts, Semi-structured layouts do not rely on fixed field locations. Instead, a model is trained to find the fields and cells based on provided training examples.

Creating Semi-structured layouts

Signature Field

A non-text field data type used to detect and extract handwritten signatures from documents. In Hyperscience, the Signature data type allows the system to identify areas where a person has signed, facilitating the extraction of these signatures for verification or record-keeping purposes. Compatible with both Structured and Semi-structured layouts.

Checkboxes and Signatures

Software-Defined Management (SDM)

An approach to managing infrastructure and operations through software-based controls rather than manual or hardware-specific methods. In Hyperscience, this approach enables centralized, flexible configuration and orchestration of workflows, environments, and resources, supporting automation, scalability, and easier maintenance.

Software Development Kit (SDK)

A collection of tools, libraries, and documentation that developers use to build software applications for a specific platform or system. In Hyperscience, the SDK allows teams to create custom integrations, automate tasks, or extend platform functionality—such as building Code Blocks or interacting with the API—within a developer-friendly framework.

Custom Supervision

Straight-Through Processing (STP)

Any data that passes  through the system without human intervention. When used as a metric, STP allows you to measure the number of documents that pass through our system without human review. However, STP is not recommended, as it may bypass critical quality checks. Human-in-the-loop validation ensures accuracy, especially for high-impact or sensitive data, and helps maintain trust and compliance in real-world operations.

Structured Layout

In Hyperscience, a Structured layout is used to process documents where key fields appear in consistent positions. This layout type helps the system quickly and accurately find and extract data from standardized documents like W-8 or HCFA forms in the US.

Creating Structured Layouts

Subflow

A smaller flow that's part of a larger flow group. It's used to break down complex flows into reusable pieces. When you deploy a flow group, all its subflows are deployed together, making it easier to manage and reuse common processing steps across different workflows.

Submission

A set of files uploaded together for processing. The system interprets each file as a page, matches it to existing layouts, and groups it into documents based on these matches.

Submission Bootstrap

In Hyperscience, the Submission Bootstrap refers to the Submission Initialization Block within a document-processing flow. This block manages the initial setup and configuration for incoming submissions, including data-ingestion parameters. By configuring the Submission Bootstrap, you can control how submissions are initialized.

Connecting Flow Blocks to Other Flows

Submission Initialization Block

The first step in a document-processing flow is to set up the submission by determining how documents are grouped and where they are ingested from.

Flow Blocks

Supervision

A manual task that is created when the system’s confidence in a prediction is below the confidence threshold. Supervision allows a human to review and correct the output, ensuring data accuracy through human-in-the-loop input.

What is Supervision?

T

Table

A logical structure used to organize and present information in rows and columns. It is used to present values in a readable format.

Table Identification

Table Identification Quality Assurance

The process of reviewing and validating the system's identification of tables within Semi-structured documents. This quality-assurance task allows you to measure the system’s accuracy in table identification.

Table Identification Quality Assurance

Table Identification Task

A manual task in Hyperscience where you confirm or correct the location of tables in Semi-structured documents. You may adjust or draw bounding boxes around the table cells to help your model learn where to look for the data you want to extract. Rows are represented by horizontal separators.

Table Identification

Table Locator

A machine learning model in Hyperscience that learns where cells and rows are located in Semi-structured documents. It uses examples from training to predict the position of each cell and row on a page so the system can extract the correct data.

Training a Semi-structured Model

Target Accuracy

A setting specified by the user. It indicates the desired overall system accuracy, including tasks performed by humans. It allows you to evaluate how well the system is expected to perform.

Task

A step in processes where a human reviews or confirms part of the system’s output. Tasks are generated based on the system settings when the system is uncertain about a result or needs to check accuracy.

There are two main types:

  • Supervision Tasks

  • Quality Assurance (QA) Tasks

Task Queue

A list of tasks waiting for human review. It organizes and stores all active Supervision and Quality Assurance tasks that need to be completed.

Navigating the Task Queue

Template Row

The lead row in your table. It doesn’t need to be the first one, but it should be representative of the rows in your table. Hyperscience uses the Copycat tool to populate the annotation for the rest of the rows in your table. The Copycat is not always accurate, so make sure to double-check the annotations.

Table Identification

Text Classification

A machine learning model that reads unstructured text—like comments, emails, or notes, and assigns them to predefined categories. This categorization helps automate decisions and organize freeform text based on business rules.

Text Classification

Threshold

The confidence limit used to decide if a machine prediction should be sent for human review to ensure accuracy.

Top-Level Flow

The main flow that manages the end-to-end document processing, coordinating with subflows to handle specific components of the process.

Document Processing Flow in v41

Trainer

A separate machine dedicated to handling resource-heavy tasks like training Identification models. It operates independently and connects to the main application through the API.

Trainer

Training Data

The input used to teach machine learning models how to process documents accurately. Its structure depends on the model type:

  • For Classification models, training data consists of uploaded document pages grouped by layout.

  • For Identification models, training data includes manually annotated fields and tables to train the model to extract specific data points.

Training Data Analysis

A tool in TDM that analyzes your training data to compute the importance of each training document and identify issues such as missing labels, overlapping fields or columns, or inconsistent annotations. This analysis helps you prioritize which documents to annotate and ensures clean, accurate data before you train a model.

Training Data Curator

A tool in Hyperscience that helps you select the most valuable documents for training your model. It highlights documents that are likely to improve accuracy so you can focus your annotation efforts where they matter most.

Training Data Curator

Training Data Management (TDM)

A tool used to annotate, manage, import, and export training documents. It is also used to train models by working directly with the training data (“ground truth”) obtained from each document in the training set.

Training Data Management

Training Set

A dataset used to teach the system how to recognize and extract information. It includes documents with labeled fields so the system can learn from real examples.

Transcription Task

A Supervision task that allows you to review or enter text the system couldn’t confidently read from a document. This task enables you to ensure accurate final data when the system’s confidence is low.

Transcription

Transcription Model

A machine learning model that automatically extracts text from scanned document images. It supports both printed and handwritten text. When the model’s confidence in the extracted text is low, the system generates a Transcription task for human review to ensure accuracy.

Managing Transcription Models

Transcription Quality Assurance

A quality check where a set percentage of transcribed fields or table cells are reviewed by a human. It helps measure and improve the accuracy of both machine and human transcriptions.

Transcription Quality Assurance

True Positive

A result where the model correctly predicts something as positive, for example, it says a field is present, and it is present.

Technical Validation Event (TVE)

A trial phase where a potential customer tests the platform with clear success criteria. It’s designed to show that the product works well for their needs. This phase ensures that both sides are aligned before moving forward.

U

Unmatched Document

A set of pages that the system couldn’t match to any known layout during Classification.

Unstructured Extraction

Processing documents with little to no layout consistency. Key information appears anywhere, often embedded in long paragraphs or freeform text. Examples include contracts, title deeds, and annual reports.

V

Visual Language Models (VLM)

Models that understand both the text and images in a document. They combine what’s written with where it appears on the page to help the system read and extract information more accurately.

Visual Page Classifier (VPC)

An automated component that matches submission pages to the correct layouts from the Layout Library. It ensures accurate and efficient processing of Structured documents.