Segmentation

Text Segmentation is the process of partitioning an image into regions containing text into meaningful and distinct pieces or blocks of text. It is the first step of downstream processing tasks such as classification, text transcription, fields, table extraction, and others.

In this article, you will learn how to leverage segmentation to improve the performance of your models.

Segmentation in Hyperscience

Segmentation is a crucial first step in handling semi-structured documents. It detects regions in a page image that contain text. These regions (also known as “text segments”) are identified by the model and returned as coordinates that define their position. To visualize this, you can see bounding boxes around the text segments. The segments can then be processed using a transcription model.

Segments Properties

After the segmentation model identifies the text segments on a page, the transcription model extracts the text within each segment. As a result each text segment has the following properties:

Location — The bounding box containing the text.

Text — The extracted text within the bounding box.

Sometimes our segmentation model might create segments in areas without text or fail to create a segment where text is present. Examples include the following:

  • Watermarks

  • Vertical text

  • Text that is faded, has a background, or is with low contrast.

When the system does not create a segment for a specific piece of text, our training pipelines and downstream processing won’t be aware of that text when evaluating the model’s predictions. As a result, the model cannot extract any information from it during both training and processing. This means that if a segment is missing, the model won’t recognize key data in that area.

Model training and segmentation

Each model uses specific information. The table below describes the data used by each type of model.

Model

Segmentation properties used

Classification

Uses only text.

Field ID

Uses both text and location information.

Table ID

Uses both text and location information.

Long-form Extraction

Uses both text and location information.

Segmentation and Signatures

In v40.2 and later, signature segmentation is trained as a standalone model, separate from text and checkbox segmentation, allowing it to focus exclusively on identifying signatures in documents. That way, the model produces more reliable and complete bounding boxes around signatures.

The bounding boxes from text segmentation are used to refine the segmentation of signatures, ensuring that fragmented or partial signatures are consolidated for better accuracy.

Segmentation and annotations

Annotations are automatically mapped to the identified segments in the selected region. We recommend using these locations to ensure accuracy. During this process, automatic bounding boxes appear around the segments, as shown below:

These annotations are later sent to the Trainer for the model training. Learn more in our What is the Trainer? article.

The system expects full segments when evaluating annotations. As a result, adjusting the bounding box to capture part of the field does not mean that only that part will be sent to the trainer. For example, if we adjust the bounding box to omit the “D” in “LTD,” the system will still send the entire original segment for model training, as most of the text is within the bounding box.

If the user wants to capture only a part of the segment this needs to be handled in post-processing. Partial segments are not supported.

Best practices

In this section, you’ll learn how to work with segments during the annotation process.

  • Make sure to capture the entire segment in the bounding box.

Adjust the bounding box only if two segments overlap to prevent disrupting the values you want to extract or use multiple bounding boxes. Learn more in Document Eligibility Filtering.

  • If you want to annotate partial lines from a multiline field, use our multiple bounding boxes feature. See the example below:

  • If you want to annotate the entire field, you can use a single bounding box:

To learn more about the model training process, see Training a Semi-structured model.