Supported Characters and Default Data Types

Overview

Data types are used by the system to specify the kinds of characters expected, or not expected. For example, the Numeric type is best used for values where only numbers are expected, versus the Generic Text type, which is best used for values containing sentences. Some types expect certain types of formatting or value lengths, such as Date, Currency Amount, or Email Address. There are also data types for non-text fields, such as Signature and Checkbox.

List of data types

To view all data types in the system navigate to Library > Data Types. A starting set of data types is included in each new Hyperscience instance - additional data types enter the system via layout and release importation, or are created using the Create Data Type option in the top right of this screen.

The Data Types screen lists the following information:

  • Name - this is the display name of the data type. These values are shown in many other parts of the system - Field Dictionary, Layout Editor, Layout Version Viewer, Output Pages, Supervision transcription tasks, and Data Quality reports - descriptive, human-readable names are recommended.

  • ML Configuration - the specific data type (Numeric, Date, etc) used by Hyperscience’s language models for transcription.

  • Status - the status indicates whether a given data type is enabled or disabled. Enabled data types are eligible to be selected for field definitions in the Field Dictionary and Layout Editor, while disabled fields are not.

Full list of supported characters

The system supports the following characters in the contexts described below.

Note that specific data types may have their own additional limitations on which characters are supported.

Alphanumeric characters

In addition to digits (0-9), spaces, and uppercase (A-Z) and lowercase (a-z) English characters, we support the following:

  • The 28 characters of the Arabic alphabet and the following special characters are accepted in any field with a data type that supports letters: ٩, ٨, ٧, ٦, ٥, ٤, ٣, ٢, ١, ٠, ؊, ؉, ؈, ؇, ؆, ٭, ٬, ٫, ٪, ،, ؟, ؛, ى, ـ, ة, ء.

  • Bulgarian characters are accepted in any field with a data type that supports letters.

  • Chinese characters are accepted in any field with a data type that supports letters.

  • Czech characters are accepted in any field with a data type that supports letters.

  • Estonian characters are accepted in any field with a data type that supports letters.

  • French characters (É, é, À, à, È, è, Ù, ù, , â, Ê, ê, Î, î, Ô, ô, Û, û, Ç, ç, Ä, ä, Ë, ë, Ï, ï, Ü, ü, Œ, œ) in any field with a data type that supports letters.

  • German characters (Ä, ä, Ö, ö, Ü, ü; ß) are accepted in any field with a data type that supports letters.

  • Hebrew characters are accepted in any field with a data type that supports letters.

  • Italian characters (À, à, È, è, É, é, Ì, ì, Ò, ò, Ó, ó, Ù, ù,) are accepted in any field with a data type that supports letters.

  • Japanese characters are accepted in any field with a data type that supports letters.

  • Kazakh characters are accepted in any field with a data type that supports letters.

  • Korean characters are accepted in any field with a data type that supports letters.

  • Latvian characters are accepted in any field with a data type that supports letters.

  • Lithuanian characters are accepted in any field with a data type that supports letters.

  • Polish characters (Ą, ą, Ć, ć, Ę, ę, Ł, ł, Ń, ń, Ó, ó, Ś, ś, Ź, ź, Ż, ż) are accepted in any field with a data type that supports letters. However, these characters are not fully supported in fields with the Alphanumeric data type.

  • Portuguese characters (Á, á, É, é, Í, í, Ó, ó, Ú, ú, À, à, , â, Ê, ê, Ô, ô, Ã, ã, Õ, õ, Ü, ü, Ç, ç) are accepted in any field with a data type that supports letters.

  • Russian characters are accepted in any field with a data type that supports letters.

  • Slovak characters are accepted in any field with a data type that supports letters.

  • Spanish characters (Ñ, ñ, Á, á, É, é, Í, í, Ó, ó, Ú, ú, Ü, ü, ¿, ¡) are accepted in any field with a data type that supports letters.

  • Thai characters are accepted in any field with a data type that supports letters.

  • Turkish characters are accepted in any field with a data type that supports letters.

Currency characters

We support the following characters:

  • International currency characters (₸, €, ₽‎, ฿, ₺, £, ¥, $, ؋, دج, ﷼, .د.ب, ع.د, ينار, ك, د.ت, ج.م, ل.ل, ج.س, ر.ع, ر.ق, ₪, ₹, ₩) in any currency with unit field after using the formatting override (F2).

MICR characters

We support MICR characters in any field with the MICR data type after using the formatting override (F2).

Punctuation and symbols

We support the following characters in fields with data types that support letters:

Character

Unicode

!

0x21

"

0x22

#

0x23

$

0x24

%

0x25

&

0x26

'

0x27

(

0x28

)

0x29

*

0x2a

+

0x2b

,

0x2c

-

0x2d

.

0x2e

/

0x2f

:

0x3a

;

0x3b

<

0x3c

=

0x3d

>

0x3e

?

0x3f

@

0x40

[

0x5b

\

0x5c

]

0x5d

^

0x5e

_

0x5f

`

0x60

{

0x7b

|

0x7c

}

0x7d

~

0x7e

0x215f

́

0x301

̀

0x300

̂

0x302

̈

0x308

̧

0x327

̃

0x303

«

0xab

»

0xbb

ª

0xaa

°

0xb0

Default data types

You can filter the Data Types list by name, ML Configuration, and status. These filters are available above the list.

To view the expanded details of a particular field data type, click on its name in the leftmost column. The details of the data type will be shown along with its unique identifier.

Note that when installing Hyperscience for the first time, all default data types are enabled by default (with the exception of the Medical data type). When an existing installation is updated to a new version, any new default data types will be disabled by default.

To learn more about default data types, see Default Data Types.

Supported barcode types

The barcode data type fully supports CODE128, CODE39, EAN13, EAN8, Interleaved 2 of 5, ISBN-13, Data Matrix barcodes, and QR codes (Version 1, 2, 3, 4, 10, 25, 40).

To note, ISBN-10 is supported, but the transcription is the value of the barcode and not the value of the ISBN. Both UPC-A and UPC-E are read as EAN13.