Default Data Types

Data type

Output

Description

Versions

Allowed characters

Raw transcription example

Transcription normalized

Address

String

Used for all types of address fields. All special characters except "-" are removed. Multiple spaces in a row are reduced to one space. Allow for dash in normalization and supervision mask


Alphanumeric, special characters ", # -" and spaces only

"54 W. 21st Street, Suite 503, N.Y., NY 10010-1234"

"54 W 21ST STREET SUITE 503 NY NY 10010-1234"

AlphaNumeric

String

Used for fields where the application should not expect real words. Expects only letters, digits, spaces, or dashes. Spaces and dashes will be removed.

The presence of other special characters not listed above are considered invalid and will reduce confidence and if present in form field will result in a normalization error exception. For a wider range of special characters, use Freeform Alphanumeric or Freeform Characters.


Alphanumeric only

"A12-231 48"

"A1223148"

AUS Postcode

4 character string

Used for Australian Post Codes. Leading and trailing zeros are allowed.


4 digits, numeric only

"2150"

"2150"

AUS State

2-3 character string

Used for Australian States and Territories. Variations in states are normalized to standard two to three character abbreviations. Optimized spelling correction


Letters only, spaces allowed

"N.S.W." or "New South Wales"

"NSW"

Barcode

String

Used to read data in barcodes. All letters are converted to uppercase, leading and trailing spaces are removed, consecutive spaces are consolidated to a single space.

The barcode data type fully supports CODE128, CODE39, EAN13, EAN8, Interleaved 2 of 5, ISBN-13, Data Matrix barcodes, and QR codes (Version 1, 2, 3, 4, 10, 25, 40).

To note, ISBN-10 is supported, but the transcription is the value of the barcode and not the value of the ISBN. Both UPC-A and UPC-E are read as EAN13.


N/A



CAN Postcode

6 characters string

Used for Canada. It's a 6-character alphanumeric string.

32.0.0
and later

Allowed letters: ABCEGHJKLMNPRSTVXY
WZ are also allowed but they should not be the first character.
Spaces are removed in normalisation

A1B 2C3

A1B2C3

Capitalized Name

String

Used for all types of names, including people, palces, and companies, where the first letter of each word is capitalized.

33.0.0
and later


Anne-Marie Smith

ANNE-MARIE SMITH

Clause

String

Used for fields where long sentences/paragraphs are expected. These fields are essentially treated as Freeform Characters (e.g., a Generic Text field). Only supported for Unstructured Extraction.

38.0.1
and later




Currency - X,XXX.XX

String ending with a decimal and two digits

This currency format expects commas to group thousands and a dot to separate up to two decimals. Values are returned with two decimal places and abbreviations (e.g., K, MM) are converted into digits. If decimals aren't written, ".00" is added. If the figure is surrounded by parentheses, the value will be normalized to negative.


Numeric, special character "-" to signify negative amounts, "$" and cents written out by default, entered numbers are expressed right-to-left.

"1,000" or "1K"

"1000.00"

Currency - X.XXX,XX

String separated with a dot, ending with a coma and two digits


33.0.0
and later




Currency Trailing Sign - X,XXX.XX

String ending with a decimal and two digits

This currency format expects commas to group thousands and a dot to separate up to two decimals. Values are returned with two decimal places and abbreviations (e.g., K, MM) are converted into digits. If decimals aren't written, ".00" is added. If the figure is surrounded by parentheses, followed by CR, or has a trailing negative sign after the value, the value will be normalized to negative.

33.0.0
and later


"1,000-" or "1000CR"

"-1000.00"

Currency Trailing Sign - X.XXX,XX

String ending with a comma and two digits

This currency format expects dots to group thousands and a comma to separate up to two decimals. Values are returned with two decimal places and abbreviations (e.g., K, MM) are converted into digits. If decimals aren't written, ",00" is added. If the figure is surrounded by parentheses, followed by CR, or has a trailing negative sign after the value, the value will be normalized to negative.

33.0.0
and later


"1.000-" or "1000CR"

"-1000,00"

Currency Trailing Sign No Rounding - X,XXX.XXXX

String ending with a decimal and any number of digits

This currency format expects commas to group thousands and a dot to separate decimals. Values are returned and abbreviations (e.g., K, MM) are converted into digits. If the figure is surrounded by parentheses, followed by CR, or has a trailing negative sign after the value, the value will be normalized to negative.

33.0.0
and later


"1,000.123400-" or "1000.123400CR"

"-1000.123400"

Separated Currency - X,XXX XX

String ending with a decimal and two digits

This currency format expects commas to group thousands and is to be used where two digits after a decimal are mandated, yet the printed vertical line on the underlying template may not always be a reliable indicator of this separation. Values are returned with a dot to separate up to two decimals. If decimals aren't written, the last two digits are assumed to be the decimals. If the figure is surrounded by parentheses, the value will be normalized to negative.


Numeric, special character "-" to signify negative amounts, "$" and cents written out by default, entered numbers are expressed right-to-left.

"100000"

"1000.00"

Separated Currency - X.XXX XX

String ending with a comma and two digits

This currency format expects dots to group thousands and is to be used where two digits after a decimal are mandated, yet the printed vertical line on the underlying template may not always be a reliable indicator of this separation. Values are returned with a comma to separate up to two decimals. If decimals aren't written, the last two digits are assumed to be the decimals. If the figure is surrounded by parentheses, the value will be normalized to negative.


Numeric, special character "-" to signify negative amounts, "$" and cents written out by default, entered numbers are expressed right-to-left.

"(100000)"

"-1000,00"

Currency with Unit - X,XXX.XX

String beginning with amount ending with a decimal and two digits, followed by a space and currency code or symbol if present

This currency format is to be used when expecting multiple types of currencies (e.g., USD and EUR) in a given field. The normalized value will start with the amount and it will use a dot to separate up to two decimals. If the figure is surrounded by parentheses, the value will be normalized to negative. Any currency code or symbol, if present, will be appended to the amount after a space. In the absence of a currency code or symbol, no space will be included.

Supported currencies include $, €, £, ¥, all ISO abbreviations, and currencies written with latin based characters.


Alphanumeric, all special characters allowed

"USD 200" or "200USD"

"200.00 USD"

Currency - X.XXX,XX

String ending with a comma and two digits

This currency format expects dots to group thousands and a comma to separate up to two decimals. Values are returned with two decimal places and abbreviations (e.g., K, MM) are converted into digits. If decimals aren't written, ",00" is added. If the figure is surrounded by parentheses, the value will be normalized to negative.


Numeric, special character "-" to signify negative amounts, "$" and cents written out by default, entered numbers are expressed right-to-left.

"1.000" or "1K"

"1000,00"

Currency with Unit - X.XXX,XX

String beginning with currency ending with a comma and two digits, followed by a space and currency code or symbol if present

This currency format is to be used when expecting multiple types of currencies (e.g., USD and EUR) in a given field. The normalized value will start with the amount and it will use a dot to separate up to two decimals. If the figure is surrounded by parentheses, the value will be normalized to negative. Any currency code or symbol, if present, will be appended to the amount after a space. In the absence of a currency code or symbol, no space will be included.

Supported currencies include $, €, £, ¥, all ISO abbreviations, and currencies written with latin based characters.


Alphanumeric, all special characters allowed

"USD 200" or "200USD"

"200,00 USD"

Date - MMDDYYYY

10 character string "MM/DD/YYYY"

Standard US date format. If the date is written with numbers only, then the month is assumed to come first (US convention). If the format does not match, the system may try to validate it in a different one to avoid a normalization error (e.g., if the raw transcription is "82 10 06", the system will normalize it as "10/06/1982").


Numeric only, MM must be between 1-12, DD must be within 1-31, date must be 4 digits,
"/" written automatically between time units.

"Jan 1 2015" or "1/1/2015"

"01/01/2015"

Date - DDMMYYYY

10 character string "DD/MM/YYYY"

DD/MM/YYYY format. If the date is written with numbers only, then the day is assumed to come first. If the format does not match, the system may try to validate it in a different one to avoid a normalization error (e.g., if the raw transcription is "82 10 06", the system will normalize it as "10/06/1982").


Numeric only, MM must be between 1-12, DD must be within 1-31, date must be 4 digits, "/" written automatically between time units.

"1 Feb 2015" or "01/02/2015"

"01/02/2015"

Date - MMYYYY

7 character string "MM/YYYY"

MM/YYYY format. If the date is written with numbers only, then four digit component will be assumed to be year regardless of order. If both components are two digits, and each are <= 12, a "normalization_error" exception is returned in the exceptions array.


Numeric only, MM must be between 1-12, date must be 4 digits, "/" written automatically between time units.

"Feb 2015" or "02/2015"

"02/2015"

Date with Punctuations

String

Used for dates that are preceded by an opening parenthesis and followed by a closing parenthesis, comma, or period, or any combination of those characters.

33.1.7
and later

Numeric only, MM must be between 1-12, DD must be within 1-31, YYYY must be 4 digits, "/" written automatically between time units.

"(June 3, 2021)."

"06/03/2021"

Date - Korean

String

Standard Korean Date format. If the date is written with numbers only, then the day is assumed to come last. If transcription.raw is an impossible date, such as "2022년 33월 4일", then a "normalization_error" exception is returned in the exceptions array.

34.0.0
and later


"2022년 3월 4일" or "2022 March 04"

"2022/03/04"

Email Address

String

Uses language model


Alphanumeric, all special characters allowed

"[email protected]"

"[email protected]"

Email Address International

String


33.0.0
and later

Alphanumeric, all special characters allowed

[email protected]

[email protected]

Enhanced Korean Freeform

String

Used for post-processing and enhancing the results of the Korean-English data type.

38.0.1
and later




Freeform Characters

String

Used for fields where the application should not expect real words. All letters are converted to uppercase, leading and trailing spaces are removed, and all special characters are retained.


Alphanumeric, all special characters allowed

"lakdoia3902u73.393837y4.3-3938"

"LAKDOIA3902U73.393837Y4.3-3938"

Freeform Characters (American English)

String

Used for fields where the application should not expect real words. All letters are converted to uppercase, leading and trailing spaces are removed, and all special characters are retained. Characters are restricted to ASCII printable characters, which include unaccented letters, numbers, and common symbols and punctuation in American English.

28.0.3
and later

Alphanumeric, no accented letters allowed. Special characters are limited to the following: @#$%&*()-=+[];:'"\,.<>/!?_{|}~ and spaces.

"lakdoia3902u73.393837y4.3-3938"

"LAKDOIA3902U73.393837Y4.3-3938"

Freeform AlphaNumeric

String

Used for fields where the application should not expect real words. All letters are converted to uppercase and leading and trailing spaces are removed. Unlike, Freeform Characters, all special characters and spaces are removed.


Alphanumeric, no special characters or spaces allowed

"123 ABC-89/XYZ"

"123ABC89XYZ"

Generic Text

String

Used for fields where words / sentences are expected. All letters are converted to uppercase, leading and trailing spaces are removed, consecutive spaces are consolidated to a single space.


Alphanumeric, all special characters allowed

"Data extraction is hard."

"DATA EXTRACTION IS HARD."

Legal Amount - X,XXX.XX

String ending with a decimal and two digits

Used for converting legal amounts found on English language checks to their numeric, courtesy amount equivalent.


Numeric, "$" and cents written out by default, entered numbers are expressed right-to-left.

"Four hundred twenty-seven + 45/100"

"427.45"

Legal Amount - X.XXX,XX

String ending with a comma and two digits

Used for converting legal amounts found on English language checks to their numeric, courtesy amount equivalent.


Numeric, "$" and cents written out by default, entered numbers are expressed right-to-left.

"Four hundred twenty-seven + 45/100"

"427,45"

Length - X,XXX.XX

String with a number and an optional unit. If a unit is present, there will be a single space between the number and unit. In the absence of a unit, no space will be added.

This format is to be used when expecting numeric length/height values that may also include a unit of measure (e.g., “5 ft”). Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. The normalized value will use a dot to separate decimals if present.

28.0.0
and later

Alphanumeric, all special characters allowed

"5’10.45”"

"5 ft 10.45 in"

Length - X.XXX,XX

String with a number and an optional unit. If a unit is present, there will be a single space between the number and unit. In the absence of a unit, no space will be added.

This format is to be used when expecting numeric length/height values that may also include a unit of measure (e.g., “5 ft”). Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. The normalized value will use a comma to separate decimals if present.

28.0.0
and later

Alphanumeric, all special characters allowed

"1,7 metres"

"1,7 m"

MICR Font

String

Used for fields printed in MICR font (often found on checks). Normalization will retain all special MICR symbols, but remove any spaces between elements.


Numeric characters along with special MICR symbols, accessible via keyboard shortcuts

"⑆123456789⑆ 12345678 123"

"⑆123456789⑆12345678123"

Name

String

Used for all types of names, including people, places, companies, etc. All special characters are removed. Multiple spaces in a row are reduced to one space.


Alphanumeric, special characters "- , . () &"

"T.J. Madison"

"TJ MADISON"

Number - X,XXX.XX

String

Used for values that represent numbers which can have a mathematical operation performed (addition, subtraction, etc). This number format expects commas to group thousands and a dot to separate decimals. Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. Abbreviations (e.g., K, MM) are converted into digits, separator commas are removed, e.g., "0321" becomes "321" and "100.00" becomes "100".


Numeric, special characters "-" to signify a negative amount and "." for decimal.

"-4.5K"
"0321"
"100.00"

"-4500"
"321"
"100"

Number with Unit - X,XXX.XX

String beginning with number, followed by a space and unit if present

This number format is to be used when expecting numeric values that may also include a unit of measure (e.g., 10kg). The normalized value will start with the number and it will use a dot to separate decimals if present. Any unit of measure, if present, will be appended to the number after a space. In the absence of a unit of measure, no space will be included.


"10kg"

"10kg"

"10 kg"

Number - X.XXX,XX

String

Used for values that represent numbers which can have a mathematical operation performed (addition, subtraction, etc). This number format expects dots to group thousands and a comma to separate up to two decimals. Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. Abbreviations (e.g., K, MM) are converted into digits, separator dots are removed, e.g., "0321" becomes "321" and "100,00" becomes "100".


Numeric, special characters "-" to signify a negative amount and "," for decimal.

"-4,5K"
"0321"
"100,00"

"-4500"
"321"
"100"

Number with Unit - X.XXX,XX

String beginning with number, followed by a space and unit if present

This number format is to be used when expecting numeric values that may also include a unit of measure (e.g., 10 kg). The normalized value will start with the number and it will use a comma to separate decimals if present. Any unit of measure, if present, will be appended to the number after a space the absence of a unit of measure, no space will be included.


Alphanumeric, all special characters allowed

"10kg"

"10 kg"

Numeric Text

String

Numeric data. No abbreviations are taken into consideration and leading and trailing zeros are kept intact.


Numeric

"0038937313200"

"0038937313200"

Phone Number

String with digits only

All punctuation (e.g., +, (), -) is removed such that only the numbers are returned.


Numeric, special character "+"

"+1 (555) 555-5555"

"15555555555"

SSN/EIN/TIN

9 digit string "#########"

US Government Tax Identification numbers. When written, these can include dashes in different places but must be 9 digits. If the transcription.raw does not have 9 digits (e.g., it has letters or has more than 9 digits), then a "normalization_error" exception is returned in the exceptions array.


Numeric, must be 9 digits in length

"987-65-4321"

"987654321"

UK Postcode

5 - 7 character string

Used for UK Postcodes. Space between Outward and Inward codes is removed.


Numeric, must be in acceptable UK postcode format.

"SW1W 0NY"

"SW1W0NY"

US Zip Code

5 or 9 digit string

Used for US Zip Codes. Dashes or spaces between Zip Code and optional +4 is removed.


Numeric, must be either 5 digits or 9 digits in length.

"10010-7356"

"100107356"

US State

2 character string

Used for US States and Territories. Variations in states are normalized to standard two character abbreviations.


Letters, must be 2 characters in length, must match a state abbreviation.

"Arizona" or "Ariz."

"AZ"

Weight - X,XXX.XX

String with a number and an optional unit. If a unit is present, there will be a single space between the number and unit. In the absence of a unit, no space will be added.

This format is to be used when expecting numeric weight values that may also include a unit of measure (e.g., “10.5 kg”). Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. The normalized value will use a dot to separate decimals if present.

28.0.0
and later

Alphanumeric, all special characters allowed

"9 pounds and 8 ozs"

"9 lbs 8 ozs"

Weight - X.XXX,XX

String with a number and an optional unit. If a unit is present, there will be a single space between the number and unit. In the absence of a unit, no space will be added.

This format is to be used when expecting numeric weight values that may also include a unit of measure (e.g., “10,5 kg”). Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. The normalized value will use a comma to separate decimals if present.

28.0.0
and later

Alphanumeric, all special characters allowed

"5 kg & 9,33 grammes"

"5 kg 9,33 g"

Medical

String

Used for fields where medical terminology / sentences are expected. All letters are converted to uppercase, leading and trailing spaces are removed, consecutive spaces are consolidated to a single space. To note, spaces between slashes and characters are removed.


Alphanumeric, all special characters allowed

"120 systolic / 80 diastolic"

"120 SYSTOLIC/80 DIASTOLIC"

Checkbox

Boolean

No normalization, true indicates that the checkbox was checked.


N/A

N/A

N/A

Signature

Boolean

No normalization, true indicates that a signature was present.


N/A

N/A

N/A