Data type | Output | Description | Versions | Allowed characters | Raw transcription example | Transcription normalized |
---|---|---|---|---|---|---|
Address | String | Used for all types of address fields. All special characters except "-" are removed. Multiple spaces in a row are reduced to one space. Allow for dash in normalization and supervision mask | Alphanumeric, special characters ", # -" and spaces only | "54 W. 21st Street, Suite 503, N.Y., NY 10010-1234" | "54 W 21ST STREET SUITE 503 NY NY 10010-1234" | |
AlphaNumeric | String | Used for fields where the application should not expect real words. Expects only letters, digits, spaces, or dashes. Spaces and dashes will be removed. | Alphanumeric only | "A12-231 48" | "A1223148" | |
AUS Postcode | 4 character string | Used for Australian Post Codes. Leading and trailing zeros are allowed. | 4 digits, numeric only | "2150" | "2150" | |
AUS State | 2-3 character string | Used for Australian States and Territories. Variations in states are normalized to standard two to three character abbreviations. Optimized spelling correction | Letters only, spaces allowed | "N.S.W." or "New South Wales" | "NSW" | |
Barcode | String | Used to read data in barcodes. All letters are converted to uppercase, leading and trailing spaces are removed, consecutive spaces are consolidated to a single space. | N/A | |||
CAN Postcode | 6 characters string | Used for Canada. It's a 6-character alphanumeric string. | 32.0.0 | Allowed letters: ABCEGHJKLMNPRSTVXY | A1B 2C3 | A1B2C3 |
Capitalized Name | String | Used for all types of names, including people, palces, and companies, where the first letter of each word is capitalized. | 33.0.0 | Anne-Marie Smith | ANNE-MARIE SMITH | |
Clause | String | Used for fields where long sentences/paragraphs are expected. These fields are essentially treated as Freeform Characters (e.g., a Generic Text field). Only supported for Unstructured Extraction. | 38.0.1 | |||
Currency - X,XXX.XX | String ending with a decimal and two digits | This currency format expects commas to group thousands and a dot to separate up to two decimals. Values are returned with two decimal places and abbreviations (e.g., K, MM) are converted into digits. If decimals aren't written, ".00" is added. If the figure is surrounded by parentheses, the value will be normalized to negative. | Numeric, special character "-" to signify negative amounts, "$" and cents written out by default, entered numbers are expressed right-to-left. | "1,000" or "1K" | "1000.00" | |
Currency - X.XXX,XX | String separated with a dot, ending with a coma and two digits | 33.0.0 | ||||
Currency Trailing Sign - X,XXX.XX | String ending with a decimal and two digits | This currency format expects commas to group thousands and a dot to separate up to two decimals. Values are returned with two decimal places and abbreviations (e.g., K, MM) are converted into digits. If decimals aren't written, ".00" is added. If the figure is surrounded by parentheses, followed by CR, or has a trailing negative sign after the value, the value will be normalized to negative. | 33.0.0 | "1,000-" or "1000CR" | "-1000.00" | |
Currency Trailing Sign - X.XXX,XX | String ending with a comma and two digits | This currency format expects dots to group thousands and a comma to separate up to two decimals. Values are returned with two decimal places and abbreviations (e.g., K, MM) are converted into digits. If decimals aren't written, ",00" is added. If the figure is surrounded by parentheses, followed by CR, or has a trailing negative sign after the value, the value will be normalized to negative. | 33.0.0 | "1.000-" or "1000CR" | "-1000,00" | |
Currency Trailing Sign No Rounding - X,XXX.XXXX | String ending with a decimal and any number of digits | This currency format expects commas to group thousands and a dot to separate decimals. Values are returned and abbreviations (e.g., K, MM) are converted into digits. If the figure is surrounded by parentheses, followed by CR, or has a trailing negative sign after the value, the value will be normalized to negative. | 33.0.0 | "1,000.123400-" or "1000.123400CR" | "-1000.123400" | |
Separated Currency - X,XXX XX | String ending with a decimal and two digits | This currency format expects commas to group thousands and is to be used where two digits after a decimal are mandated, yet the printed vertical line on the underlying template may not always be a reliable indicator of this separation. Values are returned with a dot to separate up to two decimals. If decimals aren't written, the last two digits are assumed to be the decimals. If the figure is surrounded by parentheses, the value will be normalized to negative. | Numeric, special character "-" to signify negative amounts, "$" and cents written out by default, entered numbers are expressed right-to-left. | "100000" | "1000.00" | |
Separated Currency - X.XXX XX | String ending with a comma and two digits | This currency format expects dots to group thousands and is to be used where two digits after a decimal are mandated, yet the printed vertical line on the underlying template may not always be a reliable indicator of this separation. Values are returned with a comma to separate up to two decimals. If decimals aren't written, the last two digits are assumed to be the decimals. If the figure is surrounded by parentheses, the value will be normalized to negative. | Numeric, special character "-" to signify negative amounts, "$" and cents written out by default, entered numbers are expressed right-to-left. | "(100000)" | "-1000,00" | |
Currency with Unit - X,XXX.XX | String beginning with amount ending with a decimal and two digits, followed by a space and currency code or symbol if present | This currency format is to be used when expecting multiple types of currencies (e.g., USD and EUR) in a given field. The normalized value will start with the amount and it will use a dot to separate up to two decimals. If the figure is surrounded by parentheses, the value will be normalized to negative. Any currency code or symbol, if present, will be appended to the amount after a space. In the absence of a currency code or symbol, no space will be included. | Alphanumeric, all special characters allowed | "USD 200" or "200USD" | "200.00 USD" | |
Currency - X.XXX,XX | String ending with a comma and two digits | This currency format expects dots to group thousands and a comma to separate up to two decimals. Values are returned with two decimal places and abbreviations (e.g., K, MM) are converted into digits. If decimals aren't written, ",00" is added. If the figure is surrounded by parentheses, the value will be normalized to negative. | Numeric, special character "-" to signify negative amounts, "$" and cents written out by default, entered numbers are expressed right-to-left. | "1.000" or "1K" | "1000,00" | |
Currency with Unit - X.XXX,XX | String beginning with currency ending with a comma and two digits, followed by a space and currency code or symbol if present | This currency format is to be used when expecting multiple types of currencies (e.g., USD and EUR) in a given field. The normalized value will start with the amount and it will use a dot to separate up to two decimals. If the figure is surrounded by parentheses, the value will be normalized to negative. Any currency code or symbol, if present, will be appended to the amount after a space. In the absence of a currency code or symbol, no space will be included. | Alphanumeric, all special characters allowed | "USD 200" or "200USD" | "200,00 USD" | |
Date - MMDDYYYY | 10 character string "MM/DD/YYYY" | Standard US date format. If the date is written with numbers only, then the month is assumed to come first (US convention). If the format does not match, the system may try to validate it in a different one to avoid a normalization error (e.g., if the raw transcription is "82 10 06", the system will normalize it as "10/06/1982"). | Numeric only, MM must be between 1-12, DD must be within 1-31, date must be 4 digits, | "Jan 1 2015" or "1/1/2015" | "01/01/2015" | |
Date - DDMMYYYY | 10 character string "DD/MM/YYYY" | DD/MM/YYYY format. If the date is written with numbers only, then the day is assumed to come first. If the format does not match, the system may try to validate it in a different one to avoid a normalization error (e.g., if the raw transcription is "82 10 06", the system will normalize it as "10/06/1982"). | Numeric only, MM must be between 1-12, DD must be within 1-31, date must be 4 digits, "/" written automatically between time units. | "1 Feb 2015" or "01/02/2015" | "01/02/2015" | |
Date - MMYYYY | 7 character string "MM/YYYY" | MM/YYYY format. If the date is written with numbers only, then four digit component will be assumed to be year regardless of order. If both components are two digits, and each are <= 12, a "normalization_error" exception is returned in the exceptions array. | Numeric only, MM must be between 1-12, date must be 4 digits, "/" written automatically between time units. | "Feb 2015" or "02/2015" | "02/2015" | |
Date with Punctuations | String | Used for dates that are preceded by an opening parenthesis and followed by a closing parenthesis, comma, or period, or any combination of those characters. | 33.1.7 | Numeric only, MM must be between 1-12, DD must be within 1-31, YYYY must be 4 digits, "/" written automatically between time units. | "(June 3, 2021)." | "06/03/2021" |
Date - Korean | String | Standard Korean Date format. If the date is written with numbers only, then the day is assumed to come last. If transcription.raw is an impossible date, such as "2022년 33월 4일", then a "normalization_error" exception is returned in the exceptions array. | 34.0.0 | "2022년 3월 4일" or "2022 March 04" | "2022/03/04" | |
Email Address | String | Uses language model | Alphanumeric, all special characters allowed | |||
Email Address International | String | 33.0.0 | Alphanumeric, all special characters allowed | |||
Enhanced Korean Freeform | String | Used for post-processing and enhancing the results of the Korean-English data type. | 38.0.1 | |||
Freeform Characters | String | Used for fields where the application should not expect real words. All letters are converted to uppercase, leading and trailing spaces are removed, and all special characters are retained. | Alphanumeric, all special characters allowed | "lakdoia3902u73.393837y4.3-3938" | "LAKDOIA3902U73.393837Y4.3-3938" | |
Freeform Characters (American English) | String | Used for fields where the application should not expect real words. All letters are converted to uppercase, leading and trailing spaces are removed, and all special characters are retained. Characters are restricted to ASCII printable characters, which include unaccented letters, numbers, and common symbols and punctuation in American English. | 28.0.3 | Alphanumeric, no accented letters allowed. Special characters are limited to the following: @#$%&*()-=+[];:'"\,.<>/!?_{|}~ and spaces. | "lakdoia3902u73.393837y4.3-3938" | "LAKDOIA3902U73.393837Y4.3-3938" |
Freeform AlphaNumeric | String | Used for fields where the application should not expect real words. All letters are converted to uppercase and leading and trailing spaces are removed. Unlike, Freeform Characters, all special characters and spaces are removed. | Alphanumeric, no special characters or spaces allowed | "123 ABC-89/XYZ" | "123ABC89XYZ" | |
Generic Text | String | Used for fields where words / sentences are expected. All letters are converted to uppercase, leading and trailing spaces are removed, consecutive spaces are consolidated to a single space. | Alphanumeric, all special characters allowed | "Data extraction is hard." | "DATA EXTRACTION IS HARD." | |
Legal Amount - X,XXX.XX | String ending with a decimal and two digits | Used for converting legal amounts found on English language checks to their numeric, courtesy amount equivalent. | Numeric, "$" and cents written out by default, entered numbers are expressed right-to-left. | "Four hundred twenty-seven + 45/100" | "427.45" | |
Legal Amount - X.XXX,XX | String ending with a comma and two digits | Used for converting legal amounts found on English language checks to their numeric, courtesy amount equivalent. | Numeric, "$" and cents written out by default, entered numbers are expressed right-to-left. | "Four hundred twenty-seven + 45/100" | "427,45" | |
Length - X,XXX.XX | String with a number and an optional unit. If a unit is present, there will be a single space between the number and unit. In the absence of a unit, no space will be added. | This format is to be used when expecting numeric length/height values that may also include a unit of measure (e.g., “5 ft”). Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. The normalized value will use a dot to separate decimals if present. | 28.0.0 | Alphanumeric, all special characters allowed | "5’10.45”" | "5 ft 10.45 in" |
Length - X.XXX,XX | String with a number and an optional unit. If a unit is present, there will be a single space between the number and unit. In the absence of a unit, no space will be added. | This format is to be used when expecting numeric length/height values that may also include a unit of measure (e.g., “5 ft”). Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. The normalized value will use a comma to separate decimals if present. | 28.0.0 | Alphanumeric, all special characters allowed | "1,7 metres" | "1,7 m" |
MICR Font | String | Used for fields printed in MICR font (often found on checks). Normalization will retain all special MICR symbols, but remove any spaces between elements. | Numeric characters along with special MICR symbols, accessible via keyboard shortcuts | "⑆123456789⑆ 12345678 123" | "⑆123456789⑆12345678123" | |
Name | String | Used for all types of names, including people, places, companies, etc. All special characters are removed. Multiple spaces in a row are reduced to one space. | Alphanumeric, special characters "- , . () &" | "T.J. Madison" | "TJ MADISON" | |
Number - X,XXX.XX | String | Used for values that represent numbers which can have a mathematical operation performed (addition, subtraction, etc). This number format expects commas to group thousands and a dot to separate decimals. Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. Abbreviations (e.g., K, MM) are converted into digits, separator commas are removed, e.g., "0321" becomes "321" and "100.00" becomes "100". | Numeric, special characters "-" to signify a negative amount and "." for decimal. | "-4.5K" | "-4500" | |
Number with Unit - X,XXX.XX | String beginning with number, followed by a space and unit if present | This number format is to be used when expecting numeric values that may also include a unit of measure (e.g., 10kg). The normalized value will start with the number and it will use a dot to separate decimals if present. Any unit of measure, if present, will be appended to the number after a space. In the absence of a unit of measure, no space will be included. | "10kg" | "10kg" | "10 kg" | |
Number - X.XXX,XX | String | Used for values that represent numbers which can have a mathematical operation performed (addition, subtraction, etc). This number format expects dots to group thousands and a comma to separate up to two decimals. Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. Abbreviations (e.g., K, MM) are converted into digits, separator dots are removed, e.g., "0321" becomes "321" and "100,00" becomes "100". | Numeric, special characters "-" to signify a negative amount and "," for decimal. | "-4,5K" | "-4500" | |
Number with Unit - X.XXX,XX | String beginning with number, followed by a space and unit if present | This number format is to be used when expecting numeric values that may also include a unit of measure (e.g., 10 kg). The normalized value will start with the number and it will use a comma to separate decimals if present. Any unit of measure, if present, will be appended to the number after a space the absence of a unit of measure, no space will be included. | Alphanumeric, all special characters allowed | "10kg" | "10 kg" | |
Numeric Text | String | Numeric data. No abbreviations are taken into consideration and leading and trailing zeros are kept intact. | Numeric | "0038937313200" | "0038937313200" | |
Phone Number | String with digits only | All punctuation (e.g., +, (), -) is removed such that only the numbers are returned. | Numeric, special character "+" | "+1 (555) 555-5555" | "15555555555" | |
SSN/EIN/TIN | 9 digit string "#########" | US Government Tax Identification numbers. When written, these can include dashes in different places but must be 9 digits. If the transcription.raw does not have 9 digits (e.g., it has letters or has more than 9 digits), then a "normalization_error" exception is returned in the exceptions array. | Numeric, must be 9 digits in length | "987-65-4321" | "987654321" | |
UK Postcode | 5 - 7 character string | Used for UK Postcodes. Space between Outward and Inward codes is removed. | Numeric, must be in acceptable UK postcode format. | "SW1W 0NY" | "SW1W0NY" | |
US Zip Code | 5 or 9 digit string | Used for US Zip Codes. Dashes or spaces between Zip Code and optional +4 is removed. | Numeric, must be either 5 digits or 9 digits in length. | "10010-7356" | "100107356" | |
US State | 2 character string | Used for US States and Territories. Variations in states are normalized to standard two character abbreviations. | Letters, must be 2 characters in length, must match a state abbreviation. | "Arizona" or "Ariz." | "AZ" | |
Weight - X,XXX.XX | String with a number and an optional unit. If a unit is present, there will be a single space between the number and unit. In the absence of a unit, no space will be added. | This format is to be used when expecting numeric weight values that may also include a unit of measure (e.g., “10.5 kg”). Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. The normalized value will use a dot to separate decimals if present. | 28.0.0 | Alphanumeric, all special characters allowed | "9 pounds and 8 ozs" | "9 lbs 8 ozs" |
Weight - X.XXX,XX | String with a number and an optional unit. If a unit is present, there will be a single space between the number and unit. In the absence of a unit, no space will be added. | This format is to be used when expecting numeric weight values that may also include a unit of measure (e.g., “10,5 kg”). Leading zeros and trailing decimal zeros that don't add meaning to such numbers are removed. The normalized value will use a comma to separate decimals if present. | 28.0.0 | Alphanumeric, all special characters allowed | "5 kg & 9,33 grammes" | "5 kg 9,33 g" |
Medical | String | Used for fields where medical terminology / sentences are expected. All letters are converted to uppercase, leading and trailing spaces are removed, consecutive spaces are consolidated to a single space. To note, spaces between slashes and characters are removed. | Alphanumeric, all special characters allowed | "120 systolic / 80 diastolic" | "120 SYSTOLIC/80 DIASTOLIC" | |
Checkbox | Boolean | No normalization, true indicates that the checkbox was checked. | N/A | N/A | N/A | |
Signature | Boolean | No normalization, true indicates that a signature was present. | N/A | N/A | N/A |
Default Data Types
- Updated on Oct 4, 2024
- Published on Oct 4, 2024
- 14 minute(s) read