Working with Regular Expressions

Regular expressions are used for simple languages to define what words are allowed in a language and what are not.

The ABBYY FlexiCapture SDK regular expression alphabet is described in the following table:

Item name Conventional regular expression sign Usage examples and explanations
Any character * "c"*"t" — allows such words as cat, cot, etc.
Character from a character range (also can be used as ) []

[b-d]"ell" — denotes words like “bell”, “cell”, “dell”

[ty]"ell" — denotes words “tell” and “yell”

Character out of a character range [^]

[^y]"ell" — denotes words like “dell”, “cell”, “tell”, but forbids “yell”

[^n-s]"ell" — denotes words like “bell”, “cell”, but forbids “nell”, “oell”, “pell”, “qell”, “rell”, and “sell"

Letter or digit X X — allows any standalone number or letter
Letter C C"ot" — allows such words as Rot, pot, cot, Dot, mot, etc
Capital letter A A"ot" — allows such words as Rot, Cot, Mot, Dot, etc
Small letter a a"ot" — allows such words as rot, cot, mot, dot, etc.
Digit N N"th" — allows such words as 5th, 4th, 6th, etc.
Any number of repetitions (applies to the expression or subexpression on the left) {-} [AB74]{-} — allows any combinations of characters A, B, 7, 4 of any length.
Number of repetitions n {n} N{2}"th" — allows such words as 25th, 84th, 11th, etc..
From n to m repetitions {n-m} N{1-3}"th" — allows such words as 5th, 84th, 111th, etc.
From 0 to n repetitions {-n} N{-2}"th" — allows such words as 84th, 5th, etc.
From n repetitions and more {n-} N{2-}"th" — allows such words as 25th, 834th, 311th, 34576th, etc.
Subexpression ()
OR | "c"(a|u)"t" — denotes words “cat” and “cut"
Space [\s]
Hyphen symbol [\-]
Slash symbol [\\]
Word from dictionary @(Dictionary) The Dictionary parameter sets the path to the user dictionary from which words must be taken. Backslashes in the path must be doubled. For example: @(D:\\MyFolder\\MyDictionary.amd).

Examples of regular expressions:

1. Postal code: [0-9]{6}. A sample value: "142172"

2. Zip code (USA): [0-9]{5}("-"[0-9]{4}){-1}.

Sample values: "55416", "33701-4313"

3. Income: N{4-8}[,]N{2}. Sample values: "15000,00", "4499,00"

4. Month in the numerical form: ((|"0")[1-9])|("10")|("11")|("12"). Sample values: "4", "05", "12"

5. Fraction: ("-"|)([0-9]{1-})(|(("."|",")([0-9]{1-}))). Sample values: "1234,567", "0.99", "100,0", "-345.6788903"

6. E-mail: [A-Za-z0-9_]{1-}(("."|"-")[A-Za-z0-9_]{1-}){-3}"@"[A-Za-z0-9_]{1-}(("."|"-")[A-Za-z0-9_]{1-}){-4}"."([A-Za-z]{2-4}|"asia"|"museum"|"travel"|"example"|"localhost") ". Sample values: "", "", ""

See also


3/24/2023 8:48:38 AM

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.