Working with Regular Expressions

Regular expressions are used for simple languages to define what words are allowed in a language and what are not.

The ABBYY FlexiCapture SDK regular expression alphabet is described in the following table:

Item name Conventional regular expression sign Usage examples and explanations
Any character * "c"*"t" — allows such words as cat, cot, etc.
Character from a character range (also can be used as ) []

[b-d]"ell" — denotes words like “bell”, “cell”, “dell”

[ty]"ell" — denotes words “tell” and “yell”

Character out of a character range [^]

[^y]"ell" — denotes words like “dell”, “cell”, “tell”, but forbids “yell”

[^n-s]"ell" — denotes words like “bell”, “cell”, but forbids “nell”, “oell”, “pell”, “qell”, “rell”, and “sell"

Letter or digit X X — allows any standalone number or letter
Letter C C"ot" — allows such words as Rot, pot, cot, Dot, mot, etc
Capital letter A A"ot" — allows such words as Rot, Cot, Mot, Dot, etc
Small letter a a"ot" — allows such words as rot, cot, mot, dot, etc.
Digit N N"th" — allows such words as 5th, 4th, 6th, etc.
Any number of repetitions (applies to the expression or subexpression on the left) {-} [AB74]{-} — allows any combinations of characters A, B, 7, 4 of any length.
Number of repetitions n {n} N{2}"th" — allows such words as 25th, 84th, 11th, etc..
From n to m repetitions {n-m} N{1-3}"th" — allows such words as 5th, 84th, 111th, etc.
From 0 to n repetitions {-n} N{-2}"th" — allows such words as 84th, 5th, etc.
From n repetitions and more {n-} N{2-}"th" — allows such words as 25th, 834th, 311th, 34576th, etc.
Subexpression ()
OR | "c"(a|u)"t" — denotes words “cat” and “cut"
Space [\s]
Hyphen symbol [\-]
Slash symbol [\\]
Word from dictionary @(Dictionary) The Dictionary parameter sets the path to the user dictionary from which words must be taken. Backslashes in the path must be doubled. For example: @(D:\\MyFolder\\MyDictionary.amd).

Examples of regular expressions:

1. Postal code: [0-9]{6}. A sample value: "142172"

2. Zip code (USA): [0-9]{5}("-"[0-9]{4}){-1}.

Sample values: "55416", "33701-4313"

3. Income: N{4-8}[,]N{2}. Sample values: "15000,00", "4499,00"

4. Month in the numerical form: ((|"0")[1-9])|("10")|("11")|("12"). Sample values: "4", "05", "12"

5. Fraction: ("-"|)([0-9]{1-})(|(("."|",")([0-9]{1-}))). Sample values: "1234,567", "0.99", "100,0", "-345.6788903"

6. E-mail: [A-Za-z0-9_]{1-}(("."|"-")[A-Za-z0-9_]{1-}){-3}"@"[A-Za-z0-9_]{1-}(("."|"-")[A-Za-z0-9_]{1-}){-4}"."([A-Za-z]{2-4}|"asia"|"museum"|"travel"|"example"|"localhost") ". Sample values: "", "", ""

