Regular expressions are used for simple languages to define what words are allowed in a language and what are not.
The ABBYY FlexiCapture SDK regular expression alphabet is described in the following table:
Item name |
Conventional regular expression sign |
Usage examples and explanations |
Any character |
* |
"c"*"t" — allows such words as cat, cot, etc. |
Character from a character range (also can be used as ) |
[] |
[b-d]"ell" — denotes words like “bell”, “cell”, “dell”
[ty]"ell" — denotes words “tell” and “yell”
|
Character out of a character range |
[^] |
[^y]"ell" — denotes words like “dell”, “cell”, “tell”, but forbids “yell”
[^n-s]"ell" — denotes words like “bell”, “cell”, but forbids “nell”, “oell”, “pell”, “qell”, “rell”, and “sell"
|
Letter or digit |
X |
X — allows any standalone number or letter |
Letter |
C |
C"ot" — allows such words as Rot, pot, cot, Dot, mot, etc |
Capital letter |
A |
A"ot" — allows such words as Rot, Cot, Mot, Dot, etc |
Small letter |
a |
a"ot" — allows such words as rot, cot, mot, dot, etc. |
Digit |
N |
N"th" — allows such words as 5th, 4th, 6th, etc. |
Any number of repetitions (applies to the expression or subexpression on the left) |
{-} |
[AB74]{-} — allows any combinations of characters A, B, 7, 4 of any length. |
Number of repetitions n |
{n} |
N{2}"th" — allows such words as 25th, 84th, 11th, etc.. |
From n to m repetitions |
{n-m} |
N{1-3}"th" — allows such words as 5th, 84th, 111th, etc. |
From 0 to n repetitions |
{-n} |
N{-2}"th" — allows such words as 84th, 5th, etc. |
From n repetitions and more |
{n-} |
N{2-}"th" — allows such words as 25th, 834th, 311th, 34576th, etc. |
Subexpression |
() |
|
OR |
| |
"c"(a|u)"t" — denotes words “cat” and “cut" |
Space |
[\s] |
|
Hyphen symbol |
[\-] |
|
Slash symbol |
[\\] |
|
Word from dictionary |
@(Dictionary) |
The Dictionary parameter sets the path to the user dictionary from which words must be taken. Backslashes in the path must be doubled. For example: @(D:\\MyFolder\\MyDictionary.amd). |
Examples of regular expressions:
1. Postal code: [0-9]{6}. A sample value: "142172"
2. Zip code (USA): [0-9]{5}("-"[0-9]{4}){-1}.
Sample values: "55416", "33701-4313"
3. Income: N{4-8}[,]N{2}. Sample values: "15000,00", "4499,00"
4. Month in the numerical form: ((|"0")[1-9])|("10")|("11")|("12"). Sample values: "4", "05", "12"
5. Fraction: ("-"|)([0-9]{1-})(|(("."|",")([0-9]{1-}))). Sample values: "1234,567", "0.99", "100,0", "-345.6788903"
6. E-mail: [A-Za-z0-9_]{1-}(("."|"-")[A-Za-z0-9_]{1-}){-3}"@"[A-Za-z0-9_]{1-}(("."|"-")[A-Za-z0-9_]{1-}){-4}"."([A-Za-z]{2-4}|"asia"|"museum"|"travel"|"example"|"localhost") ". Sample values: "support@abbyy.com", "my-name@company.org.ru", "info@gallery.museum"
See also
ISimpleLanguage::RegularExpression