Glossary

А

active area The currently selected area on an image. An active area can be deleted, moved or modified. To make an area active, click it. The frame enclosing an active area is bold and has sizing handles that can be dragged to change its size.

area A section of an image enclosed by a frame and containing a certain type of data. Before performing OCR, Verification Station detects text, picture, table, and barcode areas in order to determine which sections of the image should be recognized and in what order.

area template A template that contains information about the size and locations of the areas in similar-looking documents.

B

background picture area An image area that contains a picture with text printed over it.

barcode area An image area that contains a barcode.

base form The form of a word to which endings, prefixes or suffixes are added.

Back to top

C

code page A table of correspondences between characters and their codes. Users can select the characters they need from those available in a code page.

compound word A word made up of two or more existing words. Compound word is a word that the program cannot find in its dictionary but which it can create from two or more dictionary words.

D

document analysis The process of identifying the logical structure of a document and areas that contain various types of data. Document analysis can be carried out automatically or manually.

document type A parameter that tells the program how the original text was printed (e.g. on a laser printer, on a typewriter, etc.). For laser-printed texts, select Auto, for typewritten texts, select Typewriter, for faxes, select Fax.

F

font effects The appearance of a font (e.g. bold, italic, underlined, strikethrough, subscript, superscript, small caps).

H

headers and footers Images or text in the top or bottom margin of a page. Headers are located at the top of the page and footers are located at the bottom.

I

ignored characters Any non-letter characters found in words (e.g. syllable characters or stress marks). These characters are ignored during the spell check.

inverted image An image with white characters printed against a dark background.

Back to top

K

keyboard shortcuts Keys or combinations of keys that trigger a specific action when pressed. Using keyboard shortcuts can significantly increase your productivity.

L

ligature A combination of two or more characters that are "stuck" together (e.g. fi, fl, ffi). Such characters are difficult for Verification Station to separate. Treating them as one character improves OCR accuracy.

low-confidence characters Characters that may have been recognized by the program incorrectly.

low-confidence words Words that contain one or more low-confidence characters.

M

monospaced font A font (such as Courier New) in which all characters are equally spaced. For better OCR results on monospaced fonts, on the OCR tab of the Options dialog box, select Typewriter in the Document type group of options.

Back to top

О

OCR (Optical Character Recognition) A technology that enables computers to read text, detect pictures, tables, and other formatting elements.

omnifont system A recognition system that recognizes characters set in any font without prior training.

P

page layout The arrangement of text, tables, pictures, paragraphs, and columns on a page. The fonts, font sizes, font colors, text background, and text orientation are also part of the page layout.

paradigm All grammatical forms of a word.

pattern A set of associations between averaged character images and their respective names. Patterns are created when you train Verification Station on a specific text.

picture area An image area that contains a picture. This type of area may enclose an actual picture or any other object (e.g. a text fragment) that should be displayed as a picture.

prohibited characters Characters that you think will never occur in a text to be recognized. Specifying prohibited characters increases the speed and quality of OCR.

Back to top

R

recognition area An image area that Verification Station should analyze automatically.

resolution A scanning parameter measured in dots per inch (dpi). Resolution of 300 dpi should be used for texts set in 10 pt fonts and larger, 400 to 600 dpi is preferable for texts of smaller font sizes (9 pt and less).

S

scanning mode A scanning parameter that determines whether an image must be scanned in black and white, grayscale, or color.

separators Symbols that can separate words (e.g. /, \, dash) and that are separated by spaces from the words themselves.

shortcut menu The menu that appears when you right-click something, such as an area or another part of a document.

Т

table area An image area that contains data in tabular form. When the program reads this type of area, it draws vertical and horizontal separators inside the area to form a table. This area is then rendered as a table in the output text.

text area An image area that contains text. Text areas should only contain single-column text.

training The process of establishing a correspondence between a character image and the character itself. See also: If your printed document contains non-standard fonts.

U

Unicode An international text encoding standard developed by the Unicode Consortium (Unicode, Inc.). The Unicode standard provides an easily extendible 16-bit system for encoding symbols from almost all contemporary languages. It specifies how symbols should be encoded and determines which algorithms and character properties should be used during the encoding process.

Back to top

26.03.2024 13:49:49

Please leave your feedback about this article

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.