Chinese Simplified (简体中文)

Paragraph Object (IParagraph Interface)

This object exposes methods and properties for working with a single paragraph of the recognized text.

A paragraph in the ABBYY FineReader Engine object model is an elementary text unit. It is through this object that a user can get:

  • the recognized text (use Text property for this purpose)
  • different paragraph parameters (ExtendedParams, ListParams, ParagraphStyle properties)
  • collections of paragraph lines and words (Lines and Words properties)
  • a single character parameters (GetCharParams, SetCharParams and GetDropCapCharParams methods)
  • bookmarks (Bookmark and UserBookmark properties)

Notes:

  • The coordinates of the paragraph borders (Left, Top, Right, Bottom properties) are not available for the paragraphs of barcodes.
  • Bookmarks in ABBYY FineReader Engine are internal (technical), or custom entities with names encoded using keywords (prefixes). These keywords and their vocabulary may vary depending on the version of the technologies used.

Properties

Name Type Description
Application Engine, read-only Returns the Engine object.
Paragraph text, words, lines
Text BSTR, read-only

Provides access to the recognized text of the paragraph in a form of Unicode string. It is through this property that you get the recognized text. This string may contain the following special characters:

  • 0x2028 — Line break symbol
  • L'\n' — Paragraph break symbol
  • 0xFFFC — Object replacement character (denotes an embedded picture inside the text)
  • 0x0009 — Tabulation.
  • 0x005E — Circumflex accent (^), used by ABBYY FineReader Engine as a replacement for unrecognized characters
  • 0x00AC — Soft hyphen

Note: If the paragraph has right-to-left writing direction (like for Hebrew), the text of the paragraph is a string which contains characters of the paragraph in the order they are read. For example, the Hebrew text will be returned as the string "".

Please note, that the recognized text can negligibly differ from the original. Some input symbols can be replaced with a special character. For example, "..." symbol can be replaced with tabulation. That is why the number of symbols in the recognized text can differ from the original. If you want to get access to the input word with no replaced symbols, use IWord::Text.

Words Words, read-only

Provides access to the collection of the paragraph words.

Note: In contrast with the Text property, if the paragraph has right-to-left writing direction (like for Hebrew), a word in the paragraph is a string which contains characters of the word from left to right. For example, the Hebrew word will be returned as the string "".

Lines ParagraphLines, read-only Provides access to the collection of the paragraph lines. The property returns a constant object.
Additional paragraph elements
BookmarkCount int, read-only Returns the number of bookmarks in the paragraph.
Bookmark BSTR, read-only Provides access to the bookmark of any type (technical or user) by its index in the internal collection of the paragraph's bookmarks. The bookmark accessed via this property contains a prefix in its name.
Hyperlink Hyperlink, read-only Returns a reference to the Hyperlink object which describes the hyperlink in the position. If there is no hyperlink, this property is set to 0.
TabPositions TabPositions, read-only Provides access to all tab stops in the paragraph.
UserBookmark BSTR, read-only Provides access to the user bookmark by its index in the internal collection of the paragraph's bookmarks. The bookmark accessed via this property does not contain a prefix in its name.
UserBookmarkCount int, read-only Returns the number of user bookmarks in the paragraph.
Paragraph attributes
Length int, read-only

This property contains the number of characters in paragraph. This value is the same as the number of characters in the string received through the Text property.

Note: The paragraph break symbol at the end of the paragraph is included in the Text property and counted in the Length property.

ExtendedParams ParagraphParams Provides access to the parameters of the Paragraph object exposed by the ParagraphParams object.
ListParams ListParams, read-only Provides access to the parameters of the list to which the paragraph belongs. If the paragraph is not in the list, the IListParams::List property returns NULL.
ParagraphStyle ParagraphStyle

Provides access to the parameters of the paragraph style. These parameters become accessible only after document synthesis.

Note: The property returns a constant object.

DropCapCharsCount int Provides access to the number of characters in the dropped capital of a paragraph. The first DropCapCharsCount symbols of the paragraph are assumed to be dropped capital. This property is not changed when paragraph is edited, so it may be greater than the length of the paragraph.
ColumnNumber int, read-only Stores the number of the column to which the character in the position belongs.
Coordinates
Bottom int, read-only

Stores the coordinate of the bottom border of the paragraph as it is positioned on the image.

Note: This property is not available for the paragraphs of barcodes.

Left int, read-only

Stores the coordinate of the left border of the paragraph as it is positioned on the image.

Note: This property is not available for the paragraphs of barcodes.

Right int, read-only

Stores the coordinate of the right border of the paragraph as it is positioned on the image.

Note: This property is not available for the paragraphs of barcodes.

Top int, read-only

Stores the coordinate of the top border of the paragraph as it is positioned on the image.

Note: This property is not available for the paragraphs of barcodes.

Methods

Name Description
DeleteBookmark Deletes the specified bookmark of any type (technical or user) from the paragraph.
GetBookmarkRange Detects the index of the initial character and the length of the string that forms the bookmark by its name.
GetCharParams Provides access to parameters of a single character.
GetDropCapCharParams Provides access to the parameters of a paragraph's dropped capital.
GetHyperlinkRange Analyzes a single hyperlink character and detects the index of the initial character and the length of the string that forms the hyperlink.
GetWordRecognitionVariants Returns a collection of variants of a word's recognition in the current position inside the text of a paragraph.
Insert Inserts a string into the text of the paragraph.
InsertParagraphBreak Divides the paragraph into two parts.
InsertTab Inserts a tab stop into the chosen text position.
InsertText Inserts the specified text into the text of the paragraph.
NextGroup Finds the next character in the paragraph for which the selected parameters differ from the character with which the search begins. This method can be used to find all bold or italic words in the paragraph, all uncertainly recognized characters, etc.
Range Returns a substring from the text of the paragraph.
Remove Deletes a range from the text of the paragraph.
SetBookmark Sets a user bookmark to a string within a paragraph.
SetCharParams Sets parameters for a group of characters.
SetHyperlink Sets a hyperlink to a string within a paragraph.

Related objects

Object Diagram

Output parameter

This object is the output parameter of the following methods:

Input parameter

This object is the input parameter of the IndexOf method of the Paragraphs object.

Samples

C# code

The object is used in the following code samples: CustomLanguage, RecognizedTextProcessing; and demo tools: Camera OCR, Engine Predefined Processing Profiles, Image Preprocessing.

See also

Paragraphs

Working with Text

Working with Properties

17.09.2024 15:14:41

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.