IFieldExtractor

Purpose

Identifies fields in the text of a document.

Note: Can only be changed in an extraction script.

Methods

Name Description
ExtractRegularExpression( regularExpression : string, resultCollectionName : string )

Specifies a regular expression for identifying text spans.


The resultCollectionName parameter sets a name for the resulting collection of objects. The name of the collection can be used in XML queries run on documents. You can also access the resulting collection by its name.

ExtractNerObjects()

Tells the field identification mechanism to identify NER entities in the text of a document. Once the objects are identified, the field identification mechanism will have collections available with the following predefined names: NerPerson, NerOrg, NerGeo, NerAddress, NerMoney, and NerDate.

Note: The NerMoney and NerDate objects are used only in extraction scripts and are not available in ABBYY FlexiLayout Studio.

ExtractWordsFromUserDictionary( userDictionaryName : string, languageName : string )

Tells the field identification mechanism to identify words from a user dictionary in the text of a document. Words may occur in the text in any inflected form. A user dictionary can be selected on the Properties tab of the script rule. The dictionary will be accessed by its name.


The languageName parameter specifies the language in which to generate the inflected forms of the words in the user dictionary.

ParseAddress() Parses the text in a field or section into address components.
ParseAddressInPosition( resultCollectionNamePrefix : string, startPos : int, endPos : int ) Parses the text fragment between specified start and end positions in a field or section into address components.
ParseAddressInSpan(resultCollectionNamePrefix : string, span : IInterval ) Parses the text fragment within a specified interval in a field or section into address components.
RunQuery( xmlQuery : string, queryName : string ) : IExtractedObjects

Runs an XML query on the text of a document and identified text spans. Returns a collection of results as an array of text spans containing the identified resulting strings.

The queryName parameter specifies a name for the query, which can then be used to get the resulting collection from the field identification mechanism.

RunQueryAndSaveToField( xmlQuery : string, queryName : string, fieldName : string ) Runs an XML query on the text of a document and identified text spans and saves the results to a document field.
SaveSpanToField( span : IInterval, fieldName  : string ) Saves the text fragment corresponding to the span specified for a text substring to a document field.
SaveTextToField( startPos : int, endPos : int, fieldName : string ) Saves the text fragment corresponding to the start and end positions specified for a text substring to a document field.
ExtractedObjects( collectionName : string,  [optional] objectTypeName : VARIANT) : IExtractedObjects

Allows accessing a collection of identified objects by the name of the collection.

For collections of NER objects identified as address components, do one of the following:

  • For collectionName, use the name passed as resultCollectionNamePrefix to  the ParseAddress... methods. Additionally, set objectTypeName as the name of the type of the collection (e.g. "NerStreet" or "NerCity"), or
  • Specify collectionName as [resultCollectionNamePrefix ]_[ objectTypeName] without specifying objectTypeName as an optional argument.
QueryResults( queryName : string ) : IExtractedObjects Allows accessing the result of an XML query by the name of the query.

Properties

Name Type Permissions Value
SourceText() string Read The text of the document or field to which the field identification mechanism is applied.

SourceNode()

IField Read The field or the section to which the field identification mechanism is applied.
SourceDocument() IDocument Read The document that contains SourceNode.

12/1/2020 7:03:59 AM


Please leave your feedback about this article