#include <vtkTextExtraction.h>
Given a table containing document ids, URIs, Mime types and document contents, extracts plain text from each document, and generates a list of 'tags' that delineate ranges of text. The actual work of extracting text and generating tags is performed by an ordered list of vtkTextExtractionStrategy objects.
By default, vtkTextExtraction has just a single strategy for extracting plain text documents. Callers will almost certainly want to supplement or replace the default with their own strategies.
Inputs: Input port 0: (required) A vtkTable containing document ids, Mime types and document contents (which could be binary).
Outputs: Output port 0: The same table with an additional "text" column that contains the text extracted from each document. Output port 1: A table of document tags that includes "document", "uri", "begin", "end", and "type" columns.
Use SetInputArrayToProcess(0, ...) to specify the input table column that contains document ids (must be a vtkIdTypeArray). Default: "document".
Use SetInputArrayToProcess(1, ...) to specify the input table column that contains URIs (must be a vtkStringArray). Default: "uri".
Use SetInputArrayToProcess(2, ...) to specify the input table column that contains Mime types (must be a vtkStringArray). Default: "mime_type".
Use SetInputArrayToProcess(3, ...) to specify the input table column that contains document contents (must be a vtkStringArray). Default: "content".
Definition at line 80 of file vtkTextExtraction.h.
Public Types | |
typedef vtkTableAlgorithm | Superclass |
Public Member Functions | |
virtual const char * | GetClassName () |
virtual int | IsA (const char *type) |
void | PrintSelf (ostream &os, vtkIndent indent) |
void | ClearStrategies () |
void | PrependStrategy (vtkTextExtractionStrategy *strategy) |
void | AppendStrategy (vtkTextExtractionStrategy *strategy) |
virtual void | SetOutputArray (const char *) |
virtual char * | GetOutputArray () |
Static Public Member Functions | |
static vtkTextExtraction * | New () |
static int | IsTypeOf (const char *type) |
static vtkTextExtraction * | SafeDownCast (vtkObject *o) |
Protected Member Functions | |
vtkTextExtraction () | |
~vtkTextExtraction () | |
virtual int | RequestData (vtkInformation *request, vtkInformationVector **inputVector, vtkInformationVector *outputVector) |
vtkTextExtraction::vtkTextExtraction | ( | ) | [protected] |
vtkTextExtraction::~vtkTextExtraction | ( | ) | [protected] |
static vtkTextExtraction* vtkTextExtraction::New | ( | ) | [static] |
Create an object with Debug turned off, modified time initialized to zero, and reference counting on.
Reimplemented from vtkTableAlgorithm.
virtual const char* vtkTextExtraction::GetClassName | ( | ) | [virtual] |
Reimplemented from vtkTableAlgorithm.
static int vtkTextExtraction::IsTypeOf | ( | const char * | name | ) | [static] |
Return 1 if this class type is the same type of (or a subclass of) the named class. Returns 0 otherwise. This method works in combination with vtkTypeMacro found in vtkSetGet.h.
Reimplemented from vtkTableAlgorithm.
virtual int vtkTextExtraction::IsA | ( | const char * | name | ) | [virtual] |
Return 1 if this class is the same type of (or a subclass of) the named class. Returns 0 otherwise. This method works in combination with vtkTypeMacro found in vtkSetGet.h.
Reimplemented from vtkTableAlgorithm.
static vtkTextExtraction* vtkTextExtraction::SafeDownCast | ( | vtkObject * | o | ) | [static] |
Reimplemented from vtkTableAlgorithm.
void vtkTextExtraction::PrintSelf | ( | ostream & | os, | |
vtkIndent | indent | |||
) | [virtual] |
Methods invoked by print to print information about the object including superclasses. Typically not called by the user (use Print() instead) but used in the hierarchical print process to combine the output of several classes.
Reimplemented from vtkTableAlgorithm.
void vtkTextExtraction::ClearStrategies | ( | ) |
Clear the list of strategies.
void vtkTextExtraction::PrependStrategy | ( | vtkTextExtractionStrategy * | strategy | ) |
Prepend a strategy to the list of strategies. vtkTextExtraction assumes ownership of the supplied object.
void vtkTextExtraction::AppendStrategy | ( | vtkTextExtractionStrategy * | strategy | ) |
Prepend a strategy to the list of strategies. vtkTextExtraction assumes ownership of the supplied object.
virtual void vtkTextExtraction::SetOutputArray | ( | const char * | ) | [virtual] |
Specifies the name of the output text array. Default: "text".
virtual char* vtkTextExtraction::GetOutputArray | ( | ) | [virtual] |
Specifies the name of the output text array. Default: "text".
virtual int vtkTextExtraction::RequestData | ( | vtkInformation * | request, | |
vtkInformationVector ** | inputVector, | |||
vtkInformationVector * | outputVector | |||
) | [protected, virtual] |
This is called by the superclass. This is the method you should override.
Reimplemented from vtkTableAlgorithm.