vtkTextExtraction Class Reference

#include <vtkTextExtraction.h>

Inheritance diagram for vtkTextExtraction:

Inheritance graph
[legend]
Collaboration diagram for vtkTextExtraction:

Collaboration graph
[legend]

List of all members.


Detailed Description

Extracts text from documents based on their MIME type.

Given a table containing document ids, URIs, Mime types and document contents, extracts plain text from each document, and generates a list of 'tags' that delineate ranges of text. The actual work of extracting text and generating tags is performed by an ordered list of vtkTextExtractionStrategy objects.

By default, vtkTextExtraction has just a single strategy for extracting plain text documents. Callers will almost certainly want to supplement or replace the default with their own strategies.

Inputs: Input port 0: (required) A vtkTable containing document ids, Mime types and document contents (which could be binary).

Outputs: Output port 0: The same table with an additional "text" column that contains the text extracted from each document. Output port 1: A table of document tags that includes "document", "uri", "begin", "end", and "type" columns.

Use SetInputArrayToProcess(0, ...) to specify the input table column that contains document ids (must be a vtkIdTypeArray). Default: "document".

Use SetInputArrayToProcess(1, ...) to specify the input table column that contains URIs (must be a vtkStringArray). Default: "uri".

Use SetInputArrayToProcess(2, ...) to specify the input table column that contains Mime types (must be a vtkStringArray). Default: "mime_type".

Use SetInputArrayToProcess(3, ...) to specify the input table column that contains document contents (must be a vtkStringArray). Default: "content".

Warning:
The input document contents array must be a string array, even though the individual document contents may be binary data.
See also:
vtkTextExtractionStrategy, vtkPlainTextExtractionStrategy
Thanks:
Developed by Timothy M. Shead (tshead@sandia.gov) at Sandia National Laboratories.
Events:
vtkCommand::ProgressEvent
Tests:
vtkTextExtraction (Tests)

Definition at line 80 of file vtkTextExtraction.h.


Public Types

typedef vtkTableAlgorithm Superclass

Public Member Functions

virtual const char * GetClassName ()
virtual int IsA (const char *type)
void PrintSelf (ostream &os, vtkIndent indent)
void ClearStrategies ()
void PrependStrategy (vtkTextExtractionStrategy *strategy)
void AppendStrategy (vtkTextExtractionStrategy *strategy)
virtual void SetOutputArray (const char *)
virtual char * GetOutputArray ()

Static Public Member Functions

static vtkTextExtractionNew ()
static int IsTypeOf (const char *type)
static vtkTextExtractionSafeDownCast (vtkObject *o)

Protected Member Functions

 vtkTextExtraction ()
 ~vtkTextExtraction ()
virtual int RequestData (vtkInformation *request, vtkInformationVector **inputVector, vtkInformationVector *outputVector)

Member Typedef Documentation

Reimplemented from vtkTableAlgorithm.

Definition at line 85 of file vtkTextExtraction.h.


Constructor & Destructor Documentation

vtkTextExtraction::vtkTextExtraction (  )  [protected]

vtkTextExtraction::~vtkTextExtraction (  )  [protected]


Member Function Documentation

static vtkTextExtraction* vtkTextExtraction::New (  )  [static]

Create an object with Debug turned off, modified time initialized to zero, and reference counting on.

Reimplemented from vtkTableAlgorithm.

virtual const char* vtkTextExtraction::GetClassName (  )  [virtual]

Reimplemented from vtkTableAlgorithm.

static int vtkTextExtraction::IsTypeOf ( const char *  name  )  [static]

Return 1 if this class type is the same type of (or a subclass of) the named class. Returns 0 otherwise. This method works in combination with vtkTypeMacro found in vtkSetGet.h.

Reimplemented from vtkTableAlgorithm.

virtual int vtkTextExtraction::IsA ( const char *  name  )  [virtual]

Return 1 if this class is the same type of (or a subclass of) the named class. Returns 0 otherwise. This method works in combination with vtkTypeMacro found in vtkSetGet.h.

Reimplemented from vtkTableAlgorithm.

static vtkTextExtraction* vtkTextExtraction::SafeDownCast ( vtkObject o  )  [static]

Reimplemented from vtkTableAlgorithm.

void vtkTextExtraction::PrintSelf ( ostream &  os,
vtkIndent  indent 
) [virtual]

Methods invoked by print to print information about the object including superclasses. Typically not called by the user (use Print() instead) but used in the hierarchical print process to combine the output of several classes.

Reimplemented from vtkTableAlgorithm.

void vtkTextExtraction::ClearStrategies (  ) 

Clear the list of strategies.

void vtkTextExtraction::PrependStrategy ( vtkTextExtractionStrategy strategy  ) 

Prepend a strategy to the list of strategies. vtkTextExtraction assumes ownership of the supplied object.

void vtkTextExtraction::AppendStrategy ( vtkTextExtractionStrategy strategy  ) 

Prepend a strategy to the list of strategies. vtkTextExtraction assumes ownership of the supplied object.

virtual void vtkTextExtraction::SetOutputArray ( const char *   )  [virtual]

Specifies the name of the output text array. Default: "text".

virtual char* vtkTextExtraction::GetOutputArray (  )  [virtual]

Specifies the name of the output text array. Default: "text".

virtual int vtkTextExtraction::RequestData ( vtkInformation request,
vtkInformationVector **  inputVector,
vtkInformationVector outputVector 
) [protected, virtual]

This is called by the superclass. This is the method you should override.

Reimplemented from vtkTableAlgorithm.


The documentation for this class was generated from the following file:

Generated on Wed Aug 24 12:12:29 2011 for VTK by  doxygen 1.5.6