VTK
Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions | Protected Attributes
vtkKMeansStatistics Class Reference

A class for KMeans clustering. More...

#include <vtkKMeansStatistics.h>

Inheritance diagram for vtkKMeansStatistics:
Inheritance graph
[legend]
Collaboration diagram for vtkKMeansStatistics:
Collaboration graph
[legend]

List of all members.

Public Types

typedef vtkStatisticsAlgorithm Superclass

Public Member Functions

virtual int IsA (const char *type)
vtkKMeansStatisticsNewInstance () const
virtual void PrintSelf (ostream &os, vtkIndent indent)
virtual void SetDistanceFunctor (vtkKMeansDistanceFunctor *)
virtual vtkKMeansDistanceFunctorGetDistanceFunctor ()
virtual void SetDefaultNumberOfClusters (int)
virtual int GetDefaultNumberOfClusters ()
virtual void SetKValuesArrayName (const char *)
virtual char * GetKValuesArrayName ()
virtual void SetMaxNumIterations (int)
virtual int GetMaxNumIterations ()
virtual void SetTolerance (double)
virtual double GetTolerance ()
virtual void Aggregate (vtkDataObjectCollection *, vtkMultiBlockDataSet *)
virtual bool SetParameter (const char *parameter, int index, vtkVariant value)

Static Public Member Functions

static int IsTypeOf (const char *type)
static vtkKMeansStatisticsSafeDownCast (vtkObjectBase *o)
static vtkKMeansStatisticsNew ()

Protected Member Functions

virtual vtkObjectBaseNewInstanceInternal () const
 vtkKMeansStatistics ()
 ~vtkKMeansStatistics ()
virtual void Derive (vtkMultiBlockDataSet *)
virtual vtkIdType GetTotalNumberOfObservations (vtkIdType numObservations)
virtual void Learn (vtkTable *, vtkTable *, vtkMultiBlockDataSet *)
virtual void Assess (vtkTable *, vtkMultiBlockDataSet *, vtkTable *)
virtual void Test (vtkTable *, vtkMultiBlockDataSet *, vtkTable *)
virtual void SelectAssessFunctor (vtkTable *inData, vtkDataObject *inMeta, vtkStringArray *rowNames, AssessFunctor *&dfunc)
virtual void UpdateClusterCenters (vtkTable *newClusterElements, vtkTable *curClusterElements, vtkIdTypeArray *numMembershipChanges, vtkIdTypeArray *numElementsInCluster, vtkDoubleArray *error, vtkIdTypeArray *startRunID, vtkIdTypeArray *endRunID, vtkIntArray *computeRun)
int InitializeDataAndClusterCenters (vtkTable *inParameters, vtkTable *inData, vtkTable *dataElements, vtkIdTypeArray *numberOfClusters, vtkTable *curClusterElements, vtkTable *newClusterElements, vtkIdTypeArray *startRunID, vtkIdTypeArray *endRunID)
virtual void CreateInitialClusterCenters (vtkIdType numToAllocate, vtkIdTypeArray *numberOfClusters, vtkTable *inData, vtkTable *curClusterElements, vtkTable *newClusterElements)

Protected Attributes

int DefaultNumberOfClusters
char * KValuesArrayName
int MaxNumIterations
double Tolerance
vtkKMeansDistanceFunctorDistanceFunctor

Detailed Description

A class for KMeans clustering.

This class takes as input an optional vtkTable on port LEARN_PARAMETERS specifying initial set(s) of cluster values of the following form:

           K     | Col1            |  ...    | ColN
      -----------+-----------------+---------+---------------
           M     |clustCoord(1, 1) |  ...    | clustCoord(1, N)
           M     |clustCoord(2, 1) |  ...    | clustCoord(2, N)
           .     |       .         |   .     |        .
           .     |       .         |   .     |        .
           .     |       .         |   .     |        .
           M     |clustCoord(M, 1) |  ...    | clustCoord(M, N)
           L     |clustCoord(1, 1) |  ...    | clustCoord(1, N)
           L     |clustCoord(2, 1) |  ...    | clustCoord(2, N)
           .     |       .         |   .     |        .
           .     |       .         |   .     |        .
           .     |       .         |   .     |        .
           L     |clustCoord(L, 1) |  ...    | clustCoord(L, N)
 

Because the desired value of K is often not known in advance and the results of the algorithm are dependent on the initial cluster centers, we provide a mechanism for the user to test multiple runs or sets of cluster centers within a single call to the Learn phase. The first column of the table identifies the number of clusters K in the particular run (the entries in this column should be of type vtkIdType), while the remaining columns are a subset of the columns contained in the table on port INPUT_DATA. We require that all user specified clusters be of the same dimension N and consequently, that the LEARN_PARAMETERS table have N+1 columns. Due to this restriction, only one request can be processed for each call to the Learn phase and subsequent requests are silently ignored. Note that, if the first column of the LEARN_PARAMETERS table is not of type vtkIdType, then the table will be ignored and a single run will be performed using the first DefaultNumberOfClusters input data observations as initial cluster centers.

When the user does not supply an initial set of clusters, then the first DefaultNumberOfClusters input data observations are used as initial cluster centers and a single run is performed.

This class provides the following functionalities, depending on the operation in which it is executed: Learn: calculates new cluster centers for each run. The output metadata on port OUTPUT_MODEL is a multiblock dataset containing at a minimum one vtkTable with columns specifying the following for each run: the run ID, number of clusters, number of iterations required for convergence, total error associated with the cluster (sum of squared Euclidean distance from each observation to its nearest cluster center), the cardinality of the cluster, and the new cluster coordinates.

Derive: An additional vtkTable is stored in the multiblock dataset output on port OUTPUT_MODEL. This table contains columns that store for each run: the runID, number of clusters, total error for all clusters in the run, local rank, and global rank. The local rank is computed by comparing squared Euclidean errors of all runs with the same number of clusters. The global rank is computed analagously across all runs.

Assess: This requires a multiblock dataset (as computed from Learn and Derive) on input port INPUT_MODEL and tabular data on input port INPUT_DATA that contains column names matching those of the tables on input port INPUT_MODEL. The assess mode reports the closest cluster center and associated squared Euclidean distance of each observation in port INPUT_DATA's table to the cluster centers for each run in the multiblock dataset provided on port INPUT_MODEL.

The code can handle a wide variety of data types as it operates on vtkAbstractArrays and is not limited to vtkDataArrays. A default distance functor that computes the sum of the squares of the Euclidean distance between two objects is provided (vtkKMeansDistanceFunctor). The default distance functor can be overridden to use alternative distance metrics.

Thanks:
Thanks to Janine Bennett, David Thompson, and Philippe Pebay of Sandia National Laboratories for implementing this class. Updated by Philippe Pebay, Kitware SAS 2012
Examples:
vtkKMeansStatistics (Examples)
Tests:
vtkKMeansStatistics (Tests)

Definition at line 113 of file vtkKMeansStatistics.h.


Member Typedef Documentation

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

Definition at line 116 of file vtkKMeansStatistics.h.


Constructor & Destructor Documentation


Member Function Documentation

static int vtkKMeansStatistics::IsTypeOf ( const char *  name) [static]

Return 1 if this class type is the same type of (or a subclass of) the named class. Returns 0 otherwise. This method works in combination with vtkTypeMacro found in vtkSetGet.h.

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

virtual int vtkKMeansStatistics::IsA ( const char *  name) [virtual]

Return 1 if this class is the same type of (or a subclass of) the named class. Returns 0 otherwise. This method works in combination with vtkTypeMacro found in vtkSetGet.h.

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

virtual vtkObjectBase* vtkKMeansStatistics::NewInstanceInternal ( ) const [protected, virtual]

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

virtual void vtkKMeansStatistics::PrintSelf ( ostream &  os,
vtkIndent  indent 
) [virtual]

Methods invoked by print to print information about the object including superclasses. Typically not called by the user (use Print() instead) but used in the hierarchical print process to combine the output of several classes.

Reimplemented from vtkStatisticsAlgorithm.

Reimplemented in vtkPKMeansStatistics.

Create an object with Debug turned off, modified time initialized to zero, and reference counting on.

Reimplemented from vtkTableAlgorithm.

Reimplemented in vtkPKMeansStatistics.

Set the DistanceFunctor.

Set the DistanceFunctor.

Set/get the DefaultNumberOfClusters, used when no initial cluster coordinates are specified.

Set/get the DefaultNumberOfClusters, used when no initial cluster coordinates are specified.

virtual void vtkKMeansStatistics::SetKValuesArrayName ( const char *  ) [virtual]

Set/get the KValuesArrayName.

virtual char* vtkKMeansStatistics::GetKValuesArrayName ( ) [virtual]

Set/get the KValuesArrayName.

virtual void vtkKMeansStatistics::SetMaxNumIterations ( int  ) [virtual]

Set/get the MaxNumIterations used to terminate iterations on cluster center coordinates when the relative tolerance can not be met.

Set/get the MaxNumIterations used to terminate iterations on cluster center coordinates when the relative tolerance can not be met.

virtual void vtkKMeansStatistics::SetTolerance ( double  ) [virtual]

Set/get the relative Tolerance used to terminate iterations on cluster center coordinates.

Set/get the relative Tolerance used to terminate iterations on cluster center coordinates.

virtual void vtkKMeansStatistics::Aggregate ( vtkDataObjectCollection ,
vtkMultiBlockDataSet  
) [inline, virtual]

Given a collection of models, calculate aggregate model NB: not implemented

Implements vtkStatisticsAlgorithm.

Definition at line 156 of file vtkKMeansStatistics.h.

virtual bool vtkKMeansStatistics::SetParameter ( const char *  parameter,
int  index,
vtkVariant  value 
) [virtual]

A convenience method for setting properties by name.

Reimplemented from vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::Learn ( vtkTable ,
vtkTable ,
vtkMultiBlockDataSet  
) [protected, virtual]

Execute the calculations required by the Learn option.

Implements vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::Derive ( vtkMultiBlockDataSet ) [protected, virtual]

Execute the calculations required by the Derive option.

Implements vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::Assess ( vtkTable ,
vtkMultiBlockDataSet ,
vtkTable  
) [protected, virtual]

Execute the calculations required by the Assess option.

Implements vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::Test ( vtkTable ,
vtkMultiBlockDataSet ,
vtkTable  
) [inline, protected, virtual]

Execute the calculations required by the Test option.

Implements vtkStatisticsAlgorithm.

Definition at line 191 of file vtkKMeansStatistics.h.

virtual void vtkKMeansStatistics::SelectAssessFunctor ( vtkTable inData,
vtkDataObject inMeta,
vtkStringArray rowNames,
AssessFunctor *&  dfunc 
) [protected, virtual]

Provide the appropriate assessment functor.

Implements vtkStatisticsAlgorithm.

virtual void vtkKMeansStatistics::UpdateClusterCenters ( vtkTable newClusterElements,
vtkTable curClusterElements,
vtkIdTypeArray numMembershipChanges,
vtkIdTypeArray numElementsInCluster,
vtkDoubleArray error,
vtkIdTypeArray startRunID,
vtkIdTypeArray endRunID,
vtkIntArray computeRun 
) [protected, virtual]

Subroutine to update new cluster centers from the old centers. Called from within Learn (and will be overridden by vtkPKMeansStatistics to handle distributed datasets).

Reimplemented in vtkPKMeansStatistics.

virtual vtkIdType vtkKMeansStatistics::GetTotalNumberOfObservations ( vtkIdType  numObservations) [protected, virtual]

Subroutine to get the total number of observations. Called from within Learn (and will be overridden by vtkPKMeansStatistics to handle distributed datasets).

Reimplemented in vtkPKMeansStatistics.

int vtkKMeansStatistics::InitializeDataAndClusterCenters ( vtkTable inParameters,
vtkTable inData,
vtkTable dataElements,
vtkIdTypeArray numberOfClusters,
vtkTable curClusterElements,
vtkTable newClusterElements,
vtkIdTypeArray startRunID,
vtkIdTypeArray endRunID 
) [protected]

Subroutine to initalize the cluster centers using those provided by the user in input port LEARN_PARAMETERS. If no cluster centers are provided, the subroutine uses the first DefaultNumberOfClusters input data points as initial cluster centers. Called from within Learn.

virtual void vtkKMeansStatistics::CreateInitialClusterCenters ( vtkIdType  numToAllocate,
vtkIdTypeArray numberOfClusters,
vtkTable inData,
vtkTable curClusterElements,
vtkTable newClusterElements 
) [protected, virtual]

Subroutine to initialize cluster centerss if not provided by the user. Called from within Learn (and will be overridden by vtkPKMeansStatistics to handle distributed datasets).

Reimplemented in vtkPKMeansStatistics.


Member Data Documentation

This is the default number of clusters used when the user does not provide initial cluster centers.

Definition at line 253 of file vtkKMeansStatistics.h.

This is the name of the column that specifies the number of clusters in each run. This is only used if the user has not specified initial clusters.

Definition at line 257 of file vtkKMeansStatistics.h.

This is the maximum number of iterations allowed if the new cluster centers have not yet converged.

Definition at line 260 of file vtkKMeansStatistics.h.

This is the percentage of data elements that swap cluster IDs

Definition at line 262 of file vtkKMeansStatistics.h.

This is the Distance functor. The default is Euclidean distance, however this can be overridden.

Definition at line 265 of file vtkKMeansStatistics.h.


The documentation for this class was generated from the following file: