VTK/Parallel Pipeline: Difference between revisions
Line 49: | Line 49: | ||
} | } | ||
</source> | </source> | ||
A serial reader should not set this key. This case is described in more detail below. | |||
=== RequestUpdateExtent === | === RequestUpdateExtent === | ||
This pass is where consumers can request a specific subset of the available data from upstream. Upstream filters can further modify this request to fit their needs. There are 3 specific keys that are used to implement data parallelism: | |||
* '''UPDATE_NUMBER_OF_PIECES''': This key together with UPDATE_PIECE_NUMBER controls how data should be partitioned by the data source. It is usually set equal to the number of MPI ranks in the current MPI group. | |||
* '''UPDATE_PIECE_NUMBER''': This key determines which partition should be loaded on the current process. It is usually set to the MPI rank of the current process. | |||
* '''UPDATE_NUMBER_OF_GHOST_LEVELS''': This key determines the number of ghost levels requested by a particular filter. Filters should usually increment the number of ghost levels requested by downstream by the number of ghost levels they need. | |||
These keys are usually set by a data consumer and possibly modified by filters upstream. Usually UPDATE_NUMBER_OF_GHOST_LEVELS is the only one modified by filters. It is also possible to set them manually as follows: | |||
<source lang="cpp"> | |||
afilter->UpdateInformation(); | |||
vtkInformation* outInfo = afilter->GetOutputInformation(0); | |||
outInfo->Set(vtkStreamingDemandDrivenPipeline::UPDATE_PIECE_NUMBER(), controller->GetLocalProcessId()); | |||
outInfo->Set(vtkStreamingDemandDrivenPipeline::UPDATE_NUMBER_OF_PIECES(), controller->GetNumberOfProcesses()); | |||
outInfo->Set(vtkStreamingDemandDrivenPipeline::UPDATE_NUMBER_OF_GHOST_LEVELS(), 0); | |||
// or in short | |||
afilter->UpdateInformation(); | |||
vtkInformation* outInfo = afilter->GetOutputInformation(0); | |||
vtkStreamingDemandDrivenPipeline::SetUpdateExtent(controller->GetLocalProcessId(), controller->GetNumberOfProcesses(), 0); | |||
</source> | |||
=== RequestData === | === RequestData === |
Revision as of 20:36, 30 April 2014
Introduction
VTK uses a form of parallelism called data parallelism. In this form, the data is divided amongst the processes, and each process performs the same operation on its piece of data. Advantages of data parallelism include scalability, simplified load balancing, and reduced communications overhead.
Figure 1 shows how VTK filters can be setup when running in parallel. In this example, each reader reads in a piece of the data. Usually, this can be done with very little communication between processes. Then, each reader feeds into a pipeline identical with that on the other processes. Because many of the filters in VTK use algorithms that independently process each point or cell, these filters can run in parallel with little or no communication between them. Let us see a simple example of how that works.
Consider the simple, 2D grid that is partitioned into three pieces shown in Figure 2. Assume that the three pieces reside on separate processes of a distributed memory computer. Because no process has global information, communications costs will be minimized if each process performs its operation only on its local information. Of course, we can only do this if the end result of the parallel operation is equivalent to the same operation on the global data.
Figure 3 demonstrates a clip filter (vtkClipDataSet for example) processing data in parallel. Each process is given the same parameters for the cut plane. The cut plane is then applied independently on each piece of the data. When the pieces are brought back together, we see that the result is the entire data set properly cut by the plane.
Not all visualization algorithms can operate on pieces without information on neighboring cells. Consider the operation of extracting external faces as shown in Figure 4. The external face operation identifies all faces that have no local neighbors. When we bring the pieces together, we see that some internal faces have been incorrectly identified as being external. These false positives on the faces occur whenever two neighboring cells are placed in separate processes.
The external face operation in our example fails because some important global information is missing from the local processing. The processes need some data that is not local to them, but they do not need all the data. They only need to know about cells in other partitions that are neighbors to the local cells.
We can solve this local/global problem with the introduction of ghost cells. Ghost cells are cells that belong to one partition of the data and are duplicated on other partitions. The introduction of ghost cells is performed through neighborhood information and organized in levels. For a given partition, any cell neighboring a cell in the partition but not belonging to the partition itself is a ghost cell 1. Any cell neighboring a ghost cell at level 1 that does not belong to level 1 or the original partition is at level 2. Further levels are defined recursively. We define ghost cells in this way because it provides a simple distance metric to the cells of a partition and allows filters to easily specify the minimal or near minimal set of ghost cells required for proper operation.
Let us apply the use of ghost cells to our example of extracting external faces. Figure 5 shows the same partitions with a layer of ghost cells added. When the external face algorithm is run again, some faces are still inappropriately classified as external. However, all of these faces are attached to ghost cells. These ghost faces are easily culled, and the end result is the appropriate external faces.
VTK Pipeline Support for Data Parallelism
Demand-driven data parallelism is natively supported by VTK's execution mechanism. This is achieved by utilizing various pipeline passes and a specific set of meta-data and request objects. An introduction to VTK's pipeline can be found in the VTK User's Guide and this page (http://www.vtk.org/Wiki/VTK/Tutorials/New_Pipeline). Unless you are familiar with the VTK pipeline, we recommend taking a look at these documents before continuing with this one. Also note that certain keys pertaining to parallelism have changed since VTK 6.1 and are described in more detail here. The three main pipeline passes pertaining to data parallelism are RequestInformation, RequestUpdateExtent and RequestData:
RequestInformation
This is where data sources (e.g. readers) provide meta-data about their capabilities and what data they can produce. Filters downstream can also modify this meta-data when they can add/reduce capability or change what data can be made available downstream. This can be usually ignored with respect to data parallelism. The only exception is that readers that can produce data in a partitioned way need to notify the pipeline by providing the CAN_HANDLE_PIECE_REQUEST() key as follows:
<source lang="cpp"> int vtkSphereSource::RequestInformation(
vtkInformation *vtkNotUsed(request), vtkInformationVector **vtkNotUsed(inputVector), vtkInformationVector *outputVector)
{
// get the info object vtkInformation *outInfo = outputVector->GetInformationObject(0);
outInfo->Set(CAN_HANDLE_PIECE_REQUEST(), 1);
return 1;
} </source>
A serial reader should not set this key. This case is described in more detail below.
RequestUpdateExtent
This pass is where consumers can request a specific subset of the available data from upstream. Upstream filters can further modify this request to fit their needs. There are 3 specific keys that are used to implement data parallelism:
- UPDATE_NUMBER_OF_PIECES: This key together with UPDATE_PIECE_NUMBER controls how data should be partitioned by the data source. It is usually set equal to the number of MPI ranks in the current MPI group.
- UPDATE_PIECE_NUMBER: This key determines which partition should be loaded on the current process. It is usually set to the MPI rank of the current process.
- UPDATE_NUMBER_OF_GHOST_LEVELS: This key determines the number of ghost levels requested by a particular filter. Filters should usually increment the number of ghost levels requested by downstream by the number of ghost levels they need.
These keys are usually set by a data consumer and possibly modified by filters upstream. Usually UPDATE_NUMBER_OF_GHOST_LEVELS is the only one modified by filters. It is also possible to set them manually as follows:
<source lang="cpp">
afilter->UpdateInformation(); vtkInformation* outInfo = afilter->GetOutputInformation(0); outInfo->Set(vtkStreamingDemandDrivenPipeline::UPDATE_PIECE_NUMBER(), controller->GetLocalProcessId()); outInfo->Set(vtkStreamingDemandDrivenPipeline::UPDATE_NUMBER_OF_PIECES(), controller->GetNumberOfProcesses()); outInfo->Set(vtkStreamingDemandDrivenPipeline::UPDATE_NUMBER_OF_GHOST_LEVELS(), 0);
// or in short
afilter->UpdateInformation(); vtkInformation* outInfo = afilter->GetOutputInformation(0); vtkStreamingDemandDrivenPipeline::SetUpdateExtent(controller->GetLocalProcessId(), controller->GetNumberOfProcesses(), 0); </source>