Proposals:GridComputing: Difference between revisions
Line 45: | Line 45: | ||
## '''Resampling''' Resampling is the process of taking one set of data sampled on a discrete lattice and interpolating the values of that data on another discrete lattice. Resampling is required in many areas of image analysis because most transformations that are applied to a data set will move the data points off the original pixel lattice and these new points will consequently need to be placed back on a new lattice in order to be displayed as an image. The following set of tasks will be performed: | ## '''Resampling''' Resampling is the process of taking one set of data sampled on a discrete lattice and interpolating the values of that data on another discrete lattice. Resampling is required in many areas of image analysis because most transformations that are applied to a data set will move the data points off the original pixel lattice and these new points will consequently need to be placed back on a new lattice in order to be displayed as an image. The following set of tasks will be performed: | ||
### Needs analysis of the current state of resampling in ITK | ### Needs analysis of the current state of resampling in ITK | ||
### Profiling of the current ITK algorithm to find areas which are relatively fast, those that are slow and can be optimized, and any true performance bottlenecks. | ### Profiling of the current ITK algorithm to find areas which are relatively fast, those that are slow and can be optimized, and any true performance bottlenecks. | ||
## '''Mattes Mutal Information Rigid Registration''' | ### Testing of both thread scaling performance of current algorithms (with multiple threads and different hardware platforms) on 1-16 CPU SMPs. | ||
### Development and implementation of a resampling algorithm for complex number data-types currently not supported by ITK, required for interpolation of Fourier transforms. | |||
### Thread-level parallelization with the resampling algorithms | |||
### CPU specific cache size, pipeline scheduling and instruction set optimizations for different hardware platforms. | |||
## '''Mattes Mutal Information Rigid Registration''' The Mattes MI rigid registration algorithm is a mutual-information based registration technique that uses a single spatial sample to calculate the marginal and joint probabilities at discrete points (or bins) uniformly spread within the dynamic range of the images. The following set of tasks will be performed: | |||
### Needs analysis of the current state of Mattes MI | |||
### Profiling of the current ITK algorithm to find areas which are relatively fast, those that are slow and can be optimized, and any true performance bottlenecks. | |||
### Testing of the sensitivity of this algorithm to initial conditions in the placement of data sets. | |||
### Comparison of the performance of Mattes MI with respect to other algorithms (such as Wells Viola MI) | |||
### Thread-level parallelization of the Mattes MI algorithm | |||
### Cpu-specific optimization of the Mattes MI algorithm for multiple architectures | |||
# Fourier Transform in ITK | # Fourier Transform in ITK | ||
# Grid protocols and Batch-level processing with ITK | # Grid protocols and Batch-level processing with ITK |
Revision as of 19:04, 21 August 2005
Development and extension of ITK to be applied to grid computing
For more information contact: Simon Warfield, Steve Pieper, Ron Kikinis
There is an increasing trend to deploy grid computing infrastructures to support computations on extremely large datasets like those found in the Visible Human Project. We therefore believe that it is important for critical aspects of the architecture of ITK revisited and refined to support the emerging standards in the grid computing community and to develop example applications to demonstrate the power of the ITK/grid combination in real-world research computing scenarios.
Task 1: Survey of existing grid infrastructure software including the Globus Toolkit and associated middleware. Needs analysis of the bottlenecks in the current ITK as applied to collaborator projects within the SPL. Creation of initial ITK on grid computing examples using task-level parallelization of large population data sets (n > 1000).
Other examples of task level parallelism include parametric search in which an algorithm is applied to a single or small number of data sets many times (n > 1000) with different parameter settings.
Task 2: Focus on optimizing the current ITK inner-loops and threading model for CPU architectures and compilers deployed as compute grid nodes. Work with computer science collaborators to identify bottlenecks that can benefit from machine-level code optimizations to accommodate issues such as cache size, special instruction sets, and memory bandwidth hierarchies in the context of a heterogeneous compute grid environment.
Cache size varies from CPU model to CPU model and can have a tremendous impact upon performance. Also, utilization of vector instruction sets such as SSE, and scheduling and branch prediction must be carefully optimized for the specific CPU model, instruction cache size, data cache size and scheduling pipeline for best performance. Compilers, such as GCC or the Intel compiler, can generate optimized code that outperforms hand-tuned code. Our intention is to investigate the potential for performance gains on certain platforms through the generation of cache-optimized, instruction-set optimized, scheduler optimized binaries. We will also test the accuracy and precision of highly optimized code to ensure correct results are obtained.
We also want to optimize and accelerate specific algorithms, such as STAPLE. We think that at this stage, performance boosts can come from optimized thread parallel implementations of key algorithms and inner loops.
Task 3: Integrate message passing parallelism using the globus compatible MPI implementations (mpich-g2 or its successors) first to distribute ITK processing pipelines to among grid processing nodes and second to break individual algorithms across nodes.
The grid protocols have been undergoing tremendous change recently. There is no clear preferred protocol for distributing ITK processing pipelines among grid processing nodes at this time. Scheduling and access across Teragrid nodes has also been evolving. Some Teragrid sites are using tools such as PBS for managing jobs. We will focus our efforts on utilizing the protocols which are most stable and provide for the most significant benefits for the effort invested.
Task 4: Ongoing support of ITK grid tools at the task, threading, and message passing layers. Review of any newly developed grid middleware emerging during the project period and rework of ITK architecture to accommodate same.
In our initial effort we used Image Guided Neurosurgery as the clinical application driver, and optimized algorithms and performance especially suited for this application. This application requirements range from the need for highly optimized special instruction set inner loops and thread parallel execution for specific algorithms, such as intrasubject registration, to task level parallelism inherent in the statistical atlas construction that underlies robust segmentation methods. This also leverages our NSF funded ITR 0426558 A Novel Grid Architecture Integrating Real-Time Data and Intervention During Image Guided Therapy which provides active collaboration with Teragrid sites and access to expertise in the evolving grid protocols. We intend to maintain these application specific optimizations as ITK evolves, while adding new application drivers that ensure the changes we make are relevant and will have significant clinical impact.
Compute platforms
We would like to get feedback about the compute platforms used in the ITK community. In particular, we would like to know what effort on which hardware platforms would have the largest impact on the community.
CPU model family, user count:
Intel : 1 AMD : 0 SPARC : 0
Specific Tasks
- Thread-level parallelization of three key algorithms
- STAPLE STAPLE is an expectation-maximization algorithm for the simultaneous truth and performance level estimation of a collection of segmentations via the computation of a probabilistic estimate of the true segmentation and a measure of the performance level represented by each segmentation. The source of each segmentation in the collection may be an appropriately trained human rater or raters, or may be an automated segmentation algorithm. The probabilistic estimate of the true segmentation is formed by estimating an optimal combination of the segmentations, weighting each segmentation depending upon the estimated performance level, and incorporating a prior model for the spatial distribution of structures being segmented as well as spatial homogeneity constraints. The following tasks will be performed:
- Needs analysis of the current state of STAPLE in ITK. The current implementation is from the algorithm published by Warfield, Zou and Wells in MICCAI 2002 which is limited to binary labels. We will extend the implementation to allow for multi-category labels as described in (Warfield, Zou, Wells IEEE TMI 2004).
- Profiling of the current STAPLE algorithm to find areas which are relatively fast, those that are slow and can be optimized, and any true performance bottlenecks.
- Evaluation of thread scaling performance as a function of number of CPUs (1 to 16 CPUs on SMP machines).
- Evaluation of impact of different hardware platform on performance.
- Thread-level parallelization within STAPLE.
- CPU specific cache size, pipeline scheduling and instruction set optimizations for different hardware platforms.
- Resampling Resampling is the process of taking one set of data sampled on a discrete lattice and interpolating the values of that data on another discrete lattice. Resampling is required in many areas of image analysis because most transformations that are applied to a data set will move the data points off the original pixel lattice and these new points will consequently need to be placed back on a new lattice in order to be displayed as an image. The following set of tasks will be performed:
- Needs analysis of the current state of resampling in ITK
- Profiling of the current ITK algorithm to find areas which are relatively fast, those that are slow and can be optimized, and any true performance bottlenecks.
- Testing of both thread scaling performance of current algorithms (with multiple threads and different hardware platforms) on 1-16 CPU SMPs.
- Development and implementation of a resampling algorithm for complex number data-types currently not supported by ITK, required for interpolation of Fourier transforms.
- Thread-level parallelization with the resampling algorithms
- CPU specific cache size, pipeline scheduling and instruction set optimizations for different hardware platforms.
- STAPLE STAPLE is an expectation-maximization algorithm for the simultaneous truth and performance level estimation of a collection of segmentations via the computation of a probabilistic estimate of the true segmentation and a measure of the performance level represented by each segmentation. The source of each segmentation in the collection may be an appropriately trained human rater or raters, or may be an automated segmentation algorithm. The probabilistic estimate of the true segmentation is formed by estimating an optimal combination of the segmentations, weighting each segmentation depending upon the estimated performance level, and incorporating a prior model for the spatial distribution of structures being segmented as well as spatial homogeneity constraints. The following tasks will be performed:
- Mattes Mutal Information Rigid Registration The Mattes MI rigid registration algorithm is a mutual-information based registration technique that uses a single spatial sample to calculate the marginal and joint probabilities at discrete points (or bins) uniformly spread within the dynamic range of the images. The following set of tasks will be performed:
- Needs analysis of the current state of Mattes MI
- Profiling of the current ITK algorithm to find areas which are relatively fast, those that are slow and can be optimized, and any true performance bottlenecks.
- Testing of the sensitivity of this algorithm to initial conditions in the placement of data sets.
- Comparison of the performance of Mattes MI with respect to other algorithms (such as Wells Viola MI)
- Thread-level parallelization of the Mattes MI algorithm
- Cpu-specific optimization of the Mattes MI algorithm for multiple architectures
- Mattes Mutal Information Rigid Registration The Mattes MI rigid registration algorithm is a mutual-information based registration technique that uses a single spatial sample to calculate the marginal and joint probabilities at discrete points (or bins) uniformly spread within the dynamic range of the images. The following set of tasks will be performed:
- Fourier Transform in ITK
- Grid protocols and Batch-level processing with ITK