VTK/Wrapper Update 2010

From KitwarePublic
< VTK
Jump to navigationJump to search

The new wrappers for VTK are not really "new", but they are a drastic clean-up of the original wrappers which were completed circa 1998. The wrapper renewal project is currently an open project, with four definite goals and a long wish list of desired new features. The four main goals of this project are: 1) Clean up the wrapper code by removing hard-coded hexadecimal constants and reducing the amount of code voodoo, 2) Properly wrap vtkStdString, because it is a crucial interface type, 3) Wrap vtkVariant in Python, especially for use in ParaView, and 4) Eliminate the need for BTX/ETX markers in the code.

The original author of this document is David Gobbi. He can be reached on the VTK developers mailing list.

Overview

The main design goals for VTK wrappers have not changed since 1998. The wrappers must be:

  1. Scalable to a very large number of classes in multiple directories.
  2. Able to wrap VTK classes as automatically as possible, with a minimal amount of hinting.
  3. Able to support multiple wrapper back-ends for different wrapper languages.

The core of the wrapper is a lex/yacc parser that reads C++ header files and stores information about the classes in C data structures that can be used by the wrapper-generator back-ends. This parser and its data structures are what have received the most attention during this wrapper update. Some important points about the parser (both the new and the old) are as follows:

  1. It only parses the header file in question, it does not pull in the included header files. Correction: it does, since VTK 5.10 (see below).
  2. It understands all (or nearly all) of the VTK macros defined in vtkSetGet.

These two points are important for the efficiency and simplicity of the parser. The parser does not have a C preprocessor, and it does not read more than one file at a time. Instead, it relies on its built-in knowledge of the VTK macros. The new parser front-end does have the ability to read and parse multiple header files, but this feature is only used by the vtkWrapHierarchy tool to generate a text file describing the VTK hierarchy, it is not used by the individual wrapper-generators. (Note: As of Aug 2010, the parser does have a preprocessor, and it does pull in the included header files, but it only searches those files for macro definitions, it does not parse them).

The four main items

The big cleanup

The first part of the cleanup was to remove all the hexadecimal constants like 0x303 from the files in the Wrapper directory, and replace them with named constants defined in a new header file called vtkParseType.h. This was a tedious job, but just by itself was enough to make the code much more readable. For example, VTK_PARSE_CHAR_PTR is obvious, while 0x303 is not.

The second stage of cleanup was to create a new vtkParseMain.c to hold a shared "main()" method for the wrapper-generators and to provide a better way of parsing command-line arguments. This new file gives the wrapper-generators the ability to receive "-I" arguments so that they can access all of the VTK include directories, though none of the VTK wrapper-generators utilize this feature yet. The coding style of vtkWrapPython.c was also improved, by using consistent method naming and by eliminating global variables.

The third, and most crucial, part of the cleanup was a reorganization of vtkParse.y and vtkParse.l, which hold the yacc and lex rules for the parser, as well as an update of the data structures in vtkParse.h that the class information is stored in. These files were close to incomprehensible, and though they are still very difficult code to read and understand, they are at least manageable now. Also, the hard-coded limitations in these files have been removed, and the data structures have been updated to capture the full richness of C++ types.

Note that many of these improvements to the parser have not yet been propagated to the wrapper-generators for Python, Tcl, and Java. For example, the new parser stores information about template types, multi-dimensional arrays, enums, preprocessor constants and constant variables, namespaces, typedefs, etcetera. It will still require a substantial programming effort to implement these features in the language wrappers.

Wrapping vtkStdString

The vtkStdString type was introduced in VTK 5.0, as a VTK-standard subclass of std::string. It was initially wrapped via the expedient of adding a "const char *" typecast operator to it so that the wrappers could simply treat vtkStdString return values as if they were "const char *". This trick unfortunately only works for methods that return "vtkStdString&", i.e. methods that return a reference to a persistent string. As a result, VTK methods that returned vtkStdString by value had to be surrounded by BTX/ETX because, if they were wrapped, they would return a temporary vtkStdString object to the wrappers, which would then grab a pointer to the internal "char *", which would immediately become invalid. This issue of having to BTX/ETX methods that return vtkStdString persisted from 2005 to 2010, with only a select few methods that returned "vtkStdString&" being properly wrapped. The original addition of vtkStdString to the wrappers is logged as follows:

ENH: Wrap vtkStringArray by adding vtkStdString as a special token and mapping 
it to "const char *" in the wrappers.  vtkStringArray::GetValue() was changed to 
return a reference because otherwise c_str() is called on a temporary 
vtkStdString object.
dgobbi (author) May 21, 2005

In the new wrappers, vtkStdString (and vtkstd::string and std::string) are recognized by the parser as a distinct type, rather than as "const char *", so now all VTK methods that use vtkStdString can be properly and safely wrapped. The parser also recognizes vtkUnicodeString, but only the Python wrappers handle this type. In the python wrappers, the vtkUnicodeString is synonymous with Python's unicode type, with automatic conversion between the two.

Wrapping vtkVariant

The vtkVariant type is a VTK type that can hold any of the types commonly used in VTK, such as the C++ numeric types, vtkObjects, vtkStdString, and vtkUnicodeString. It is, in other words, an interface to a union of these types. An increasing number of classes in VTK use it as an interface type, so there was a strong interest in wrapping it, particularly for use in ParaView's python scripting engine. The new wrappers make vtkVariant available in Python, but not in Java or Tcl (and not, as of yet, in ParaView's ClientServer wrapper).

Two approaches could have been taken for wrapping vtkVariant. The first approach would have been to make vtkVariant invisible from Python, i.e. methods taking vtkVariant arguments would automatically convert the given Python type into a vtkVariant, and methods returning vtkVariant would automatically convert the vtkVariant to a native Python type (or to a vtkObject). The second approach was to explicitly wrap vtkVariant and make it possible to construct and use vtkVariant objects within python. This latter approach was taken, because it makes Python VTK code much easier to compare with and convert to C++ VTK code.

One concession was made, however. The VTK/Python wrappers were modified to support automatic argument conversion via the vtkVariant constructors. So if a VTK method accepts a vtkVariant, then you can pass a numeric value, a string, a unicode string, or a vtkObject and the vtkVariant will be automatically constructed and passed as an argument. This kind of argument conversion is standard in C++, but not in Python, except for the VTK/Python wrappers.

The method used to wrap vtkVariant is generic, and can be applied to other special VTK types. Currently the special-wrapped types for VTK/Python are vtkVariant, vtkTimeStamp, vtkArrayCoordinates, vtkArrayExtents, and vtkArrayRange.

Also see the following project page: Wrapping special types (start Apr 28, 2010, finish Jun 18, 2010)

Eliminating BTX/ETX from VTK header files

There were two main uses for BTX/ETX in the VTK header files. The first use was to block off code that the VTK wrapper parser could not parse, since it did not understand all C++ syntax. The second use was to block off methods that, if they were wrapped, would cause the wrappers to either refuse to compile, or to compile and then segfault if the method was called. The new wrappers tackle both of these issues in order to make it possible to remove BTX/ETX from the code. The main feature of the new parser is that it is a full C++ parser, and is likely to only be confused by the use of unrecognized preprocessor macros (since the wrapper's parser lacks a true preprocessor).

The second issue, i.e. the problem of wrapped methods either not compiling or segfaulting when used, was due to the inability of the wrappers to properly recognize anything but basic C types and vtkIdType. When the wrappers saw a vtkSomething as an argument, they would always assume that this was a vtkObjectBase-derived type. There are only two ways for the wrappers to be able to figure out types: the first is to have them go through all included header files and look for class definitions and typedefs, and the second is for them to be given a list of types that they can consult.

This "list of types" is provided by the new vtkWrapHierarchy tool, which has its own project page here. The vtkWrapHierarchy tool reads all the VTK header files in one go, and spits out a file that lists all the classes, typedefs, and enums that are defined within the kit. This information is then pulled in by the wrappers, where the information is used to properly wrap method arguments and return types.

Wish list items

The four items listed above are the only items that are certain to be done for the VTK 6.0 release. There are several desired features, however, that might be added as resources allow:

A vtkWrapUtil.c to share between the wrapper-generators

There is a lot of code duplication between the wrappers with minor variations. This makes maintenance of the wrappers very difficult, particularly where external wrapper-generators like ParaView's vtkWrapClientServer are concerned. The utility file should contain the following shared utility functions:

  1. A check for whether a method is wrappable, given the types that a particular wrapper-generator supports. Right now there is a lot of very messy code in the wrapper-generators for excluding unwrappable methods.
  2. A check for excluding method overloads that use different C++ types that map to the same wrapper-language type, for example there are many VTK method overloads for "float" and "double", where only the "double" overload should be wrapped.
  3. A method for formatting comments for use by the wrappers. (done, see vtkWrapText.c)
  4. A method for generating temp variables for storing wrapped method arguments and return values. (done, see vtkWrap.c)

The existence of these utility functions would also help to force a certain level of consistency between the various wrapper languages.

Complete vtkParse.y and vtkParse.l

The new parser is able to parse virtually any C++ code, but there are a few bits of information that it does not yet store for use by the wrappers. The short list of definitions that are not stored for use by the wrappers is as follows:

  1. Nested classes
  2. Unions (done!)
  3. Anonymous structs and classes
  4. Typedefs that include a struct, class, or enum definition within the typedef statement
  5. Variables (done!)

None of these items are used as part of the VTK interface, which is why they are not yet supported. However, it would still be nice if vtkParse.y was complete in this regard.

A more serious problem with the parser was that it lacked a preprocessor. This has changed, and now vtkParse.l will pass all preprocessor directives to vtkParsePreprocess.c for evaluation. This means that #if conditionals are correctly processed, and #define statements add macros that the #if directives can use. Macros are not expanded, since they need to be passed along to the parser for use by the wrappers. The preprocessor has its own simple parser that can evaluate macro definitions and provide numerical (integer) results.

Support doxygen comments in the parser

The markers for doxygen comments are fairly simple: /**, //*, //!, //<, //@{ etcetera. Making the parser recognize these comments and attach them to objects that it parses would be a nice little project. It would have minimal initial effect since VTK has its own comment style, but it could pave the way to VTK eventually switching over to doxygen completely. The new Python wrappers already have the ability to recognize and deal with many doxygen text-formatting statements.

Generalized special type wrapping for Python

The technique that was used to wrap vtkVariant in python is general and extensible. It only requires that the wrapped types have a public copy constructor and assignment operator; these are needed because the wrappers always use pass-by-value for these types in order to avoid memory management issues.

A few highly desirable features for special-type wrapping are:

  1. support for hierarchies of special types (i.e. vtkValue, vtkDataValue)
  2. support for templated VTK types in python
  3. support for more operators, right now only comparison operators are wrapped
  4. support for handling pointers to these types

Also see this wiki page: Python wrapper enhancements

Wrapping of constants in Tcl and Java

The new parser provides a list of all constants defined in the header files. These constants are automatically wrapped for Python, but should be wrapped in all of the wrapper languages.

Unicode in Java, Tcl

Adding vtkUnicodeString support to the python wrappers was a simple feat that required only a few hours of work. It should be similarly easy to add support for Tcl and Java.

Cleanup of the Java and Tcl wrappers

The worst code-style offences in the vtkWrapJava.c and and vtkWrapTcl.c should be fixed: 1) elimination of global variables, 2) reduction of code duplication, 3) better method names and improved code documentation.

Unified wrapper for VTK and ITK?

It is uncertain whether adding recognition of ITK macros to the parser would be enough to make it able to parse ITK header files. It would also have to deal with VNL header files, and would have to recognize all the basic STL container types. It might, in fact, be necessary to make it recognize #include directives and parse through all included header files so that it can keep track of all typedefs and other relevant information. The parser does, however, already handle templates and typedef statements.

The back-end wrapper generators for Python and Tcl would need to be modified to utilize the template information and all the other ITK-relevant information that the parser provides.

The advantage to this approach is that it would be possible to wrap classes that utilize both ITK and VTK types in their interfaces. It could also make wrapper compilation for ITK much faster, and in general it would make it much easier to use ITK and VTK together in the wrapper languages.