VTK/Wrapping Special Types

From KitwarePublic
< VTK
Jump to navigationJump to search

This project began on April 28, 2010 and arose from a desire to improve the ability of the VTK wrappers to handle special VTK types, i.e. types that are not derived from vtkObjectBase. The primary goals are to wrap vtkVariant, and to fix the wrapping of vtkStdString since the current code will sometimes push an invalid "const char *" to the wrapper language. The wrapping of vtkVariant will be done via a general-purpose mechanism that can be easily extended to other special classes in VTK.

Background

There are some very useful classes in VTK that are not derived from vtkObjectBase and are therefore not wrapped. Some, like vtkTimeStamp, are trivial classes. Others, like vtkVariant, are more comprehensive. The vtkStdString isn't wrapped, but is coerced to "const char *" by the wrappers in a manner that can leave dangling references to temporary objects.

State of the code

The wrapper generator code is a mess of hexidecimal literals, static variables, and badly-named functions. It should be updated to VTK code standards, in both the front-end and the back-end. This process has been started by the addition of vtkParseType.h, which defines useful constants for the wrapper code.

The vtkStdString issue

When vtkStdString was introduced to VTK, support for it was added to the wrappers in the following manner:

If a VTK method returned a vtkStdString or vtkStdString&, it would be stored in a "const char *" variable. Because type conversion from "string" to "char *" is automatic, this required minimal changes to the wrappers. Code that originally handled "const char *" would now handle strings, as well.

This scheme works fine for methods that return "vtkStdString&" because the string is stored somewhere and is guaranteed to be around at least until the contents of the "char *" are copied or otherwise used by the wrapper language. However, methods returning "vtkStdString" are problematic, because they create a temporary vtkStdString object which the wrappers then get a "char *" from. As soon as the "char *" is acquired, the compiler is free to throw the temporary away, even before the wrappers have a chance to utilize the "char *".

The fix should be straightforward. When a method returns a vtkStdString, the wrappers should store it in a vtkStdString variable that remains valid until the wrappers have copied the contents.

Changes Needed

New parser types for vtkStdString and vtkUnicodeString

These two types must be handled transparently by the wrapper languages: we want vtkStdString to be automatically converted the the wrapper language's native string type, as opposed to defining a special "vtkStdString" type in the wrapper language. In other words, we need to add two slots to the list of parse types. Unfortunately, the way that the parser types are enumerated (i.e. as hexidecimal digits), there are only 15 slots available, and 14 of those slots are used.

The parser internally uses 32-bit ints for types. The first 16 bits are used for the array count, e.g. "float arg[count]". The last 16 bits are used to store four hexadecimal digits that describe the following:

  • 1st digit: const (0x1) or static (0x2) or function pointer (0x5)
  • 2nd digit: reference (0x1) or pointer (0x3) and variations thereof up to 0x9
  • 3rd digit: unsigned (0x1)
  • 4th digit: base type (0x1 through 0xE)

This is poor usage of the available bits. If we squash the bitfield to remove bits that are always zero, then a full 8 bits or two hexadecimal digits can be reserved for the base type.

I have added a vtkParseType.h header file that defines the bitfields for the types, and have modified vtkParse so that it uses this header file. So it is now possible to change the type bitfields simply by changing vtkParseType.h.

Once this has been done, vtkStdString and vtkUnicodeString will be visible to vtkWrapPython.c and the other wrapper back-ends. The back-ends will have to be modified to convert back and forth between these types and the native string types of the wrapper languages.

Question: do we also want to wrap vtkstd::string and std::string?

Marking special classes in CMake

Unlike vtkStdString, we want vtkVariant to appear in the wrapper languages as a type called "vtkVariant" that has the same methods as the C++ class. In other words, we want it to be wrapped is a similar manner (but with some distinct differences) to the vtkObjectBase-derived classes.

The differences between wrapping "special classes" like vtkVariant vs. vtkObjectBase-derived classes are:

  • special classes must not be added to the vtkInstantiators
  • special classes must always be handled with copy semantics in the wrappers, since they don't have reference counts
  • the wrapper languages will probably not handle polymorphism for special classes

Because of these differences, there must be a way of marking special classes within CMake. Right now, their header files are simply marked as "WRAP_EXCLUDE". They must instead be marked as "WRAP_SPECIAL" so that they can be kept out of the instantiators, while being sent to whatever language wrappers are able to handle them. It is likely that support for special classes will only be added to the python wrappers (unless there are volunteers for the other wrapper languages).

It will probably be best if setting WRAP_SPECIAL implicitly sets WRAP_EXCLUDE, and then wrapper generators that know they are smart enough to wrap special types will ignore the WRAP_EXCLUDE if WRAP_SPECIAL is also set. By doing this, we can avoid breaking backwards compatibility with any third-party wrapper generators out there that are unable to wrap special types.

Python Specifics

PyVTKSpecialObjectType

The python wrappers already have a PyVTKSpecialObjectType that was originally developed for this purpose several years ago. However, it was unused because it was decided that it would be more expedient to only wrap vtkObjectBase-derived types.

To properly use this new "special object" type, vtkPythonUtil.cxx needs a new map that can be used to store information about these special types. This will map the class name to a PyVTKSpecialTypeInfo struct, which will contain a pointer to a table of all the methods for the type, along with other important information such as the docstring.

For now, at least, there will be no polymorphism, i.e. the PyVTKSpecialTypeInfo struct will not provide info about the superclass. This is something that could be added in the future.

Each PyVTKSpecialObject will contain a pointer to its own copy of the underlying C++ object. That is, if Python ever receives a vtkVariant as a return value from a VTK method, then it will make a PyVTKSpecialObject for that vtkVariant, and then will use the copy constructor to make its own copy. The use of copy semantics will eliminate the need for a garbage collection scheme.

Resolving arguments and calling the correct signature

Currently, if there are multiple signatures for a particular VTK object method, then when a method is called the wrapper will try each signature until one of them is able to process the arguments. So if one method signature takes "float" and another method signature takes "int", then the one that is defined first is called regardless of whether the passed argument is "float" or "int". This behaviour is completely different from C++, where the compiler works hard to resolve ambiguities between method signatures.

This is a particularly bad situation for vtkVariant, which has a multitude of constructors that take various argument types. After the default constructor, the first constructor defined is "vtkVariant(char c)" which will accept an int or a float with silent conversion. In other words, constructing a float or an int vtkVariant is impossible.

To fix this problem, the python wrappers will have to compare the passed arguments against the available method signatures in order to optimally match the latter to the former. This way, the best method will be called instead of the "first match". There will still be some situations where two or more signatures are equally good matches for the arguments. In this case, we say that the caller is "ambiguous" about which signature they want, and there are two ways to resolve this ambiguity:

  1. ignore the ambiguity and just choose one of the methods
  2. raise a python exception

I prefer the second option. Ambiguities are a bad thing, and they should be flagged. If PyNone is passed as an argument to an overloaded method, it can easily cause ambiguity because it converts equally well to "vtkObject *", "bool", and "void *". This is, in fact, a better situation than C++ where a literal zero converts equally well to any pointer or numeric type.

Proper "bool" support

Python has had a native "bool" type since python 2.3, but the wrappers do not yet distinguish between bool and int. This unfortunately makes it impossible to construct a bool-valued vtkVariant in python. Because PyArg_ParseTuple doesn't provide a format character for bool, a fair bit of work is needed to check boolean arguments when resolving overloaded methods.

Tricky issues

How will wrappers know vtkObjectBase-derived args from non-vtkObjectBase args?

In order for vtkVariant and its ilk to be wrapped, the BTX/ETX will have to be removed from methods in classes that use vtkVariant. But if only the python wrappers properly support vtkVariant, what will happen if someone calls these methods in Tcl or Java? Well, right now the Java wrappers won't even compile, because they will try to instantiate the vtkVariant in Java when the vtkVariant type doesn't exist in Java. For the Tcl wrappers, the code will compile, but will segfault if a method attempts to construct a vtkVariant.

There is a heuristic that can be used by the wrapper generators to distinguish vtkObjectBase-derived classes from special VTK types:

  • vtkObjectBase-derived objects are always handled via pointers, this is enforced
  • vtkVariant and other "special types" are rarely handled via pointers

So "vtkObj *" can be assumed to be a vtkObjectBase, while "vtkObj" or "vtkObj&" can be assumed to be something else. Sounds simple, right? As long as these assumptions hold, we are scot-free. But how safe are these assumptions? If they are not safe, what kind of checks can be done, preferably at code-generation time or compile-time? Or, if the checks have to be done at run-time, how efficient would it be for Tcl to compare an "unknown" object name against all the known vtkObjectBase-derived class names?

Status

30 April 2010

For python, vtkVariant and vtkTimeStamp are both automatically wrapped. There is an issue with constructors for vtkVariant that still needs to be fixed. The wrapper code tries out the various constructor signatures in order until one is able to use the constructor arguments. Because the "char" constructor is tried first, any attempt to make a "double" or "int" variant results in the creation of a "char" variant instead. Creating a variant from a string or a vtkObject works fine, though. Storing variants in a vtkVariantArray also works.

A problem was found with the parser: the leading "~" of the destructor signature is thrown away by the parser, so the destructor isn't distinguishable from a constructor.

5 May 2010

No vtkStdString yet, but the special type support for vtkVariant is done:

  • Special-type constructors can be used to resolve arguments, e.g. vtkVariantArray.InsertNextItem(vtkVariant value) can take any value that is accepted as a constructor argument for vtkVariant.
  • Overload resolution is done, so the correct methods are called now. Ambiguities cause a python Type exception.
  • The destructor is properly identified and is never wrapped.
  • Bool is now handled properly.

There are two improvements that could be done for special-type support, neither of them is easy:

  • Operator wrapping
  • Special python-only methods, e.g. a GetContents() method for vtkVariant

6 May 2010

Finally some motion on the string front:

  • Expand the VTK_PARSE_BASE_TYPE bitfield from 4 bits to 8 bits
  • Add vtkStdString and vtkUnicodeString as fundamental types in vtkParse
  • Add vtkStdString and vtkUnicodeString support to python wrappers
  • Add vtkStdString support to tcl and java wrappers

In vtkWrapTcl.c, I don't check if argv[i] is NULL before doing tmpstring = argv[i]. I will have to check to see if a check is necessary, and if so, how to respond.

Side-fix to vtkParse:

  • allow struct forward declarations in vtkParse, i.e. "struct vtkHelperStruct;" won't confuse the parser anymore, it will be ignored in exactly the same way as "class vtkHelperClass;"

Changes to vtkVariant:

  • Add constructor vtkVariant(const vtkVariant &v, unsigned int type)
  • From python, it's possible to do this: v = vtk.vtkVariant(255, vtk.VTK_UNSIGNED_CHAR)

10 May 2010

Enhancements to Python wrappers:

  • Add support for bool array arguments
  • Add support for comparison operators < <= == != > >=, special-cased for vtkTimeStamp and vtkVariant for now
  • Add hash functions for vtkTimeStamp and vtkVariant, so they can be used as map keys

Split monster-file vtkPythonUtil.cxx into smaller files:

  • PyVTKObject.h, .cxx
  • PyVTKClass.h, .cxx
  • PyVTKSpecialObject.h, .cxx
  • vtkPythonCommand.h, .cxx
  • vtkPythonUtil.h, .cxx

Also, now vtkPythonUtil is a proper class, not just a mess of functions. I intend to move all of these files into a separate library called libvtkPython.so that resides in Wrapping/Python.

Enhancements for vtkVariant wrapping:

  • Changed ToInt(bool *valid) and friends to ToInt(bool valid[1]) to allow wrapping, now "valid" is returned in the supplied list or array.
  • Add vtkVariantCreate(), vtkVariantExtract(), vtkVariantCast() functions to vtk module

Tested on Windows. A few fixes were needed due to linker issues in order to get it to compile and run.

12 May 2010

The original set of goals has been completed with this update.

Reorganization:

  • Moved vtkPythonUtil to Wrapping/Python along with PyVTKObject et al.
  • Added new library "vtkPythonCore" for vtkPythonUtil
  • Tested install of header files and libraries

Documentation:

  • Documented PyVTKClass.cxx, PyVTKObject.cxx, PyVTKSpecialObject.cxx
  • Updated Python README.txt and README_WRAP.txt

More vtkVariant support:

  • Added vtkVariantStrictWeakOrder(), vtkVariantStrictEquality()

Fixes:

  • Compiler warnings fixed
  • Destructor was being mistaken as a constructor

26 May 2010

A follow-up project: enhancing the lex/yacc parser. The following items have been completed:

  • a more functional C++ parser, capable of reading all VTK class header files without generating a syntax error
  • operator methods are now stored in the FunctionInfo struct, with names like "operator="
  • fixes to the way that signatures and IDs are generated, now even if vtkParse assigned VTK_PARSE_UNKNOWN as the class type, the ClassName will contain the full and correct name of the type.
  • parsing of typedefs, scoped names, enums, etc... eventually these will be stored in the FileInfo struct

The main goal of the above changes was to make it possible to remove all //BTX //ETX markers from the header files, and this goal has nearly been achieved.

16 Jun 2010

Code merged into VTK master.

Future Work: Things Not Yet Supported

This list was written on June 21, 2010. Support for some of these items may have been added since then.

More string support

  • Support "std::string" like vtkStdString - could be added easily to vtkParse
  • vtkUnicodeString in Tcl - Fairly easy
  • vtkUnicodeString in Java - probably very easy

Special types in Tcl and Java

  • Difficulty level probably similar to Python

Proper detection of special types in method arguments

The wrappers guess that "vtkSomething *" is a vtkObject and that "vtkSomething &" or "vtkSomething" is a special type.

Unfortunately, special types are often accessed via pointers. The solution is:

  • make a "pre-wrapper" called vtkWrapHierarchy that will go through the header files and print the hierarchy to a file
  • add a utility method for vtkParse to get the class hierarchy, so that the wrappers will know what classes are vtkObjects
  • once this is done, all BTX/ETX marks can be removed from the code
  • also, WRAP_SPECIAL can be eliminated because the wrappers will know what objects aren't vtkObjects.

The pre-wrappers should also:

  • make a list of what header files contain what classes.
  • this will allow even 'helper classes" that don't have their own header to be accessed by wrappers.

Improvements to vtkParse

The parser needs several enhancements:

  • Recognize std::string as a string type (easy)
  • Save operator signatures in FunctionInfo (easy)
  • Recognize default argument values
  • Save enum constants with the FileInfo
  • Save #define constants with the FileInfo
  • Parsing of simple templates
  • General improvements to reduce need for BTX/ETX

Improvements to Python wrappers

  • Allow hierarchies of special types
  • More operator support than just "< <= == != > >="
  • Wrapping of templated types - will always be limited to selected types, but can still be very useful
  • Wrapping of pointers args - requires a better hinting system e.g
  • Wrapping of reference args for returning values - would be easy

Hierarchies of special types in Python wrappers

If each special type had its own PyTypeObject struct (to be generated by vtkWrapPython.c) then:

  • Types could have a hierarchy via python's subclass system
  • Type-specific protocols (number, sequence, buffer, etc) could be supported, this would require proper parsing of operators

Templated type handling in Python

Should be made to look similar to numpy, e.g. vtkValue(1, 'f') would create a vtkValue<float>. To python, the templated type would look like a variadic type. It would be necessary to change vtkParse so that it recognized templates.

Pointer arg wrapping

The "count" for pointer args should be hinted so that they can be properly wrapped. E.g.

  1. vtkVariant::ToInt(bool *vtkSingleValue(valid))
  2. vtkVariant::ToInt(bool *vtkOptionalSingleValue(valid)) - can be safely set to NULL
  3. vtkDataArray::SetTuple(double *vtkMultiValue(tuple, GetNumberOfComponents))

In the latter, the name of the method to get the count is supplied in the hint. Recognizing these macros in vtkParse would be easy. Unfortunately, they would make the header files ugly. An alternative is to extend the "hints" file so that it can hint arg counts.

Reference arg wrapping: &arg

This is trivial to add, only a few lines would have to be added to vtkWrapPython. For the reference arg, the user would have to pass a container object that supported both the sequence protocol and the number protocol, e.g. like a numpy array. For example, the user could make an array([0], 'f') and pass it, and after the call the result would be stored in the array.