VTK/Unwrappable Code: Difference between revisions

From KitwarePublic
< VTK
Jump to navigationJump to search
No edit summary
 
Line 31: Line 31:
=== Universal character names in identifiers ===
=== Universal character names in identifiers ===


In C++11, universal character names \uXXXX and \UXXXXXXXX can be used in place of non-ASCII characters.  The wrapper preprocessor only allows these in string literals and character literals, but not in identifiers.  The wrappers do, however, allow you to use utf-8 encoding for identifiers (and in strings, characters, and comments).
In C++11, universal character names \uXXXX and \UXXXXXXXX can be used in place of non-ASCII characters.  The wrapper preprocessor only allows these in string literals and character literals, but not in identifiers.  However, the wrappers allow you to use utf-8 encoding for identifiers (and for strings, characters, and comments).


   // this is fine
   // this is fine
Line 60: Line 60:
=== Ambiguous angle brackets ===
=== Ambiguous angle brackets ===


The following code to break the wrappers.  The breakage occurs when angle brackets occur in the RHS of an assignment, unless the assignment is taking place within a function body (the parser ignores all function bodies, because they are part of the "implementation" rather than part of the "interface".).
The wrappers fail when angle brackets occur in the RHS of an assignment, unless the assignment is taking place within a function body (the parser ignores all function bodies, because they are part of the "implementation" rather than part of the "interface".).


  // this looks totally natural, and is valid C++ code
  // this looks totally natural, and is valid C++ code

Latest revision as of 02:28, 18 October 2015

The VTK Tcl, Java, and Python wrappers use a custom parser to read the VTK C++ header files. This parser consists of the following pieces:

  • a C++ preprocessor
  • a lex/yacc C++ parser (a GNU bison GLR parser)
  • a set of data structures for describing a C++ API

As of this writing, the above are based on the C++11 grammar and are being updated for C++14 and C++17.

Syntax that the wrapper's cannot parse

The parser was written based on the ISO draft standards for C++98, C++11, and C++14. However, there are specific parts of the C++ grammar that were not implemented. These are described below.

Backslash line continuation in odd places

According to the C++ standard, any backslash that occurs at the end of a line (unless it occurs within a raw string) is meant to indicate that the following newline should be ignored. The wrapper preprocessor, however, does not allow a backslash to be used within any token except for a string literal.

This code will work:

#define mymacro(x) \
  (2*(x))

const char *s = "this is a long\
 string broken in two.";

This code will not work:

class MyClassHasAVeryLongNameSo\
   IBrokeItWithABackslash;

const int i = 'A\
  ';

Universal character names in identifiers

In C++11, universal character names \uXXXX and \UXXXXXXXX can be used in place of non-ASCII characters. The wrapper preprocessor only allows these in string literals and character literals, but not in identifiers. However, the wrappers allow you to use utf-8 encoding for identifiers (and for strings, characters, and comments).

 // this is fine
 const char16_t *s = u"Hello\u00A0There";
 
 // this will break things
 const char *encyclop\u00C6dia = "Britannica";

Ambiguous member variable definition

C++ has an ambiguous grammar. One of the most common sources of ambiguity is that a name will sometimes be identified as a type, and sometimes as a function or variable name, depending on context.

struct x {
  typedef int z;

  // this kinda looks like a constructor
  x(z);
 
  // so would you believe that this defines a variable y of type z?
  z(y);

  // it does, because it is equivalent to writing this!
  z y;
};

The wrapper's parser does not distinguish type names from other names within its grammar rules, therefore it cannot disambiguate between the constructor declaration at the top and the funny-looking variable declaration in the middle. It will try to interpret both as constructor declarations.

Ambiguous angle brackets

The wrappers fail when angle brackets occur in the RHS of an assignment, unless the assignment is taking place within a function body (the parser ignores all function bodies, because they are part of the "implementation" rather than part of the "interface".).

// this looks totally natural, and is valid C++ code
const T unity = static_cast<T>(1.0);

// whitespace makes it look a bit less natural
const T unity = static_cast < T > (1.0);

The difficulty is that our parser thinks that the angle brackets might be less-than and greater-than operators, because it doesn't know that T names a type and not a constant. So it conks out after reporting "syntax is ambiguous". It is possible to disambiguate by writing the code as follows, which causes the parser to take a different path:

// this causes to parse to succeed
const T unity = (static_cast<T>(1.0));