This chapter primarily describes SWIG's internal organization and the process by which new target languages can be developed. It also provides details for debugging and extending existing target languages and functionality.
First, a brief word of warning---SWIG is continually evolving. The information in this chapter is mostly up to date, but changes are ongoing. Expect a few inconsistencies. Also, this chapter is not meant to be a hand-holding tutorial. As a starting point, you should probably look at one of SWIG's existing modules.
In order to extend SWIG, it is useful to have the following background:
Since SWIG is essentially a specialized C++ compiler, it may be useful to have some prior experience with compiler design (perhaps even a compilers course) to better understand certain parts of the system. A number of books will also be useful. For example, "The C Programming Language" by Kernighan and Ritchie (a.k.a, "K&R") and the C++ standard, "ISO/IEC 14882 Programming Languages - C++" will be of great use.
Also, it is useful to keep in mind that SWIG primarily operates as an extension of the C++ type system. At first glance, this might not be obvious, but almost all SWIG directives as well as the low-level generation of wrapper code are driven by C++ datatypes.
SWIG is a special purpose compiler that parses C++ declarations to generate wrapper code. To make this conversion possible, SWIG makes three fundamental extensions to the C++ language:
It is important to emphasize that virtually all SWIG features reduce to one of these three fundamental concepts. The type system and pattern matching rules also play a critical role in making the system work. For example, both typemaps and declaration annotation are based on pattern matching and interact heavily with the underlying type system.
When you run SWIG on an interface, processing is handled in stages by a series of system components:
The next few sections briefly describe some of these stages.
The preprocessor plays a critical role in the SWIG implementation. This is because a lot of SWIG's processing and internal configuration is managed not by code written in C, but by configuration files in the SWIG library. In fact, when you run SWIG, parsing starts with a small interface file like this (note: this explains the cryptic error messages that new users sometimes get when SWIG is misconfigured or installed incorrectly):
%include "swig.swg" // Global SWIG configuration %include "langconfig.swg" // Language specific configuration %include "yourinterface.i" // Your interface file
The swig.swg file contains global configuration information. In addition, this file defines many of SWIG's standard directives as macros. For instance, part of of swig.swg looks like this:
...
/* Code insertion directives such as %wrapper %{ ... %} */
#define %begin       %insert("begin")
#define %runtime     %insert("runtime")
#define %header      %insert("header")
#define %wrapper     %insert("wrapper")
#define %init        %insert("init")
/* Access control directives */
#define %immutable   %feature("immutable", "1")
#define %mutable     %feature("immutable")
/* Directives for callback functions */
#define %callback(x) %feature("callback") `x`;
#define %nocallback  %feature("callback");
/* %ignore directive */
#define %ignore         %rename($ignore)
#define %ignorewarn(x)  %rename("$ignore:" x)
...
The fact that most of the standard SWIG directives are macros is intended to simplify the implementation of the internals. For instance, rather than having to support dozens of special directives, it is easier to have a few basic primitives such as %feature or %insert.
The langconfig.swg file is supplied by the target language. This file contains language-specific configuration information. More often than not, this file provides run-time wrapper support code (e.g., the type-checker) as well as a collection of typemaps that define the default wrapping behavior. Note: the name of this file depends on the target language and is usually something like python.swg or perl5.swg.
As a debugging aid, the text that SWIG feeds to its C++ parser can be obtained by running swig -E interface.i. This option, like the same option that regular C/C++ compilers support, generates the preprocessed output and is useful for looking at how macros have been expanded as well as everything else that goes into the low-level construction of the wrapper code. Also, like a regular C/C++ compiler, the preprocessed output can be generated in one invocation and then fed back in with a second invocation. This is the approach that the CCache tool uses as part of its strategy to speed up repeated builds with the same inputs.
The current C++ parser handles a subset of C++. Most incompatibilities with C are due to subtle aspects of how SWIG parses declarations. Specifically, SWIG expects all C/C++ declarations to follow this general form:
storage type declarator initializer;
storage is a keyword such as extern, static, typedef, or virtual. type is a primitive datatype such as int or void. type may be optionally qualified with a qualifier such as const or volatile. declarator is a name with additional type-construction modifiers attached to it (pointers, arrays, references, functions, etc.). Examples of declarators include *x, **x, x[20], and (*x)(int, double). The initializer may be a value assigned using = or body of code enclosed in braces { ... }.
This declaration format covers most common C++ declarations. However, the C++ standard is somewhat more flexible in the placement of the parts. For example, it is technically legal, although uncommon to write something like int typedef const a in your program. SWIG simply doesn't bother to deal with this case.
The other significant difference between C++ and SWIG is in the treatment of typenames. In C++, if you have a declaration like this,
int blah(Foo *x, Bar *y);
it won't parse correctly unless Foo and Bar have been previously defined as types either using a class definition or a typedef. The reasons for this are subtle, but this treatment of typenames is normally integrated at the level of the C tokenizer---when a typename appears, a different token is returned to the parser instead of an identifier.
SWIG does not operate in this manner--any legal identifier can be used as a type name. The reason for this is primarily motivated by the use of SWIG with partially defined data. Specifically, SWIG is supposed to be easy to use on interfaces with missing type information.
Because of the different treatment of typenames, the most serious limitation of the SWIG parser is that it can't process type declarations where an extra (and unnecessary) grouping operator is used. For example:
int (x); /* A variable x */ int (y)(int); /* A function y */
The placing of extra parentheses in type declarations like this is already recognized by the C++ community as a potential source of strange programming errors. For example, Scott Meyers "Effective STL" discusses this problem in a section on avoiding C++'s "most vexing parse."
The parser is also unable to handle declarations with no return type or bare argument names. For example, in an old C program, you might see things like this:
foo(a, b) {
...
}
In this case, the return type as well as the types of the arguments are taken by the C compiler to be an int. However, SWIG interprets the above code as an abstract declarator for a function returning a foo and taking types a and b as arguments).
The SWIG parser produces a complete parse tree of the input file before any wrapper code is actually generated. Each item in the tree is known as a "Node". Each node is identified by a symbolic tag. Furthermore, a node may have an arbitrary number of children. The parse tree structure and tag names of an interface can be displayed using swig -debug-tags. For example:
$ swig -c++ -python -debug-tags example.i . top (example.i:1) . top . include (example.i:1) . top . include . typemap (/r0/beazley/Projects/lib/swig1.3/swig.swg:71) . top . include . typemap . typemapitem (/r0/beazley/Projects/lib/swig1.3/swig.swg:71) . top . include . typemap (/r0/beazley/Projects/lib/swig1.3/swig.swg:83) . top . include . typemap . typemapitem (/r0/beazley/Projects/lib/swig1.3/swig.swg:83) . top . include (example.i:4) . top . include . insert (/r0/beazley/Projects/lib/swig1.3/python/python.swg:7) . top . include . insert (/r0/beazley/Projects/lib/swig1.3/python/python.swg:8) . top . include . typemap (/r0/beazley/Projects/lib/swig1.3/python/python.swg:19) ... . top . include (example.i:6) . top . include . module (example.i:2) . top . include . insert (example.i:6) . top . include . include (example.i:9) . top . include . include . class (example.h:3) . top . include . include . class . access (example.h:4) . top . include . include . class . constructor (example.h:7) . top . include . include . class . destructor (example.h:10) . top . include . include . class . cdecl (example.h:11) . top . include . include . class . cdecl (example.h:11) . top . include . include . class . cdecl (example.h:12) . top . include . include . class . cdecl (example.h:13) . top . include . include . class . cdecl (example.h:14) . top . include . include . class . cdecl (example.h:15) . top . include . include . class (example.h:18) . top . include . include . class . access (example.h:19) . top . include . include . class . cdecl (example.h:20) . top . include . include . class . access (example.h:21) . top . include . include . class . constructor (example.h:22) . top . include . include . class . cdecl (example.h:23) . top . include . include . class . cdecl (example.h:24) . top . include . include . class (example.h:27) . top . include . include . class . access (example.h:28) . top . include . include . class . cdecl (example.h:29) . top . include . include . class . access (example.h:30) . top . include . include . class . constructor (example.h:31) . top . include . include . class . cdecl (example.h:32) . top . include . include . class . cdecl (example.h:33)
Even for the most simple interface, the parse tree structure is larger than you might expect. For example, in the above output, a substantial number of nodes are actually generated by the python.swg configuration file which defines typemaps and other directives. The contents of the user-supplied input file don't appear until the end of the output.
The contents of each parse tree node consist of a collection of attribute/value pairs. Internally, the nodes are simply represented by hash tables. A display of the entire parse-tree structure can be obtained using swig -debug-top <n>, where n is the stage being processed. There are a number of other parse tree display options, for example, swig -debug-module <n> will avoid displaying system parse information and only display the parse tree pertaining to the user's module at stage n of processing. Adding the -debug-quiet option is recommended as it removes some noise which is not usually needed, that is, the display of many linked list pointers and symbol table pointers. See Debugging Options for a full list.
$ swig -c++ -python -debug-module 1 -debug-quiet example.i
debug-module stage 1
+++ module ----------------------------------------
| name         - "example"
| 
+++ insert ----------------------------------------
| code         - "\n#include \"example.h\"\n"
| 
+++ include ----------------------------------------
| name         - "example.h"
      +++ class ----------------------------------------
      | abstracts    - 0x7f4f15182930
      | allows_typedef - "1"
      | kind         - "class"
      | name         - "Shape"
      | sym:name     - "Shape"
            +++ access ----------------------------------------
            | kind         - "public"
            | 
            +++ constructor ----------------------------------------
            | access       - "public"
            | code         - "{\n    nshapes++;\n  }"
            | decl         - "f()."
            | feature:new  - "1"
            | ismember     - "1"
            | name         - "Shape"
            | sym:name     - "Shape"
            | 
            +++ destructor ----------------------------------------
            | access       - "public"
            | code         - "{\n    nshapes--;\n  }"
            | decl         - "f()."
            | ismember     - "1"
            | name         - "~Shape"
            | storage      - "virtual"
            | sym:name     - "~Shape"
            | 
            +++ cdecl ----------------------------------------
            | access       - "public"
            | decl         - ""
            | ismember     - "1"
            | kind         - "variable"
            | name         - "x"
            | sym:name     - "x"
            | type         - "double"
            | 
            +++ cdecl ----------------------------------------
            | access       - "public"
            | decl         - ""
            | ismember     - "1"
            | kind         - "variable"
            | name         - "y"
            | sym:name     - "y"
            | type         - "double"
            | 
            +++ cdecl ----------------------------------------
            | access       - "public"
            | decl         - "f(double,double)."
            | ismember     - "1"
            | kind         - "function"
            | name         - "move"
            | parms        - 'double dx,double dy'
            | sym:name     - "move"
            | type         - "void"
            | 
            +++ cdecl ----------------------------------------
            | abstract     - "1"
            | access       - "public"
            | decl         - "f()."
            | ismember     - "1"
            | kind         - "function"
            | name         - "area"
            | storage      - "virtual"
            | sym:name     - "area"
            | type         - "double"
            | value        - "0"
            | valuetype    - "int"
            | 
            +++ cdecl ----------------------------------------
            | abstract     - "1"
            | access       - "public"
            | decl         - "f()."
            | ismember     - "1"
            | kind         - "function"
            | name         - "perimeter"
            | storage      - "virtual"
            | sym:name     - "perimeter"
            | type         - "double"
            | value        - "0"
            | valuetype    - "int"
            | 
            +++ cdecl ----------------------------------------
            | access       - "public"
            | decl         - ""
            | ismember     - "1"
            | kind         - "variable"
            | name         - "nshapes"
            | storage      - "static"
            | sym:name     - "nshapes"
            | type         - "int"
            | 
      +++ class ----------------------------------------
      | allows_typedef - "1"
      | baselist     - 0x7f4f15182ad0
      | kind         - "class"
      | name         - "Circle"
      | privatebaselist - 0x7f4f15182b10
      | protectedbaselist - 0x7f4f15182af0
      | sym:name     - "Circle"
            +++ access ----------------------------------------
            | kind         - "private"
            | 
            +++ cdecl ----------------------------------------
            | access       - "private"
            | decl         - ""
            | ismember     - "1"
            | kind         - "variable"
            | name         - "radius"
            | type         - "double"
            | 
            +++ access ----------------------------------------
            | kind         - "public"
            | 
            +++ constructor ----------------------------------------
            | access       - "public"
            | code         - "{ }"
            | decl         - "f(double)."
            | feature:new  - "1"
            | ismember     - "1"
            | name         - "Circle"
            | parms        - 'double r'
            | sym:name     - "Circle"
            | 
            +++ cdecl ----------------------------------------
            | access       - "public"
            | decl         - "f()."
            | ismember     - "1"
            | kind         - "function"
            | name         - "area"
            | storage      - "virtual"
            | sym:name     - "area"
            | type         - "double"
            | 
            +++ cdecl ----------------------------------------
            | access       - "public"
            | decl         - "f()."
            | ismember     - "1"
            | kind         - "function"
            | name         - "perimeter"
            | storage      - "virtual"
            | sym:name     - "perimeter"
            | type         - "double"
            | 
      +++ class ----------------------------------------
      | allows_typedef - "1"
      | baselist     - 0x7f4f15183830
      | kind         - "class"
      | name         - "Square"
      | privatebaselist - 0x7f4f15183870
      | protectedbaselist - 0x7f4f15183850
      | sym:name     - "Square"
            +++ access ----------------------------------------
            | kind         - "private"
            | 
            +++ cdecl ----------------------------------------
            | access       - "private"
            | decl         - ""
            | ismember     - "1"
            | kind         - "variable"
            | name         - "width"
            | type         - "double"
            | 
            +++ access ----------------------------------------
            | kind         - "public"
            | 
            +++ constructor ----------------------------------------
            | access       - "public"
            | code         - "{ }"
            | decl         - "f(double)."
            | feature:new  - "1"
            | ismember     - "1"
            | name         - "Square"
            | parms        - 'double w'
            | sym:name     - "Square"
            | 
            +++ cdecl ----------------------------------------
            | access       - "public"
            | decl         - "f()."
            | ismember     - "1"
            | kind         - "function"
            | name         - "area"
            | storage      - "virtual"
            | sym:name     - "area"
            | type         - "double"
            | 
            +++ cdecl ----------------------------------------
            | access       - "public"
            | decl         - "f()."
            | ismember     - "1"
            | kind         - "function"
            | name         - "perimeter"
            | storage      - "virtual"
            | sym:name     - "perimeter"
            | type         - "double"
Attributes of parse tree nodes are often prepended with a namespace qualifier. For example, the attributes sym:name and sym:symtab are attributes related to symbol table management and are prefixed with sym:. As a general rule, only those attributes which are directly related to the raw declaration appear without a prefix (type, name, declarator, etc.).
Target language modules may add additional attributes to nodes to assist the generation of wrapper code. The convention for doing this is to place these attributes in a namespace that matches the name of the target language. For example, python:foo or perl:foo.
During parsing, all symbols are managed in the space of the target language. The sym:name attribute of each node contains the symbol name selected by the parser. Normally, sym:name and name are the same. However, the %rename directive can be used to change the value of sym:name. You can see the effect of %rename by trying it on a simple interface and dumping the parse tree. For example:
%rename(foo_i) foo(int); %rename(foo_d) foo(double); void foo(int); void foo(double); void foo(Bar *b);
There are various debug- options that can be useful for debugging and analysing the parse tree. For example, the debug-top <n> or debug-module <n> options will dump the entire/top of the parse tree or the module subtree at one of the four n stages of processing. The parse tree can be viewed after the final stage of processing by running SWIG:
$ swig -debug-top 1 -debug-quiet example.i
...
            +++ cdecl ----------------------------------------
            | decl         - "f(int)."
            | name         - "foo"
            | parms        - int
            | sym:name     - "foo_i"
            | type         - "void"
            |
            +++ cdecl ----------------------------------------
            | decl         - "f(double)."
            | name         - "foo"
            | parms        - double
            | sym:name     - "foo_d"
            | type         - "void"
            |
            +++ cdecl ----------------------------------------
            | decl         - "f(p.Bar)."
            | name         - "foo"
            | parms        - Bar *
            | sym:name     - "foo"
            | type         - "void"
All symbol-related conflicts and complaints about overloading are based on sym:name values. For instance, the following example uses %rename in reverse to generate a name clash.
%rename(foo) foo_i(int); %rename(foo) foo_d(double); void foo_i(int); void foo_d(double); void foo(Bar *b);
When you run SWIG on this you now get:
$ ./swig example.i example.i:6. Overloaded declaration ignored. foo_d(double ) example.i:5. Previous declaration is foo_i(int ) example.i:7. Overloaded declaration ignored. foo(Bar *) example.i:5. Previous declaration is foo_i(int )
A number of SWIG directives such as %exception are implemented using the low-level %feature directive. For example:
%feature("except") getitem(int) {
  try {
     $action
  } catch (badindex) {
     ...
  }
}
...
class Foo {
public:
  Object *getitem(int index) throws(badindex);
  ...
};
The behavior of %feature is very easy to describe--it simply attaches a new attribute to any parse tree node that matches the given prototype. When a feature is added, it shows up as an attribute in the feature: namespace. You can see this when running with the -debug-top 4 -debug-quiet option. For example:
 +++ cdecl ----------------------------------------
 | decl         - "f(int).p."
 | feature:except - "{\n    try {\n       $action\n    } catc..."
 | name         - "getitem"
 | parms        - int
 | sym:name     - "getitem"
 | type         - "Object"
 |
Feature names are completely arbitrary and a target language module can be programmed to respond to any feature name that it wants to recognize. The data stored in a feature attribute is usually just a raw unparsed string. For example, the exception code above is simply stored without any modifications.
Language modules work by defining handler functions that know how to respond to different types of parse-tree nodes. These handlers simply look at the attributes of each node in order to produce low-level code.
In reality, the generation of code is somewhat more subtle than simply invoking handler functions. This is because parse-tree nodes might be transformed. For example, suppose you are wrapping a class like this:
class Foo {
public:
  virtual int *bar(int x);
};
When the parser constructs a node for the member bar, it creates a raw "cdecl" node with the following attributes:
nodeType : cdecl name : bar type : int decl : f(int).p parms : int x storage : virtual sym:name : bar
To produce wrapper code, this "cdecl" node undergoes a number of transformations. First, the node is recognized as a function declaration. This adjusts some of the type information--specifically, the declarator is joined with the base datatype to produce this:
nodeType : cdecl name : bar type : p.int <-- Notice change in return type decl : f(int).p parms : int x storage : virtual sym:name : bar
Next, the context of the node indicates that the node is really a member function. This produces a transformation to a low-level accessor function like this:
nodeType : cdecl name : bar type : int.p decl : f(int).p parms : Foo *self, int x <-- Added parameter storage : virtual wrap:action : result = (arg1)->bar(arg2) <-- Action code added sym:name : Foo_bar <-- Symbol name changed
In this transformation, notice how an additional parameter was added to the parameter list and how the symbol name of the node has suddenly changed into an accessor using the naming scheme described in the "SWIG Basics" chapter. A small fragment of "action" code has also been generated--notice how the wrap:action attribute defines the access to the underlying method. The data in this transformed node is then used to generate a wrapper.
Language modules work by registering handler functions for dealing with various types of nodes at different stages of transformation. This is done by inheriting from a special Language class and defining a collection of virtual methods. For example, the Python module defines a class as follows:
class PYTHON : public Language {
protected:
public :
  virtual void main(int, char *argv[]);
  virtual int  top(Node *);
  virtual int  functionWrapper(Node *);
  virtual int  constantWrapper(Node *);
  virtual int  variableWrapper(Node *);
  virtual int  nativeWrapper(Node *);
  virtual int  membervariableHandler(Node *);
  virtual int  memberconstantHandler(Node *);
  virtual int  memberfunctionHandler(Node *);
  virtual int  constructorHandler(Node *);
  virtual int  destructorHandler(Node *);
  virtual int  classHandler(Node *);
  virtual int  classforwardDeclaration(Node *);
  virtual int  insertDirective(Node *);
  virtual int  importDirective(Node *);
};
The role of these functions is described shortly.
Much of SWIG's current parser design was originally motivated by interest in using XML to represent SWIG parse trees. Although XML is not currently used in any direct manner, the parse tree structure, use of node tags, attributes, and attribute namespaces are all influenced by aspects of XML parsing. Therefore, in trying to understand SWIG's internal data structures, it may be useful to keep XML in the back of your mind as a model.
In addition to the options beginning -debug- for dumping out SWIG's parse tree in a simple text format, the parse tree can also be dumped as XML. There are three options below, where file specifies the name of a file for generating the resulting XML into:
-xmlout <file>
Use this XML option in addition to any target language options. The parse tree is dumped after the final stage of processing, that is, after the target language has finished processing. It is useful for seeing the parse tree at stage 4 after any target language processing. The parse tree dumped is very similar in content to that generated by -debug-top 4.
-xml -o <file>
Use this XML option without specifying a target language. The parse tree that is dumped is the same as if there was a no target language option. It is useful for seeing the parse tree at stage 3, which is the same as stage 4, as there is no target language processing. The parse tree dumped is very similar in content to that generated by -debug-top 3.
-xml -xmllite -o <file>
Same as above, except the addition of -xmllite reduces the output by skipping some type information, that is, the typescope and typetab nodes.
The parse tree generated using -xmlout is much bigger than that using -xml as the target languages process numerous extra SWIG system files and also add to the parse tree quite considerably in stage 4.
The XML is a dump of SWIG's internal parse tree and as such it is subject to change at any time as and when SWIG's implementation changes.
Most of SWIG is constructed using three basic data structures: strings, hashes, and lists. These data structures are dynamic in same way as similar structures found in many scripting languages. For instance, you can have containers (lists and hash tables) of mixed types and certain operations are polymorphic.
This section briefly describes the basic structures so that later sections of this chapter make more sense.
When describing the low-level API, the following type name conventions are used:
In most cases, other typenames in the source are aliases for one of these primitive types. Specifically:
typedef String SwigType; typedef Hash Parm; typedef Hash ParmList; typedef Hash Node; typedef Hash Symtab; typedef Hash Typetab;
String *NewString(const String_or_char *val)
String *NewStringf(const char *fmt, ...)
String *Copy(String *s)
void Delete(String *s)
int Len(const String_or_char *s)
char *Char(const String_or_char *s)
void Append(String *s, const String_or_char *t)
void Insert(String *s, int pos, const String_or_char *t)
int Strcmp(const String_or_char *s, const String_or_char *t)
int Strncmp(const String_or_char *s, const String_or_char *t, int len)
char *Strstr(const String_or_char *s, const String_or_char *pat)
char *Strchr(const String_or_char *s, char ch)
void Chop(String *s)
int Replace(String *s, const String_or_char *pat, const String_or_char *rep, int flags)
Replaces the pattern pat with rep in string s. flags is a combination of the following flags:
DOH_REPLACE_ANY - Replace all occurrences DOH_REPLACE_ID - Valid C identifiers only DOH_REPLACE_NOQUOTE - Don't replace in quoted strings DOH_REPLACE_FIRST - Replace first occurrence only.
Returns the number of replacements made (if any).
At most one of DOH_REPLACE_ANY and DOH_REPLACE_FIRST should be specified. DOH_REPLACE_ANY is the default if neither is specified.
Hash *NewHash()
Hash *Copy(Hash *h)
void Delete(Hash *h)
int Len(Hash *h)
Object *Getattr(Hash *h, const String_or_char *key)
int Setattr(Hash *h, const String_or_char *key, const Object_or_char *val)
int Delattr(Hash *h, const String_or_char *key)
List *Keys(Hash *h)
List *SortedKeys(Hash *h, int (*cmp) (const DOH *, const DOH *))
List *NewList()
List *Copy(List *x)
void Delete(List *x)
int Len(List *x)
Object *Getitem(List *x, int n)
int *Setitem(List *x, int n, const Object_or_char *val)
int *Delitem(List *x, int n)
void Append(List *x, const Object_or_char *t)
void Insert(String *s, int pos, const Object_or_char *t)
Object *Copy(Object *x)
void Delete(Object *x)
void Setfile(Object *x, String_or_char *f)
String *Getfile(Object *x)
void Setline(Object *x, int n)
int Getline(Object *x)
Iterator First(Object *x)
Iterator Next(Iterator i)
Returns an iterator that points to the next item in a list or hash table. Here are two examples of iteration:
List *l = (some list);
Iterator i;
for (i = First(l); i.item; i = Next(i)) {
  Printf(stdout, "%s\n", i.item);
}
Hash *h = (some hash);
Iterator j;
for (j = First(j); j.item; j= Next(j)) {
  Printf(stdout, "%s : %s\n", j.key, j.item);
}
int Printf(String_or_FILE *f, const char *fmt, ...)
int Printv(String_or_FILE *f, String_or_char *arg1, ..., NULL)
int Putc(int ch, String_or_FILE *f)
int Write(String_or_FILE *f, void *buf, int len)
int Read(String_or_FILE *f, void *buf, int maxlen)
int Getc(String_or_FILE *f)
int Ungetc(int ch, String_or_FILE *f)
int Seek(String_or_FILE *f, int offset, int whence)
long Tell(String_or_FILE *f)
File *NewFile(const char *filename, const char *mode, List *newfiles)
File *NewFileFromFile(FILE *f)
There's no explicit function to close a file, just call Delete(f) - this decreases the reference count, and the file will be closed when the reference count reaches zero.
The use of the above I/O functions and strings play a critical role in SWIG. It is common to see small code fragments of code generated using code like this:
/* Print into a string */
String *s = NewString("");
Printf(s, "Hello\n");
for (i = 0; i < 10; i++) {
  Printf(s, "%d\n", i);
}
...
/* Print string into a file */
Printf(f, "%s\n", s);
Similarly, the preprocessor and parser all operate on string-files.
String *nodeType(Node *n)
Node *nextSibling(Node *n)
Node *previousSibling(Node *n)
Node *firstChild(Node *n)
Node *lastChild(Node *n)
Node *parentNode(Node *n)
The following macros can be used to change all of the above attributes. Normally, these functions are only used by the parser. Changing them without knowing what you are doing is likely to be dangerous.
void set_nodeType(Node *n, const String_or_char)
void set_nextSibling(Node *n, Node *s)
void set_previousSibling(Node *n, Node *s)
void set_firstChild(Node *n, Node *c)
void set_lastChild(Node *n, Node *c)
void set_parentNode(Node *n, Node *p)
The following utility functions are used to alter the parse tree (at your own risk)
void appendChild(Node *parent, Node *child)
void deleteNode(Node *node)
Since parse tree nodes are just hash tables, attributes are accessed using the Getattr(), Setattr(), and Delattr() operations. For example:
int functionHandler(Node *n) {
  String *name    = Getattr(n, "name");
  String *symname = Getattr(n, "sym:name");
  SwigType *type  = Getattr(n, "type");
  ...
}
New attributes can be freely attached to a node as needed. However, when new attributes are attached during code generation, they should be prepended with a namespace prefix. For example:
... Setattr(n, "python:docstring", doc); /* Store docstring */ ...
A quick way to check the value of an attribute is to use the checkAttribute() function like this:
if (checkAttribute(n, "storage", "virtual")) {
  /* n is virtual */
  ...
}
Changing the values of existing attributes is allowed and is sometimes done to implement node transformations. However, if a function/method modifies a node, it is required to restore modified attributes to their original values. To simplify the task of saving/restoring attributes, the following functions are used:
int Swig_save(const char *ns, Node *n, const char *name1, const char *name2, ..., NIL)
int Swig_restore(Node *n)
Restores the attributes saved by the previous call to Swig_save(). Those attributes that were supplied to Swig_save() will be restored to their original values.
The Swig_save() and Swig_restore() functions must always be used as a pair. That is, every call to Swig_save() must have a matching call to Swig_restore(). Calls can be nested if necessary. Here is an example that shows how the functions might be used:
int variableHandler(Node *n) {
  Swig_save("variableHandler", n, "type", "sym:name", NIL);
  String *symname = Getattr(n, "sym:name");
  SwigType *type  = Getattr(n, "type");
  ...
  Append(symname, "_global");        // Change symbol name
  SwigType_add_pointer(type);        // Add pointer
  ...
  generate wrappers
  ...
  Swig_restore(n);                  // Restore original values
  return SWIG_OK;
}
int Swig_require(const char *ns, Node *n, const char *name1, const char *name2, ..., NIL)
SWIG implements the complete C++ type system including typedef, inheritance, pointers, references, and pointers to members. A detailed discussion of type theory is impossible here. However, let's cover the highlights.
All types in SWIG consist of a base datatype and a collection of type operators that are applied to the base. A base datatype is almost always some kind of primitive type such as int or double. The operators consist of things like pointers, references, arrays, and so forth. Internally, types are represented as strings that are constructed in a very precise manner. Here are some examples:
C datatype SWIG encoding (strings) ----------------------------- -------------------------- int "int" int * "p.int" const int * "p.q(const).int" int (*x)(int, double) "p.f(int, double).int" int [20][30] "a(20).a(30).int" int (F::*)(int) "m(F).f(int).int" vector<int> * "p.vector<(int)>"
Reading the SWIG encoding is often easier than figuring out the C code---just read it from left to right. For a type of "p.f(int, double).int" is a "pointer to a function(int, double) that returns int".
The following operator encodings are used in type strings:
Operator Meaning ------------------- ------------------------------- p. Pointer to a(n). Array of dimension n r. C++ reference m(class). Member pointer to class f(args). Function. q(qlist). Qualifiers
In addition, type names may be parameterized by templates. This is represented by enclosing the template parameters in <( ... )>. Variable length arguments are represented by the special base type of v(...).
If you want to experiment with type encodings, the raw type strings can be inserted into an interface file using backticks `` wherever a type is expected. For instance, here is an extremely perverted example:
`p.a(10).p.f(int, p.f(int).int)` foo(int, int (*x)(int));
This corresponds to the immediately obvious C declaration:
(*(*foo(int, int (*)(int)))[10])(int, int (*)(int));
Aside from the potential use of this declaration on a C programming quiz, it motivates the use of the special SWIG encoding of types. The SWIG encoding is much easier to work with because types can be easily examined, modified, and constructed using simple string operations (comparison, substrings, concatenation, etc.). For example, in the parser, a declaration like this
int *a[30];
is processed in a few pieces. In this case, you have the base type "int" and the declarator of type "a(30).p.". To make the final type, the two parts are just joined together using string concatenation.
The following functions are used to construct types. You should use these functions instead of trying to build the type strings yourself.
void SwigType_add_pointer(SwigType *ty)
void SwigType_del_pointer(SwigType *ty)
void SwigType_add_reference(SwigType *ty)
void SwigType_add_array(SwigType *ty, const String_or_char *size)
void SwigType_del_array(SwigType *ty)
int SwigType_array_ndim(SwigType *ty)
String* SwigType_array_getdim(SwigType *ty, int n)
void SwigType_array_setdim(SwigType *ty, int n, const String_or_char *rep)
void SwigType_add_qualifier(SwigType *ty, const String_or_char *q)
void SwigType_add_memberpointer(SwigType *ty, const String_or_char *cls)
void SwigType_add_function(SwigType *ty, ParmList *p)
void SwigType_add_template(SwigType *ty, ParmList *p)
SwigType *SwigType_pop(SwigType *ty)
void SwigType_push(SwigType *ty, SwigType *op)
SwigType *SwigType_pop_arrays(SwigType *ty)
SwigType *SwigType_pop_function(SwigType *ty)
SwigType *SwigType_base(SwigType *ty)
SwigType *SwigType_prefix(SwigType *ty)
The following functions can be used to test properties of a datatype.
int SwigType_ispointer(SwigType *ty)
int SwigType_ismemberpointer(SwigType *ty)
int SwigType_isreference(SwigType *ty)
int SwigType_isarray(SwigType *ty)
int SwigType_isfunction(SwigType *ty)
int SwigType_isqualifier(SwigType *ty)
int SwigType_issimple(SwigType *ty)
int SwigType_isconst(SwigType *ty)
int SwigType_isvarargs(SwigType *ty)
int SwigType_istemplate(SwigType *ty)
The behavior of typedef declaration is to introduce a type alias. For instance, typedef int Integer makes the identifier Integer an alias for int. The treatment of typedef in SWIG is somewhat complicated due to the pattern matching rules that get applied in typemaps and the fact that SWIG prefers to generate wrapper code that closely matches the input to simplify debugging (a user will see the typedef names used in their program instead of the low-level primitive C datatypes).
To handle typedef, SWIG builds a collection of trees containing typedef relations. For example,
typedef int Integer; typedef Integer *IntegerPtr; typedef int Number; typedef int Size;
produces two trees like this:
                 int               p.Integer
               ^  ^  ^                 ^
              /   |   \                |
             /    |    \               |
        Integer  Size   Number    IntegerPtr
To resolve a single typedef relationship, the following function is used:
SwigType *SwigType_typedef_resolve(SwigType *ty)
Typedefs are only resolved in simple typenames that appear in a type. For example, the type base name and in function parameters. When resolving types, the process starts in the leaf nodes and moves up the tree towards the root. Here are a few examples that show how it works:
Original type After typedef_resolve() ------------------------ ----------------------- Integer int a(30).Integer int p.IntegerPtr p.p.Integer p.p.Integer p.p.int
For complicated types, the process can be quite involved. Here is the reduction of a function pointer:
p.f(Integer, p.IntegerPtr, Size).Integer : Start p.f(Integer, p.IntegerPtr, Size).int p.f(int, p.IntegerPtr, Size).int p.f(int, p.p.Integer, Size).int p.f(int, p.p.int, Size).int p.f(int, p.p.int, int).int : End
Two types are equivalent if their full type reductions are the same. The following function will fully reduce a datatype:
SwigType *SwigType_typedef_resolve_all(SwigType *ty)
When generating wrapper code, it is necessary to emit datatypes that can be used on the left-hand side of an assignment operator (an lvalue). However, not all C datatypes can be used in this way---especially arrays and const-qualified types. To generate a type that can be used as an lvalue, use the following function:
SwigType *SwigType_ltype(SwigType *ty)
The creation of lvalues is fully aware of typedef and other aspects of the type system. Therefore, the creation of an lvalue may result in unexpected results. Here are a few examples:
typedef double Matrix4[4][4]; Matrix4 x; // type = 'Matrix4', ltype='p.a(4).double' typedef const char * Literal; Literal y; // type = 'Literal', ltype='p.char'
The following functions produce strings that are suitable for output.
String *SwigType_str(SwigType *ty, const String_or_char *id = 0)
String *SwigType_lstr(SwigType *ty, const String_or_char *id = 0)
String *SwigType_lcaststr(SwigType *ty, const String_or_char *id = 0)
String *SwigType_rcaststr(SwigType *ty, const String_or_char *id = 0)
String *SwigType_manglestr(SwigType *ty)
Several type-related functions involve parameter lists. These include functions and templates. Parameter list are represented as a list of nodes with the following attributes:
"type" - Parameter type (required) "name" - Parameter name (optional) "value" - Initializer (optional)
Typically parameters are denoted in the source by using a typename of Parm * or ParmList *. To walk a parameter list, simply use code like this:
Parm *parms;
Parm *p;
for (p = parms; p; p = nextSibling(p)) {
  SwigType *type  = Getattr(p, "type");
  String   *name  = Getattr(p, "name");
  String   *value = Getattr(p, "value");
  ...
}
Note: this code is exactly the same as what you would use to walk parse tree nodes.
An empty list of parameters is denoted by a NULL pointer.
Since parameter lists are fairly common, the following utility functions are provided to manipulate them:
Parm *CopyParm(Parm *p);
ParmList *CopyParmList(ParmList *p);
int ParmList_len(ParmList *p);
String *ParmList_str(ParmList *p);
String *ParmList_protostr(ParmList *p);
int ParmList_numrequired(ParmList *p);
One of the easiest routes to supporting a new language module is to copy an already supported language module implementation and modify it. Be sure to choose a language that is similar in nature to the new language. All language modules follow a similar structure and this section briefly outlines the steps needed to create a bare-bones language module from scratch. Since the code is relatively easy to read, this section describes the creation of a minimal Python module. You should be able to extrapolate this to other languages.
Code generation modules are defined by inheriting from the Language class, currently defined in the Source/Modules directory of SWIG. Starting from the parsing of command line options, all aspects of code generation are controlled by different methods of the Language that must be defined by your module.
To define a new language module, first create a minimal implementation using this example as a guide:
#include "swigmod.h"
class PYTHON : public Language {
public:
  virtual void main(int argc, char *argv[]) {
    printf("I'm the Python module.\n");
  }
  virtual int top(Node *n) {
    printf("Generating code.\n");
    return SWIG_OK;
  }
};
extern "C" Language *
swig_python(void) {
  return new PYTHON();
}
The "swigmod.h" header file contains, among other things, the declaration of the Language base class and so you should include it at the top of your language module's source file. Similarly, the "swigconfig.h" header file contains some other useful definitions that you may need. Note that you should not include any header files that are installed with the target language. That is to say, the implementation of the SWIG Python module shouldn't have any dependencies on the Python header files. The wrapper code generated by SWIG will almost always depend on some language-specific C/C++ header files, but SWIG itself does not.
Give your language class a reasonable name, usually the same as the target language. By convention, these class names are all uppercase (e.g. "PYTHON" for the Python language module) but this is not a requirement. This class will ultimately consist of a number of overrides of the virtual functions declared in the Language base class, in addition to any language-specific member functions and data you need. For now, just use the dummy implementations shown above.
The language module ends with a factory function, swig_python(), that simply returns a new instance of the language class. As shown, it should be declared with the extern "C" storage qualifier so that it can be called from C code. It should also return a pointer to the base class (Language) so that only the interface (and not the implementation) of your language module is exposed to the rest of SWIG.
Save the code for your language module in a file named "python.cxx" and place this file in the Source/Modules directory of the SWIG distribution. To ensure that your module is compiled into SWIG along with the other language modules, modify the file Source/Makefile.am to include the additional source files. In addition, modify the file Source/Modules/swigmain.cxx with an additional command line option that activates the module. Read the source---it's straightforward.
Next, at the top level of the SWIG distribution, re-run the autogen.sh script to regenerate the various build files:
$ ./autogen.sh
Next re-run configure to regenerate all of the Makefiles:
$ ./configure
Finally, rebuild SWIG with your module added:
$ make
Once it finishes compiling, try running SWIG with the command-line option that activates your module. For example, swig -python foo.i. The messages from your new module should appear.
When SWIG starts, the command line options are passed to your language module. This occurs before any other processing occurs (preprocessing, parsing, etc.). To capture the command line options, simply use code similar to this:
void Language::main(int argc, char *argv[]) {
  for (int i = 1; i < argc; i++) {
    if (argv[i]) {
      if (strcmp(argv[i], "-interface") == 0) {
        if (argv[i+1]) {
          interface = NewString(argv[i+1]);
          Swig_mark_arg(i);
          Swig_mark_arg(i+1);
          i++;
        } else {
          Swig_arg_error();
        }
      } else if (strcmp(argv[i], "-globals") == 0) {
        if (argv[i+1]) {
          global_name = NewString(argv[i+1]);
          Swig_mark_arg(i);
          Swig_mark_arg(i+1);
          i++;
        } else {
          Swig_arg_error();
        }
      } else if ((strcmp(argv[i], "-proxy") == 0)) {
        proxy_flag = 1;
        Swig_mark_arg(i);
      } else if (strcmp(argv[i], "-keyword") == 0) {
        use_kw = 1;
        Swig_mark_arg(i);
      } else if (strcmp(argv[i], "-help") == 0) {
        fputs(usage, stderr);
      }
      ...
    }
  }
}
The exact set of options depends on what you want to do in your module. Generally, you would use the options to change code generation modes or to print diagnostic information.
If a module recognizes an option, it should always call Swig_mark_arg() to mark the option as valid. If you forget to do this, SWIG will terminate with an unrecognized command line option error.
In addition to looking at command line options, the main() method is responsible for some initial configuration of the SWIG library and preprocessor. To do this, insert some code like this:
void main(int argc, char *argv[]) {
  ... command line options ...
  /* Set language-specific subdirectory in SWIG library */
  SWIG_library_directory("python");
  /* Set language-specific preprocessing symbol */
  Preprocessor_define("SWIGPYTHON 1", 0);
  /* Set language-specific configuration file */
  SWIG_config_file("python.swg");
  /* Set typemap language (historical) */
  SWIG_typemap_lang("python");
}
The above code does several things--it registers the name of the language module with the core, it supplies some preprocessor macro definitions for use in input files (so that they can determine the target language), and it registers a start-up file. In this case, the file python.swg will be parsed before any part of the user-supplied input file.
Before proceeding any further, create a directory for your module in the SWIG library (The Lib directory). Now, create a configuration file in the directory. For example, python.swg.
Just to review, your language module should now consist of two files-- an implementation file python.cxx and a configuration file python.swg.
SWIG is a multi-pass compiler. Once the main() method has been invoked, the language module does not execute again until preprocessing, parsing, and a variety of semantic analysis passes have been performed. When the core is ready to start generating wrappers, it invokes the top() method of your language class. The argument to top is a single parse tree node that corresponds to the top of the entire parse tree.
To get the code generation process started, the top() procedure needs to do several things:
An outline of top() might be as follows:
int Python::top(Node *n) {
  /* Get the module name */
  String *module = Getattr(n, "name");
  /* Get the output file name */
  String *outfile = Getattr(n, "outfile");
  /* Initialize I/O (see next section) */
  ...
  /* Output module initialization code */
  ...
  /* Emit code for children */
  Language::top(n);
  ...
  /* Cleanup files */
  ...
  return SWIG_OK;
}
Within SWIG wrappers, there are five main sections. These are (in order)
Different parts of the SWIG code will fill different sections, then upon completion of the wrappering all the sections will be saved to the wrapper file.
To perform this will require several additions to the code in various places, such as:
class PYTHON : public Language {
protected:
  /* General DOH objects used for holding the strings */
  File *f_begin;
  File *f_runtime;
  File *f_header;
  File *f_wrappers;
  File *f_init;
public:
  ...
};
int Python::top(Node *n) {
  ...
  /* Initialize I/O */
  f_begin = NewFile(outfile, "w", SWIG_output_files());
  if (!f_begin) {
    FileErrorDisplay(outfile);
    Exit(EXIT_FAILURE);
  }
  f_runtime = NewString("");
  f_init = NewString("");
  f_header = NewString("");
  f_wrappers = NewString("");
  /* Register file targets with the SWIG file handler */
  Swig_register_filebyname("begin", f_begin);
  Swig_register_filebyname("header", f_header);
  Swig_register_filebyname("wrapper", f_wrappers);
  Swig_register_filebyname("runtime", f_runtime);
  Swig_register_filebyname("init", f_init);
  /* Output module initialization code */
  Swig_banner(f_begin);
  ...
  /* Emit code for children */
  Language::top(n);
  ...
  /* Write all to the file */
  Dump(f_runtime, f_begin);
  Dump(f_header, f_begin);
  Dump(f_wrappers, f_begin);
  Wrapper_pretty_print(f_init, f_begin);
  /* Cleanup files */
  Delete(f_runtime);
  Delete(f_header);
  Delete(f_wrappers);
  Delete(f_init);
  Delete(f_begin);
  return SWIG_OK;
}
Using this to process a file will generate a wrapper file, however the wrapper will only consist of the common SWIG code as well as any inline code which was written in the .i file. It does not contain any wrappers for any of the functions or classes.
The code to generate the wrappers are the various member functions, which currently have not been touched. We will look at functionWrapper() as this is the most commonly used function. In fact many of the other wrapper routines will call this to do their work.
A simple modification to write some basic details to the wrapper looks like this:
int Python::functionWrapper(Node *n) {
  /* Get some useful attributes of this function */
  String   *name   = Getattr(n, "sym:name");
  SwigType *type   = Getattr(n, "type");
  ParmList *parms  = Getattr(n, "parms");
  String   *parmstr= ParmList_str_defaultargs(parms); // to string
  String   *func   = SwigType_str(type, NewStringf("%s(%s)", name, parmstr));
  String   *action = Getattr(n, "wrap:action");
  Printf(f_wrappers, "functionWrapper   : %s\n", func);
  Printf(f_wrappers, "           action : %s\n", action);
  return SWIG_OK;
}
This will now produce some useful information within your wrapper file.
functionWrapper   : void delete_Shape(Shape *self)
           action : delete arg1;
functionWrapper   : void Shape_x_set(Shape *self, double x)
           action : if (arg1) (arg1)->x = arg2;
functionWrapper   : double Shape_x_get(Shape *self)
           action : result = (double) ((arg1)->x);
functionWrapper   : void Shape_y_set(Shape *self, double y)
           action : if (arg1) (arg1)->y = arg2;
...
As ingenious as SWIG is, and despite all its capabilities and the power of its parser, the Low-level code generation takes a lot of work to write properly. Mainly because every language insists on its own manner of interfacing to C/C++. To write the code generators you will need a good understanding of how to manually write an interface to your chosen language, so make sure you have your documentation handy.
At this point it is also probably a good idea to take a very simple file (just one function), and try letting SWIG generate wrappers for many different languages. Take a look at all of the wrappers generated, and decide which one looks closest to the language you are trying to wrap. This may help you to decide which code to look at.
In general most language wrappers look a little like this:
/* wrapper for TYPE3 some_function(TYPE1, TYPE2); */
RETURN_TYPE _wrap_some_function(ARGS){
  TYPE1 arg1;
  TYPE2 arg2;
  TYPE3 result;
  if(ARG1 is not of TYPE1) goto fail;
  arg1=(convert ARG1);
  if(ARG2 is not of TYPE2) goto fail;
  arg2=(convert ARG2);
  result=some_function(arg1, arg2);
  convert 'result' to whatever the language wants;
  do any tidy up;
  return ALL_OK;
  fail:
  do any tidy up;
  return ERROR;
}
Yes, it is rather vague and not very clear. But each language works differently so this will have to do for now.
Tackling this problem will be done in two stages:
The first step will be done in the code, the second will be done in typemaps.
Our first step will be to write the code for functionWrapper(). What is shown below is NOT the solution, merely a step in the right direction. There are a lot of issues to address.
virtual int functionWrapper(Node *n) {
  /* get useful attributes */
  String   *name   = Getattr(n, "sym:name");
  SwigType *type   = Getattr(n, "type");
  ParmList *parms  = Getattr(n, "parms");
  ...
  /* create the wrapper object */
  Wrapper *wrapper = NewWrapper();
  /* create the wrapper function's name */
  String *wname = Swig_name_wrapper(iname);
  /* deal with overloading */
  ....
  /* write the wrapper function definition */
  Printv(wrapper->def, "RETURN_TYPE ", wname, "(ARGS) {", NIL);
  /* if any additional local variables are needed, add them now */
  ...
  /* write the list of locals/arguments required */
  emit_args(type, parms, wrapper);
  /* check arguments */
  ...
  /* write typemaps(in) */
  ....
  /* write constraints */
  ....
  /* Emit the function call */
  emit_action(n, wrapper);
  /* return value if necessary  */
  ....
  /* write typemaps(out) */
  ....
  /* add cleanup code */
  ....
  /* Close the function(ok) */
  Printv(wrapper->code, "return ALL_OK;\n", NIL);
  /* add the failure cleanup code */
  ...
  /* Close the function(error) */
  Printv(wrapper->code, "return ERROR;\n", "}\n", NIL);
  /* final substitutions if applicable */
  ...
  /* Dump the function out */
  Wrapper_print(wrapper, f_wrappers);
  /* tidy up */
  Delete(wname);
  DelWrapper(wrapper);
  return SWIG_OK;
}
Executing this code will produce wrappers which have our basic skeleton but without the typemaps, there is still work to do.
At the time of this writing, SWIG supports nearly twenty languages, which means that for continued sanity in maintaining the configuration files, the language modules need to follow some conventions. These are outlined here along with the admission that, yes it is ok to violate these conventions in minor ways, as long as you know where to apply the proper kludge to keep the overall system regular and running. Engineering is the art of compromise, see...
Much of the maintenance regularity depends on choosing a suitable nickname for your language module (and then using it in a controlled way). Nicknames should be all lower case letters with an optional numeric suffix (no underscores, no dashes, no spaces). Some examples are: foo, bar, qux99.
The numeric suffix variant, as in the last example, is somewhat tricky to work with because sometimes people expect to refer to the language without this number but sometimes that number is extremely relevant (especially when it corresponds to language implementation versions with incompatible interfaces). New language modules that unavoidably require a numeric suffix in their nickname should include that number in all uses, or be prepared to kludge.
The nickname is used in four places:
| usage | transform | 
| "skip" tag | (none) | 
| Examples/ subdir name | (none) | 
| Examples/test-suite/ subdir name | (none) | 
As you can see, most usages are direct.
autoconf to generate the configure script. This is where you need to add shell script fragments and autoconf macros to detect the presence of whatever development support your language module requires, typically directories where headers and libraries can be found, and/or utility programs useful for integrating the generated wrapper code.
Use the AC_ARG_WITH, AC_MSG_CHECKING, AC_SUBST macros and so forth (see other languages for examples). Avoid using the [ and ] character in shell script fragments. The variable names passed to AC_SUBST should begin with the nickname, entirely upcased.
At the end of the new section is the place to put the aforementioned nickname kludges (should they be needed). See Perl5 for examples of what to do. [If this is still unclear after you've read the code, ping me and I'll expand on this further. --ttn]
Some of the variables AC_SUBSTituted are essential to the support of your language module. Fashion these into a shell script "test" clause and assign that to a skip tag using "-z" and "-o":
This means if those vars should ever be empty, qux99 support should be considered absent and so it would be a good idea to skip actions that might rely on it.
Here is where you may also define an alias (but then you'll need to kludge --- don't do this):
Lastly, you need to modify each of check-aliveness, check-examples, check-test-suite and lib-languages (var). Use the nickname for these, not the alias. Note that you can do this even before you have any tests or examples set up; the Makefile rules do some sanity checking and skip around these kinds of problems.
When you have modified these files, please make sure that the new language module is completely ignored if it is not installed and detected on a box, that is, make check-examples and make check-test-suite politely displays the ignoring language message.
Discuss the kinds of functions typically needed for SWIG runtime support (e.g. SWIG_ConvertPtr() and SWIG_NewPointerObj()) and the names of the SWIG files that implement those functions.
The standard library files that most languages supply keeps growing as SWIG matures. The following are the minimum that are usually supported:
Please copy these and modify for any new language.
Each of the language modules provides one or more examples. These examples are used to demonstrate different features of the language module to SWIG end-users, but you'll find that they're useful during development and testing of your language module as well. You can use examples from the existing SWIG language modules for inspiration.
Each example is self-contained and consists of (at least) a Makefile, a SWIG interface file for the example module, and a 'runme' script that demonstrates the functionality for that module. All of these files are stored in the same subdirectory under the Examples/[lang] directory. There are two classic examples which should be the first to convert to a new language module. These are the "simple" C example and the "class" C++ example. These can be found, for example for Python, in Examples/python/simple and Examples/python/class.
By default, all of the examples are built and run when the user types make check. To ensure that your examples are automatically run during this process, see the section on configuration files.
A test driven development approach is central to the improvement and development of SWIG. Most modifications to SWIG are accompanied by additional regression tests and checking all tests to ensure that no regressions have been introduced.
The regression testing is carried out by the SWIG test-suite. The test-suite consists of numerous testcase interface files in the Examples/test-suite directory as well as target language specific runtime tests in the Examples/test-suite/[lang] directory. When a testcase is run, it will execute the following steps for each testcase:
For example, the ret_by_value testcase consists of two components. The first component is the Examples/test-suite/ret_by_value.i interface file. The name of the SWIG module must always be the name of the testcase, so the ret_by_value.i interface file thus begins with:
%module ret_by_value
The testcase code will then follow the module declaration, usually within a %inline %{ ... %} section for the majority of the tests.
The second component is the optional runtime tests. Any runtime tests are named using the following convention: [testcase]_runme.[ext], where [testcase] is the testcase name and [ext] is the normal extension for the target language file. In this case, the Java and Python target languages implement a runtime test, so their files are respectively, Examples/test-suite/java/ret_by_value_runme.java and Examples/test-suite/python/ret_by_value_runme.py.
The goal of the test-suite is to test as much as possible in a silent manner. This way any SWIG or compiler errors or warnings are easily visible. Should there be any warnings, changes must be made to either fix them (preferably) or suppress them. Compilation or runtime errors result in a testcase failure and will be immediately visible. It is therefore essential that the runtime tests are written in a manner that displays nothing to stdout/stderr on success but error/exception out with an error message on stderr on failure.
In order for the test-suite to work for a particular target language, the language must be correctly detected and configured during the configure stage so that the correct Makefiles are generated. Most development occurs on Linux, so usually it is a matter of installing the development packages for the target language and simply configuring as outlined earlier.
If when running the test-suite commands that follow, you get a message that the test was skipped, it indicates that the configure stage is missing information in order to compile and run everything for that language.
The test-suite can be run in a number of ways. The first group of commands are for running multiple testcases in one run and should be executed in the top level directory. To run the entire test-suite (can take a long time):
make -k check-test-suite
To run the test-suite just for target language [lang], replace [lang] with one of csharp, java, perl5, python, ruby, tcl etc:
make check-[lang]-test-suite
Note that if a runtime test is available, a message "(with run test)" is displayed when run. For example:
$ make check-python-test-suite checking python test-suite checking python testcase argcargvtest (with run test) checking python testcase python_autodoc checking python testcase python_append (with run test) checking python testcase callback (with run test)
The files generated on a previous run can be deleted using the clean targets, either the whole test-suite or for a particular language:
make clean-test-suite make clean-[lang]-test-suite
The test-suite can be run in a partialcheck mode where just SWIG is executed, that is, the compile, link and running of the testcases is not performed. Note that the partialcheck does not require the target language to be correctly configured and detected and unlike the other test-suite make targets, is never skipped. Once again, either all the languages can be executed or just a chosen language:
make partialcheck-test-suite make partialcheck-[lang]-test-suite
If your computer has more than one CPU, you are strongly advised to use parallel make to speed up the execution speed. This can be done with any of the make targets that execute more than one testcase. For example, a dual core processor can efficiently use 2 parallel jobs:
make -j2 check-test-suite make -j2 check-python-test-suite make -j2 partialcheck-java-test-suite
The second group of commands are for running individual testcases and should be executed in the appropriate target language directory, Examples/test-suite/[lang]. Testcases can contain either C or C++ code and when one is written, a decision must be made as to which of these input languages is to be used. Replace [testcase] in the commands below with the name of the testcase.
For a C language testcase, add the testcase under the C_TEST_CASES list in Examples/test-suite/common.mk and execute individually as:
make -s [testcase].ctest
For a C++ language testcase, add the testcase under the CPP_TEST_CASES list in Examples/test-suite/common.mk and execute individually as:
make -s [testcase].cpptest
A third category of tests are C++ language testcases testing multiple modules (the %import directive). These require more than one shared library (dll/shared object) to be built and so are separated out from the normal C++ testcases. Add the testcase under the MULTI_CPP_TEST_CASES list in Examples/test-suite/common.mk and execute individually as:
make -s [testcase].multicpptest
To delete the generated files, execute:
make -s [testcase].clean
If you would like to see the exact commands being executed, drop the -s option:
make [testcase].ctest make [testcase].cpptest make [testcase].multicpptest
Some real examples of each:
make -s ret_by_value.clean make -s ret_by_value.ctest make -s bools.cpptest make -s imports.multicpptest
Advanced usage of the test-suite facilitates running tools on some of the five stages. The make variables SWIGTOOL and RUNTOOL are used to specify a tool to respectively, invoke SWIG and the execution of the runtime test. You are advised to view the Examples/test-suite/common.mk file for details but for a short summary, the classic usage is to use Valgrind for memory checking. For example, checking for memory leaks when running the runtime test in the target language interpreter:
make ret_by_value.ctest RUNTOOL="valgrind --leak-check=full"
This will probably make more sense if you look at the output of the above as it will show the exact commands being executed. SWIG can be analyzed for bad memory by first rebuilding swig with just the -g option. Also define DOH_DEBUG_MEMORY_POOLS, see section. SWIG can then be invoked via valgrind using:
make ret_by_value.ctest SWIGTOOL="valgrind --tool=memcheck --trace-children=yes"
A debugger can also be invoked easily on an individual test, for example gdb:
make ret_by_value.ctest RUNTOOL="gdb --args"
SWIG reads the SWIG_FEATURES environment variable to obtain options in addition to those passed on the command line. This is particularly useful as the entire test-suite or a particular testcase can be run customized by using additional arguments, for example the -O optimization flag can be added in, as shown below for the bash shell:
env SWIG_FEATURES=-O make check-python-test-suite
The syntax for setting environment variables varies from one shell to the next, but it also works as shown in the example below, where some typemap debugging is added in:
make ret_by_value.ctest SWIG_FEATURES="-debug-tmsearch"
There is also a special 'errors' test-suite which is a set of regression tests checking SWIG warning and error messages. It can be run in the same way as the other language test-suites, replacing [lang] with errors, such as make check-errors-test-suite. The test cases used and the way it works is described in Examples/test-suite/errors/Makefile.in.
Don't forget to write end-user documentation for your language module. Currently, each language module has a dedicated chapter You shouldn't rehash things that are already covered in sufficient detail in the SWIG Basics and SWIG and C++ chapters. There is no fixed format for what, exactly, you should document about your language module, but you'll obviously want to cover issues that are unique to your language.
Some topics that you'll want to be sure to address include:
The coding guidelines for the C/C++ source code are pretty much K&R C style. The style can be inferred from the existing code base and is largely dictated by the indent code beautifier tool set to K&R style. The code can formatted using the make targets in the Source directory. Below is an example of how to format the emit.cxx file:
$ cd Source $ make beautify-file INDENTFILE=Modules/emit.cxx
Of particular note is indentation is set to 2 spaces and a tab is used instead of 8 spaces. The generated C/C++ code should also follow this style as close as possible. However, tabs should be avoided as unlike the SWIG developers, users will never have consistent tab settings.
Target languages are given a status of either 'Supported', 'Experimental' or 'Deprecated' depending on their maturity as broadly outlined in the Target language introduction. This section provides more details on how this status is given.
A target language is given the 'Supported' status when
A target language is given the 'Experimental' status when
Some minimum requirements and notes about languages with the 'Experimental' status:
Unfortunately target languages that once met 'Experimental' or 'Supported' status can become non-functional and simply bit rot due to neglect or due to the language's C/C++ API evolving and changing over time. If there is no language maintainer and the maintenance burden becomes too much for the core SWIG developers to keep the test-suite working with easily available versions of the target language, the language is put into the 'Deprecated' status.
Changing the status to 'Deprecated' is an unfortunate step not done lightly and only if recent versions cannot successfully run the test-suite and examples on the Github Actions Continuous Integration platform. As the target language would once have had a working set of examples and test-suite, this usually only occurs when the more modern operating system distributions available on Github can no longer run any distribution supplied version of the target language.
Changing status to 'Deprecated' flags it for removal from SWIG in a subsequent release (usually one SWIG release). This step becomes the final plea for help from the community who use the target language. The language will need updating by an interested community member to meet the requirements of at least 'Experimental' status in order to prevent removal. If you are a user of a 'Deprecated' target language and would like to keep it available in future releases, please contact the SWIG developers for details of how you can help.
New target language modules can be included in SWIG and contributions are encouraged for popular languages. In order to be considered for inclusion, a language must at a minimum fit the 'Experimental' status described above.
Below are some practical steps that should help meet these requirements.
Once accepted into the official Git repository, development efforts should concentrate on getting the entire test-suite to work in order to migrate the language module to the 'Supported' status. Runtime tests should be added for existing testcases and new test cases can be added should there be an area not already covered by the existing tests.
There are various command line options which can aid debugging a SWIG interface as well as debugging the development of a language module. These are as follows:
-debug-classes - Display information about the classes found in the interface -debug-module <n> - Display module parse tree at stages 1-4, <n> is a csv list of stages -debug-symtabs - Display symbol tables information -debug-symbols - Display target language symbols in the symbol tables -debug-csymbols - Display C symbols in the symbol tables -debug-lsymbols - Display target language layer symbols -debug-quiet - Display less parse tree node debug info when using other -debug options -debug-tags - Display information about the tags found in the interface -debug-template - Display information for debugging templates -debug-top <n> - Display entire parse tree at stages 1-4, <n> is a csv list of stages -debug-typedef - Display information about the types and typedefs in the interface -debug-typemap - Display information for debugging typemaps -debug-tmsearch - Display typemap search debugging information -debug-tmused - Display typemaps used debugging information
The complete list of command line options for SWIG are available by running swig -help.
This section describes the different parse tree nodes and their attributes.
cdecl
Describes general C declarations including variables, functions, and typedefs. A declaration is parsed as "storage T D" where storage is a storage class, T is a base type, and D is a declarator.
"name" - Declarator name "type" - Base type T "decl" - Declarator type (abstract) "storage" - Storage class (static, extern, typedef, etc.) "parms" - Function parameters (if a function) "code" - Function body code (if supplied) "value" - Default value (if supplied)
constructor
C++ constructor declaration.
"name" - Name of constructor "parms" - Parameters "decl" - Declarator (function with parameters) "code" - Function body code (if any) "feature:new" - Set to indicate return of new object.
destructor
C++ destructor declaration.
"name" - Name of destructor "code" - Function body code (if any) "storage" - Storage class (set if virtual) "value" - Default value (set if pure virtual).
access
C++ access change.
"kind" - public, protected, private
constant
Constant created by %constant or #define.
"name" - Name of constant. "type" - Base type. "value" - Value. "storage" - Set to %constant "feature:immutable" - Set to indicate read-only
class
C++ class definition or C structure definition.
"name"          - Name of the class.
"kind"          - Class kind ("struct", "union", "class")
"symtab"        - Enclosing symbol table.
"tdname"        - Typedef name. Use for typedef struct { ... } A.
"abstract"      - Set if class has pure virtual methods.
"baselist"      - List of base class names.
"storage"       - Storage class (if any)
"unnamed"       - Set if class is unnamed.
enum
Enumeration.
"name"          - Name of the enum (if supplied).
"storage"       - Storage class (if any)
"tdname"        - Typedef name (typedef enum { ... } name).
"unnamed"       - Set if enum is unnamed.
enumitem
Enumeration value.
"name" - Name of the enum value. "type" - Type (integer or char) "value" - Enum value (if given) "feature:immutable" - Set to indicate read-only
namespace
C++ namespace.
"name" - Name of the namespace. "symtab" - Symbol table for enclosed scope. "unnamed" - Set if unnamed namespace "alias" - Alias name. Set for namespace A = B;
using
C++ using directive.
"name" - Name of the object being referred to. "uname" - Qualified name actually given to using. "node" - Node being referenced. "namespace" - Namespace name being reference (using namespace name)
classforward
A forward C++ class declaration.
"name"          - Name of the class.
"kind"          - Class kind ("union", "struct", "class")
insert
Code insertion directive. For example, %{ ... %} or %insert(section).
"code"          - Inserted code
"section"       - Section name ("header", "wrapper", etc.)
top
Top of the parse tree.
"module" - Module name
extend
%extend directive.
"name" - Module name "symtab" - Symbol table of enclosed scope.
apply
%apply pattern { patternlist }.
"pattern" - Source pattern. "symtab" - Symbol table of enclosed scope.
clear
%clear patternlist;
"firstChild" - Patterns to clear
include
%include directive.
"name" - Filename "firstChild" - Children
import
%import directive.
"name" - Filename "firstChild" - Children
module
%module directive.
"name" - Name of the module
typemap
%typemap directive.
"method" - Typemap method name. "code" - Typemap code. "kwargs" - Keyword arguments (if any) "firstChild" - Typemap patterns
typemapcopy
%typemap directive with copy.
"method" - Typemap method name. "pattern" - Typemap source pattern. "firstChild" - Typemap patterns
typemapitem
%typemap pattern. Used with %apply, %clear, %typemap.
"pattern" - Typemap pattern (a parameter list) "parms" - Typemap parameters.
types
%types directive.
"parms" - List of parameter types. "convcode" - Code which replaces the default casting / conversion code
extern
extern "X" { ... } declaration.
"name" - Name "C", "Fortran", etc.
There is further documentation available on the internals of SWIG, API documentation and debugging information. This is shipped with SWIG in the Doc/Devel directory.