How to use C++ with Bison, V 1.0.2

 

Introduction

This is a mini-tutorial that describes how to use C++ with Bison. It was done in reply to a thread in comp.compilers that questioned how %union declarations would interfere with C++ strong type checking.

I assume that you already know C++ and how to use Bison, so I will talk just about the important parts.

Grammar

For the example lets use the following BNF grammar. It's too simple and has ambiguity (it will be solved later on) but it will serve our purposes well enough.
prompt -> exp '\n'
       |  prompt exp '\n'.

exp -> exp '+' exp
    |  exp '*' exp
    |  ident
    |  number
    |  ident '='exp.

ident -> 'A'|..|'Z'.

number -> digit
       | number digit.

digit -> '0'|..|'9'.

Ast Classes

In order to create a feasible example it's necessary to use C++ classes, so lets create a very simple class hierarchy to represent expressions.

Since the example is very simple I have chosen to declare all classes in a single file and the function members are defined in the class interface. The file is ast.h .

class Expression {
public:
  virtual ~Expression () {}

  //  It's necessary because we need to clone objects without
  // knowing the exact type.
  virtual Expression *clone () = 0;

  // The value represented by the expression
  virtual int value () = 0;
};

// For addictive expressions
class Plus : public Expression {
  Expression *m_left, *m_right;

public:
   
  Plus (Expression *left, Expression *right): m_left (left), m_right (right) {}

  // Copy constructor
  Plus (const Plus &other) {
    m_left = other.m_left->clone ();
    m_right = other.m_right->clone ();
  }
  
  virtual ~Plus () 
  {
    delete m_left;
    delete m_right;
  }

  Plus &operator = (const Plus &other) {
    if (&other != this) {
      delete m_left;
      delete m_right;
      
      m_left = other.m_left->clone ();
      m_right = other.m_right->clone ();
    }
  }
  

  virtual Expression *clone () { return new Plus (*this); }
  
  virtual int value () { return m_left->value () + m_right->value (); }
  
};

// For multiplicative expressions
class Times : public Expression {
  Expression *m_left, *m_right;

public:
   
  Times (Expression *left, Expression *right): m_left (left), m_right (right) {}

  // Copy constructor
  Times (const Times &other) {
    m_left = other.m_left->clone ();
    m_right = other.m_right->clone ();
  }

  virtual ~Times () 
  {
    delete m_left;
    delete m_right;
  }

  Times &operator = (const Times &other) {
    if (&other != this) {
      delete m_left;
      delete m_right;
      
      m_left = other.m_left->clone ();
      m_right = other.m_right->clone ();
    }
  }
  

  virtual Expression *clone () { return new Times (*this); }
  
  virtual int value () { return m_left->value () * m_right->value (); }
  
};

// For numbers
class Number : public Expression {
  int m_val;

public:
   
  Number (int val): m_val (val) {}

  // Copy constructor
  Number (const Number &other) { m_val = other.m_val; }

  Number &operator = (const Number &other) {
    if (&other != this)
      m_val = other.m_val;
  }

  virtual Expression *clone () { return new Number (*this); }

  virtual int value () { return m_val; }
};

// For identifiers
class Ident : public Expression {
  int *m_val;

public:
   
  Ident (int *val): m_val (val) {}

  // Copy constructor
  Ident (const Ident &other) { m_val = other.m_val; }

  Ident &operator = (const Ident &other) {
    if (&other != this)
      m_val = other.m_val;
  }

  virtual Expression *clone () { return new Ident (*this); }
  

  virtual int value () { return *m_val; }
};

 

The Bison file

Now we need to create a Bison file that matches the grammar described above. It will create a sort of AST using the AST classes already described.

There is a subtle difference between using Bison with C and using Bison with C++, function prototypes. In the example you see the prototypes for yyerror () and for yylex () , since you have to put them in case of C++, but in C you can forget them (I don't advice you to do that, because it's bad practice).

The problem that appeared in the thread that I made mention above, was about how to cope with the restrictions that C++ makes to unions and at the same time use unions in Bison. The solution I think is more manageable is to declare the fields that are classes as pointers, in that case there isn't any problem, because they are pointers and there aren't any restrictions to pointer data members.

In the first version of this tutorial I said that I wouldn't discuss the problem of deleting the nodes that are in the parse stack when a parse error happens, but after some suggestions done to this tutorial I've decided to add the code to solve the mentionated situation.

I've opted to use a stack that contains all the nodes that were allocated until the moment that the error has occurred. In that moment the stack is cleared and its nodes deleted. When the parser reckonizes a + or a * the two top nodes are removed because they are the children that will be handled by the father node and the father is pushed in the stack.

Now for the file (it's implemented in grammar.y) :

%{
  #include <iostream.h>
  #include <cctype>
  #include <cstring>
  #include <vector>
  #include <stack>

  #include "ast.h"

  // Bring the standard library into the
  // global namespace
  using namespace std;

  // Prototypes to keep the compiler happy
  void yyerror (const char *error);
  int  yylex ();
  void clear_stack ();
  
  // Variables
  int vars ['Z'- 'A' + 1];

  // stack class that takes care of all the nodes that were allocated
  stack <Expression *> nodes;
%}

%token IDENT NUMBER

%union {
  Expression *exp;  /* For the expressions. Since it is a pointer, no problem. */
  int       value;  /* For the lexical analyser. NUMBER tokens */
  char      ident;  /* For the lexical analyser. IDENT tokens */
}

/* Lets inform Bison about the type of each terminal and non-terminal */
%type <exp>   exp
%type <value> NUMBER
%type <ident> IDENT

/* Precedence information to resolve ambiguity */
%left '+'
%left '*'
%%

prompt : exp  '\n'             { 
                                 if ($1) {
                                   cout << $1->value () << endl;
                                   clear_stack ();
                                 }
                               }
       |  prompt  exp  '\n'    {
                                 if ($2) {
                                   cout << $2->value () << endl;
                                   clear_stack ();
                                 }
                               }
       | error '\n'            { clear_stack (); }
       ;

exp : exp '+' exp              {
                                 $$ = new Plus ($1, $3);
                                 nodes.pop ();  //  The childreen are handled by Plus so we
                                 nodes.pop ();  // take them of the allocated nodes stack.
                                 nodes.push ($$);
                               }
    | exp '*' exp              {
                                 $$ = new Times ($1, $3);
                                 nodes.pop ();  // The same as above.
                                 nodes.pop ();
                                 nodes.push ($$);
                               } 
    | IDENT                    { $$ = new Ident (&vars [$1 - 'A']); nodes.push ($$); } 
    | NUMBER                   { $$ = new Number ($1); nodes.push ($$); } 
    | IDENT '=' exp            { vars [$1 - 'A'] = $3->value (); $$ = $3; nodes.push ($$); } 
    ;
%%

// we need to provid the following functions
int main ()
{
  memset (vars, 0, sizeof (vars));
  return yyparse ();
}

void yyerror (const char *error)
{
  cout << error << endl;
}

int yylex ()
{
  char ch;

  do {
   ch = cin.peek ();
   if (isalpha (ch)) {
     cin.get ();

     yylval.ident = ch;
     return IDENT;
   }
   else if (isdigit (ch)) {
     int value = 0;
     while (!cin.eof () && isdigit (ch)) {
       cin.get ();

       value = value * 10 + ch - '0';
       ch = cin.peek ();
     }

     yylval.value = value;
     return NUMBER;
  }
  else if (ch == '+' || ch == '\n' || ch == '*' || ch == '=') {
     cin.get ();
 
     return ch;
  }
  else
    cin.get ();

 } while (!cin.eof ());

 return -1;
}


// Deletes all the nodes that were allocated
void clear_stack ()
{
  while (!nodes.empty ()) {
    delete nodes.top ();
    nodes.pop ();
  }
}

 

To test this example use the script create, and run the example file. Please note that this tutorial was tested using g++ 3.3.6 with the libstdc++.
 

Acknowledges

I would like to thank the following people for their opinions regarding this tutorial :