More than addition

Introducing multiplication and division into out compiler means that we have to think about operator precedence if we want our language implementation to adhere to the general conventions of math. We all know from basic math that

1
1+2*3
should be interpreted as
1
1+(2*3)
.

As it turns out, there is a rather simple trick to this - define

1
term
in terms of multiplication.

1
<term> := <factor>[<mulop><factor>]*

Now we call a single digit a

1
factor
, and a
1
term
is any result of a multiplication operation involving these factors. What this effectively means is that all multiplication has happened before we get to addition, and thus operator precedence is preserved.

Before we start implementing this we will need some new data constructors to handle the

1
Mul
and
1
Div
. We can extend the existing
1
Expression
quite simply. We’ll also need a
1
mulOp
similar to
1
addOp
.

data Expression = 
  Num Char 
  | Add Expression Expression
  | Sub Expression Expression
  | Mul Expression Expression
  | Div Expression Expression
  deriving (Show) 

mulOp :: Parser (Expression -> Expression -> Expression)
mulOp = literal '*' >>> (\_ -> Mul)
    <|> literal '/' >>> (\_ -> Div)

Factor is our new jargon for single digits so we can simply rename

1
term
to
1
factor
. We then need to define a new
1
term
that looks very similar to
1
expression
.

term :: Parser Expression
term = factor +> term'
term' e = mulOp <+> term >>> buildOp e +> term'
      <|> result e

factor :: Parser Expression
factor = digit >>> Num

And that’s really all we need to do. We now have the four basic math operators obeying the laws of operator precedence. A little play in

1
GHCi
shows that the stack is built correctly - multiplication before addition moving left to right.

1
2
3
4
*Main> expression "1+7*3"
Just (Add (Num '1') (Mul (Num '7') (Num '3')),"")
*Main> expression "1+7*3-4"
Just (Sub (Add (Num '1') (Mul (Num '7') (Num '3'))) (Num '4'),"")

More operator precedence

The precedence between

1
mulOp
and
1
addOp
fell out due to the way we defined our expressions. Left associativety, i.e. strings such as “8/4/2” should be parsed as “(8/4)/2” and not “8/(4/2)”, was due to the way we defined
1
buildOp
. We could confuse everyone and create a right associative expression parser simply by changing the order of the expressions in
1
buildOp
. We won’t because we are nice, but it’s nice to know we can.

But what about when you really do mean “8/(4/2)”? In regular math we need to use parentheses, and we shall follow conventions here too. With a bit of knowledge and thinking, you might realise that a parenthesised expression can be considered to be a factor, giving us a new definition for factor:

1
<factor> ::= <number>|(<expression>)

This is surprisingly easy now.

factor :: Parser Expression
factor = digit >>> Num 
     <|> literal '(' <-+> expression <+-> literal ')'

And checking it we can see that parenthesised expressions take precedence.

*Main> expression "1+7*3-4"
Just (Sub (Add (Num '1') (Mul (Num '7') (Num '3'))) (Num '4'),"")
*Main> expression "1+7*(3-4)"
Just (Add (Num '1') (Mul (Num '7') (Sub (Num '3') (Num '4'))),"")

More than single digits

We’ve been working on the assumption that everything is a single character because it kept things easy. However, with our new techniques, it is not very hard at all to extend this to multi-characters.

Looking at variables first, if we extend them to be any number of letters we would write it as:

letters :: Parser String
letters :: iter letter

1
iter
is a function that iterates over the input string until ‘letter’ fails, joining the results as we go.

iter :: Parser a -> Parser [a]
iter m = m <+> iter m >>> (\(x, y) -> x:y) 
	 <|> result []

Pretty simple - no new parsers or combinators in there. If we change the

1
Assign
data constructor to use
1
String
, and alter
1
assign
to use
1
letters
instead of
1
letter
, we can now use arbitrary length variables in our assignment statements.

data Assign = Assign String Expression 
	deriving Show

assign :: Parser (String, Expression)
assign = letters <+-> literal '=' <+> expression
*Main> parse "asdf=1"
Assign "asdf" (Num '1')

We can something very simialr for multi-character numbers. A few type changes later we can simply keep collecting numbers until

1
digit
fails. We need to change the order of the alternatives in
1
factor
to avoid the greedy nature of
1
iter
.

data Expression = 
  Num String 
  | Add Expression Expression
  | Sub Expression Expression
  | Mul Expression Expression
  | Div Expression Expression
  deriving (Show) 

factor :: Parser Expression
factor = literal '(' <-+> expression <+-> literal ')'
	 <|> digits >>> Num

digits :: Parser String
digits = iter digit

And now we are starting to get somewhere.

*Main> parse "asdf=(123+53)*5"
Assign "asdf" (Mul (Add (Num "123") (Num "53")) (Num "5"))

Adding whitespace

Now that we have

1
iter
we can write a function that chews up all the whitespace between the stuff we are interested in. These things are generally called tokens so we will call our function
1
token
.

token :: Parser a -> Parser a
token = (<+-> iter space)

This uses

1
iter
in conjunction with
1
<+->
which, as you will recall, drops the right hand side from the result. We can use token like so:

assign :: Parser (String, Expression)
assign = token(letters) <+-> token(literal '=') <+> expression

expression :: Parser Expression
expression = token(term) +> expression'
expression' e = addOp <+> term >>> buildOp e +> expression'
            <|> result e

factor :: Parser Expression
factor = token number 
	 <|> token (literal '(') <-+> token expr <+-> token (literal ')')
	 <|> token var

If you also add

1
token
to
1
mulOp
and
1
addOp
you can now do even more amazing things like:

*Main> parse "asdf = (123 + 53) * 5"
Assign "asdf" (Mul (Add (Num "123") (Num "53")) (Num "5"))
*Main> parse "asdf = ( 123 + 53 ) * 5"
Assign "asdf" (Mul (Add (Num "123") (Num "53")) (Num "5"))

More than numbers in expressions

We have all the infrastucture to add variables into

1
Expression
so that we can write things like
1
a = x * y
. First we will need to extend
1
Expression
to include a
1
Var
data constructor which takes a string similar to
1
Num
. We will then need to figure out where
1
Var
s can be created - and seeing as they are stand ins for
1
Num
we can add a pattern to to
1
factor
to accomodate them.

If you simply add the new pattern to

1
factor
you will run into some issues due to the way
1
iter
works. For the alternatives to be processed we need the parsers to return
1
Nothing
on failure but
1
iter
will never fail - rather it will return an empty list. This was alluded to earlier when we needed to put the parenthesis checking before the number checking.

To make

1
iter
conform to our
1
Parser
type we will rewrite it to check if the result is an empty list then it should be return
1
Nothing
. To do this we can pass the result of
1
iter
through our comparison combinator like so.

iter :: Parser Char -> Parser String
iter m = (iterS m) <=> (/="")
iterS m = m <+> iterS m >>> (\(x, y) -> x:y)
	 <|> result []

The one exception to using the new

1
iter
everywhere is
1
token
as is is optional space so it should not return a
1
Nothing
.
1
token
will need to use the sub
1
iterS
.

token :: Parser a -> Parser a
token = (<+-> iterS space)

Now we can write add the new pattern to

1
factor
safely.

data Expression = 
  Num Integer 
  | Var String
  | Add Expression Expression
  | Sub Expression Expression
  | Mul Expression Expression
  | Div Expression Expression
  deriving (Show) 

factor :: Parser Expression
factor = token (literal '(') <-+> token(expression) <+-> token(literal ')')
	 <|> digits >>> Num
	 <|> letters >>> Var

And let’s check that it looks good.

*Main Data.Char> parse "a = 1 * y"
Assign "a" (Mul (Num "1") (Var "y"))
*Main Data.Char> parse "alpha = 13 * ( yaxis + 234 )"
Assign "alpha" (Mul (Num "13") (Add (Var "yaxis") (Num "234")))

More than strings

Sometimes we may want a parser to return something other than a string value. For example we would prefer that

1
Num
use a real number rather than a string representation. We have the transform combinator at our disposal so this is not really difficult to do.

Because we now have a safe iterator being used in

1
digits
we can use
1
read
to convert the string to an integer. I tried doing this before changing
1
iter
and the empty string was cases
1
read
to throw and unrecoverable exception. We could use
1
reads
which can be recovered from if it fails to read but it over complicates things a fair bit. Now that
1
digits
is gaurenteed to return a string with at least one digit or
1
Nothing
we can safely use
1
read
and keep things simple.

data Expression = 
  Num Integer 
  | Var String
  | Add Expression Expression
  | Sub Expression Expression
  | Mul Expression Expression
  | Div Expression Expression
  deriving (Show) 

factor :: Parser Expression
factor = token (literal '(') <-+> token(expression) <+-> token(literal ')')
	 <|> number >>> Num
	 <|> letters >>> Var

number :: Parser Integer
number = literal '-' <-+> digits >>> (\n -> -1 * (read n :: Integer) )
     <|> digits >>> (\n -> read n :: Integer)

We’ve changed the

1
Num
data constructor to use and integer and created a new
1
number
parser that returns a
1
Parser
wrapped
1
Integer
using
1
read
. This is then used in
1
factor
instead of the basic
1
digits
. I also slipped the ability to prefix a number with a sign allowing expressions such as
1
-1 * 4
.

More than…?

We’ve come a long way in this chapter. From single digit addition with no spacing to being able to use multicharachter terms and factors, adding in multiplication, parenthesis and variables and using whitespace to make the equations look nicer. And it was all rather simple due to the combination of small simple functions.