Introducing multiplication and division into out compiler means that we have to think about operator precedence if we want our language implementation to adhere to the general conventions of math. We all know from basic math that
should be interpreted as 1
1+2*3
.1
1+(2*3)
As it turns out, there is a rather simple trick to this - define
in terms of multiplication.1
term
1
<term> := <factor>[<mulop><factor>]*
Now we call a single digit a
, and a 1
factor
is any result of a multiplication operation involving these factors. What this effectively means is that all multiplication has happened before we get to addition, and thus operator precedence is preserved.1
term
Before we start implementing this we will need some new data constructors to handle the
and 1
Mul
. We can extend the existing 1
Div
quite simply. We’ll also need a 1
Expression
similar to 1
mulOp
.1
addOp
data Expression =
Num Char
| Add Expression Expression
| Sub Expression Expression
| Mul Expression Expression
| Div Expression Expression
deriving (Show)
mulOp :: Parser (Expression -> Expression -> Expression)
mulOp = literal '*' >>> (\_ -> Mul)
<|> literal '/' >>> (\_ -> Div)
Factor is our new jargon for single digits so we can simply rename
to 1
term
. We then need to define a new 1
factor
that looks very similar to 1
term
.1
expression
term :: Parser Expression
term = factor +> term'
term' e = mulOp <+> term >>> buildOp e +> term'
<|> result e
factor :: Parser Expression
factor = digit >>> Num
And that’s really all we need to do. We now have the four basic math operators obeying the laws of operator precedence. A little play in
shows that the stack is built correctly - multiplication before addition moving left to right.1
GHCi
1
2
3
4
*Main> expression "1+7*3"
Just (Add (Num '1') (Mul (Num '7') (Num '3')),"")
*Main> expression "1+7*3-4"
Just (Sub (Add (Num '1') (Mul (Num '7') (Num '3'))) (Num '4'),"")
The precedence between
and 1
mulOp
fell out due to the way we defined our expressions. Left associativety, i.e. strings such as “8/4/2” should be parsed as “(8/4)/2” and not “8/(4/2)”, was due to the way we defined 1
addOp
. We could confuse everyone and create a right associative expression parser simply by changing the order of the expressions in 1
buildOp
. We won’t because we are nice, but it’s nice to know we can.1
buildOp
But what about when you really do mean “8/(4/2)”? In regular math we need to use parentheses, and we shall follow conventions here too. With a bit of knowledge and thinking, you might realise that a parenthesised expression can be considered to be a factor, giving us a new definition for factor:
1
<factor> ::= <number>|(<expression>)
This is surprisingly easy now.
factor :: Parser Expression
factor = digit >>> Num
<|> literal '(' <-+> expression <+-> literal ')'
And checking it we can see that parenthesised expressions take precedence.
*Main> expression "1+7*3-4"
Just (Sub (Add (Num '1') (Mul (Num '7') (Num '3'))) (Num '4'),"")
*Main> expression "1+7*(3-4)"
Just (Add (Num '1') (Mul (Num '7') (Sub (Num '3') (Num '4'))),"")
We’ve been working on the assumption that everything is a single character because it kept things easy. However, with our new techniques, it is not very hard at all to extend this to multi-characters.
Looking at variables first, if we extend them to be any number of letters we would write it as:
letters :: Parser String
letters :: iter letter
is a function that iterates over the input string until ‘letter’ fails, joining the results as we go.1
iter
iter :: Parser a -> Parser [a]
iter m = m <+> iter m >>> (\(x, y) -> x:y)
<|> result []
Pretty simple - no new parsers or combinators in there. If we change the
data constructor to use 1
Assign
, and alter 1
String
to use 1
assign
instead of 1
letters
, we can now use arbitrary length variables in our assignment statements.1
letter
data Assign = Assign String Expression
deriving Show
assign :: Parser (String, Expression)
assign = letters <+-> literal '=' <+> expression
*Main> parse "asdf=1"
Assign "asdf" (Num '1')
We can something very simialr for multi-character numbers. A few type changes later we can simply keep collecting numbers until
fails. We need to change the order of the alternatives in 1
digit
to avoid the greedy nature of 1
factor
.1
iter
data Expression =
Num String
| Add Expression Expression
| Sub Expression Expression
| Mul Expression Expression
| Div Expression Expression
deriving (Show)
factor :: Parser Expression
factor = literal '(' <-+> expression <+-> literal ')'
<|> digits >>> Num
digits :: Parser String
digits = iter digit
And now we are starting to get somewhere.
*Main> parse "asdf=(123+53)*5"
Assign "asdf" (Mul (Add (Num "123") (Num "53")) (Num "5"))
Now that we have
we can write a function that chews up all the whitespace between the stuff we are interested in. These things are generally called tokens so we will call our function 1
iter
.1
token
token :: Parser a -> Parser a
token = (<+-> iter space)
This uses
in conjunction with 1
iter
which, as you will recall, drops the right hand side from the result. We can use token like so:1
<+->
assign :: Parser (String, Expression)
assign = token(letters) <+-> token(literal '=') <+> expression
expression :: Parser Expression
expression = token(term) +> expression'
expression' e = addOp <+> term >>> buildOp e +> expression'
<|> result e
factor :: Parser Expression
factor = token number
<|> token (literal '(') <-+> token expr <+-> token (literal ')')
<|> token var
If you also add
to 1
token
and 1
mulOp
you can now do even more amazing things like:1
addOp
*Main> parse "asdf = (123 + 53) * 5"
Assign "asdf" (Mul (Add (Num "123") (Num "53")) (Num "5"))
*Main> parse "asdf = ( 123 + 53 ) * 5"
Assign "asdf" (Mul (Add (Num "123") (Num "53")) (Num "5"))
We have all the infrastucture to add variables into
so that we can write things like 1
Expression
. First we will need to extend 1
a = x * y
to include a 1
Expression
data constructor which takes a string similar to 1
Var
. We will then need to figure out where 1
Num
s can be created - and seeing as they are stand ins for 1
Var
we can add a pattern to to 1
Num
to accomodate them.1
factor
If you simply add the new pattern to
you will run into some issues due to the way 1
factor
works. For the alternatives to be processed we need the parsers to return 1
iter
on failure but 1
Nothing
will never fail - rather it will return an empty list. This was alluded to earlier when we needed to put the parenthesis checking before the number checking.1
iter
To make
conform to our 1
iter
type we will rewrite it to check if the result is an empty list then it should be return 1
Parser
. To do this we can pass the result of 1
Nothing
through our comparison combinator like so.1
iter
iter :: Parser Char -> Parser String
iter m = (iterS m) <=> (/="")
iterS m = m <+> iterS m >>> (\(x, y) -> x:y)
<|> result []
The one exception to using the new
everywhere is 1
iter
as is is optional space so it should not return a 1
token
. 1
Nothing
will need to use the sub 1
token
.1
iterS
token :: Parser a -> Parser a
token = (<+-> iterS space)
Now we can write add the new pattern to
safely.1
factor
data Expression =
Num Integer
| Var String
| Add Expression Expression
| Sub Expression Expression
| Mul Expression Expression
| Div Expression Expression
deriving (Show)
factor :: Parser Expression
factor = token (literal '(') <-+> token(expression) <+-> token(literal ')')
<|> digits >>> Num
<|> letters >>> Var
And let’s check that it looks good.
*Main Data.Char> parse "a = 1 * y"
Assign "a" (Mul (Num "1") (Var "y"))
*Main Data.Char> parse "alpha = 13 * ( yaxis + 234 )"
Assign "alpha" (Mul (Num "13") (Add (Var "yaxis") (Num "234")))
Sometimes we may want a parser to return something other than a string value. For example we would prefer that
use a real number rather than a string representation. We have the transform combinator at our disposal so this is not really difficult to do.1
Num
Because we now have a safe iterator being used in
we can use 1
digits
to convert the string to an integer. I tried doing this before changing 1
read
and the empty string was cases 1
iter
to throw and unrecoverable exception. We could use 1
read
which can be recovered from if it fails to read but it over complicates things a fair bit. Now that 1
reads
is gaurenteed to return a string with at least one digit or 1
digits
we can safely use 1
Nothing
and keep things simple.1
read
data Expression =
Num Integer
| Var String
| Add Expression Expression
| Sub Expression Expression
| Mul Expression Expression
| Div Expression Expression
deriving (Show)
factor :: Parser Expression
factor = token (literal '(') <-+> token(expression) <+-> token(literal ')')
<|> number >>> Num
<|> letters >>> Var
number :: Parser Integer
number = literal '-' <-+> digits >>> (\n -> -1 * (read n :: Integer) )
<|> digits >>> (\n -> read n :: Integer)
We’ve changed the
data constructor to use and integer and created a new 1
Num
parser that returns a 1
number
wrapped 1
Parser
using 1
Integer
. This is then used in 1
read
instead of the basic 1
factor
. I also slipped the ability to prefix a number with a sign allowing expressions such as 1
digits
.1
-1 * 4
We’ve come a long way in this chapter. From single digit addition with no spacing to being able to use multicharachter terms and factors, adding in multiplication, parenthesis and variables and using whitespace to make the equations look nicer. And it was all rather simple due to the combination of small simple functions.