In this chapter we will start implementing programming structures beyond the basic universal mathmatical expressions. Some decisions need to made first on what the new language looks like. We will look back to Crenshaw and emulate his simple procedural style language based on Pascal. The specifics of the language being implemented do not really effect the way the compiler is constructed at this early stage.
The single statement assignments we have been creating until now are not particularly useful when we start getting into branching and looping. We need a construct that can hold multiple statments which we can call
. We’ll define a 1
Program
as a collection, or 1
Program
of 1
Block
s ending with the keyword 1
Statment
.1
end
First we start with some data types. For now we will make all our
be of the 1
Statments
type. This will allow us to create a simple ‘program’ like 1
Assign
.1
a=1 end
data Program = Program Block deriving (Show)
type Block = [Statement]
data Statement = Statement Assign deriving (Show)
We’ll create a parser for each of the components from the bottom up to see how they feed into each other. We also need to update the existing
parser to return a 1
assign
. This will break the main parsing function but we’ll update that to parse 1
Parser Assign
later.1
Program
statement :: Parser Statement
statement = assign >>> Statement
block :: Parser Block
block = iterS statement
program :: Parser Program
program = block <+-> accept "end" >>> Program
assign :: Parser Assign
assign = token(letters) <+-> token(literal '=') <+> expression >>> (\(x, y) -> Assign x y)
Currently
is a wrapper for 1
statement
but we will add branching and looping later. 1
assign
uses the non-space-checking version of our iteration combinator 1
block
which will create a list of 1
iterS
. Finally 1
Statements
calls 1
program
and then uses a new combinator 1
block
to detect the end keyword.1
accept
is defined using existing combinators to 1
accept
ize a list of 1
token
that match the input.1
letters
accept :: String -> Parser String
accept w = token (letters <=> (==w))
Now we can check the output of
. We can now detect valid and invalid programs and multiline programs. An added bonus is that we can use keywords in our variables.1
program
*Main> program "a=1 end"
Just (Program [Statement (Assign "a" (Num 1))],"")
*Main> program "end=1 end"
Just (Program [Statement (Assign "end" (Num 1))],"")
*Main> program "1 end"
Nothing
*Main>program "a=1 x=a+1 end"
Program [Statement (Assign "a" (Num 1)),Statement (Assign "x" (Add (Var "a") (Num 1)))]
After updating the main module to use the new
we now start to have a more useful compiler.1
program
main :: IO ()
main = getArgs >>= p . head
parse :: String -> Program
parse s = case program s of
Nothing -> error "Invalid program"
Just (a, b) -> a
-- | Parse and print. Utility and test function for use in @ghci@.
p = putStrLn . show . parse
-- | Parse and emit.
-- The emitter does not know how to handle a Program yet
-- e = putStrLn . emit . parse
The first control structure we will look at is the basic if statement. This takes a condition, and if the condition is true, it will execute the body of the if statment.
The hardest part of this section is deciding what our syntax will be. To keep things simple, we will stick to Crenshaws recommended syntax of
1
if <condition> <block> end
By having an explicit block terminator
we avoid the ambiguity of deciding ‘is this still part of the if body?’1
end
We create a new
type constructor which I called 1
Statement
. A 1
Branch
will hold the 1
Branch
and the body of the if statement, which is really just another 1
Condition
. For now a condition will just be a string.1
Block
data Statement = Statement Assign
| Branch Condition Block
deriving (Show)
type Condition = String
The parser is quite simple to write using existing parsers and combinators. I’ve created a
that accepts anything other than keywords and used that as a fill-in for 1
tempPlaceholder
until we get to relational algebra. We also update 1
condition
to use the new 1
statment
as well.1
ifthen
statement :: Parser Statement
statement = assign >>> Statement
<|> ifthen
ifthen :: Parser Statement
ifthen = accept "if" <-+> condition <+> block <+-> accept "end" >>> buildBranch
where buildBranch (c, b) = Branch c b
condition = tempPlaceholder
-- |This is a temporary parser that accepts anything except keywords
tempPlaceholder :: Parser String
tempPlaceholder = token letters <=> (\x -> not $ any (==x) keywords)
where keywords = ["if", "else", "end", "while", "until"]
Because we use
as part of the function, we can parse multi-statement bodies, and even nested if statments such as:1
block
*Main> ifthen "if cond a=1 if cond b=2 end end "
Just (Branch "cond" [Statement (Assign "a" (Num 1)),Branch "cond" [Statement (Assign "b" (Num 2))]],"")
*Main> let pro = "x=0 \nif a \n\tx=1 \n\tif b \n\t\tb=3 \n\tend \nend \nb=4 \nend"
*Main> putStrLn pro
x=0
if a
x=1
if b
b=3
end
end
b=4
end
*Main> program pro
Just (Program [Statement (Assign "x" (Num 0)),
Branch "a" [Statement (Assign "x" (Num 1)),
Branch "b" [Statement (Assign "b" (Num 3))]
],
Statement (Assign "b" (Num 4))
],"")
Got to love the power of recursion.
Now that we have the basic concepts of creating branches, extending it to the if-else statement not so hard. Extending the definition of if to include the else statement looks like:
1
if <condition> <block> [ else <block>] end
You could treat
as an optional part of 1
else
but I chose to seperate them into seperate type constructors and functions.1
ifthen
data Statement =
Statement Assign
| Branch Condition Block
| Branch2 Condition Block Block
deriving (Show)
statement :: Parser Statement
statement = assign >>> Statement
<|> ifelse
<|> ifthen
ifelse :: Parser Statement
ifelse = accept "if" <-+> condition <+> block <+-> accept "else" <+> block <+-> accept "end" >>> buildBranch
where buildBranch ((c, b1), b2) = Branch2 c b1 b2
As you can see, it is much the same as the basic if statement. The only difference is that we add a second
to the type constructor to allow for the else body. Again, recusrion allows for nested if statements in nested if-else statements all the way down to turtles.1
Block
There are a few kinds of loops in your average imperitive langauage. There are pre and post-condition loops, loops which you can break out of the middle and then there a loops that iterate over a collection.
Parseing a while loop is actually extremely similar to the
we have already done. The real difference comes down to how the code generator treats them. Our definition of while looks like:1
ifThen
1
while <condition> <block> end
This is so simialar to the
case that you should be able to see where this is headed.1
ifThen
data Statement =
Statement Assign
| Branch Condition Block
| Branch2 Condition Block Block
| While Condition Block
deriving (Show)
statement :: Parser Statement
statement = assign >>> Statement
<|> ifelse
<|> ifthen
<|> while
while :: Parser Statement
while = accept "while" <-+> condition <+> block <+-> accept "end" >>> buildWhile
where buildWhile (c, b) = While c b
We could go on adding other looping constructs such as
or 1
do...until condition
etc but I think this one is sufficient to see that they are all conceptually similar and will result be very similar code. Most of the interesting part of the different types of looping constructs are in the generated code which we will visit at a later stage.1
loop...if cond break...endloop