Basic Control Structures

In this chapter we will start implementing programming structures beyond the basic universal mathmatical expressions. Some decisions need to made first on what the new language looks like. We will look back to Crenshaw and emulate his simple procedural style language based on Pascal. The specifics of the language being implemented do not really effect the way the compiler is constructed at this early stage.

Multiple Statments

The single statement assignments we have been creating until now are not particularly useful when we start getting into branching and looping. We need a construct that can hold multiple statments which we can call

1
Program
. We’ll define a
1
Program
as a collection, or
1
Block
of
1
Statment
s ending with the keyword
1
end
.

First we start with some data types. For now we will make all our

1
Statments
be of the
1
Assign
type. This will allow us to create a simple ‘program’ like
1
a=1 end
.

data Program = Program Block deriving (Show)
type Block = [Statement]
data Statement = Statement Assign deriving (Show)

We’ll create a parser for each of the components from the bottom up to see how they feed into each other. We also need to update the existing

1
assign
parser to return a
1
Parser Assign
. This will break the main parsing function but we’ll update that to parse
1
Program
later.

statement :: Parser Statement
statement = assign >>> Statement

block :: Parser Block
block = iterS statement

program :: Parser Program
program = block <+-> accept "end" >>> Program

assign :: Parser Assign
assign = token(letters) <+-> token(literal '=') <+> expression >>> (\(x, y) -> Assign x y)

Currently

1
statement
is a wrapper for
1
assign
but we will add branching and looping later.
1
block
uses the non-space-checking version of our iteration combinator
1
iterS
which will create a list of
1
Statements
. Finally
1
program
calls
1
block
and then uses a new combinator
1
accept
to detect the end keyword.

1
accept
is defined using existing combinators to
1
token
ize a list of
1
letters
that match the input.

accept :: String -> Parser String
accept w = token (letters <=> (==w))

Now we can check the output of

1
program
. We can now detect valid and invalid programs and multiline programs. An added bonus is that we can use keywords in our variables.

*Main> program "a=1 end"     
Just (Program [Statement (Assign "a" (Num 1))],"")
*Main> program "end=1 end"    
Just (Program [Statement (Assign "end" (Num 1))],"")
*Main> program "1 end"        
Nothing
*Main>program "a=1 x=a+1 end"
Program [Statement (Assign "a" (Num 1)),Statement (Assign "x" (Add (Var "a") (Num 1)))]

After updating the main module to use the new

1
program
we now start to have a more useful compiler.

main :: IO ()
main = getArgs >>= p . head

parse :: String -> Program
parse s = case program s of 
            Nothing -> error "Invalid program"
            Just (a, b) -> a

-- | Parse and print. Utility and test function for use in @ghci@.
p = putStrLn . show . parse

-- | Parse and emit.
-- The emitter does not know how to handle a Program yet
-- e = putStrLn . emit . parse 

If-Then Statements

The first control structure we will look at is the basic if statement. This takes a condition, and if the condition is true, it will execute the body of the if statment.

The hardest part of this section is deciding what our syntax will be. To keep things simple, we will stick to Crenshaws recommended syntax of

1
if <condition> <block> end

By having an explicit block terminator

1
end
we avoid the ambiguity of deciding ‘is this still part of the if body?’

We create a new

1
Statement
type constructor which I called
1
Branch
. A
1
Branch
will hold the
1
Condition
and the body of the if statement, which is really just another
1
Block
. For now a condition will just be a string.

data Statement = Statement Assign 
  | Branch Condition Block
  deriving (Show)

type Condition = String

The parser is quite simple to write using existing parsers and combinators. I’ve created a

1
tempPlaceholder
that accepts anything other than keywords and used that as a fill-in for
1
condition
until we get to relational algebra. We also update
1
statment
to use the new
1
ifthen
as well.

statement :: Parser Statement
statement = assign >>> Statement
  <|> ifthen 

ifthen :: Parser Statement		
ifthen = accept "if" <-+> condition <+> block <+-> accept "end" >>> buildBranch
    where buildBranch (c, b) = Branch c b

condition = tempPlaceholder

-- |This is a temporary parser that accepts anything except keywords
tempPlaceholder :: Parser String
tempPlaceholder = token letters <=> (\x -> not $ any (==x) keywords) 
  where keywords = ["if", "else", "end", "while", "until"]

Because we use

1
block
as part of the function, we can parse multi-statement bodies, and even nested if statments such as:

*Main> ifthen "if cond a=1 if cond b=2 end end "
Just (Branch "cond" [Statement (Assign "a" (Num 1)),Branch "cond" [Statement (Assign "b" (Num 2))]],"")
*Main> let pro = "x=0 \nif a \n\tx=1 \n\tif b \n\t\tb=3 \n\tend \nend \nb=4 \nend"
*Main> putStrLn pro
x=0
if a
	x=1
	if b
		b=3
	end
end
b=4
end
*Main> program pro
Just (Program [Statement (Assign "x" (Num 0)),
               Branch "a" [Statement (Assign "x" (Num 1)),
                           Branch "b" [Statement (Assign "b" (Num 3))]
                          ],
               Statement (Assign "b" (Num 4))
              ],"")

Got to love the power of recursion.

If-Else

Now that we have the basic concepts of creating branches, extending it to the if-else statement not so hard. Extending the definition of if to include the else statement looks like:

1
if <condition> <block> [ else <block>] end

You could treat

1
else
as an optional part of
1
ifthen
but I chose to seperate them into seperate type constructors and functions.

data Statement = 
	Statement Assign 
  | Branch Condition Block
  | Branch2 Condition Block Block
  deriving (Show)
              
statement :: Parser Statement
statement = assign >>> Statement
  <|> ifelse
  <|> ifthen 
              
ifelse :: Parser Statement		
ifelse = accept "if" <-+> condition <+> block <+-> accept "else" <+> block <+-> accept "end" >>> buildBranch
    where buildBranch ((c, b1), b2) = Branch2 c b1 b2

As you can see, it is much the same as the basic if statement. The only difference is that we add a second

1
Block
to the type constructor to allow for the else body. Again, recusrion allows for nested if statements in nested if-else statements all the way down to turtles.

Basic loop with while

There are a few kinds of loops in your average imperitive langauage. There are pre and post-condition loops, loops which you can break out of the middle and then there a loops that iterate over a collection.

Parseing a while loop is actually extremely similar to the

1
ifThen
we have already done. The real difference comes down to how the code generator treats them. Our definition of while looks like:

1
while <condition> <block> end

This is so simialar to the

1
ifThen
case that you should be able to see where this is headed.

data Statement = 
	Statement Assign 
  | Branch Condition Block
  | Branch2 Condition Block Block
  | While Condition Block
  deriving (Show)

statement :: Parser Statement
statement = assign >>> Statement
  <|> ifelse
  <|> ifthen 
  <|> while

while :: Parser Statement		
while = accept "while" <-+> condition <+> block <+-> accept "end" >>> buildWhile
    where buildWhile (c, b) = While c b

We could go on adding other looping constructs such as

1
do...until condition
or
1
loop...if cond break...endloop
etc but I think this one is sufficient to see that they are all conceptually similar and will result be very similar code. Most of the interesting part of the different types of looping constructs are in the generated code which we will visit at a later stage.