Laziness

This lecture is based on Brent Yorgey’s.

One of the core (and controversial) features of Haskell is lazy evaluation. Today we will explain what this means, how to take advantage of laziness and what problems you may encounter.

> module Laziness where

Evaluation

Evaluation of an expression (e.g., 2^3) is the process that runs the expression to get a value (e.g., 8). Haskell’s evaluation strategy is lazy (intuitively only runs expressions when needed) as opposed to strict evaluation used by C, Java, …

Strict evaluation

Before we talk about lazy evaluation it will be useful to look at some examples of its opposite, strict evaluation.

Under a strict evaluation strategy, function arguments are completely evaluated before passing them to the function. For example, suppose we have defined

> f x y = x + 2

In a strict language, evaluating f 5 (29^35792) will first completely evaluate 5 (already done) and 29^35792 (which is a lot of work) before passing the results to f.

Of course, in this particular example, this is silly, since f ignores its second argument, so all the work to compute 29^35792 was wasted. So why would we want this?

The benefit of strict evaluation is that it is easy to predict when and in what order things will happen. Usually languages with strict evaluation will even specify the order in which function arguments should be evaluated (e.g. from left to right).

For example, in Java if we write

f (release_monkeys(), increment_counter())

we know that the monkeys will be released, and then the counter will be incremented, and then the results of doing those things will be passed to f, and it does not matter whether f actually ends up using those results.

If the releasing of monkeys and incrementing of the counter could independently happen, or not, in either order, depending on whether f happens to use their results, it would be extremely confusing. When such “side effects” are allowed, strict evaluation is really what you want.

Side effects and purity

So, what’s really at issue here is the presence or absence of side effects. By “side effect” we mean anything that causes evaluation of an expression to interact with something outside itself. The root issue is that such outside interactions are time-sensitive. For example:

Modifying a global variable – it matters when this happens since it may affect the evaluation of other expressions
Printing to the screen – it matters when this happens since it may need to be in a certain order with respect to other writes to the screen
Reading from a file or the network – it matters when this happens since the contents of the file can affect the outcome of the expression

Question: What is another side effect?

Lazy evaluation makes it hard to reason about when things will be evaluated; hence including side effects in a lazy language would be extremely unintuitive. Historically, this is the reason Haskell is pure: initially, the designers of Haskell wanted to make a lazy functional language, and quickly realized it would be impossible unless it also disallowed side effects.

But… a language with no side effects would not be very useful. The only thing you could do with such a language would be to load up your programs in an interpreter and evaluate expressions. You would not be able to get any input from the user, or print anything to the screen, or read from a file. The challenge facing the Haskell designers was to come up with a way to allow such effects in a principled, restricted way that did not interfere with the essential purity of the language. They finally did come up with something (namely, the IO monad).

Lazy evaluation

So now that we understand strict evaluation, let’s see what lazy evaluation actually looks like. Under a lazy evaluation strategy, evaluation of function arguments is delayed as long as possible: they are not evaluated until it actually becomes necessary to do so. When some expression is given as an argument to a function, it is simply packaged up as an unevaluated expression (called a “thunk”, don’t ask me why) without doing any actual work.

For example, when evaluating f 5 (29^35792), the second argument will simply be packaged up into a thunk without doing any actual computation, and f will be called immediately. Since f never uses its second argument the thunk will just be thrown away by the garbage collector.

Pattern matching drives evaluation

So, when is it “necessary” to evaluate an expression? The examples above concentrated on whether a function used its arguments, but this is actually not the most important distinction. Consider the following examples:

> -- data Maybe a = Just a | Nothing 
> 
> f1 :: Maybe Int -> [Maybe Int]
> f1 m = [Nothing,m]
> 
> f2 :: Maybe Int -> [Int]
> f2 Nothing  = []
> f2 (Just x) = [5, x]

f1 and f2 both use their argument. But there is still a big difference between them.

Questions: What is the result of

- f1 (error "die")
- head (f1 (error "die"))
- f2 (error "die")
- f2 (Just (error "die"))
- head (f2 (Just (error "die")))
- f2 (safeHead [3^500, 49])

where

> safeHead :: [a] -> Maybe a 
> safeHead (x:_) = Just x 
> safeHead _     = Nothing

Although f1 uses its argument m, it does not need to know anything about it. m can remain completely unevaluated, and the unevaluated expression is simply put in a list. Put another way, the result of f1 e does not depend on the shape of e.

f2, on the other hand, needs to know something about its argument in order to proceed: was it constructed with Nothing or Just? That is, in order to evaluate f2 e, we must first evaluate e, because the result of f2 depends on the shape of e.

The other important thing to note is that thunks are evaluated only enough to allow a pattern match to proceed, and no further! For example, suppose we wanted to evaluate f2 (safeHead [3^500, 49]). f2 would force evaluation of the call to safeHead [3^500, 49], which would evaluate to Just (3^500) – note that the 3^500 is not evaluated, since safeHead does not need to look at it, and neither does f2. Whether the 3^500 gets evaluated later depends on how the result of f2 is used.

The slogan to remember is “pattern matching drives evaluation”. To reiterate the important points:

Expressions are only evaluated when pattern-matched
…only as far as necessary for the match to proceed, and no farther!

Let’s do a slightly more interesting example: we’ll evaluate take 3 (repeat 7). For reference, here are the definitions of repeat and take:

repeat :: a -> [a]
repeat x = x : repeat x

take :: Int -> [a] -> [a]
take n _      | n <= 0 =  []
take _ []              =  []
take n (x:xs)          =  x : take (n-1) xs

Carrying out the evaluation step-by-step looks something like this:

  take 3 (repeat 7)
    { 3 <= 0 is False, so we proceed to the second clause, 
      which needs to match on the second argument. 
      So we must expand repeat 7 one step. }
= take 3 (7 : repeat 7)
    { the second clause doesn't match but the third
      does. Note that (3-1) doesn't get evaluated yet! }
= 7 : take (3-1) (repeat 7)
    { In order to decide on the first clause, we must test
      (3-1) <= 0 which requires evaluating (3-1). }
= 7 : take 2 (repeat 7)
    { 2 <= 0 is False, so we expand repeat 7 again. }
= 7 : take 2 (7 : repeat 7)
    { The rest is similar. }
= 7 : 7 : take (2-1) (repeat 7)
= 7 : 7 : take 1 (repeat 7)
= 7 : 7 : take 1 (7 : repeat 7)
= 7 : 7 : 7 : take (1-1) (repeat 7)
= 7 : 7 : 7 : take 0 (repeat 7)
= 7 : 7 : 7 : []

(Note that although evaluation could be implemented exactly like the above, most Haskell compilers will do something a bit more sophisticated. In particular, GHC uses a technique called graph reduction, where the expression being evaluated is actually represented as a graph, so that different parts of the expression can share pointers to the same subexpression. This ensures that work is not duplicated unnecessarily. For example, if f x = [x,x], evaluating f (1+1) will only do one addition, because the subexpression 1+1 will be shared between the two occurrences of x.)

Consequences

Laziness has some very interesting, pervasive, and nonobvious consequences. Let’s explore a few of them.

Purity As we’ve already seen, choosing a lazy evaluation strategy essentially forces you to also choose purity (assuming you don’t want programmers to go insane).
Understanding space usage Laziness is not all roses. One of the downsides is that it sometimes becomes tricky to reason about the space usage of your programs. Consider the following (innocuous-seeming) example:

-- Standard library function foldl, provided for reference
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl _ z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

Question: How will foldl (+) 0 [1,2,3] evaluate?

  foldl (+) 0 [1,2,3]
= foldl (+) (0 + 1) [2,3]
= foldl (+) ((0 + 1) + 2) [3]
= foldl (+) (((0 + 1) + 2) + 3) []
= (((0 + 1) + 2) + 3)

Since the value of the accumulator is not demanded until recursing through the entire list, the accumulator simply builds up a big unevaluated expression (((0+1)+2)+3), which finally gets reduced to a value at the end. There are at least two problems with this. One is that it’s simply inefficient: there’s no point in transferring all the numbers from the list into a different list-like thing (the accumulator thunk) before actually adding them up. The second problem is more subtle, and more insidious: evaluating the expression (((0+1)+2)+3) actually requires pushing the 3 and 2 onto a stack before being able to compute 0+1 and then unwinding the stack, adding along the way. This is not a problem for this small example, but for very long lists it’s a big problem: there is usually not as much space available for the stack, so this can lead to a stack overflow.

The solution in this case is to use the foldl' function instead of foldl, which adds a bit of strictness: in particular, foldl' requires its second argument (the accumulator) to be evaluated before it proceeds, so a large thunk never builds up:

  foldl' (+) 0 [1,2,3]
= foldl' (+) (0+1) [2,3]
= foldl' (+) 1 [2,3]
= foldl' (+) (1+2) [3]
= foldl' (+) 3 [3]
= foldl' (+) (3+3) []
= foldl' (+) 6 []
= 6

As you can see, foldl' does the additions along the way, which is what we really want. But the point is that in this case laziness got in the way and we had to make our program less lazy.

(If you’re interested in learning about how foldl' achieves this, you can read about seq on the Haskell wiki.)

Abusing laziness for efficiency

Consider the fibonacci function:

> fib :: Int -> Int 
> fib i | i <= 0 = 0  
> fib 1 = 1 
> fib i = fib (i-1) + fib (i-2)

Q: Why is it so slow?

Let’s compute the first 6 fibonacci numbers, but evaluate each only one time:

> fibs :: [Int]
> fibs = 1:1:zipWith (+) fibs (tail fibs)
> -- fibi i = fibs!!i + tail fibs!!i

Now use the transformation l!!(i+1) = tail l!!i

> fibs1 :: [Int]
> fibs1 = fib01:fib11:fib21:fib31:fib41:fib51:[{- ... -}]
> fib01 = 0 
> fib11 = 1 
> fib21 = fibs1!!0 + tail fibs1!!0 
> fib31 = fibs1!!1 + tail fibs1!!1 
> fib41 = fibs1!!2 + tail fibs1!!2 
> fib51 = fibs1!!3 + tail fibs1!!3

Observation 1: The code for computing the fibonacci numbers after the first two is the same (modulo indexing).

> fibs2 :: [Int]
> fibs2 = fib02:fib12:fib2i 0:fib2i 1 :fib2i 2:fib2i 3:[{- ... -}]
> fib02 = 0 
> fib12 = 1 
> fib2i i = (+) (fibs2!!i)  (tail fibs2!!i)

Observation 2: We are applying the same function for all the elements of the fib (infinite) list! Let’s zip it! The zip function comes from Prelude

zip :: [a] -> [b] -> [(a,b)]

Similarly, zipWith combines the two lists using a given function

zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]

> fibs' :: [Int]
> fibs' = 0:1:zipWith (+) fibs' (tail fibs')

Now you can take the ith fib as fibs'!!i. This works because of laziness: fibs' is an infinite list, but because of laziness it will only get computed as needed.