Data Types

{- OPTIONS_GHC -fplugin=LiquidHaskell #-} {- LIQUID "--no-termination" @-} module Lecture_02_DataTypes where import Data.List (foldl') import Prelude hiding (head, tail, (!!), map, zipWith, zip, take, drop, reverse) import Data.Maybe (fromJust) main :: IO () main = return ()

In last lecture, we saw refinement types on primitive values and functions and the language of the predicates that includes arithmetic, boolean, and uninterpreted functions. Today, we will see how to use refinement types on data types. Concretely,

  1. We will define new and use new logical functions on user defined data types.
  2. We use refinements on definitions of data types to specify invariants.
  3. We will see how to use LiquidHaskell to reason about Haskell’s lists.

Measures

We will start with the most famous data type, the list and see how we can use refinement types for safe indexing in lists, e.g., to

  1. define the length of a list,
  2. compute the length of a list, and
  3. restrict the indexing of lists to valid indices.

Here is the standard list data type in Haskell:

data List a = Nil | Cons a (List a)

We use the measure definition to define the length of a list.

{-@ measure llen @-} {-@ llen :: List a -> Nat @-} llen :: List a -> Int llen Nil = 0 llen (Cons x l) = 1 + llen l

Note: The measure keyword has two uses in LiquidHaskell.

  1. Last time we saw that the measure keyword is used to define an uninterpreted SMT function.
  2. Used without a type signature with the same name as a Haskell function, the measure keyword is used to lift the Haskell function to the refinement logic.

Concretely, a “measure” is a function that has one argument which is a Algebraic Data Type (ADT), like a list. The one argument restriction is very important because it allows LiquidHaskell to automate the verification.1 The measure definition “lifts” the Haskell function to the refinement logic, by refining the types of the data constructors with the exact definition of the function.

For example, the llen measure definition refines the type of the lists constructor to be:

Nil  :: {v:List a | llen v = 0}
Cons :: x:a -> l:List a -> {v:List a | llen v = 1 + llen l}

With these refinements, verification can reason about the length of lists:

{-@ twoElems :: {v:List Int | llen v == 2} @-} twoElems :: List Int twoElems = Cons 4 (Cons 2 Nil)

Type checking twoElems, using ANF, looks like this:


let l0 = Nil        :: {v:List a | llen v = 0}
    l1 = Cons 2 l0  :: {v:List a | llen v = 1 + llen l0}
in Cons 4 l1        :: {v:List a | llen v = 1 + llen l1}

Multiple Measures

We can define multiple measures for the same data type, in which case, the refinements are conjoined together.

For example, we can define a measure that checks empiness of a list.

{-@ measure isempty @-} isempty :: List a -> Bool isempty Nil = True isempty _ = False

With these two measure definitions, the types of the list constructors are refined to:

Nil  :: {v:List a | llen v = 0 && isempty v}
Cons :: x:a -> l:List a -> {v:List a | llen v = 1 + llen l && not (isempty v)}

Question: Let’s define the head and tail functions for lists.

head :: List a -> a head = undefined
tail :: List a -> List a tail = undefined

Question: Can you give a strong engouth type for tail to verify length of result?

{- oneElem :: {v:List Int | llen v == 1} @-} oneElem :: List Int oneElem = tail twoElems

Question: Let’s now define a safe indexing function for lists.

{-@ (!!) :: xs:List a -> {i:Int | 0 <= i && i < llen xs } -> a @-} (!!) :: List a -> Int -> a (!!) = undefined

Question: Let’s now define a safe lookup function for lists, using the case sensitivity of refinement types.

safeLookup :: List a -> Int -> Maybe a safeLookup = undefined

Recursive Functions

Let’s write a recursive function that adds up the values of an integer list.

listSum :: List Int -> Int listSum xs = go 0 0 where go acc i | i < llen xs = go (acc + (xs !! i)) (i+1) | otherwise = acc

Question: What happens if you replace the guard with i <= llen xs?

Question: Write a variant of the above function that computes the absuluteSum of the list, i.e., the sum of the absolute values of the elements.

{-@ absSum :: List Int -> Int @-} absSum :: List Int -> Int absSum = undefined

LiquidHaskell verifies listSum, or to be precise the safety of list indexing. The verification works because Liquid Haskell is able to automatically infer

go :: Int -> {v:Int | 0 <= v && v <= llen xs} -> Int

which states that the second parameter i is between 0 and the length of the list (inclusive). LiquidHaskell uses this and the test that i < llen xs to verify that the indexing is safe.

Note: LiquidHaskell automatically tests the termination of recursive functions. The default termination metric for the above functions fail. Later, we will see how to fix this. But for now, we can disable termination checking, but declaring functions as lazy.

{-@ lazy listSum @-} {-@ lazy absSum @-}

Question: Why does the type of go has v <= llen xs and not v < llen xs?

Higher-Order Functions

We already used the go structure twice, so let’s generalize the common pattern! Let’s refactor the above low-level recursive function into a generic higher-order loop.

{-@ lazy loop @-} loop :: Int -> Int -> a -> (Int -> a -> a) -> a loop lo hi base f = go base lo where go acc i | i < hi = go (f i acc) (i + 1) | otherwise = acc

We can now use loop to implement listSum:

{-@ lazy listSum' @-} listSum' :: List Int -> Int listSum' xs = loop 0 n 0 body where body i acc = acc + (xs !! i) n = llen xs


Inference is a convenient option. LiquidHaskell finds:

{-@ type Btwn Lo Hi = {v:Int | Lo <= v && v < Hi} @-} {-@ loop :: lo:Nat -> hi:{Nat|lo <= hi} -> a -> (Btwn lo hi -> a -> a) -> a @-}

In English, the above type states that

  • lo the loop lower bound is a non-negative integer
  • hi the loop upper bound is a greater then or equal to lo,
  • f the loop body is only called with integers between lo and hi.

It can be tedious to have to keep typing things like the above. If we wanted to make loop a public or exported function, we could use the inferred type to generate an explicit signature.

At the call loop 0 n 0 body the parameters lo and hi are instantiated with 0 and n respectively, which, by the way is where the inference engine deduces non-negativity. Thus LiquidHaskell concludes that body is only called with values of i that are between 0 and (llen xs), which verifies the safety of the call xs !! i.

Question: Complete the implementation of absoluteSum' below. When you are done, what is the type that is inferred for body?

{-@ absoluteSum' :: List Int -> Nat @-} absoluteSum' :: List Int -> Int absoluteSum' xs = loop 0 n 0 body where body i acc = undefined n = llen xs

Question: The following uses loop to compute dotProducts. Why does LiquidHaskell flag an error? Fix the code or specification so that LiquidHaskell accepts it.

{-@ ignore dotProduct @-} -- >>> dotProduct ([1,2,3]) ( [4,5,6]) -- 32 {-@ dotProduct :: x:List Int -> y:List Int -> Int @-} dotProduct :: List Int -> List Int -> Int dotProduct x y = loop 0 sz 0 body where body i acc = acc + (x !! i) * (y !! i) sz = llen x

Folding (Indexed Lists)

Let’s now use lists to represent sparse vectors, meaning vectors with many zeros.

{-@ type SparseN a N = [(Btwn 0 N, a)] @-}

Implicitly, all indices other than those in the list have the value 0 (or the equivalent value for the type a).


The Alias SparseN is just a shorthand for the (longer) type on the right, it does not define a new type. If you are familiar with the index-style length encoding e.g. as found in DML or Agda, then note that despite appearances, our Sparse definition is not indexed.

Let’s write a function to compute a sparse product

{-@ sparseProduct :: x:List Int -> SparseN Int (llen x) -> Int @-} sparseProduct :: List Int -> [(Int, Int)] -> Int sparseProduct x y = go 0 y where go n [] = n go n ((i,v):y') = go (n + (x!!i) * v) y'

LiquidHaskell verifies the above by using the specification to conclude that for each tuple (i, v) in the list y, the value of i is within the bounds of the list x, thereby proving x !! i safe.

The sharp reader will have undoubtedly noticed that the sparse product can be more cleanly expressed as a fold:

foldl' :: (a -> b -> a) -> a -> [b] -> a

We can simply fold over the sparse vector, accumulating the sum as we go along

{-@ sparseProduct' :: x:List Int -> SparseN Int (llen x) -> Int @-} sparseProduct' :: List Int -> [(Int, Int)] -> Int sparseProduct' x y = foldl' body 0 y where body sum (i, v) = sum + (x !! i) * v

LiquidHaskell digests this without difficulty. The main trick is in how the polymorphism of foldl' is instantiated.

  1. GHC infers that at this site, the type variable b from the signature of foldl' is instantiated to the Haskell type (Int, a).

  2. Correspondingly, LiquidHaskell infers that in fact b can be instantiated to the refined (Btwn 0 (vlen x), a).

Thus, the inference mechanism saves us a fair bit of typing and allows us to reuse existing polymorphic functions over containers and such without ceremony.

Data Invariants: Sparse Vectors

Liquid Haskell allows to write invariants on data types. As an example, let’s revisit the sparse vector representation that we saw earlier. The SparseN type alias we used got the job done, but is not pleasant to work with because we have no way of determining the dimension of the sparse vector. Instead, let’s create a new datatype to represent such vectors:

data Sparse a = SP { spDim :: Int , spElems :: [(Int, a)] }

Thus, a sparse vector is a pair of a dimension and a list of index-value tuples. Implicitly, all indices other than those in the list have the value 0 or the equivalent value type a.

Sparse vectors satisfy two crucial properties. First, the dimension stored in spDim is non-negative. Second, every index in spElems must be valid, i.e. between 0 and the dimension. Unfortunately, Haskell’s type system does not make it easy to ensure that illegal vectors are not representable.2


Data Invariants LiquidHaskell lets us enforce these invariants with a refined data definition:

{-@ data Sparse a = SP { spDim :: Nat , spElems :: [(Btwn 0 spDim, a)]} @-}


Refined Data Constructors The refined data definition is internally converted into refined types for the data constructor SP:

-- Generated Internal representation
data Sparse a where
  SP :: spDim:Nat
     -> spElems:[(Btwn 0 spDim, a)]
     -> Sparse a

In other words, by using refined input types for SP we have automatically converted it into a smart constructor that ensures that every instance of a Sparse is legal. Consequently, LiquidHaskell verifies:

okSP :: Sparse String okSP = SP 5 [ (0, "cat") , (3, "dog") ]

but rejects, due to the invalid index:

{-@ ignore badSP @-} badSP :: Sparse String badSP = SP 5 [ (0, "cat") , (6, "dog") ]


Field Measures It is convenient to write an alias for sparse vectors of a given size N. We can use the field name spDim as a measure, like llen. That is, we can use spDim inside refinements3

{-@ type SparseIN a N = {v:Sparse a | spDim v == N} @-}

Let’s write a function to compute a sparse product

{-@ dotProd :: x:List Int -> SparseIN Int (llen x) -> Int @-} dotProd :: List Int -> Sparse Int -> Int dotProd x (SP _ y) = go 0 y where go sum ((i, v) : y') = go (sum + (x !! i) * v) y' go sum [] = sum

LiquidHaskell verifies the above by using the specification to conclude that for each tuple (i, v) in the list y, the value of i is within the bounds of the list x, thereby proving x !! i safe.


Folded Product We can port the fold-based product to our new representation:

{-@ dotProd' :: x:List Int -> SparseIN Int (llen x) -> Int @-} dotProd' :: List Int -> Sparse Int -> Int dotProd' x (SP _ y) = foldl' body 0 y where body sum (i, v) = sum + (x !! i) * v

As before, LiquidHaskell checks the above by automatically instantiating refinements for the type parameters of foldl', saving us a fair bit of typing and enabling the use of the elegant polymorphic, higher-order combinators we know and love.


Exercise: (Sanitization): Invariants are all well and good for data computed inside our programs. The only way to ensure the legality of data coming from outside, i.e. from the “real world”, is to write a sanitizer that will check the appropriate invariants before constructing a Sparse list. Write the specification and implementation of a sanitizer fromList, so that the following typechecks:



Hint: You need to check that all the indices in elts are less than dim; the easiest way is to compute a new Maybe [(Int, a)] which is Just the original pairs if they are valid, and Nothing otherwise.

fromList :: Int -> [(Int, a)] -> Maybe (Sparse a) fromList dim elts = undefined {- test :: SparseIN String 3 @-} test :: Maybe (Sparse String) test = fromList 3 [(0, "cat"), (2, "mouse")]


Exercise: (Addition): Write the specification and implementation of a function plus that performs the addition of two Sparse vectors of the same dimension, yielding an output of that dimension. When you are done, the following code should typecheck:



plus :: (Num a) => Sparse a -> Sparse a -> Sparse a plus x y = undefined {- testPlus :: SparseIN Int 3 @-} testPlus :: Sparse Int testPlus = plus vec1 vec2 where vec1 = SP 3 [(0, 12), (2, 9)] vec2 = SP 3 [(0, 8), (1, 100)]

Ordered Lists

As a second example of refined data types, let’s consider a different problem: representing ordered sequences. Here’s a type for sequences that mimics the classical list:

data IncList a = Emp | (:<) { hd :: a, tl :: IncList a } infixr 9 :<

The Haskell type above does not state that the elements are in order of course, but we can specify that requirement by refining every element in tl to be greater than hd:

{-@ data IncList a = Emp | (:<) { hd :: a, tl :: IncList {v:a | hd <= v}} @-}


Refined Data Constructors Once again, the refined data definition is internally converted into a “smart” refined data constructor

-- Generated Internal representation
data IncList a where
  Emp  :: IncList a
  (:<) :: hd:a -> tl:IncList {v:a | hd <= v} -> IncList a

which ensures that we can only create legal ordered lists.

okList :: IncList Int okList = 1 :< 2 :< 3 :< Emp -- accepted by LH {-@ ignore badList @-} badList :: IncList Int badList = 2 :< 1 :< 3 :< Emp -- rejected by LH

It’s all very well to specify ordered lists. Next, let’s see how it’s equally easy to establish these invariants by implementing several textbook sorting routines.

First, let’s implement insertion sort, which converts an ordinary list [a] into an ordered list IncList a.

insertSort :: (Ord a) => [a] -> IncList a insertSort [] = Emp insertSort (x:xs) = insert x (insertSort xs)

The hard work is done by insert which places an element into the correct position of a sorted list. LiquidHaskell infers that if you give insert an element and a sorted list, it returns a sorted list.

insert :: (Ord a) => a -> IncList a -> IncList a insert y Emp = y :< Emp insert y (x :< xs) | y <= x = y :< x :< xs | otherwise = x :< insert y xs


Exercise: (Insertion Sort): Complete the implementation of the function below to use foldr to eliminate the explicit recursion in insertSort.



insertSort' :: (Ord a) => [a] -> IncList a insertSort' xs = foldr f b xs where f = undefined -- Fill this in b = undefined -- Fill this in

We will come back to the concept of increasing lists and see how one can provide such a specification for Haskell’s lists. But for now, let’s study easier properties of lists.

Summary

Today we saw how refinement interact with data types. Concretely we saw how to define measures to specify properties of user defined data and how to refine the definitions of data types to specify invariants. Finally, we saw how all these features interact with existing Haskell libraries, and concretely how to use LiquidHaskell to reason about Haskell’s lists.


  1. In a next lecture we will see how one can use reflection to lift in the logic functions with more than one argument, but then verification is no more automated.↩︎

  2. The standard approach is to use abstract types and smart constructors but even then there is only the informal guarantee that the smart constructor establishes the right invariants.↩︎

  3. Note that inside a refined data definition, a field name like spDim refers to the value of the field, but outside it refers to the field selector measure or function.↩︎