sum
and psum
mRTheorem
mRTheorem
on plusIn last lecture, we saw refinement types on primitive values and functions and the language of the predicates that includes arithmetic, boolean, and uninterpreted functions. Today, we will see how to use refinement types on data types. Concretely,
We will start with the most famous data type, the list and see how we can use refinement types for safe indexing in lists, e.g., to
Here is the standard list data type in Haskell:
We use the measure definition to define the length of a list.
Note: The measure
keyword has two uses in LiquidHaskell.
measure
keyword is used to define an uninterpreted SMT function.measure
keyword is used to lift the Haskell function to the refinement logic.Concretely, a “measure” is a function that has one argument which is a Algebraic Data Type (ADT), like a list. The one argument restriction is very important because it allows LiquidHaskell to automate the verification.1 The measure definition “lifts” the Haskell function to the refinement logic, by refining the types of the data constructors with the exact definition of the function.
For example, the llen
measure definition refines the type of the lists constructor to be:
Nil :: {v:List a | llen v = 0}
Cons :: x:a -> l:List a -> {v:List a | llen v = 1 + llen l}
With these refinements, verification can reason about the length of lists:
Type checking twoElems
, using ANF, looks like this:
let l0 = Nil :: {v:List a | llen v = 0}
l1 = Cons 2 l0 :: {v:List a | llen v = 1 + llen l0}
in Cons 4 l1 :: {v:List a | llen v = 1 + llen l1}
We can define multiple measures for the same data type, in which case, the refinements are conjoined together.
For example, we can define a measure that checks empiness of a list.
With these two measure definitions, the types of the list constructors are refined to:
Nil :: {v:List a | llen v = 0 && isempty v}
Cons :: x:a -> l:List a -> {v:List a | llen v = 1 + llen l && not (isempty v)}
Question: Let’s define the head
and tail
functions for lists.
Question: Can you give a strong engouth type for tail to verify length of result?
Question: Let’s now define a safe indexing function for lists.
Question: Let’s now define a safe lookup function for lists, using the case sensitivity of refinement types.
Let’s write a recursive function that adds up the values of an integer list.
Question: What happens if you replace the guard with i <= llen xs
?
Question: Write a variant of the above function that computes the absuluteSum
of the list, i.e., the sum of the absolute values of the elements.
LiquidHaskell verifies listSum
, or to be precise the safety of list indexing. The verification works because Liquid Haskell is able to automatically infer
go :: Int -> {v:Int | 0 <= v && v <= llen xs} -> Int
which states that the second parameter i
is between 0 and the length of the list (inclusive). LiquidHaskell uses this and the test that i < llen xs
to verify that the indexing is safe.
Note: LiquidHaskell automatically tests the termination of recursive functions. The default termination metric for the above functions fail. Later, we will see how to fix this. But for now, we can disable termination checking, but declaring functions as lazy
.
Question: Why does the type of go
has v <= llen xs
and not v < llen xs
?
We already used the go
structure twice, so let’s generalize the common pattern! Let’s refactor the above low-level recursive function into a generic higher-order loop
.
We can now use loop
to implement listSum
:
Inference is a convenient option. LiquidHaskell finds:
In English, the above type states that
lo
the loop lower bound is a non-negative integerhi
the loop upper bound is a greater then or equal to lo
,f
the loop body is only called with integers between lo
and hi
. It can be tedious to have to keep typing things like the above. If we wanted to make loop
a public or exported function, we could use the inferred type to generate an explicit signature.
At the call loop 0 n 0 body
the parameters lo
and hi
are instantiated with 0
and n
respectively, which, by the way is where the inference engine deduces non-negativity. Thus LiquidHaskell concludes that body
is only called with values of i
that are between 0
and (llen xs)
, which verifies the safety of the call xs !! i
.
Question: Complete the implementation of absoluteSum'
below. When you are done, what is the type that is inferred for body
?
Question: The following uses loop
to compute dotProduct
s. Why does LiquidHaskell flag an error? Fix the code or specification so that LiquidHaskell accepts it.
Let’s now use lists to represent sparse vectors, meaning vectors with many zeros.
Implicitly, all indices other than those in the list have the value 0
(or the equivalent value for the type a
).
The Alias SparseN
is just a shorthand for the (longer) type on the right, it does not define a new type. If you are familiar with the index-style length encoding e.g. as found in DML or Agda, then note that despite appearances, our Sparse
definition is not indexed.
Let’s write a function to compute a sparse product
LiquidHaskell verifies the above by using the specification to conclude that for each tuple (i, v)
in the list y
, the value of i
is within the bounds of the list x
, thereby proving x !! i
safe.
The sharp reader will have undoubtedly noticed that the sparse product can be more cleanly expressed as a fold:
foldl' :: (a -> b -> a) -> a -> [b] -> a
We can simply fold over the sparse vector, accumulating the sum
as we go along
LiquidHaskell digests this without difficulty. The main trick is in how the polymorphism of foldl'
is instantiated.
GHC infers that at this site, the type variable b
from the signature of foldl'
is instantiated to the Haskell type (Int, a)
.
Correspondingly, LiquidHaskell infers that in fact b
can be instantiated to the refined (Btwn 0 (vlen x), a)
.
Thus, the inference mechanism saves us a fair bit of typing and allows us to reuse existing polymorphic functions over containers and such without ceremony.
Liquid Haskell allows to write invariants on data types. As an example, let’s revisit the sparse vector representation that we saw earlier. The SparseN
type alias we used got the job done, but is not pleasant to work with because we have no way of determining the dimension of the sparse vector. Instead, let’s create a new datatype to represent such vectors:
Thus, a sparse vector is a pair of a dimension and a list of index-value tuples. Implicitly, all indices other than those in the list have the value 0
or the equivalent value type a
.
Sparse
vectors satisfy two crucial properties. First, the dimension stored in spDim
is non-negative. Second, every index in spElems
must be valid, i.e. between 0
and the dimension. Unfortunately, Haskell’s type system does not make it easy to ensure that illegal vectors are not representable.2
Data Invariants LiquidHaskell lets us enforce these invariants with a refined data definition:
Refined Data Constructors The refined data definition is internally converted into refined types for the data constructor SP
:
-- Generated Internal representation
data Sparse a where
SP :: spDim:Nat
-> spElems:[(Btwn 0 spDim, a)]
-> Sparse a
In other words, by using refined input types for SP
we have automatically converted it into a smart constructor that ensures that every instance of a Sparse
is legal. Consequently, LiquidHaskell verifies:
but rejects, due to the invalid index:
Field Measures It is convenient to write an alias for sparse vectors of a given size N
. We can use the field name spDim
as a measure, like llen
. That is, we can use spDim
inside refinements3
Let’s write a function to compute a sparse product
LiquidHaskell verifies the above by using the specification to conclude that for each tuple (i, v)
in the list y
, the value of i
is within the bounds of the list x
, thereby proving x !! i
safe.
Folded Product We can port the fold
-based product to our new representation:
As before, LiquidHaskell checks the above by automatically instantiating refinements for the type parameters of foldl'
, saving us a fair bit of typing and enabling the use of the elegant polymorphic, higher-order combinators we know and love.
Exercise: (Sanitization): Invariants are all well and good for data computed inside our programs. The only way to ensure the legality of data coming from outside, i.e. from the “real world”, is to write a sanitizer that will check the appropriate invariants before constructing a Sparse
list. Write the specification and implementation of a sanitizer fromList
, so that the following typechecks:
Hint: You need to check that all the indices in elts
are less than dim
; the easiest way is to compute a new Maybe [(Int, a)]
which is Just
the original pairs if they are valid, and Nothing
otherwise.
Exercise: (Addition): Write the specification and implementation of a function plus
that performs the addition of two Sparse
vectors of the same dimension, yielding an output of that dimension. When you are done, the following code should typecheck:
As a second example of refined data types, let’s consider a different problem: representing ordered sequences. Here’s a type for sequences that mimics the classical list:
The Haskell type above does not state that the elements are in order of course, but we can specify that requirement by refining every element in tl
to be greater than hd
:
Refined Data Constructors Once again, the refined data definition is internally converted into a “smart” refined data constructor
-- Generated Internal representation
data IncList a where
Emp :: IncList a
(:<) :: hd:a -> tl:IncList {v:a | hd <= v} -> IncList a
which ensures that we can only create legal ordered lists.
It’s all very well to specify ordered lists. Next, let’s see how it’s equally easy to establish these invariants by implementing several textbook sorting routines.
First, let’s implement insertion sort, which converts an ordinary list [a]
into an ordered list IncList a
.
The hard work is done by insert
which places an element into the correct position of a sorted list. LiquidHaskell infers that if you give insert
an element and a sorted list, it returns a sorted list.
Exercise: (Insertion Sort): Complete the implementation of the function below to use foldr
to eliminate the explicit recursion in insertSort
.
We will come back to the concept of increasing lists and see how one can provide such a specification for Haskell’s lists. But for now, let’s study easier properties of lists.
Today we saw how refinement interact with data types. Concretely we saw how to define measures to specify properties of user defined data and how to refine the definitions of data types to specify invariants. Finally, we saw how all these features interact with existing Haskell libraries, and concretely how to use LiquidHaskell to reason about Haskell’s lists.
In a next lecture we will see how one can use reflection to lift in the logic functions with more than one argument, but then verification is no more automated.↩︎
The standard approach is to use abstract types and smart constructors but even then there is only the informal guarantee that the smart constructor establishes the right invariants.↩︎
Note that inside a refined data
definition, a field name like spDim
refers to the value of the field, but outside it refers to the field selector measure or function.↩︎