On this page:
1 Purpose
2 Outline
8.7

Lecture 11: Rosette & Finitization

1 Purpose

Finitization of problems to make them fit into Rosette.

2 Outline

A bigger example, for a long-standing bug

This example comes from the Rosette documentation, which has many other great ones!

The bad-mult-prop example is nice, but it’s a bit contrived. Let’s look at a real bug.

First, we’ll define a way of dealing with 32bit numbers, representing them as vectors of 32 bits.

; int32? is a shorthand for the type (bitvector 32).
(define int32? (bitvector 32))

; int32 takes as input an integer literal and returns
; the corresponding 32-bit bitvector value.
(define (int32 i)
  (bv i int32?))

Next, we’ll define a "midpoint" function that takes two numbers and returns the middle, using bitvector addition and division:

; Returns the midpoint of the interval [lo, hi].
(define (bvmid lo hi)  ; (lo + hi) / 2
  (bvsdiv (bvadd lo hi) (int32 2)))

Remember: SMT solvers have the ability to natively reason about bitvectors, so stating things this way will allow us to best leverage the solver.

To talk about correctness, we need to think about how this function is supposed to work: both the assumptions it makes about the inputs (that both numbers are positive, that the first is less than the second). We can encode those as assumptions, and then compute assertions about the result of the function. In this case, we’ll assert that the middle is between the lo and the hi, and that the difference from middle to lo and middle to hi are nearly the same.

(define (check-mid impl lo hi)     ; Assuming that
  (assume (bvsle (int32 0) lo))    ; 0 ≤ lo and
  (assume (bvsle lo hi))           ; lo ≤ hi,
  (define mi (impl lo hi))         ; and letting mi = impl(lo, hi) and
  (define diff                     ; diff = (hi - mi) - (mi - lo),
    (bvsub (bvsub hi mi)
           (bvsub mi lo)))         ; we require that
  (assert (bvsle lo mi))           ; lo ≤ mi,
  (assert (bvsle mi hi))           ; mi ≤ hi,
  (assert (bvsle (int32 0) diff))  ; 0 ≤ diff, and
  (assert (bvsle diff (int32 1)))) ; diff ≤ 1.

We can test this on various inputs (and we could use quickcheck, though won’t in this case), and perhaps unsurprisingly (given this was a common and longstanding bug: https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues), we don’t detect any problems:

; (check-mid bvmid (int32 0) (int32 0))
; (check-mid bvmid (int32 0) (int32 1))
; (check-mid bvmid (int32 0) (int32 2))
; (check-mid bvmid (int32 10) (int32 10000))

But we can do better with Rossette, and unsurprisingly, it finds a counterexample:

(define-symbolic l h int32?)


(verify (check-mid bvmid l h))

Finitization

This example is more interesting, but still just involves adding and dividing numbers. While we won’t look at the actual encodings, this happens by translating the problem into a query in the language that the SMT language can accomodate: different SMT solvers have different capabilities, but typically, not only Booleans (as SAT solvers can deal with), but also finite sized integers (bit-arrays), and possibly strings, uninterpreted functions, etc.

What if we want to handle more complicated queries? What about recursive functions, or unbounded data types like lists?

In this case, what we have to do is finitize. For recursive functions, that typically means a certain number of recursive calls; for lists, it would be lists of up to a certain length, etc. This is because the SMT queries cannot be self-referential, so we cannot directly express a recursive computation, or a recursive data structure.

How do we finitize? There are two issues. First, there is a question of recursion. If we have functions on symbolic values that are recursive, naively trying to reason about them with Rosette runs into problems. In particular, on each recursive call, Rosette will create a new symbolic value, and will never realize that it gets to a stop condition. Consider: here we allow it to go to 5 levels of recursion; that isn’t enough to find the bug, so Rosette reports "unsat". If we up the fuel to high enough, it finds it just fine.

(define (bad-fact k n)
  (assume (> k 0) "out of fuel")
  (if (> n 10)
      0
      (if (zero? n)
          1
          (* n (bad-fact (sub1 k) (sub1 n))))))

(define (fact k n)
  (assume (> k 0) "out of fuel")
  (if (zero? n)
      1
      (* n (fact (sub1 k) (sub1 n)))))

(define (fact-prop k n)
  (equal? (fact k n)
          (bad-fact k n)))

(define-symbolic p integer?)
(assert (>= p 0))
(define m (verify (assert (fact-prop 5 p))))
(clear-vc!)

Recursive data is handled slightly differently. There is no direct way to create a symbolic piece of recursive data, so likely, we’ll end up creating a concrete list that contains symbolic values. As a result, we’ll end up with a finite list, and, at least structurally recursive functions will then be fine, since Rosette will be able to figure out when the program stops. If the program was generative, we’d likely have to do the above finitization trick.

(require quickcheck)
(require rackunit)

(define-symbolic a b c d e f n integer?)
(assert (<= n 6))
(define xs (take (list a b c d e f) n))

This allows us to test on lists of up to 6, since we have a symbolic value that we use to determine how many of the elements of the list we actually care about. Another way of thinking about how this will be used is that we are going to enumerate all the lists of one element, two elements, three elements, etc. up to 6 elements, and then check each of those. So while "list" is not something we can reason about symbolically (to really handle them, we’ll have to wait for Lean), we can still handle at least small cases, which are often enough to reveal bugs.

Let’s see how to do this for a recursive function. In the presentation of SAT, we manually unrolled and finitized our recursion. With Rosette, we don’t have to do that, though we do still have to think about how large of a space are we asking for it to be searching through, as otherwise the queries may take a long time to run!

Let’s return to the example of sorting. We can start with a decent property – not perfect, but not terrible – to capture what a sorting function (calling it sort1 here) should do:

(define (ordered l)
  (cond [(empty? l) #t]
        [(cons? l) (and (andmap (lambda (e) (<= (first l) e)) (rest l))
                        (ordered (rest l)))]))

(define (sort1-prop l)
  (and (equal? (length l) (length (sort1 l)))
       (ordered (sort1 l))))

This says that sort1 should take a list, and return a list of the same length, where the elements in the resulting list are in sorted order.

Now, we could just jump right into using Rosette, but one weakness of it is that if our programs report errors, then Rosette considers whatever input triggered that a counterexample. If there is a bug in our code, it will thus report that any input could be treated as a counterexample, but it won’t show the error – it will hide them. One easy way to avoid that (and, general good practice) is to first do some basic testing. We’ll show how to do it here with some unit tests, though we certainly could use PBT, and probably it would be better. Because of how Rosette works, we can use the same functions for both purposes: only the inputs we passed (concrete for the testts, or symbolic, for rosette) will change.

(define (sort1 l)
  (cond [(empty? l) l]
        [(cons? l)
         (insert1 (first l)
                  (sort1 (rest l)))]))

(define (insert1 x l)
  (cond [(empty? l) (list x)]
        [(cons? l)
         (if (< x (first l))
             (cons x (rest l))
             (cons (first l) (insert1 x (rest l))))]))

(check-pred sort1-prop (list))
(check-pred sort1-prop (list 1 1))
(check-pred sort1-prop (list 3 2 1))

Now we can also use Rosette to find counterexamples to this property. We can do this with:

(verify (assert (sort1-prop xs)))

This will find a counterexample, and print it out, as it turns out there is a bug in our code. At this point, we can either use the counterexample that Rosette gave us, or we can turn to another tool to look for more, or smaller ones. Here, we’ll see what Quickcheck gives us (if anything):

(quickcheck (property ([l (arbitrary-list arbitrary-integer)])
                      (sort1-prop l)))

This is a smaller input, which is nice, and it allows us to realize what is going wrong: our insert is dropping the element after the one it inserts, assuming it doesn’t insert at the end of the list. Since our concrete unit tests all inserted at the end of the list, we missed this.

We can fix this by changing our insert function, running all of the tests again, and this time, both Quickcheck and Rosette will report no issues.

(define (sort2 l)
  (cond [(empty? l) l]
        [(cons? l)
         (insert2 (first l)
                 (sort2 (rest l)))]))

(define (insert2 n l)
  (cond [(empty? l) (list n)]
        [(cons? l)
         (if (< n (first l))
             (cons n l)
             (cons (first l) (insert2 n (rest l))))]))



(define (sort2-prop l)
  (and (equal? (length l) (length (sort2 l)))
       (ordered (sort2 l))))

(check-pred sort2-prop (list))
(check-pred sort2-prop (list 1 1))
(check-pred sort2-prop (list 3 2 1))

(quickcheck (property ([l (arbitrary-list arbitrary-integer)])
                      (sort2-prop l)))

(verify (assert (sort2-prop xs)))
(clear-vc!)