Read an Excerpt
Trends in Functional Programming Volume 7
By Henrik Nilsson Intellect Ltd
Copyright © 2007 Intellect
All rights reserved.
ISBN: 978-1-84150-993-8
CHAPTER 1
Proving Termination Using Dependent Types: the Case of Xor-Terms
Jean-François Monin, Judicaël Courant
Abstract: We study a normalization function in an algebra of terms quotiented by an associative, commutative and involutive operator (logical xor). This study is motivated by the formal verification of cryptographic systems, relying on a normalization function for xor-terms. Such a function is easy to define using general recursion. However, as it is to be used in a type theoretic proof assistant, we also need a proof of its termination. Instead of using a mixture of various rewriting orderings, we follow an approach involving the power of Type Theory with dependent types. The results are to be applied in the proof of the security API described in.
1.1 INTRODUCTION
This work originates in the verification of a cryptographic system in the Coq proof assistant. In the course of this verification, we modelized plaintext and encrypted data as first-order sorted terms. For instance, a typical term could be {K1 [direct sum] K2}H(KM,H(EXP,KP)) where {x}y denotes encryption of x using a key y, H(x,y) denotes the fingerprint (cryptographic checksum) of the pair (x, y), x [direct sum] y denotes the bitwise exclusive or of x and y. The cryptographic primitives {_}_ and H are commonly supposed to be perfect, and can thus be treated as free constructors. On the contrary, we need to deal with the equational theory of the exclusive or function as many (potential or effective) attacks are based on its algebraic properties.
Deduction in presence of a non-trivial equational theory in the general case is notoriously difficult: shows that the matching problem in the presence of an associative-commutative (AC) function symbol is NP-complete, even in very restricted cases, and AC matching is still an active research topic. In order to solve our particular problem, we only needed a way to define a canonical form for closed terms and a proof of existence of a canonical form for each term. More precisely, we needed a canonicalization function putting any term into its canonical form. On paper, we would use results from term rewriting theory in order to prove the existence of a normalization function with respect to some rewriting system. Unfortunately, the formalization of these results in Coq is only partial yet. Therefore we tried to apply the methods of terms rewriting theory to our problem but stopped when we realized the amount of work needed to complete the definition of the normalization function (section 1.2).
Then we observed that the existence of such a normalization function could be reduced to the existence of a normalization function for terms built over [direct sum] and 0 only. In other words, we can layer our terms into alternating levels of free and non-free constructors, and normalization of the overall term can be deduced from the normalization of all layers (section 1.3).
We formalized such a proof in Coq (section 1.4). Our results are to be applied in our proof of security of an API. However, the approach investigated here is not tied to the specific term algebra of this case study. We reflected this at several places in our formal development, by stating some key definitions and lemmas in an abstract setting, independently of any specific application.
1.2 BACKGROUND
1.2.1 The Need for a Canonicalization Function
In order to reason of our cryptographic system, we need to define some functions or properties over terms. In order to be consistent, we must ensure that these definitions are compatible with the equational theory E induced by commutativity, associativity and cancelation laws of [direct sum] and 0. We can define such a property P using one of the following ways:
We define P on the whole set of terms (for instance by structural induction). We check afterwards that P is indeed compatible with equality, that is, we check that [MATHEMATICAL EXPRESSION OMITTED]
We give a canonicalization function N for terms, i.e. a function N from terms to terms such that [MATHEMATICAL EXPRESSION OMITTED]. Then we define P over canonical forms only and extend it in the only compatible way. More formally, we first define some auxiliary predicate P' over terms (for instance by structural induction) and P(t) is defined as P'(N(t)). Thus P is compatible with E by construction.
In our case, the latter approach looks the only reasonable one. For instance, we needed to check whether some secret constant appears in a given term. Let us consider an example: it is clear that KM does not appear in {K1}H(EXP,KEK). But checking that it does not appear in KM [direct sum] {K1} H(EXP,KEK) [direct sum] KM is a bit more difficult, as you have to notice that, although KM syntactically is a subterm of this latter term, the occurences of KM have no semantic significance here because of the self-cancelation law. We therefore do not see how to give a definition for this notion by structural induction on the whole set of terms. On the other hand, once we applied all possible simplication rules (self-cancelation and neutral element cancelation, modulo associativity and commutativity), we just need to check a syntactical occurence of KM.
Therefore, we need to define a canonicalization function N over terms which is also a simplification function, i.e. it ensures that for all term t, no simplification rule applies to N(t).
1.2.2 What Term Rewriting Theory Says
The equational theory of [direct sum] and 0 can be described by the following equations:
[MATHEMATICAL EXPRESSION OMITTED] (1.1)
[MATHEMATICAL EXPRESSION OMITTED] (1.2)
Neutral element x [direct sum] 0 [??] x (1.3)
Involutivity x [direct sum] x [??] 0 (1.4)
Term rewriting theory modulo associativity and commutativity (AC), as described in would say that this equational theory is generated by the following rewrite system, where [direct sum] is an associative-commutative symbol:
O [direct sum] x -> x (1.5)
x [direct sum] x -> O (1.6)
If this system is terminating and critical pair converge, the existence of a unique normal form is ensured up to associativity and commutativity. In order to get a unique normal form, one just has to sort repeated applications of ? according to a given arbitrary total ordering.
Tools for automatically checking termination and critical pair convergence have been developed for years by the term rewriting community. For instance, the tool CiME checks these properties instantaneously (code figure 1.1).
1.2.3 Applicability to Type Theory
However, formally giving such a normalization function in Type Theory and formally proving its correctness is much more challenging.
The first show-stopper is termination. In Type Theory, all functions are total and terminating by construction. This means that general fixpoints are not allowed for defining functions. In order to achieve this fundamental property while keeping a good expressive power, Type Theory limits recursion to higher-order primitive recursion, and structural recursion over all inductively defined types. The theory and support tools automatically provide combinators for naturals numbers, lists and user-defined inductive types, which are enough in most cases. Alternatively, in recent versions of Coq, the user can define a function recursively, provided she points out to the system a parameter of this function which structurally decreases at each recursive call. A special and very important case is well-founded recursion: the decreasing argument is a proof, formalized as an inductive object, witnessing that a given value is accessible for some binary relation. In complex cases, the latter approach is by far more convenient. But there is no such thing as a free lunch: while type-checking ensures totality and strong normalization, the user has to design the right types and provide the right arguments. Of course, standard libraries about well-founded relations may help here. Besides, one can consider additional tools or methodology such as those developed by Bertot and Balaa, or Bove and Capretta.
In the case of our canonicalization function, using standard rewriting arguments is surprisingly difficult in a proof assistant such as Coq:
Although some theoretical work addresses the addition of rewriting to the Calculus of Constructions, this is yet to be implemented.
Some work provides ways to define tactics for reasonning over associative-commutative theories, but they only provide ways to normalize given terms, not to define a normalization function.
We therefore tried to define our own specific rewriting relation corresponding to the defining equations of [??], but found this approach really costly:
We had to give a well-founded ordering. As neither recursive path ordering nor lexicographic path ordering library was available in Coq, we used the lexicographic combination of a partial ordering ≤1 with a total ordering ≤2, where ≤ 1 is a polynomial ordering, and ≤ 2 is a lexicographic ordering. Although ≤ 2 is not well-founded, the set of terms having a given weight for the polynomial defining ≤1 is finite, therefore we could prove in Coq that the lexicographic combination of ≤1 and ≤ 2 is finite.
Then we defined a rewriting relation [??]. The difficult part here is to take into account commutativity and associativity. In order to avoid AC-matching issues, we decided to add a rewriting rule for associativity and a conditionnal rewriting rule for commutativity (x [direct sum] y would rewrite to y [direct sum] x if and only if x is smaller than y). Moreover, we had to complete our rewriting system in order to close critical pairs such as x [direct sum] x [direct sum] y, which could be rewritten to y or to x [direct sum] (x [direct sum] y).
We still had to define the normalization function. As mentioned above, the definition of such a function using well-founded induction in Coq is difficult. Therefore we stopped there and used another approach instead.
Once this would be done, we would still have to prove that the transitive closure of our rewriting relation is irreflexive, that our normalization function is sound with respect to it, and that it computes normal forms. Essentially, the main results to prove here would be [MATHEMATICAL EXPRESSION OMITTED] and [MATHEMATICAL EXPRESSION OMITTED].
1.3 OUR APPROACH
1.3.1 Intuitive Presentation
Although implementing normalization in Coq is hard in general, it is much easier in some particular cases. For instance consider the term [MATHEMATICAL EXPRESSION OMITTED]. In the examples considered here, K1 and K2 are constants (they are taken from our crytographic application). Normalizing t1 is easy as it contains only constants and the exclusive or function. Such terms can easily be normalized as follows: we first compute the list of the constants it contains, here we get [K1; K2; K1], then we sort this list with respect to some total ordering on these constants, here we get, say, [K2; K1; K1]. Then we easily detect repeated elements and apply self-cancelation: we therefore get the list [K2] hence the normalized form of K1 [direct sum] K2 [direct sum] K1 is K2.
Consider now the following term t2, where EXP and KM are two other constants, while E is a free binary constructor – E(u, v) stands for the encryption of u by v, usually denoted by {u}v:
[ILLUSTRATION OMITTED]
We first normalize the subterms K1 [direct sum] K2 [direct sum] K1 and K2 [direct sum] 0. Replacing them in the initial term, we get:
[ILLUSTRATION OMITTED]
Now, both occurrences of {K2}KM [direct sum] EXP behave as constants with respect to normalization. Indeed, they are in normal form and moreover the applications of the function symbols E at their heads act as a barrier preventing their subterms from any interaction with outside subterms with respect to rewriting. Therefore, we can normalize the whole term t2 as previously for t1 by sorting the list [{K2}KM[direct sum]EXP; KP; {K2}] with respect to some total ordering, then detect repeated elements, apply self-cancelation and get KP as normal form.
We generalize this approach to all terms as follows, using typical features of Type Theory. In a first stage, the term to be normalized is layered in such a way that each level is built up from terms belonging to the previous level. These levels alternate between layers built up using only [direct sum] constructors and layers built up using only other constructors, as lasagnas alternate between pasta-only layers and sauce layers (mixed up to your taste of tomato, meat, and cheese – in fact anything but pasta). At the term level, this stage is nothing else than constructor renaming. In a second stage, layers are normalized bottom-up. Normalizing a [direct sum]-layer only requires us to sort a list and remove pairs of identical items, while normalization of a non-[direct sum]-layer is just identity.
The second stage is easy, although we had to avoid some pitfalls (see section 1.4.3). Surprisingly, the first stage requires more work than expected: the new version of a term has a much more precise type whose computation turns out to be non-trivial (see section 1.4.2). In the whole development, we need the full power of programming with polymorphic dependent types: each layer owns its specific ordering relation, which depends on the ordered structure of the previous layer.
1.3.2 Layering Types
Let us now outline our approach in more detail. We call T the first-order inductive type of terms to be normalized. Let {[direct sum],0} ψN be the set of constructors of T. For instance, in our application, we have N = {PC,SC,E,Hash} with
[ILLUSTRATION OMITTED]
where public_const and secret_const are suitable enumerated types.
As explained in section 1.3.1, we want to split a T-term into layers. Moreover, we have to state and prove a number of functions and lemmas for each layer. For modularity reasons, it is better to handle each layer separately. Each layer provides a data type, a comparison function and a sorting function on this type, as well as correctness lemmas. Intuitively, it could be seen as a module in the sense of Harper, Lillibridge and Leroy, or better: a functor, because each layer relies on the interface of the previous layer. HLL modules have been adapted to the Calculus of Inductive Constructions and implemented in Coq. But our case is out of their scope, because here the number of layers is a dynamic notion which depends on a piece of data, namely the term we normalize. Therefore we only use the features of basic CIC, which are dependent, polymorphic and inductive types.
(Continues...)
Excerpted from Trends in Functional Programming Volume 7 by Henrik Nilsson. Copyright © 2007 Intellect. Excerpted by permission of Intellect Ltd.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.