Chapter 7 Generalized algebraic datatypes

Generalized algebraic datatypes, or GADTs, extend usual sum types in two ways: constraints on type parameters may change depending on the value constructor, and some type variables may be existentially quantified. Adding constraints is done by giving an explicit return type, where type parameters are instantiated:

type _ term = | Int : int -> int term | Add : (int -> int -> int) term | App : ('b -> 'a) term * 'b term -> 'a term

This return type must use the same type constructor as the type being defined, and have the same number of parameters. Variables are made existential when they appear inside a constructor’s argument, but not in its return type. Since the use of a return type often eliminates the need to name type parameters in the left-hand side of a type definition, one can replace them with anonymous types _ in that case.

The constraints associated to each constructor can be recovered through pattern-matching. Namely, if the type of the scrutinee of a pattern-matching contains a locally abstract type, this type can be refined according to the constructor used. These extra constraints are only valid inside the corresponding branch of the pattern-matching. If a constructor has some existential variables, fresh locally abstract types are generated, and they must not escape the scope of this branch.

1 Recursive functions

We write an eval function:

let rec eval : type a. a term -> a = function | Int n -> n (* a = int *) | Add -> (fun x y -> x+y) (* a = int -> int -> int *) | App(f,x) -> (eval f) (eval x) (* eval called at types (b->a) and b for fresh b *)

And use it:

let two = eval (App (App (Add, Int 1), Int 1))

val two : int = 2

It is important to remark that the function eval is using the polymorphic syntax for locally abstract types. When defining a recursive function that manipulates a GADT, explicit polymorphic recursion should generally be used. For instance, the following definition fails with a type error:

let rec eval (type a) : a term -> a = function | Int n -> n | Add -> (fun x y -> x+y) | App(f,x) -> (eval f) (eval x)

Error: The value f has type ($b -> a) term but an expression was expected of type 'a The type constructor $b would escape its scope Hint: $b is an existential type bound by the constructor App.

In absence of an explicit polymorphic annotation, a monomorphic type is inferred for the recursive function. If a recursive call occurs inside the function definition at a type that involves an existential GADT type variable, this variable flows to the type of the recursive function, and thus escapes its scope. In the above example, this happens in the branch App(f,x) when eval is called with f as an argument. In this branch, the type of f is ($App_'b -> a) term. The prefix $ in $App_'b denotes an existential type named by the compiler (see 7.5). Since the type of eval is 'a term -> 'a, the call eval f makes the existential type $App_'b flow to the type variable 'a and escape its scope. This triggers the above error.

2 Type inference

Type inference for GADTs is notoriously hard. This is due to the fact some types may become ambiguous when escaping from a branch. For instance, in the Int case above, n could have either type int or a, and they are not equivalent outside of that branch. As a first approximation, type inference will always work if a pattern-matching is annotated with types containing no free type variables (both on the scrutinee and the return type). This is the case in the above example, thanks to the type annotation containing only locally abstract types.

In practice, type inference is a bit more clever than that: type annotations do not need to be immediately on the pattern-matching, and the types do not have to be always closed. As a result, it is usually enough to only annotate functions, as in the example above. Type annotations are propagated in two ways: for the scrutinee, they follow the flow of type inference, in a way similar to polymorphic methods; for the return type, they follow the structure of the program, they are split on functions, propagated to all branches of a pattern matching, and go through tuples, records, and sum types. Moreover, the notion of ambiguity used is stronger: a type is only seen as ambiguous if it was mixed with incompatible types (equated by constraints), without type annotations between them. For instance, the following program types correctly.

let rec sum : type a. a term -> _ = fun x -> let y = match x with | Int n -> n | Add -> 0 | App(f,x) -> sum f + sum x in y + 1

val sum : 'a term -> int = <fun>

Here the return type int is never mixed with a, so it is seen as non-ambiguous, and can be inferred. When using such partial type annotations we strongly suggest specifying the -principal mode, to check that inference is principal.

The exhaustiveness check is aware of GADT constraints, and can automatically infer that some cases cannot happen. For instance, the following pattern matching is correctly seen as exhaustive (the Add case cannot happen).

let get_int : int term -> int = function | Int n -> n | App(_,_) -> 0

3 Refutation cases

Usually, the exhaustiveness check only tries to check whether the cases omitted from the pattern matching are typable or not. However, you can force it to try harder by adding refutation cases, written as a full stop. In the presence of a refutation case, the exhaustiveness check will first compute the intersection of the pattern with the complement of the cases preceding it. It then checks whether the resulting patterns can really match any concrete values by trying to type-check them. Wild cards in the generated patterns are handled in a special way: if their type is a variant type with only GADT constructors, then the pattern is split into the different constructors, in order to check whether any of them is possible (this splitting is not done for arguments of these constructors, to avoid non-termination). We also split tuples and variant types with only one case, since they may contain GADTs inside. For instance, the following code is deemed exhaustive:

type _ t = | Int : int t | Bool : bool t let deep : (char t * int) option -> char = function | None -> 'c' | _ -> .

Namely, the inferred remaining case is Some _, which is split into Some (Int, _) and Some (Bool, _), which are both untypable because deep expects a non-existing char t as the first element of the tuple. Note that the refutation case could be omitted here, because it is automatically added when there is only one case in the pattern matching.

Another addition is that the redundancy check is now aware of GADTs: a case will be detected as redundant if it could be replaced by a refutation case using the same pattern.

4 Advanced examples

The term type we have defined above is an indexed type, where a type parameter reflects a property of the value contents. Another use of GADTs is singleton types, where a GADT value represents exactly one type. This value can be used as runtime representation for this type, and a function receiving it can have a polytypic behavior.

Here is an example of a polymorphic function that takes the runtime representation of some type t and a value of the same type, then pretty-prints the value as a string:

type _ typ = | Int : int typ | String : string typ | Pair : 'a typ * 'b typ -> ('a * 'b) typ let rec to_string: type t. t typ -> t -> string = fun t x -> match t with | Int -> Int.to_string x | String -> Printf.sprintf "%S" x | Pair(t1,t2) -> let (x1, x2) = x in Printf.sprintf "(%s,%s)" (to_string t1 x1) (to_string t2 x2)

Another frequent application of GADTs is equality witnesses.

type (_,_) eq = Eq : ('a,'a) eq let cast : type a b. (a,b) eq -> a -> b = fun Eq x -> x

Here type eq has only one constructor, and by matching on it one adds a local constraint allowing the conversion between a and b. By building such equality witnesses, one can make equal types which are syntactically different.

Here is an example using both singleton types and equality witnesses to implement dynamic types.

let rec eq_type : type a b. a typ -> b typ -> (a,b) eq option = fun a b -> match a, b with | Int, Int -> Some Eq | String, String -> Some Eq | Pair(a1,a2), Pair(b1,b2) -> begin match eq_type a1 b1, eq_type a2 b2 with | Some Eq, Some Eq -> Some Eq | _ -> None end | _ -> None type dyn = Dyn : 'a typ * 'a -> dyn let get_dyn : type a. a typ -> dyn -> a option = fun a (Dyn(b,x)) -> match eq_type a b with | None -> None | Some Eq -> Some x

5 Existential type names in error messages

The typing of pattern matching in the presence of GADTs can generate many existential types. When necessary, error messages refer to these existential types using compiler-generated names. Currently, the compiler generates these names according to the following nomenclature:

First, types whose name starts with a $ are existentials.
$a denotes an existential type introduced for the type variable 'a of a GADT constructor:

type any = Any : 'name -> any let escape (Any x) = x

Error: The value x has type $name but an expression was expected of type 'a The type constructor $name would escape its scope Hint: $name is an existential type bound by the constructor Any.
$'a if the existential variable was unified with the type variable 'a during typing:

type ('arg,'result,'aux) fn = | Fun: ('a ->'b) -> ('a,'b,unit) fn | Mem1: ('a ->'b) * 'a * 'b -> ('a, 'b, 'a * 'b) fn let apply: ('arg,'result, _ ) fn -> 'arg -> 'result = fun f x -> match f with | Fun f -> f x | Mem1 (f,y,fy) -> if x = y then fy else f x

Error: This pattern matches values of type ($'arg, 'result, $'arg * 'result) fn but a pattern was expected which matches values of type ($'arg, 'result, unit) fn The type constructor $'arg would escape its scope
$n (n a number) is an internally generated existential which could not be named using one of the previous schemes.

As shown by the last item, the current behavior is imperfect and may be improved in future versions.

6 Explicit naming of existentials

As explained above, pattern-matching on a GADT constructor may introduce existential types. Syntax has been introduced which allows them to be named explicitly. For instance, the following code names the type of the argument of f and uses this name.

type _ closure = Closure : ('a -> 'b) * 'a -> 'b closure let eval = fun (Closure (type a) (f, x : (a -> _) * _)) -> f (x : a)

All existential type variables of the constructor must be introduced by the (type ...) construct and bound by a type annotation on the outside of the constructor argument.

One can additionally bind existentials that were freshly introduced by the refinement of another existential type, if they appear in the type of the arguments.

type _ ty = | Int : int ty | Pair : 'b ty * 'c ty -> ('b * 'c) ty let rec default : type a. a ty -> a = function | Int -> 0 | Pair (type b c) (b, c : b ty * c ty) -> (default b : b), (default c : c)

7 Equations on non-local abstract types

GADT pattern-matching may also add type equations to non-local abstract types. The behaviour is the same as with local abstract types. Reusing the above eq type, one can write:

module M : sig type t val x : t val e : (t,int) eq end = struct type t = int let x = 33 let e = Eq end let x : int = let Eq = M.e in M.x

8 Types used as GADT indices

In our discussion of refutation cases (7.3), we used the following example:

type _ t = | Int : int t | Bool : bool t let deep : (char t * int) option -> char = function | None -> 'c' | _ -> .

The type-checker is able to verify that the Some _ case is impossible because the type char t is empty. This relies on the fact that the types occurring in the GADT index are known to be distinct: the constructor Int returns an int t, and we know that int and char are distinct types, so Int is impossible at type char t – and similarly for Bool. In other words, checking that a GADT case is impossible (at a specific type) requires reasoning on inequalities between types.

For this reason, using abstract types as GADT indices does not work as intended. Consider for instance:

(* type-level natural numbers, in unary notation *) module TypeNat = struct type zero type !'a succ end type ('a, _) sized_list = | Nil : ('a, TypeNat.zero) sized_list | Cons : 'a * ('a, 'n) sized_list -> ('a, 'n TypeNat.succ) sized_list

This definition does not allow to rule out certain cases as impossible as we intended, for example the following fails:

let of_singleton : ('a, TypeNat.(zero succ)) sized_list -> 'a = function | Cons (v, Nil) -> v | _ -> .

Error: This match case could not be refuted. Here is an example of a value that would reach it: Nil

Because zero and succ are abstract types, the compiler assumes that their definition is unknown, and that they might in fact be the same type. In particular, with this signature the types zero and zero succ are not known to be distinct (even if their implementations are distinct) and the compiler cannot agree that the Nil case is impossible at type zero succ:¹

One way to reason about this is that a module with the same signature could have been implemented as follows, where clearly zero and zero succ are not incompatible:

module OtherTypeNat : sig type zero type !'a succ end = struct type zero = unit type 'a succ = 'a end

The recommended way to avoid this problem is to give a concrete definition to types such as zero and succ, that are only meant to be used in GADT indices. (This is counter-intuitive, as we will never manipulate values of this type.) You could use:

module TypeNat = struct type zero = Zero type 'a succ = Succ end

The types are now incompatibles as their constructor names are exposed and distinct.

As an additional refinement, we can define those index types using using a private annotation, which clarifies that they are not meant to be used to build values:

module TypeNat = struct type zero = private Zero type 'a succ = private Succ end

Remark: Usually private is meant to be used in interfaces, not in implementations; see the documentation of private types in 12.3. The constructors of a datatype that is defined as private cannot be used to build values, so the type is effectively empty. This makes it more explicit that the types are only intended to be used as GADT indices.

8.1 Legacy behavior: types in the current module

Early versions of OCaml, up to OCaml 5.4, had a special handling of types defined in the current module. This special case does not apply to the definitions of succ and zero discussed above, which are in separate submodules.

So the following definitions used to suffice to define sized_list, as the compiler would accept that zero and 'a succ are always distinct:

type zero type !'a succ

The reasoning was that while in general abstract types could be defined as anything somewhere in the program, abstract types that we just introduced are not defined at all, so it is correct to assume that they are distinct. Similarly, datatypes defined in the current module were considered distinct even if they used the exact same constructors or field names.

This special case made the system more complex and less modular. For example, GADT examples would work well when everything is defined in the same module, but function definitions located in another module would fail, which is very confusing to users.

Programs using this previously-supported feature can always be rewritten to use explicit definitions instead:

type zero = private Zero type 'a succ = private Succ

1: We also needed an explicit injectivity annotation, type !'a succ, otherwise the definition of sized_list is also rejected because succ is abstract.

Chapter 7 Generalized algebraic datatypes

1 Recursive functions

﻿2 Type inference

﻿3 Refutation cases

﻿4 Advanced examples

﻿5 Existential type names in error messages

﻿6 Explicit naming of existentials

﻿7 Equations on non-local abstract types

﻿8 Types used as GADT indices

﻿8.1 Legacy behavior: types in the current module