package tezos-protocol-environment
include module type of struct include Data_encoding end
This page is for the API documentation of data-encoding: the technical description of each of the available type and combinator exported by the data-encoding library.
For a high-level view and a tutorial, see tutorial
.
Data Encoding
Overview
This module provides type-safe serialization and deserialization of data structures. Backends are provided to both /ad hoc/ binary, JSON and BSON.
This works by writing type descriptors by hand, using the provided combinators. These combinators can fine-tune the binary representation to be compact and efficient, but also provide proper field names and meta information. As a result, an API that uses those descriptors can be automatically introspected and documented.
Here is an example encoding for type (int * string)
.
let enc = obj2 (req "code" uint16) (req "message" string)
In JSON, this encoding maps values of type int * string
to JSON objects with a field code
whose value is a number and a field message
whose value is a string.
In binary, this encoding maps to two raw bytes for the int
followed by the size of the string in bytes, and finally the raw contents of the string. This binary format is mostly tagless, meaning that serialized data cannot be interpreted without the encoding that was used for serialization.
Regarding binary serialization, encodings are classified as either:
- fixed size (booleans, integers, numbers) data is always the same size for that type ;
- dynamically sized (arbitrary strings and bytes) data is of unknown size and requires an explicit length field ;
- variable size (special case of strings, bytes, and arrays) data makes up the remainder of an object of known size, thus its size is given by the context, and does not have to be serialized.
JSON operations are delegated to json-data-encoding
.
Module structure
This Data_encoding
module provides multiple submodules:
Encoding
contains the necessary types and constructors for making the type descriptors.Json
,Bson
, andBinary
contain functions to serialize and deserialize values.
include module type of struct include Data_encoding.Encoding end
Ground descriptors
voids
ground numerical types
All encodings are big-endians.
- 8-bit integers (signed or unsigned) are encoded over 1 single byte.
- 16-bit integers (signed or unsigned) are encoded over 2 bytes.
- 31-bit integers are always signed and always encoded over 4 bytes.
- 32-bit integers are always signed and always encoded over 4 bytes.
- 64-bit integers are always signed and always encoded over 8 bytes.
A note on 31-bit integers. The internal representation of integers in OCaml reserves one bit for GC tagging. The remaining bits encode a signed integer. For compatibility with 32-bit machine, we restrict these native integers to the 31-bit range.
Other ground type encodings
Descriptor combinators
Product descriptors
Constructors for objects with N fields
These are serialized to binary by converting each internal object to binary and placing them in the order of the original object. These are serialized to JSON as a JSON object with the field names. An object might only contains one 'variable' field, typically the last one. If the encoding of more than one field are 'variable', the first ones should be wrapped with dynamic_size
.
Constructors for tuples with N fields
These are serialized to binary by converting each internal object to binary and placing them in the order of the original object. These are serialized to JSON as JSON arrays/lists. Like objects, a tuple might only contains one 'variable' field, typically the last one. If the encoding of more than one field are 'variable', the first ones should be wrapped with dynamic_size
.
Sum descriptors
Predicates over descriptors
Specialized descriptors
Documenting descriptors
module With_version = Data_encoding.With_version
Create a Data_encoding.t
value which records knowledge of older versions of a given encoding as long as one can "upgrade" from an older version to the next (if upgrade is impossible one should consider that the encoding is completely different).
module Json = Data_encoding.Json
module Bson = Data_encoding.Bson
module Binary_schema = Data_encoding.Binary_schema
module Binary_stream = Data_encoding.Binary_stream
module Binary = Data_encoding.Binary
type json = Json.t
val json : json Data_encoding.Encoding.t
json
is an encoding for JSON values. It is mostly intended for internal use or for defining your own low-level combinators.
WARNING! Due to a limitation of BSON, this encoding does not safely roundtrip. Specifically, Json.destruct json (Json.construct json v)
is not guaranteed to be equal to v
. More specifically, in BSON, top-level Arrays are represented as number-indexed Objects and this library has no way to distinguish between the two, doubly so for empty collections.
See Json.destruct
's ?bson_relaxation
optional parameter.
type json_schema = Json.schema
val json_schema : json_schema Data_encoding.Encoding.t
type bson = Bson.t
module Registration = Data_encoding.Registration
module Encoding : sig ... end
include module type of struct include Encoding end
type 'a t = 'a Data_encoding.Encoding.t
The type descriptors for values of type 'a
.
type 'a encoding = 'a t
Ground descriptors
voids
val null : unit encoding
Special value null
in JSON, nothing in binary.
val empty : unit encoding
Empty object (not included in binary, encoded as empty object in JSON).
val unit : unit encoding
Unit value, omitted in binary. Serialized as an empty object in JSON, accepts any object when deserializing.
val constant : string -> unit encoding
Constant string (data is not included in the binary data).
ground numerical types
All encodings are big-endians.
- 8-bit integers (signed or unsigned) are encoded over 1 single byte.
- 16-bit integers (signed or unsigned) are encoded over 2 bytes.
- 31-bit integers are always signed and always encoded over 4 bytes.
- 32-bit integers are always signed and always encoded over 4 bytes.
- 64-bit integers are always signed and always encoded over 8 bytes.
A note on 31-bit integers. The internal representation of integers in OCaml reserves one bit for GC tagging. The remaining bits encode a signed integer. For compatibility with 32-bit machine, we restrict these native integers to the 31-bit range.
val int8 : int encoding
Signed 8 bit integer (data is encoded as a byte in binary and an integer in JSON).
val uint8 : int encoding
Unsigned 8 bit integer (data is encoded as a byte in binary and an integer in JSON).
val int16 : int encoding
Signed 16 bit integer (data is encoded as a short in binary and an integer in JSON).
val uint16 : int encoding
Unsigned 16 bit integer (data is encoded as a short in binary and an integer in JSON).
val int31 : int encoding
Signed 31 bit integer, which corresponds to type int on 32-bit OCaml systems (data is encoded as a 32 bit int in binary and an integer in JSON).
val int32 : int32 encoding
Signed 32 bit integer (data is encoded as a 32-bit int in binary and an integer in JSON).
val int64 : int64 encoding
Signed 64 bit integer (data is encoded as a 64-bit int in binary and a decimal string in JSON).
val ranged_int : int -> int -> int encoding
Integer with bounds in a given range. Both bounds are inclusive.
val z : Z.t encoding
Big number
In JSON, data is encoded as a string containing the decimal representation of the number.
In binary, data is encoded as a variable length sequence of bytes, with a running unary size bit: the most significant bit of each byte tells is this is the last byte in the sequence (0) or if there is more to read (1). The second most significant bit of the first byte is reserved for the sign (positive if zero). Sizing and sign bits ignored, data is then the binary representation of the absolute value of the number in little-endian order.
val n : Z.t encoding
Positive big number.
In JSON, data is encoded as a string containing the decimal representation of the number.
In binary, data is encoded similarly to z
but the sign bit is omitted. In other words:
Data is encoded as a variable length sequence of bytes, with a running unary size bit: the most significant bit of each byte tells is this is the last byte in the sequence (0) or if there is more to read (1). Sizing bits ignored, data is then the binary representation of the number in little-endian order.
val uint_like_n : ?max_value:int -> unit -> int encoding
uint_like_n ()
is an encoding for int
which uses the same representation as n
.
For compatibility with 32-bit machines, this encoding supports the same range of encodings as int31
, but only the positive ones. I.e., it supports the inclusive range 0
to (1 lsl 30) - 1
.
The optional parameter ?max_value
can be used to further restrict the range of values. If max_value
is set and is greater than (1 lsl 30) - 1
then the function raises Invalid_argument
.
The encoding is partial: attempting to de/serialise values which are outside of the supported range will fail. In addition, in binary, a maximum size for the serialised representation is computed based on the maximum value in the range, and the de/serialisation process fails before attempting any conversion if the size is exceeded.
val int_like_z : ?min_value:int -> ?max_value:int -> unit -> int encoding
int_like_z ()
is an encoding for int
which uses the same representation as z
.
For compatibility with 32-bit machines, this encoding supports the same range of encodings as int31
. I.e., it supports the inclusive range -(1 lsl 30)
to (1 lsl 30) - 1
.
The optional parameters ?min_value
and ?max_value
can be used to further restrict the range of values. If min_value
is set and less than -(1 lsl 30)
or if max_value
is set and is greater than (1 lsl 30) - 1
then the function raises Invalid_argument
.
The encoding is partial: attempting to de/serialise values which are outside of the supported range will fail. In addition, in binary, a maximum size for the serialised representation is computed based on the encoding's range, and the de/serialisation process fails before attempting any conversion if the size is exceeded.
val float : float encoding
Encoding of floating point number (encoded as a floating point number in JSON and a double in binary).
val ranged_float : float -> float -> float encoding
Float with bounds in a given range. Both bounds are inclusive
Other ground type encodings
val bool : bool encoding
Encoding of a boolean (data is encoded as a byte in binary and a boolean in JSON).
val string' :
?length_kind:[ `N | `Uint30 | `Uint16 | `Uint8 ] ->
string_json_repr ->
string encoding
Encoding of a string
- In binary, encoded as a byte sequence prefixed by the length of the string. The length is represented as specified by the
length_kind
parameter (default`Uint30
). - in JSON when
string_json_repr = Plain
, encoded as a string - in JSON when
string_json_repr = Hex
, encoded via hex.
val bytes' :
?length_kind:[ `N | `Uint30 | `Uint16 | `Uint8 ] ->
string_json_repr ->
Stdlib.Bytes.t encoding
Encoding of arbitrary bytes. See string'
val string : string encoding
same as string' Plain
val bytes : Stdlib.Bytes.t encoding
same as bytes' Hex
Descriptor combinators
Combinator to make an optional value (represented as a 1-byte tag followed by the data (or nothing) in binary and either the raw value or a null in JSON).
Note that the JSON representation is only weakly discriminating. Specifically, the value Some None
is represented as the raw value None
and so the two are indistinguishable. For this reason, this combinator does not support nesting, nor does it support use within a recursive (mu
) encoding.
Combinator to make a result
value (represented as a 1-byte tag followed by the data of either type in binary, and either unwrapped value in JSON (the caller must ensure that both encodings do not collide)).
Array combinator.
- encoded as an array in JSON
- encoded as the concatenation of all the element in binary prefixed its size in bytes
val array_with_length :
?max_length:int ->
[ `N | `Uint8 | `Uint16 | `Uint30 ] ->
'a encoding ->
'a array encoding
Array combinator.
- encoded as an array in JSON
- encoded as the concatenation of its length (number of elements) and all the element in binary
List combinator.
- encoded as an array in JSON
- encoded as the concatenation of all the element in binary prefixed its size in bytes
val list_with_length :
?max_length:int ->
[ `N | `Uint8 | `Uint16 | `Uint30 ] ->
'a encoding ->
'a list encoding
List combinator.
- encoded as an array in JSON
- encoded as the concatenation of its length (number of elements) and all the element in binary
val conv_with_guard :
('a -> 'b) ->
('b -> ('a, string) Stdlib.result) ->
?schema:Json_schema.schema ->
'b encoding ->
'a encoding
conv_with_guard
is similar to conv
but the function that takes in the value from the outside (untrusted) world has a chance to fail.
Specifically, if the function returns Error msg
then the decoding is interrupted with an error carrying the message msg
. If the function returns Ok _
then the decoding proceeds normally.
with_decoding_guard g e
is similar to e
but decoding fails if g
returns Error _
on the decoded value.
Association list. An object in JSON, a list of pairs in binary.
Product descriptors
type 'a field = 'a Data_encoding.Encoding.field
An enriched encoding to represent a component in a structured type, augmenting the encoding with a name and whether it is a required or optional. Fields are used to encode OCaml tuples as objects in JSON, and as sequences in binary, using combinator obj1
and the like.
Optional field. Omitted entirely in JSON encoding if None. Omitted in binary if the only optional field in a `Variable
encoding, otherwise a 1-byte prefix (`0` or `255`) tells if the field is present or not.
Optional field of variable length. Only one can be present in a given object.
Required field with a default value. If the default value is passed, the field is omitted in JSON. The value is always serialized in binary.
Constructors for objects with N fields
These are serialized to binary by converting each internal object to binary and placing them in the order of the original object. These are serialized to JSON as a JSON object with the field names. An object might only contains one 'variable' field, typically the last one. If the encoding of more than one field are 'variable', the first ones should be wrapped with dynamic_size
.
Create a larger object from the encodings of two smaller ones.
module With_field_name_duplicate_checks =
Encoding.With_field_name_duplicate_checks
With_field_name_duplicate_checks
is a subset of Encoding
where all the constructed objects are checked for duplicates.
Constructors for tuples with N fields
These are serialized to binary by converting each internal object to binary and placing them in the order of the original object. These are serialized to JSON as JSON arrays/lists. Like objects, a tuple might only contains one 'variable' field, typically the last one. If the encoding of more than one field are 'variable', the first ones should be wrapped with dynamic_size
.
Create a large tuple encoding from two smaller ones.
Sum descriptors
type 't case = 't Data_encoding.Encoding.case
A partial encoding to represent a case in a variant type. Hides the (existentially bound) type of the parameter to the specific case, providing its encoder, and converter functions to and from the union type.
type 'a matching_function = 'a -> match_result
A sum descriptor can be optimized by providing a specific matching_function
which efficiently determines in which case some value of type 'a
falls.
Note that in general you should use a total function (i.e., one defined over the whole of the 'a
type) for the matching_function
. However, in the case where you have a good reason to use a partial function, you should raise No_case_matched
in the dead branches. Reasons why you may want to do so include:
'a
is an open variant and you will complete the matching function later, and- there is a code invariant that guarantees that
'a
is not fully inhabited.
and match_result = Data_encoding.Encoding.match_result
val matched :
?tag_size:[ `Uint8 | `Uint16 ] ->
int ->
'a encoding ->
'a ->
match_result
val case :
title:string ->
?description:string ->
case_tag ->
'a encoding ->
('t -> 'a option) ->
('a -> 't) ->
't case
Encodes a variant constructor. Takes the encoding for the specific parameters, a recognizer function that will extract the parameters in case the expected case of the variant is being serialized, and a constructor function for deserialization.
The tag must be less than the tag size of the union in which you use the case. An optional tag gives a name to a case and should be used to maintain compatibility.
An optional name for the case can be provided, which is used in the binary documentation.
val matching :
?tag_size:[ `Uint8 | `Uint16 ] ->
't matching_function ->
't case list ->
't encoding
Create a single encoding from a series of cases.
In JSON, all cases are tried one after the other using the case list
. The caller is responsible for avoiding collisions. If there are collisions (i.e., if multiple cases produce the same JSON output) then the encoding and decoding processes might not be inverse of each other. In other words, destruct e (construct e v)
may not be equal to v
.
In binary, a prefix tag is added to discriminate quickly between cases. The default is `Uint8
and you must use a `Uint16
if you are going to have more than 256 cases.
The matching function is used during binary encoding of a value v
to efficiently determine which of the cases corresponds to v
. The case list is used during decoding to reconstruct a value based on the encoded tag. (Decoding is optimised internally: tag look-up has a constant cost.)
The caller is responsible for ensuring that the matching_function
and the case list
describe the same encoding. If they describe different encodings, then the decoding and encoding processes will not be inverses of each others. In other words, of_bytes e (to_bytes e v)
will not be equal to v
.
If you do not wish to be responsible for this, you can use the unoptimised union
that uses a case list
only (see below). Beware that in union
the complexity of the encoding is linear in the number of cases.
Following: a basic example use. Note that the matching_function
uses the same tags, payload conversions, and payload encoding as the case list
.
type t = A of string | B of int * int | C
let encoding_t =
(* Tags and payload encodings for each constructors *)
let a_tag = 0 and a_encoding = string in
let b_tag = 1 and b_encoding = obj2 (req "x" int) (req "y" int) in
let c_tag = 2 and c_encoding = unit in
matching
(* optimised encoding function *)
(function
| A s -> matched a_tag a_encoding s
| B (x, y) -> matched b_tag b_encoding (x, y)
| C -> matched c_tag c_encoding ())
(* decoding case list *)
[
case ~title:"A"
(Tag a_tag)
a_encoding
(function A s -> Some s | _ -> None) (fun s -> A s);
case ~title:"B"
(Tag b_tag)
b_encoding
(function B (x, y) -> Some (x, y) | _ -> None) (fun (x, y) -> B (x, y));
case ~title:"C"
(Tag c_tag)
c_encoding
(function C -> Some () | _ -> None) (fun () -> C);
]
Same as matching except that the matching function is a linear traversal of the cases.
module With_JSON_discriminant = Encoding.With_JSON_discriminant
With_JSON_discriminant
is a subset of Encoding
where the union/matching combinators (and associated functions) add discriminant for the JSON backend.
Predicates over descriptors
val is_obj : 'a encoding -> bool
Is the given encoding serialized as a JSON object?
val is_tup : 'a encoding -> bool
Does the given encoding encode a tuple?
val classify : 'a encoding -> [ `Fixed of int | `Dynamic | `Variable ]
Classify the binary serialization of an encoding as explained in the preamble.
Specialized descriptors
val string_enum : (string * 'a) list -> 'a encoding
Encode enumeration via association list
- represented as a string in JSON and
- represented as an integer representing the element's position in the list in binary. The integer size depends on the list size.
module Fixed = Encoding.Fixed
Create encodings that produce data of a fixed length when binary encoded. See the preamble for an explanation.
module Variable = Encoding.Variable
Create encodings that produce data of a variable length when binary encoded. See the preamble for an explanation.
module Bounded = Encoding.Bounded
Mark an encoding as being of dynamic size. Forces the size to be stored alongside content when needed. Typically used to combine two variable encodings in a same objects or tuple, or to use a variable encoding in an array or a list.
check_size size encoding
ensures that the binary encoding of a value will not be allowed to exceed size
bytes. The reader and the writer fails otherwise. This function do not modify the JSON encoding.
Recompute the encoding definition each time it is used. Useful for dynamically updating the encoding of values of an extensible type via a global reference (e.g., exceptions).
Define different encodings for JSON and binary serialization.
val mu :
string ->
?title:string ->
?description:string ->
('a encoding -> 'a encoding) ->
'a encoding
Combinator for recursive encodings.
Notice that the function passed to mu
must be pure. Otherwise, the behavior is unspecified.
A stateful recursive encoding can still be put under a delayed
combinator to make sure that a new encoding is generated each time it is used. Caching the encoding generation when the state has not changed is then the responsability of the client.
Documenting descriptors
Give a name to an encoding and optionally add documentation to an encoding.
type 'a lazy_t = 'a Data_encoding.Encoding.lazy_t
See lazy_encoding
below.
Combinator to have a part of the binary encoding lazily deserialized. This is transparent on the JSON side.
val force_decode : 'a lazy_t -> 'a option
Force the decoding (memoized for later calls), and return the value if successful.
val force_bytes : 'a lazy_t -> Stdlib.Bytes.t
Obtain the bytes without actually deserializing. Will serialize and memoize the result if the value is not the result of a lazy deserialization.
val apply_lazy :
fun_value:('a -> 'b) ->
fun_bytes:(Stdlib.Bytes.t -> 'b) ->
fun_combine:('b -> 'b -> 'b) ->
'a lazy_t ->
'b
Apply on structure of lazy value, and combine results
module Compact = Encoding.Compact