Module Data_encoding.Compact
This module provides specialized encoding combinators that are implemented to reduce the size of the serialization result.
The main trick this module relies on is the notion of “shared tags”. In Data_encoding, the union combinator uses (at least) one byte every time it is used, to “tag” the output and distinguish between various disjunction cases. As a consequence, if n union are composed together to define one encoding, (at least) n bytes are being allocated. However, in practice, only few bits are used in each tags, which means the rest is “wasted.”
As an example, consider this type:
type t =
| T1 of { f1 : int option; f2 : (int, bool) Either.t }
| T2 of { f3: int }
A value of t using the constructor T1 will be serialized into a binary array of this form:
┌────────┬─────────┬─────────────┬─────────┬─────────────┐
│ tag(t) │ tag(f1) │ payload(f1) │ tag(f2) │ payload(f2) │
└────────┴─────────┴─────────────┴─────────┴─────────────┘
1 byte 1 byte N bytes 1 byte M bytes
Where tag(f) is a value used by Data_encoding to distinguish between several encoding alternatives for f, and payload(f) is the resulting binary array.
For both option and Either.t, the tag of the encoding only uses one bit in practice. Which means that for T1, encoding the pair (f1, f2) needs two bits, but the default approach of Data_encoding uses two bytes instead. Similarly, to distinguish between T1 and T2 needs one bit, but the default approach is to use an additional tag (one additional byte).
This module provides an approach to tackle this issue, by allocating only one tag (i.e., one byte) that is used to store the useful bits to distinguish between the disjunction cases. We call this tag the “shared tag” of the encoding. The bits of the shared tag describes precisely the layout of the encoded data.
For instance, considering a compact encoding for t, the third bit of the tag can be used to distinguish between T1 and T2. In case the third bit is 0, the first bit of the tag determines the case of option, and the second the case of Either.t.
As a consequence the resulting binary array for the constructor T1 is, using
_ to represent meaningless bits,0 and 1 to represent actual bit values,e to represent the bit used to distinguish the Either case of f2, ando to represent the bit used to distinguish the Option case of f1:
┌──────────┬─────────────┬─────────────┐
│ _____0eo │ payload(f1) │ payload(f2) │
└──────────┴─────────────┴─────────────┘
1 byte N bytes M bytes
while the resulting binary array for the constructor T2 is
┌──────────┬─────────────┐
│ _____100 │ payload(f3) │
└──────────┴─────────────┘
1 byte N bytes
type 'a t = 'a Tezos_base.TzPervasives.Data_encoding.Compact.tThe description of a compact encoding.
val make : ?tag_size:[ `Uint0 | `Uint8 | `Uint16 ] -> 'a t -> 'a encodingTurn a compact encoding into a regular Data_encoding.t.
The default is `Uint0, i.e., no tag at all. This is can only represent values which use 0 bits of tags.
It is recommended to set the tag_size explicitly.
val tag_bit_count : 'a t -> inttag_bit_count c is the number of bits of tag that a compact encoding uses.
Combinators
Similarly to Data_encoding, we provide various combinators to compose compact encoding together.
Base types
A type with no inhabitant.
A compact encoding used to denote an impossible case inside of conjunction operators such as union.
Uses 0 bit of tag.
refute x can be used to refute a branch of a match which exhibits a value of type void.
A compact encoding of the singleton value unit, which has zero memory footprint.
Uses zero (0) bits of tag.
In JSON it is represented as the empty object {}.
A compact encoding of the singleton value unit, which has zero memory footprint.
Uses zero (0) bits of tag.
In JSON it is represented as null.
Efficient encoding of boolean values. It uses one (1) bit in the shared tag, and zero bit in the payload.
payload encoding unconditionally uses encoding in the payload, and uses zero (0) bit in the shared tag.
val option : 'a t -> 'a option tUses one (1) bit in the tag to encode an option.
Conversion
val conv : ?json:'a encoding -> ('a -> 'b) -> ('b -> 'a) -> 'b t -> 'a tconv ?json f g e reuses the encoding e for type b to encode a type a using the isomorphism (f, g). The optional argument allows to overwrite the encoding used for JSON, in place of the one computed by default.
Conjunctions
tup1 e wraps the underlying encoding of e in a tup1 (from the parent module). This is only useful in that, provided you use make ~tag_size:`Uint0 to translate the returned compact encoding, it allows you to call merge_tups on it.
Uses as many bits of tag as e.
val tup2 : 'a t -> 'b t -> ('a * 'b) ttup2 e1 e2 concatenates the shared tags and payloads of e1 and e2.
Uses as many bits of tags as the sum of the tags used by its arguments.
val tup3 : 'a t -> 'b t -> 'c t -> ('a * 'b * 'c) ttup3 e1 e2 e3 concatenates the shared tags and payloads of e1, e2, and e3.
Uses as many bits of tags as the sum of the tags used by its arguments.
val tup4 : 'a t -> 'b t -> 'c t -> 'd t -> ('a * 'b * 'c * 'd) ttup4 e1 e2 e3 e4 concatenates the shared tags and payloads of e1, e2, e3 and e4.
Uses as many bits of tags as the sum of the tags used by its arguments.
val tup5 :
'f1 t ->
'f2 t ->
'f3 t ->
'f4 t ->
'f5 t ->
('f1 * 'f2 * 'f3 * 'f4 * 'f5) tval tup6 :
'f1 t ->
'f2 t ->
'f3 t ->
'f4 t ->
'f5 t ->
'f6 t ->
('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6) tval tup7 :
'f1 t ->
'f2 t ->
'f3 t ->
'f4 t ->
'f5 t ->
'f6 t ->
'f7 t ->
('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7) tval tup8 :
'f1 t ->
'f2 t ->
'f3 t ->
'f4 t ->
'f5 t ->
'f6 t ->
'f7 t ->
'f8 t ->
('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8) tval tup9 :
'f1 t ->
'f2 t ->
'f3 t ->
'f4 t ->
'f5 t ->
'f6 t ->
'f7 t ->
'f8 t ->
'f9 t ->
('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8 * 'f9) tval tup10 :
'f1 t ->
'f2 t ->
'f3 t ->
'f4 t ->
'f5 t ->
'f6 t ->
'f7 t ->
'f8 t ->
'f9 t ->
'f10 t ->
('f1 * 'f2 * 'f3 * 'f4 * 'f5 * 'f6 * 'f7 * 'f8 * 'f9 * 'f10) tval req : string -> 'a t -> 'a fieldreq "f" compact can be used in conjunction with objN to create compact encoding with more readable JSON encoding, as an alternative of tupN. The JSON output is a dictionary which contains the field f with a value encoded using compact.
val opt : string -> 'a t -> 'a option fieldSame as req, but the field is optional.
An objN compact encoding uses as many bits of tags as its number of opt fields.
obj1 can be used in conjunction with req or opt to produce more readable JSON outputs.
Uses as many bits of tags as there are opt fields in its arguments.
An alternative to tup2 which can be used in conjunction with req and opt to produce more readable JSON outputs based on dictionary.
Uses as many bits of tags as there are opt fields in its arguments.
An alternative to tup3 which can be used in conjunction with req and opt to produce more readable JSON outputs based on dictionary.
Uses as many bits of tags as there are opt fields in its arguments.
An alternative to tup4 which can be used in conjunction with req and opt to produce more readable JSON outputs based on dictionary.
Uses as many bits of tags as there are opt fields in its arguments.
A compact encoding for int32 values. It uses 2 bits in the shared tag, to determine how many bytes are used in the payload:
00: from 0 to 255, one byte.01: from 256 to 65,535, two bytes.10: from 65,536 to Int32.max_int and for negative values, four bytes.
Note that by itself, this compact encoding is not necessarily more economical in space. However, in combination with other compact encodings (say, when you have two bits of tag to spare anyway) or given a very skewed distribution of values (say, when the vast majority of your values are in the 0–255 interval), then it can help you save some space.
Uses two (2) bits of tag.
A compact encoding for int64 values. It uses 2 bits in the shared tag, to determine how many bytes are used in the payload:
00: from 0 to 255, one byte.01: from 256 to 65,535, two bytes.10: from 65,536 to 4,294,967,295 four bytes.11: from 4,294,967,295 and for negative values eight bytes.
See int32 for usage recommendations.
Uses two (2) bits of tag.
val list : bits:int -> 'a encoding -> 'a list tlist ~bits:n encoding uses n bits in the shared tag to encode the size of small lists.
For instance, list ~bits:2 encoding,
00: the payload is empty, because it is the empty list01: the singleton list, whose element is encoded using encoding.10: a list of two elements encoded with encoding11: a list of more than two elements, prefixed with its encoded size (i.e., the number of bytes it takes to represent the whole value) (which uses 4 bytes)
With ~bits:3, lists of 0 to 6 items are encoded with tags 000 to 110, and lists of 7 or more are encoded with tag 111 and the length.
Uses n bits of tags.
Disjunctions
val case :
title:string ->
?description:string ->
'b t ->
('a -> 'b option) ->
('b -> 'a) ->
'a caseUsage: case name encode decode encoding
Intended to be used inside a union.
val union : ?union_tag_bits:int -> ?cases_tag_bits:int -> 'a case list -> 'a tunion cases creates a new compact encoding to encompass a disjunction of cases.
The value uses some tag bits to distinguish the different cases of the union (see discussion of parameter union_tag_bits) and some tag bits (potentially 0) to distinguish the values within a case (see discussion of parameter cases_tag_bits).
E.g., Given type t = A of bool | B of int option and the encoding
let c =
union [
case "A" (function A b -> Some b | _ -> None) (fun b -> A b) bool;
case "B" (function B i -> Some i | _ -> None) (fun i -> B b) (option (payload int));
in
make ~tag_size:`Uint8 cthen a value can have either of the following 4 tags:
- 0b00000000: case
A, false - 0b00000001: case
A, true - 0b00000010: case
B, Some (a payload of 4 bytes follows) - 0b00000011: case
B, None
In other words, the second bit of this tag is used to discriminate the cases of the union, whilst the first bit is used to discriminate within each case.
Note that the compact union can be combined with more compact encoding before being passed to make in which case the two bits of tags will be combined with the tags of the other compact encodings. E.g., make ~tag_size:`Uint8 (tup2 c c).
If not provided explicitly, union_tag_bits is inferred: it is set to the smallest value which can accommodate the provided cases.
It is recommended to set union_tag_bits explicitly.
You can over-provision the union_tag_bits if you expect the cases to grow in the future.
It is recommended to set cases_tag_bits explicitly if you need the layout to be stable even if the cases or one of its element changes.
You can over-provision the cases_tag_bits if you expect one of the cases to change to use more bits of tag or if you expect that a new case using more tag bits will be added in the future.
E.g., passing ~cases_tag_bits:7 to the union in the example above will cause the values to be represented as follows:
- 0b00000000: case
A, false - 0b00000001: case
A, true - 0b10000000: case
B, Some (a payload of 4 bytes follows) - 0b10000001: case
B, None
E.g., union ~cases_tag_bits:0 [case "Bool" Option.some Fun.id bool] raises Invalid_argument because bool uses one bit of tag which is more than 0.
val void_case : title:string -> 'a casevoid_case ~title is an impossible case. It is provided so you can add unused tags within a union. E.g., union [case _; void_case ~title:"reserved-for-v04-compatibility"; case _; case _] uses two bits of tag for the discrimination of the union, but the tag 01 is unused (reserved for some version compatibility).
val or_int32 :
int32_title:string ->
alt_title:string ->
?alt_description:string ->
'a encoding ->
(int32, 'a) Either.t tor_int32 ~i32_title ~alt_title ?alt_description c creates a new compact encoding for the disjunction of any type a (see case) with int32. It uses the same number of bits as int32, that is 2, and uses the spare tag (11) within this size for values of type a.
Custom
This module can be used to write compact encoding for complex types without relying on the existing combinators.