package ppx_deriving_protobuf
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=51fe022ddb2e001f00559482631276876e5be641227491449358bcd05eae3117
md5=2e3ba97f2354ba51cb470f899e3ccc25
README.md.html
[@@deriving protobuf]
deriving protobuf is a ppx_deriving plugin that generates Google Protocol Buffers serializers and deserializes from an OCaml type definition.
Sponsored by Evil Martians. protoc export sponsored by MaxProfitLab.
Installation
deriving protobuf can be installed via OPAM:
$ opam install ppx_deriving_protobuf
Usage
In order to use deriving protobuf, require the package ppx_deriving_protobuf
.
Syntax
deriving protobuf is not a replacement for protoc and it does not attempt to generate code based on protoc definitions. Instead, it generates code based on OCaml type definitions. It can also generate input files for protoc.
deriving protobuf-generated serializers are derived from the structure of the type and several attributes: @key
, @encoding
, @bare
and @default
. Generation of the serializer is triggered by a [@@deriving Protobuf]
attribute attached to the type definition.
deriving protobuf generates two functions per type:
type t = ... [@@deriving protobuf]
val t_from_protobuf : Protobuf.Decoder.t -> t
val t_to_protobuf : t -> Protobuf.Encoder.t -> unit
In order to deserialize a value of type t
from bytes message
, use:
let output = Protobuf.Decoder.decode_exn t_from_protobuf message in
...
In order to serialize a value input
of type t
, use:
let message = Protobuf.Encoder.encode_exn t_to_protobuf input in
...
Records
A record is the most obvious counterpart for a Protobuf message. In a record, every field must have an explicitly defined key. For example, consider this protoc definition and its deriving protobuf equivalent:
message SearchRequest {
required string query = 1;
optional int32 page_number = 2;
optional int32 result_per_page = 3;
}
type search_request = {
query : string [@key 1];
page_number : int option [@key 2];
result_per_page : int option [@key 3];
} [@@deriving protobuf]
deriving protobuf recognizes and maps option
to optional fields, and list
and array
to repeated fields.
Optional and default fields
A [@default]
attribute attached to a required field converts it to an optional field; if the field is not present, its value is assumed to be the default one, and conversely, if the value of the field is same as the default value, it is not serialized:
message Defaults {
optional int32 results = 1 [default = 10];
}
type defaults = {
results : int [@key 1] [@default 10];
}
Note that protoc's default behavior is to assign a type-specific default value to optional fields missing from message, i.e. 0
to integer fields, ""
to string fields, and so on. With deriving protobuf, optional fields are represented with the option
type; it is possible to emulate protoc's behavior by explicitly specifying int [@default 0]
, etc.
Integers
Unlike protoc, deriving protobuf allows a much more flexible mapping between wire representations of integral types and their counterparts in OCaml. Any combination of the known integral types (int
, int32
, int64
, Int32.t
, Int64.t
, Uint32.t
and Uint64.t
) and wire representations (varint
, zigzag
, bits32
and bits64
) is valid. The wire representation is specified using the @encoding
attribute.
For example, consider this protoc definition and a compatible deriving protobuf one:
message Integers {
required int32 bar = 1;
required fixed64 baz = 2;
}
type integers = {
bar : Uint64.t [@key 1] [@encoding `varint];
baz : int [@key 2] [@encoding `bits64];
}
When parsing or serializing, the values will be appropriately extended or truncated. If a value does not fit into the narrower type used for serialization or deserialization, Decoder.Error Decoder.Overflow
or Encoder.Error Encoder.Overflow
is raised.
The following table summarizes equivalence between integral types of protoc and encodings of deriving protobuf:
Encoding | protoc type |
---|---|
varint | int32, int64, uint32, uint64 |
zigzag | sint32, sint64 |
bits32 | fixed32, sfixed32 |
bits64 | fixed64, sfixed64 |
By default, OCaml types use the following encoding:
OCaml type | Encoding | protoc type |
---|---|---|
int | varint | int32 or int64 |
int32 or Int32.t | bits32 | sfixed32 |
Uint32.t | bits32 | fixed32 |
int64 or Int64.t | bits64 | sfixed64 |
Uint64.t | bits64 | fixed64 |
Note that no OCaml type maps to zigzag-encoded sint32
or sint64
by default. It is necessary to use [@encoding `zigzag]
explicitly.
Floats
Similarly to integers, float
maps to protoc's double
by default, but it is possible to specify the encoding explicitly:
message Floats {
required float foo = 1;
required double bar = 2;
}
type floats = {
foo : float [@key 1] [@encoding `bits32];
bar : float [@key 2];
} [@@deriving protobuf]
Booleans
bool
maps to protoc's bool
and is encoded on wire using varint
:
message Booleans {
required bool bar = 1;
}
type booleans = {
bar : bool [@key 1];
} [@@deriving protobuf]
Strings and bytes
All of string
, String.t
, bytes
and Bytes.t
map to protoc's string
or bytes
and are encoded on wire using bytes
:
Note that unlike protoc, which has an additional invariant that the contents of a string
must be valid UTF-8 text, deriving protobuf does not have this invariant. However, you still should observe it in your programs.
message Strings {
required string bar = 1;
required bytes baz = 2;
}
type strings = {
bar : string [@key 1];
baz : bytes [@key 2];
} [@@deriving protobuf]
Tuples
A tuple is treated in exactly same way as a record, except that keys are derived automatically starting at 1. The definition of search_request
above could be rewritten as:
type search_request' = string * int option * int option
[@@deriving protobuf]
Additionally, a tuple can be used in any context where a scalar value is expected; in this case, it is equivalent to an anonymous inner message:
message Nested {
message StringFloatPair {
required string str = 1;
required float flo = 2;
}
required int32 foo = 1;
optional StringFloatPair bar = 2;
}
type nested = {
foo : int [@key 1];
bar : (string * float) option [@key 2];
} [@@deriving protobuf]
Variants
An OCaml variant types is normally mapped to an entire Protobuf message by deriving protobuf, as opposed to protoc, which maps an enum
to a simple varint
. This is done because OCaml constructors can have arguments, but protoc's enum
s can not.
Note that even if a type doesn't have any constructor with arguments, it is still mapped to a message, because it would not be possible to extend the type later with a constructor with arguments otherwise.
Every constructor must have an explicitly specified key; if the constructor has one argument, it is mapped to an optional field with the key corresponding to the key of the constructor plus one. If there is more than one argument, they're treated like a tuple.
Consider this example:
message Variant {
enum T {
A = 1;
B = 2;
C = 3;
D = 4;
}
message C {
required string foo = 1;
required string bar = 2;
}
message D {
required string s1 = 1;
required string s2 = 2;
}
required T t = 1;
optional int32 b = 3; // (B = 2) + 1
optional C c = 4; // (C = 3) + 1
optional D d = 5; // (D = 4) + 1
}
type variant =
| A [@key 1]
| B of int [@key 2]
| C of string * string [@key 3]
| D of {s1: string ; s2: string} [@key 4]
[@@deriving protobuf]
Note that decoder considers messages which contain more than one optional field invalid and rejects them.
In order to achieve better compatibility with protoc, it is possible to embed a variant where no constructors have arguments without wrapping it in a message:
enum BareVariant {
A = 1;
B = 2;
}
message Container {
required T value = 1;
}
type bare_variant =
| A [@key 1]
| B [@key 2]
and container = {
value : bare_variant [@key 1] [@bare];
} [@@deriving protobuf]
In practice, if a variant has no constructors with arguments, additional two functions are generated with the following signatures:
type t = A | B | ... [@@deriving protobuf]
val t_from_protobuf_bare : Protobuf.Decoder.t -> t
val t_to_protobuf_bare : Protobuf.Encoder.t -> t -> unit
These functions do not expect additional framing; they just parse or serialize a single varint
.
Polymorphic variants
Polymorphic variants are handled in exactly same way as regular variants. However, you can also embed them directly, like tuples, in which case the semantics is the same as defining an alias for the variant and then using that type.
This feature can be combined with the [@bare]
annotation to create a useful shorthand:
message Packet {
enum Type {
REQUEST = 1;
REPLY = 2;
}
required Type type = 1;
required int32 value = 2;
}
type packet = {
type : [ `Request [@key 1] | `Reply [@key 2] ] [@key 1] [@bare];
value : int [@key 2];
} [@@deriving protobuf]
Type aliases
A type alias (statement of form type a = b
) is treated by deriving protobuf as a definition of a message with one field with key 1:
message Alias {
required int32 val = 1;
}
type alias = int [@@deriving protobuf]
Nested messages
When deriving protobuf encounters a non-scalar type, it generates a call to the serialization or deserialization function corresponding to the full path to the type.
Consider this definition:
type foo = bar * Baz.Quux.t [@@deriving protobuf]
The generated deserializer code will refer to bar_from_protobuf
and Baz.Quux.t_from_protobuf
; the serializer code will call bar_to_protobuf
and Baz.Quux.t_to_protobuf
.
Packed fields
Types which are encoded as varint
, bits32
or bits64
, that is, numeric fields or bare variants, can be declared as "packed" with the [@packed]
attribute, in which case the serializer emits a more compact representation. Only protoc newer than 2.3.0 will recognize this representation. Note that the deserializer understands it regardless of the presence of [@packed]
attribute.
message Packed {
repeated int32 elem = 1 [packed=true];
}
type packed = int list [@key 1] [@packed] [@@deriving protobuf]
Parametric polymorphism
deriving protobuf is able to handle polymorphic type definitions. In this case, the serializing or deserializing function will accept one additional argument for every type variable; correspondingly, the value of this argument will be passed to serializer or deserializer of any nested parametric type.
Consider this example:
type 'a mylist =
| Nil [@key 1]
| Cons of 'a * 'a mylist [@key 2]
[@@deriving protobuf]
Here, the following functions will be generated:
val mylist_from_protobuf : (Protobuf.Decoder.t -> 'a) -> Protobuf.Decoder.t ->
'a mylist
val mylist_to_protobuf : (Protobuf.Decoder.t -> 'a -> unit) -> Protobuf.Decoder.t ->
'a mylist -> unit
An example usage would be:
type a = int [@@deriving protobuf]
let get_ints message =
let decoder = Protobuf.Decoder.of_bytes message in
mylist_from_protobuf a_from_protobuf decoder
It's also possible to specify concrete types as parameters; in this case, deriving protobuf will infer the serializer/deserializer functions automatically. For example:
(* Combining two samples above *)
type b = a mylist [@@deriving protobuf]
Error handling
Both serializers and deserializers rigorously verify their input data. The only possible exception that can be raised during serialization is Protobuf.Encoder.Failure
, and during deserialization is Protobuf.Decoder.Failure
.
Decoder errors
The decoder attempts to annotate its failures with useful location information, but only if that wouldn't cost too much in terms of performance and complexity.
In general, as long as you're using the same protocol on both sides, deserialization or should never raise. The errors would mainly arise when interoperating with code generated by protoc that doesn't observe OCaml-specific invariants, or when handling malicious input.
It discerns these types of failure (represented with Decoder.Failure
exception):
Incomplete
: the message was truncated or using invalid wire format. Frame overruns are likely to produce this as well.Overlong_varint
: avarint
greater than 2⁶⁴-1 was encountered.Malformed_field
: an invalid wire type was encountered.Overflow fld
: an integer field in the message contained a value outside the range of the corresponding type, e.g. avarint
field corresponding toint32
contained0xffffffff
.Unexpected_payload (fld, kind)
: a key corresponding to fieldfld
had a wire type incompatible with the specified encoding, e.g. avarint
wire type for a nested message.Missing_field fld
: a required fieldfld
was missing from the message.Malformed_variant fld
: a variantfld
contained a key not corresponding to any defined constructor.
The decoder errors refer to fields via so-called "paths"; a path corresponds to the OCaml syntax for referring to a type, field or constructor, but can contain additional /<number>
(e.g. /0
) component for an immediate tuple.
For example, the string
field will have the path Foo.r.ra/1
:
(* foo.ml *)
type r = {
ra: (int * string) option [@key 1];
} [@@deriving protobuf]
Encoder errors
The encoder discerns these types of failure (represented with Encoder.Failure
exception):
Overflow fld
: an integer value was outside the range of its corresponding encoding, e.g. aint64
containing0xffffffffffff
was serialized tobits32
.
The encoder errors use the same "path" convention as decoder errors.
Extending protocols
In real-world applications, implementations using multiple versions of the same protocol must coexist. Protocol Buffers offer an imperfect and sometimes complicated, but very powerful and practical solution to this problem.
The wire protocol is designed in a way that allows to safely extend it if one follows a set of constraints.
Always
Any of the following changes may be applied to either the sender or receiver of the message without breaking protocol:
Adding an optional field to a record, or an optional element to a tuple, or an optional argument to a constructor with multiple arguments.
Converting an optional field, tuple element or constructor argument into a repeated one.
Converting an optional field, tuple element or constructor argument into a required field with a default value, or vice versa.
Converting a repeated field, tuple element or constructor argument into an optional one (this is not recommended, as it silently ignores some of input data).
Turning an alias into a record that has a field marked
[@key 1]
.Turning an alias into a tuple where the first element is the former type of the alias (this is not recommended for reasons of code clarity).
Never
When communicating bidirectionally, violating any of the following constraints always results in exceptions or receiving garbage data:
Never change
[@key]
or[@encoding]
annotations; never add or remove[@bare]
annotation.Never change primitive (i.e. excluding
list
,option
orarray
qualifiers) types of existing fields, tuple elements or constructor arguments.Never remove required fields, tuple elements or constructor arguments.
Never replace a primitive type of a field, tuple element or constructor argument with a tuple, even if the first element of the replacing tuple is the former primitive type.
Never add arguments to an argument-less variant constructor, or vice versa.
The following sections list some exceptions to this rule when the communication is unidirectional.
On sender
Any of the following changes may be applied exclusively to the sender without breaking the existing receivers:
Adding a required field, tuple element, or argument to a constructor with multiple arguments.
Converting an optional or repeated field, tuple element or constructor argument into a required one.
Replacing an integer type with a narrower one while preserving the encoding (it's a good idea to add the
[@encoding]
annotation explicitly).Adding a variant constructor, but never actually sending it.
On receiver
Any of the following changes may be applied exclusively to the receiver without losing the ability to decode messages from existing senders:
Removing a required field, tuple element, or argument to a constructor with more than two arguments.
Replacing an integer type with a wider one while preserving the encoding (it's a good idea to add the
[@encoding]
annotation explicitly).
Protoc export
deriving protobuf can export message types in proto2 language, the format that protoc accepts; protoc version 2.6 or later is required.
To enable protoc export, pass a protoc
option to deriving protobuf:
(* foo.ml *)
type msg = ... [@@deriving protobuf { protoc }]
Compiling this file will create a file called Foo.protoc
(note the capitalization) in a directory adjacent to foo.ml
; if you are using ocamlbuild and foo.ml
is located in directory src/
, the file will be generated at _build/src/Foo.protoc
. This can be customized by providing a path explicitly, e.g. [@@deriving protobuf { protoc = "Bar.protoc" }]
; the path is interpreted relative to the source file.
The mapping between OCaml types and protoc messages is straightforward.
OCaml modules become protoc packages with the same name. A nested module, e.g. module Bar
in our foo.ml
, becomes a nested package, Foo.Bar
; it will be emitted in a file Foo.Bar.protoc
, placed adjacent to Foo.protoc
, since protoc requires every package to reside in its own file.
OCaml records and their fields become protoc messages and fields with the same name:
type msg = {
name: string [@key 1];
value: int [@key 2];
} [@@deriving protobuf { protoc }]
message msg {
required string name = 1;
required int64 value = 2;
}
OCaml variants and their constructors become protoc messages and fields with the same name; additionally generated are a nested enum called _tag
whose constants have the same name as constructors with _tag
appended, and a field named tag
with the type _tag
:
type msg =
| A [@key 1]
| B of string [@key 2]
[@@deriving protobuf { protoc }]
message msg {
enum _tag {
A_tag = 1;
B_tag = 2;
}
required _tag tag = 1;
oneof value {
string B = 3;
}
}
OCaml tuples become protoc messages with the same name whose fields are called _N
with N
being the field index:
type msg = int * string
[@@deriving protobuf { protoc }]
message msg {
required int64 _0 = 1;
required string _1 = 2;
}
OCaml aliases become protoc messages with one field called _
:
type msg = int
[@@deriving protobuf { protoc }]
message msg {
required int64 _ = 1;
}
Sometimes, a single toplevel OCaml type definition has to be translated into several messages, e.g. when a field or a constructor contains a tuple or a polymorphic variant. In this case, such messages become nested messages whose name is the name of the field or constructor with _
prepended:
type msg = {
field: int * string [@key 1]
}
[@@deriving protobuf { protoc }]
message msg {
message _field {
required int64 _0 = 1;
required string _1 = 2;
}
required _field field = 1;
}
Normally, when a type from another module is referenced, deriving protobuf automatically generates the corresponding protoc import
directive:
type imported = Other.msg
[@@deriving protobuf { protoc }]
import "Other.protoc";
message imported {
required Other.msg _ = 1;
}
However, when a type is referenced that was defined in a module defined earlier in the same file, the produced import
directive is incorrect. (deriving protobuf does not have an accurate model of OCaml's module scoping.) In this case, the protoc_import
option can help:
(* foo.ml *)
module Bar = struct
type msg = int [@@deriving protobuf { protoc }]
end
type alias = Bar.msg
[@@deriving protobuf { protoc; protoc_import = ["Foo.Bar.protoc"] }]
// Foo.protoc
package Foo;
import "Foo.Bar.protoc";
message alias {
required Bar.msg _ = 1;
}
// Foo.Bar.protoc
package Foo.Bar;
message msg {
required int64 _ = 1;
}
Compatibility
Protocol Buffers specification suggests that if a message contains multiple instances of a required
or optional
nested message, those nested messages should be merged. However, there is no concept of "merging messages" accessible to deriving protobuf, and this feature can be considered harmful anyway: it is far too forgiving of invalid input. Thus, deriving protobuf doesn't implement this merging.
deriving protobuf is more strict than protoc with numeric types; it raises Failure (Overflow fld)
rather than silently truncate values. It is thought that accidentally losing 32th or 64th bit with OCaml's int
type would be a common error without this countermeasure.
Everything else should be entirely compatible with protoc.
API Documentation
The documentation for internal API is available at GitHub pages.