Release v4.5.0
What is MirageOS?
MirageOS is a library operating system that can build standalone unikernels on various platforms. More precisely, the architecture can be divided into:
- operating system libraries that implement kernel and protocol functionality, ranging from low-level network card drivers to a full reimplementation of the TLS protocol, through to a reimplementation of the Git protocol to store versioned data.
- A set of typed signatures to make sure these libraries are consistent and can interoperate. As all the library are almost all pure OCaml code, we have defined a set of OCaml module types that encode these conventions in a statically enforcable way. We make no compatibility guarantees at the C level, but compile those on a best-effort basis.
- Finally, MirageOS is also a metaprogramming compiler that generates OCaml code. It takes as input: the OCaml source code of a program and all of its dependencies, the full description of the deployment target, including configuration values (like the HTTP port to listen on, or the private key or the service being deployed). The `mirage`CLI tool uses all of these to generate a executable unikernel: a specialised binary artefact containing only the code what is needed to run on the given deployment platform and no more.
It is possible to write high-level MirageOS applications, such as HTTPS, email or CalDAV servers which can be deployed on very heterogenous and embedded platforms by changing only a few compilation parameters. The supported platforms range from minimal virtual machines running on cloud providers, or processes running inside Docker containers configured with a tight security profile. In general, these platform do not have a full POSIX environment; MirageOS does not try to emulate POSIX and focuses on providing a small, well-defined, typed interface with the system components. The nearest equivalent to the MirageOS approach is the WASI (wasi.dev) set of interfaces for WebAssembly.
Is everything really written in OCaml?
While most of the code is written in OCaml, a typed, high-level language with many good safety properties, there are pieces of MirageOS which are still written in C. These bits can be separated in three categories:
- The OCaml runtime is written in C. It needs to be ported to the platform that MirageOS is trying to target, which do not support POSIX. Hence, the first component to port to a new platform is the OCaml runtime.
- The low-level device drivers (network, console, clock, etc) also need some C bits.
- The base usual C bindings; some libraries are widely used and (unfortunately) very hard (but not impossible) to replace them completely without taking a big performance hit or having to trust code without much real-world usages. This is the case for low-level bit handling for crypto code (even if we try to make sure allocation is alway handled by the OCaml runtime) as well as arbitrary precision numeric computation (e.g. gmp). Ideally we could image rewriting all of these libraries in OCaml if we had an infinite amount of time in our hands.
MirageOS as a cross-compilator
The MirageOS compiler is basically a cross-compiler, where the host and target toolchain are identical, but with different flags for the C bindings: for instance, it is necessary to pass -freestanding
to all C bindings to not use POSIX headers. The MirageOS compiler also uses a custom linker: eg. not only it needs a custom OCaml's runtime libasmrun.a
, but it also needs to run a different linker to generate specialised executable images.
Historically, the OCaml ecosystem always had partial support for cross-compilation: for instance, the ocaml-cross way of doing it is to duplicate all existing opam pacakges by adding a -windows
suffix to their names and dependencies; this allows normal packages and windows packages can be co-installed in the same opam switch.
MirageOS 3.x
MirageOS 3.x solves this by duplicating only the packages defining C bindings. It relies on every MirageOS backend registering a set of CFLAGS
with pkg-config
. Then every bindings uses pkg-config
to configure their CFLAGS
and ocamlfind
to register link-time predicates, e.g. additional link time options like the name of the C archives. Finally, the final link step is done by querying ocamlfind (using the custom registered predicates) to link the list of dependencies' objects files with the result of OCam compiler's --output-obj
option.
MirageOS 4.x
MirageOS 4 solves this by relying on dune
's built-in support for cross-compilation. This is done by gathering all the sources of the dependencies locally with opam-monorepo
, and by creating a `dune-workspace` file describing the C flags to use in each cross-compilation "context". Once this is set-up, only one dune build
can cross-compile the unikernel target with all its local sources.
MirageOS eDSL
The rest of the document describes Functoria, the embedded domain-specific language to be used in config.ml
files, to described how the typed libraries have to be assembled.
Combinators
The type for values representing module types.
type t
is a value representing the module type t
.
val (@->) : 'a typ -> 'b typ -> ('a -> 'b) typ
Construct a functor type from a type and an existing functor type. This corresponds to prepending a parameter to the list of functor parameters. For example:
kv_ro @-> ip @-> kv_ro
This describes a functor type that accepts two arguments -- a kv_ro
and an ip
device -- and returns a kv_ro
.
The type for values representing module implementations.
m $ a
applies the functor m
to the module a
.
Same as impl
but with hidden type.
dep t
is the (build-time) dependency towards t
.
Keys
The type for configure-time command-line arguments.
The type for runtime command-line arguments.
The type for abstract keys.
The type for keys' parsing context. See Key.context
.
The type for values parsed from the command-line. See Key.value
.
key k
is an untyped representation of k
.
if_impl v impl1 impl2
is impl1
if v
is resolved to true and impl2
otherwise.
match_impl v cases ~default
chooses the implementation amongst cases
by matching the v
's value. default
is chosen if no value matches.
Package dependencies
For specifying opam package dependencies, the type package
is used. It consists of the opam package name, the ocamlfind names, and optional lower and upper bounds. The version constraints are merged with other modules.
The type for opam packages.
Installation scope of a package.
val package :
?scope:scope ->
?build:bool ->
?sublibs:string list ->
?libs:string list ->
?min:string ->
?max:string ->
?pin:string ->
?pin_version:string ->
string ->
package
Application Builder
Values of type impl
are tied to concrete module implementation with the device
and main
construct. Module implementations of type job
can then be registered into an application builder. The builder is in charge if parsing the command-line arguments and of generating code for the final application. See Functoria.Lib
for details.
The type for build information.
val main :
?pos:(string * int * int * int) ->
?packages:package list ->
?packages_v:package list value ->
?runtime_args:Functoria.Runtime_arg.t list ->
string ->
'a typ ->
'a impl
main name typ
is the functor name
, having the module type typ
. The connect code will call <name>.start
.
- If
packages
or packages_v
is set, then the given packages are installed before compiling the current application.
Devices
val code :
pos:(string * int * int * int) ->
('a, Stdlib.Format.formatter, unit, 'b code) Stdlib.format4 ->
'a
of_device t
is the implementation device t
.
impl ...
is of_device @@ Device.v ...
Jobs
General mirage devices
For the Qubes target, the Qubes database from which to look up dynamic runtime configuration information.
A default qubes database, guessed from the usual valid configurations.
Time
Abstract type for timers.
Implementations of the Mirage_time.S
signature.
The default timer implementation.
Clocks
Abstract type for POSIX clocks.
Implementations of the Mirage_clock.PCLOCK
signature.
The default mirage-clock Mirage_clock.PCLOCK
implementation.
Abstract type for monotonic clocks
Implementations of the Mirage_clock.MCLOCK
signature.
The default mirage-clock Mirage_clock.MCLOCK
implementation.
Log reporters
The type for log reporters.
Implementation of the log reporter
type.
default_reporter ?clock ?level ()
is the log reporter that prints log messages to the console, timestampted with clock
. If not provided, the default clock is default_posix_clock
. level
is the default log threshold. It is Some Logs.Info
if not specified.
no_reporter
disable log reporting.
Random
Abstract type for random sources.
Implementations of the Mirage_random.S
signature.
Default PRNG device to be used in unikernels. It uses getrandom/getentropy on Unix, and a Fortuna PRNG on other targets.
rng ()
is the device Mirage_crypto_rng.Make
.
Block devices
Abstract type for raw block device configurations.
Implementations of the Mirage_block.S
signature.
Use the given file as a raw block device.
val block_of_xenstore_id : string -> block impl
Use the given XenStore ID (ex: /dev/xvdi1
or 51760
) as a raw block device.
Use a ramdisk with the given name.
val generic_block :
?group:string ->
?key:[ `XenstoreId | `BlockFile | `Ramdisk ] value ->
string ->
block impl
Static key/value stores
Abstract type for read-only key/value store.
Implementations of the Mirage_kv.RO
signature.
Crunch a directory. The contents of the directory is transformed into OCaml code, which is then compiled as part of the unikernel.
tar_kv_ro block
is a read-only tar archive.
Direct access to the underlying filesystem as a key/value store for Unix. For other backends, this is equivalent to crunch
.
Use a FAT formatted block device.
val generic_kv_ro :
?group:string ->
?key:[ `Crunch | `Direct ] value ->
string ->
kv_ro impl
Generic key/value that will choose dynamically between direct_kv_ro
and crunch
. To use a filesystem implementation, try kv_ro_of_fs
.
If no key is provided, it uses Key.kv_ro
to create a new one.
val docteur :
?mode:[ `Fast | `Light ] ->
?name:string key ->
?output:string key ->
?analyze:bool runtime_arg ->
?branch:string ->
?extra_deps:string list ->
string ->
kv_ro impl
docteur ?mode ?name ?output ?analyze remote
is a read-only, key-value store device. Data is stored on that device using the Git PACK file format, version 2. This format has very good compression factors for many similar files of relatively small size. For instance, 14Gb of HTML files can be compressed into a disk image of 240Mb.
Unlike crunch
, docteur
produces an external image which means that less memory is used to keep and get files. The image can be produced from many sources:
- A local Git repository (like
file://path/to/the/git/repository/
) - A simple directory (like
file://path/to/a/simple/directory/
) - A remote Git repository (via SSH, HTTP(S) or TCP/IP as what
git clone
expects)
If you use a Git repository, you can choose a specific branch with the ?branch
argument (like refs/heads/main
). Otherwise, this argument is ignored.
If you use a simple directory, it can be a relative from your unikernel project (relativize://directory
) or an absolute path (file://home/user/directory
).
If a required file is produced by a dune
rule, you must notice it via the extra_deps
argument.
For a Solo5 target, users must attach the image as a block device:
$ solo5-hvt --block:<name>=<path-to-the-image> -- unikernel.{hvt,...}
The user is able to specify the name of the block device (default to "docteur"
). The user can also specify the output of docteur.make
, the tool which generate the image (default to "disk.img"
).
For the Unix target, the program open
the image at the beginning of the process. An integrity check of the image can be done via the analyze
value (defaults to true
).
It's possible to use the file-system into 2 modes:
`Light
: any access requires that we reconstruct the path to the requested file. That means that we will need to extract a few additional objects before the extraction of the requested one. `Light
does not cache anything in memory but it can be slower if the requested file is deep in the directory structure.`Fast
: reconstructs and cache the layout of the directory structure when the unikernel starts: it might increase boot-time and bigger memory requirements. However, `Fast
allows the device to decode only the requested object so it is faster than the `Light
mode.
Abstract type for read-write key/value store.
Implementations of the Mirage_kv.RW
signature.
Direct access to the underlying filesystem as a key/value store. Only available on Unix backends.
An in-memory key-value store using mirage-kv-mem
.
chamelon ~program_block_size
returns a kv_rw
filesystem which is an implementation of littlefs in OCaml. The chamelon
device expects a block-device:
let program_block_size =
let doc = Key.Arg.info [ "program-block-size" ] in
Key.(create "program_block_size" Arg.(opt int 16 doc))
let block = block_of_file "db"
let fs = chamelon ~program_block_size block
For Solo5 targets, you finally can launch the unikernel with:
$ solo5-hvt --block:db=db.img unikernel.hvt
The block-device must be well-formed and formatted by the chamelon
tool:
$ dd if=/dev/zero of=db.img bs=1M count=1
$ chamelon format db.img 512
tar_kv_rw block
is a read/write tar archive. Note that the filesystem is append-only. That is, files can generally not be removed, set_partial
only works on what is allocated, and there are restrictions on rename
.
ccm_block key block
returns a new block which is a AES-CCM encrypted disk.
Note also that the available size of an encrypted block is always divided by 2 of its real size: a 512M block will only be able to contain 256M data if it is encrypted.
You can either use a fresh block device as encrypted storage. This does not need any preparation, just using ccm_block
with the desired key
. If you have an existing disk image that you want to encrypt, you can use the ccmblock
tool given by the mirage-block-ccm
opam package.
$ ccmblock enc -i db.img -k 0x10786d3a9c920d0b3ec80dfaaac557a7 -o edb.img
Then, into you config.ml
, you just need to compose your block device with ccm_block
:
let aes_ccm_key =
let doc =
Key.Arg.info [ "aes-ccm-key" ]
~doc:"The key of the block device (hex formatted)"
in
Key.(create "aes-ccm-key" Arg.(required string doc))
let block = block_of_file "edb"
let encrypted_block = ccm_block aes_ccm_key block
Finally, with Solo5, you can launch your unikernel with that:
$ solo5-hvt --block:edb=edb.img \
--arg="--aes-ccm-key=0x10786d3a9c920d0b3ec80dfaaac557a7" \
unikernel.hvt
You can finally compose a file-system such as chamelon
with this block device (and you have a encrypted file-system!):
let fs = chamelon ~program_block_size encrypted_block
Network interfaces
Abstract type for network configurations.
Implementations of the Mirage_net.S
signature.
default_network
is a dynamic network implementation which attempts to do something reasonable based on the target.
A custom network interface. Exposes a Key.interface
key.
Ethernet configuration
Implementations of the Ethernet.S
signature.
ARP configuration
Implementation of the Arp.S
signature.
ARP implementation provided by the arp library
IP configuration
Implementations of the Tcpip.Ip.S
signature.
Abstract type for IP configurations.
The Tcpip.Ip.S
module signature with ipaddr = Ipaddr.V4.
The Tcpip.Ip.S
module signature with ipaddr = Ipaddr.V6.
The Tcpip.Ip.S
module signature with ipaddr = Ipaddr.t.
Types for manual IPv4 configuration.
Types for manual IPv6 configuration.
Use an IPv4 address Exposes the keys Key.V4.network
and Key.V4.gateway
. If provided, the values of these keys will override those supplied in the ipv4 configuration record, if that has been provided.
Use a given initialized QubesDB to look up and configure the appropriate * IPv4 interface.
Use an IPv6 address. Exposes the keys Key.V6.network
, Key.V6.gateway
.
UDP configuration
Implementation of the Tcpip.Udp.S
signature.
TCP configuration
Implementation of the Tcpip.Tcp.S
signature.
Network stack configuration
Dual IPv4 and IPv6
Implementation of the Tcpip.Stack.V4V6
signature.
Direct network stack with given ip.
Network stack with sockets.
Build a stackv4v6 by checking the Key.V6.network
, and Key.V6.gateway
keys for IPv4 and IPv6 configuration information, filling in unspecified information from ?config
, then building a stack on top of that.
Generic stack using a net
keys: Key.net
.
If a key is not provided, it uses Key.net
(with the group
argument) to create it.
tcpv4v6 stackv4v6
is an helper to extract the TCP/IP stack regardless the UDP/IP stack expected by some devices such as protocols.
Resolver configuration
DNS client
A DNS client is a module which implements:
getaddrinfo
to request a query_type
-dependent response to a nameserver regarding a domain-name such as the MX
record.gethostbyname
to request the A
regarding a domain-namegethostbyname6
to request the AAAA
record regarding a domain-name
generic_dns_client stackv4v6
creates a new DNS value which is able to resolve domain-name from nameservers
. It requires a network stack to communicate with these nameservers.
The nameservers
argument is a list of strings. The format of them is:
udp:ipaddr(:port)?
if you want to communicate with a DNS resolver via UDPtcp:ipaddr(:port)?
if you want to communicate with a DNS resolver via TCP/IPtls:ipaddr(:port)?(!<authenticator>)
if you to communicate with a DNS resolver via TLS. You are able to introduce an <authenticator>
(please, follow the documentation about X509.Authenticator.of_string
to get an explanation of its format). Otherwise, by default, we use trust anchors from NSS' certdata.txt
.
Happy-eyeballs
Happy-eyeballs is an implementation of RFC 8305 which specifies how to connect to a remote host using either IP protocol version 4 or IP protocol version 6 from a stackv4v6
network implementation.
The given device is able to resolve a remote host via a dns_client
device and both must share the same stackv4v6
implementation.
generic_happy_eyeballs stackv4v6 dns_client
creates a new happy-eyeballs value which is able to resolve and connect to a remote host and allocate finally a connected flow from the given network implementation stackv4v6
.
This device has several optional arguments of keys for timeouts specified in nanoseconds.
Syslog configuration
Syslog exfiltrates log messages (generated by libraries using the logs
library) via a network connection. The log level of the log sources is controlled via the Mirage_key.logs
key. The functionality is provided by the logs-syslog
package.
type syslog_config = {
hostname : string;
server : Ipaddr.t option;
port : int option;
truncate : int option;
}
Implementation of the syslog
type.
Emit log messages via UDP to the configured host.
Emit log messages via TCP to the configured host.
Emit log messages via TLS to the configured host, using the credentials (private key, certificate, trust anchor) provided in the KV_RO using the keyname
.
Conduit configuration
Mimic devices
For some implementations which requires to communicate with an external resources (such as a webserver or a git server), we must hide the underlying implementations that depend on the target (such as the network stack) and are necessary for these implementations.
The aim of mimic
is to offer first of all the ability to initiate a TCP/IP connection independently of the chosen target (see mimic_happy_eyeballs
).
The resulting device can then be composed with other protocols like TLS, Git or HTTP and it is through this resulting device that other devices can initiate an internet connection to a peer (like a webserver or a Git server).
mimic_happy_eyeballs stackv4v6 dns happy_eyeballs
creates a device which initiate a global happy-eyeballs loop. By this way, an underlying instance works to initiate a TCP/IP connection from an IP address or a domain-name.
For the domain-name resolution, we ask the happy-eyeballs instance to resolve the given domain-name via the DNS instance created by dns
(which includes several arguments like nameservers used - see generic_dns_client
for more informations).
The resulting device can be used and re-used to for any clients which need to initiate a connection (like alpn_client
or git_tcp
).
HTTP configuration
cohttp_server
starts a Cohttp server.
httpaf_server
starts a http/af server.
cohttp_server
starts a Cohttp server.
paf_server ~port tcpv4v6
creates an instance which will start to listen on the given port
. With this instance and the produced module HTTP_server
, the user can initiate:
- a simple HTTP server
- a simple HTTPS server (with a TLS configuration)
- a simple ALPN (
http/1.1
& h2
) server with TLS
This is a simple example of how to launch an HTTP server: unikernel.ml
module Make (HTTP_server : Paf_mirage.S with type ipaddr = Ipaddr.t) =
struct
let error_handler (_ipaddr, _port) ?request:_ _error _send = ()
let request_handler :
HTTP_server.TCP.flow -> Ipaddr.t * int -> Httpaf.Reqd.t -> unit =
fun _socket (_ipaddr, _port) reqd ->
let contents = "Hello World!\n" in
let headers =
Httpaf.Headers.of_list
[
("content-length", string_of_int (String.length contents));
("content-type", "text/plain");
("connection", "close");
]
in
let response = Httpaf.Response.create ~headers `OK in
Httpaf.Reqd.respond_with_string reqd response contents
let start http_server =
let service =
HTTP_service.http_service ~error_handler request_handler
in
let (`Initialized thread) = HTTP_server.serve service http_server in
thread
end
config.ml
open Mirage
let port =
let doc =
Key.Arg.info ~doc:"Port of the HTTP service." [ "p"; "port" ]
in
Key.(create "port" Arg.(opt int 8080 doc))
let main = main "Unikernel.Make" (http_server @-> job)
let stackv4v6 = generic_stackv4v6 default_network
let http_server = paf_server ~port (tcpv4v6_of_stackv4v6 stackv4v6)
let () = register "main" [ main $ http_server ]
Abstract type for ALPN HTTP clients
paf_client tcpv4v6 dns
creates an ALPN device which can do HTTP (http/1.1
& h2
) requests as a HTTP client. The device allocated represents values required to initiate a connection to HTTP webservers. The user can, then, use the module Http_mirage_client.request
to communicate with HTTP webservers. This is an example of how to use the ALPN devices:
unikernel.ml
module Make (HTTP_client : Http_mirage_client.S) = struct
let start http =
Http_mirage_client.request http "https://google.com"
(fun _response buf str -> Buffer.add_string buf str ; Lwt.return buf)
(Buffer.create 0x100) >>= function
| Ok (response, buf) ->
let body = Buffer.contents buf in
...
| Error _ -> ...
end
config.ml
open Mirage
let main = main "Unikernel.Make" (alpn_client @-> job)
let stackv4v6 = generic_stackv4v6 default_network
let dns = generic_dns_client stack
let alpn_client =
let dns =
mimic_happy_eyeballs stackv4v6 dns (generic_happy_eyeballs stack dns)
in
paf_client (tcpv4v6_of_stackv4v6 stackv4v6) dns
let () = register "main" [ main $ alpn_client ]
Argv configuration
default_argv
is a dynamic argv implementation which attempts to do something reasonable based on the target.
no_argv
Disable command line parsing and set argv to |""|
.
Git client configuration
Users can connect to a remote Git repository in many ways:
The devices defined below define these in composable ways. The git_client impl
returned from them can be passed to Git or Irmin in order to be able to fetch and push from/into a Git repository.
The user is able to restrict or enlarge protocol possibilities needed for its application. For instance, the user is able to restrict only the SSH connection to communicate with a Git repository or the user can handle TCP/IP and SSH as possible protocols to communicate with a peer.
For instance, a device which is able to communicate via TCP/IP and SSH can be implemented like:
let dns = generic_dns_client stack
let git_client =
let dns =
mimic_happy_eyeballs stackv4v6 dns (generic_happy_eyeballs stack dns)
in
let ssh = git_ssh ~key ~password (tcpv4v6_of_stackv4v6 stackv4v6) dns in
let tcp = git_tcp (tcpv4v6_of_stackv4v6 stackv4v6) dns in
merge_git_clients ssh tcp
The type for devices that implement the Git protocol.
merge_git_clients a b
is a device that can connect to remote Git repositories using either the device a
or the device b
.
git_tcp tcpv4v6 dns
is a device able to connect to a remote Git repository using TCP/IP.
git_ssh ?authenticator ~key ~password tcpv4v6 dns
is a device able to connect to a remote Git repository using an SSH connection with the given private key
or password
. The identity of the remote Git repository can be verified using authenticator
.
The format of the private key is: <type>:<seed or b64 encoded>
. <type>
can be rsa
or ed25519
and, if the type is RSA, we expect the seed of the private key. Otherwise (if the type is Ed25519), we expect the b64-encoded private key.
The format of the authenticator is SHA256:<b64-encoded-public-key>
, the output of:
$ ssh-keygen -lf <(ssh-keyscan -t rsa|ed25519 remote 2>/dev/null)
git_http ?authenticator ?headers tcpv4v6 dns
is a device able to connect to a remote Git repository via an HTTP(S) connection, using the provided HTTP headers
. The identity of the remote Git repository can be verified using authenticator
.
The format of it is:
none
no authentication- key(:<hash>)?:<b64-encoded fingerprint> to authenticate via the key fingerprint
- cert(:<hash>)?:<b64-encoded fingerprint> to authenticate via the cert fingerprint
- trust-anchor(:<der-encoded cert>)+ to authenticate via a list of certificates - By default, we use X.509 trust anchors extracted from Mozilla's NSS
Other devices
job
is the combinator for representing main tasks.
noop
is a job that does nothing, has no dependency and returns ()
runtime_args argv
is a job that loads argv.
Application registering
val register :
?argv:argv impl ->
?reporter:reporter impl ->
?src:[ `Auto | `None | `Some of string ] ->
string ->
job impl list ->
unit
register name jobs
registers the application named by name
which will executes the given jobs
.
val connect_err : string -> ?max:int -> int -> 'a